1 <!-- ##### SECTION Title ##### --> 2 Character Set Conversion 3 4 <!-- ##### SECTION Short_Description ##### --> 5 convert strings between different character sets using iconv() 6 7 <!-- ##### SECTION Long_Description ##### --> 8 <para> 9 10 </para> 11 12 <refsect2 id="file-name-encodings"> 13 <title>File Name Encodings</title> 14 15 <para> 16 Historically, Unix has not had a defined encoding for file 17 names: a file name is valid as long as it does not have path 18 separators in it ("/"). However, displaying file names may 19 require conversion: from the character set in which they were 20 created, to the character set in which the application 21 operates. Consider the Spanish file name 22 "<filename>Presentación.sxi</filename>". If the 23 application which created it uses ISO-8859-1 for its encoding, 24 then the actual file name on disk would look like this: 25 </para> 26 27 <programlisting id="filename-iso8859-1"> 28 Character: P r e s e n t a c i ó n . s x i 29 Hex code: 50 72 65 73 65 6e 74 61 63 69 f3 6e 2e 73 78 69 30 </programlisting> 31 32 <para> 33 However, if the application use UTF-8, the actual file name on 34 disk would look like this: 35 </para> 36 37 <programlisting id="filename-utf-8"> 38 Character: P r e s e n t a c i ó n . s x i 39 Hex code: 50 72 65 73 65 6e 74 61 63 69 c3 b3 6e 2e 73 78 69 40 </programlisting> 41 42 <para> 43 Glib uses UTF-8 for its strings, and GUI toolkits like GTK+ 44 that use Glib do the same thing. If you get a file name from 45 the file system, for example, from 46 <function>readdir(3)</function> or from <link 47 linkend="g_dir_read_name"><function>g_dir_read_name()</function></link>, 48 and you wish to display the file name to the user, you 49 <emphasis>will</emphasis> need to convert it into UTF-8. The 50 opposite case is when the user types the name of a file he 51 wishes to save: the toolkit will give you that string in 52 UTF-8 encoding, and you will need to convert it to the 53 character set used for file names before you can create the 54 file with <function>open(2)</function> or 55 <function>fopen(3)</function>. 56 </para> 57 58 <para> 59 By default, Glib assumes that file names on disk are in UTF-8 60 encoding. This is a valid assumption for file systems which 61 were created relatively recently: most applications use UTF-8 62 encoding for their strings, and that is also what they use for 63 the file names they create. However, older file systems may 64 still contain file names created in "older" encodings, such as 65 ISO-8859-1. In this case, for compatibility reasons, you may 66 want to instruct Glib to use that particular encoding for file 67 names rather than UTF-8. You can do this by specifying the 68 encoding for file names in the <link 69 linkend="G_FILENAME_ENCODING"><envar>G_FILENAME_ENCODING</envar></link> 70 environment variable. For example, if your installation uses 71 ISO-8859-1 for file names, you can put this in your 72 <filename>~/.profile</filename>: 73 </para> 74 75 <programlisting> 76 export G_FILENAME_ENCODING=ISO-8859-1 77 </programlisting> 78 79 <para> 80 Glib provides the functions <link 81 linkend="g_filename_to_utf8"><function>g_filename_to_utf8()</function></link> 82 and <link 83 linkend="g_filename_from_utf8"><function>g_filename_from_utf8()</function></link> 84 to perform the necessary conversions. These functions convert 85 file names from the encoding specified in 86 <envar>G_FILENAME_ENCODING</envar> to UTF-8 and vice-versa. 87 <xref linkend="file-name-encodings-diagram"/> illustrates how 88 these functions are used to convert between UTF-8 and the 89 encoding for file names in the file system. 90 </para> 91 92 <figure id="file-name-encodings-diagram"> 93 <title>Conversion between File Name Encodings</title> 94 <graphic fileref="file-name-encodings.png" format="PNG"/> 95 </figure> 96 97 <refsect3 id="file-name-encodings-checklist"> 98 <title>Checklist for Application Writers</title> 99 100 <para> 101 This section is a practical summary of the detailed 102 description above. You can use this as a checklist of 103 things to do to make sure your applications process file 104 name encodings correctly. 105 </para> 106 107 <orderedlist> 108 <listitem> 109 <para> 110 If you get a file name from the file system from a 111 function such as <function>readdir(3)</function> or 112 <function>gtk_file_chooser_get_filename()</function>, 113 you do not need to do any conversion to pass that 114 file name to functions like <function>open(2)</function>, 115 <function>rename(2)</function>, or 116 <function>fopen(3)</function> — those are "raw" 117 file names which the file system understands. 118 </para> 119 </listitem> 120 121 <listitem> 122 <para> 123 If you need to display a file name, convert it to UTF-8 124 first by using <link 125 linkend="g_filename_to_utf8"><function>g_filename_to_utf8()</function></link>. 126 If conversion fails, display a string like 127 "<literal>Unknown file name</literal>". <emphasis>Do 128 not</emphasis> convert this string back into the 129 encoding used for file names if you wish to pass it to 130 the file system; use the original file name instead. 131 For example, the document window of a word processor 132 could display "Unknown file name" in its title bar but 133 still let the user save the file, as it would keep the 134 raw file name internally. This can happen if the user 135 has not set the <envar>G_FILENAME_ENCODING</envar> 136 environment variable even though he has files whose 137 names are not encoded in UTF-8. 138 </para> 139 </listitem> 140 141 <listitem> 142 <para> 143 If your user interface lets the user type a file name 144 for saving or renaming, convert it to the encoding used 145 for file names in the file system by using <link 146 linkend="g_filename_from_utf8"><function>g_filename_from_utf8()</function></link>. 147 Pass the converted file name to functions like 148 <function>fopen(3)</function>. If conversion fails, ask 149 the user to enter a different file name. This can 150 happen if the user types Japanese characters when 151 <envar>G_FILENAME_ENCODING</envar> is set to 152 <literal>ISO-8859-1</literal>, for example. 153 </para> 154 </listitem> 155 </orderedlist> 156 </refsect3> 157 </refsect2> 158 159 <!-- ##### SECTION See_Also ##### --> 160 <para> 161 162 </para> 163 164 <!-- ##### SECTION Stability_Level ##### --> 165 166 167 <!-- ##### FUNCTION g_convert ##### --> 168 <para> 169 170 </para> 171 172 @str: 173 @len: 174 @to_codeset: 175 @from_codeset: 176 @bytes_read: 177 @bytes_written: 178 @error: 179 @Returns: 180 181 182 <!-- ##### FUNCTION g_convert_with_fallback ##### --> 183 <para> 184 185 </para> 186 187 @str: 188 @len: 189 @to_codeset: 190 @from_codeset: 191 @fallback: 192 @bytes_read: 193 @bytes_written: 194 @error: 195 @Returns: 196 197 198 <!-- ##### STRUCT GIConv ##### --> 199 <para> 200 The <structname>GIConv</structname> struct wraps an 201 <function>iconv()</function> conversion descriptor. It contains private data 202 and should only be accessed using the following functions. 203 </para> 204 205 206 <!-- ##### FUNCTION g_convert_with_iconv ##### --> 207 <para> 208 209 </para> 210 211 @str: 212 @len: 213 @converter: 214 @bytes_read: 215 @bytes_written: 216 @error: 217 @Returns: 218 219 220 <!-- ##### MACRO G_CONVERT_ERROR ##### --> 221 <para> 222 Error domain for character set conversions. Errors in this domain will 223 be from the #GConvertError enumeration. See #GError for information on 224 error domains. 225 </para> 226 227 228 229 <!-- ##### FUNCTION g_iconv_open ##### --> 230 <para> 231 232 </para> 233 234 @to_codeset: 235 @from_codeset: 236 @Returns: 237 238 239 <!-- ##### FUNCTION g_iconv ##### --> 240 <para> 241 242 </para> 243 244 @converter: 245 @inbuf: 246 @inbytes_left: 247 @outbuf: 248 @outbytes_left: 249 @Returns: 250 251 252 <!-- ##### FUNCTION g_iconv_close ##### --> 253 <para> 254 255 </para> 256 257 @converter: 258 @Returns: 259 260 261 <!-- ##### FUNCTION g_locale_to_utf8 ##### --> 262 <para> 263 264 </para> 265 266 @opsysstring: 267 @len: 268 @bytes_read: 269 @bytes_written: 270 @error: 271 @Returns: 272 273 274 <!-- ##### FUNCTION g_filename_to_utf8 ##### --> 275 <para> 276 277 </para> 278 279 @opsysstring: 280 @len: 281 @bytes_read: 282 @bytes_written: 283 @error: 284 @Returns: 285 286 287 <!-- ##### FUNCTION g_filename_from_utf8 ##### --> 288 <para> 289 290 </para> 291 292 @utf8string: 293 @len: 294 @bytes_read: 295 @bytes_written: 296 @error: 297 @Returns: 298 299 300 <!-- ##### FUNCTION g_filename_from_uri ##### --> 301 <para> 302 303 </para> 304 305 @uri: 306 @hostname: 307 @error: 308 @Returns: 309 310 311 <!-- ##### FUNCTION g_filename_to_uri ##### --> 312 <para> 313 314 </para> 315 316 @filename: 317 @hostname: 318 @error: 319 @Returns: 320 321 322 <!-- ##### FUNCTION g_get_filename_charsets ##### --> 323 <para> 324 325 </para> 326 327 @charsets: 328 @Returns: 329 330 331 <!-- ##### FUNCTION g_filename_display_name ##### --> 332 <para> 333 334 </para> 335 336 @filename: 337 @Returns: 338 339 340 <!-- ##### FUNCTION g_filename_display_basename ##### --> 341 <para> 342 343 </para> 344 345 @filename: 346 @Returns: 347 348 349 <!-- ##### FUNCTION g_uri_list_extract_uris ##### --> 350 <para> 351 352 </para> 353 354 @uri_list: 355 @Returns: 356 357 358 <!-- ##### FUNCTION g_locale_from_utf8 ##### --> 359 <para> 360 361 </para> 362 363 @utf8string: 364 @len: 365 @bytes_read: 366 @bytes_written: 367 @error: 368 @Returns: 369 370 371 <!-- ##### ENUM GConvertError ##### --> 372 <para> 373 Error codes returned by character set conversion routines. 374 </para> 375 376 @G_CONVERT_ERROR_NO_CONVERSION: Conversion between the requested character sets 377 is not supported. 378 @G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input. 379 @G_CONVERT_ERROR_FAILED: Conversion failed for some reason. 380 @G_CONVERT_ERROR_PARTIAL_INPUT: Partial character sequence at end of input. 381 @G_CONVERT_ERROR_BAD_URI: URI is invalid. 382 @G_CONVERT_ERROR_NOT_ABSOLUTE_PATH: Pathname is not an absolute path. 383 384 <!-- ##### FUNCTION g_get_charset ##### --> 385 <para> 386 387 </para> 388 389 @charset: 390 @Returns: 391 392 393 <!-- 394 Local variables: 395 mode: sgml 396 sgml-parent-document: ("../glib-docs.sgml" "book" "refentry") 397 End: 398 --> 399 400 401