Home | History | Annotate | Download | only in tmpl
      1 <!-- ##### SECTION Title ##### -->
      2 Character Set Conversion
      3 
      4 <!-- ##### SECTION Short_Description ##### -->
      5 convert strings between different character sets using iconv()
      6 
      7 <!-- ##### SECTION Long_Description ##### -->
      8 <para>
      9 
     10 </para>
     11 
     12     <refsect2 id="file-name-encodings">
     13       <title>File Name Encodings</title>
     14 
     15       <para>
     16 	Historically, Unix has not had a defined encoding for file
     17 	names:  a file name is valid as long as it does not have path
     18 	separators in it ("/").  However, displaying file names may
     19 	require conversion:  from the character set in which they were
     20 	created, to the character set in which the application
     21 	operates.  Consider the Spanish file name
     22 	"<filename>Presentaci&oacute;n.sxi</filename>".  If the
     23 	application which created it uses ISO-8859-1 for its encoding,
     24 	then the actual file name on disk would look like this:
     25       </para>
     26 
     27       <programlisting id="filename-iso8859-1">
     28 Character:  P  r  e  s  e  n  t  a  c  i  &oacute;  n  .  s  x  i
     29 Hex code:   50 72 65 73 65 6e 74 61 63 69 f3 6e 2e 73 78 69
     30       </programlisting>
     31 
     32       <para>
     33 	However, if the application use UTF-8, the actual file name on
     34 	disk would look like this:
     35       </para>
     36 
     37       <programlisting id="filename-utf-8">
     38 Character:  P  r  e  s  e  n  t  a  c  i  &oacute;     n  .  s  x  i
     39 Hex code:   50 72 65 73 65 6e 74 61 63 69 c3 b3 6e 2e 73 78 69
     40       </programlisting>
     41 
     42       <para>
     43 	Glib uses UTF-8 for its strings, and GUI toolkits like GTK+
     44 	that use Glib do the same thing.  If you get a file name from
     45 	the file system, for example, from
     46 	<function>readdir(3)</function> or from <link
     47 	linkend="g_dir_read_name"><function>g_dir_read_name()</function></link>,
     48 	and you wish to display the file name to the user, you
     49 	<emphasis>will</emphasis> need to convert it into UTF-8.  The
     50 	opposite case is when the user types the name of a file he
     51 	wishes to save:  the toolkit will give you that string in
     52 	UTF-8 encoding, and you will need to convert it to the
     53 	character set used for file names before you can create the
     54 	file with <function>open(2)</function> or
     55 	<function>fopen(3)</function>.
     56       </para>
     57 
     58       <para>
     59 	By default, Glib assumes that file names on disk are in UTF-8
     60 	encoding.  This is a valid assumption for file systems which
     61 	were created relatively recently:  most applications use UTF-8
     62 	encoding for their strings, and that is also what they use for
     63 	the file names they create.  However, older file systems may
     64 	still contain file names created in "older" encodings, such as
     65 	ISO-8859-1.  In this case, for compatibility reasons, you may
     66 	want to instruct Glib to use that particular encoding for file
     67 	names rather than UTF-8.  You can do this by specifying the
     68 	encoding for file names in the <link
     69 	linkend="G_FILENAME_ENCODING"><envar>G_FILENAME_ENCODING</envar></link>
     70 	environment variable.  For example, if your installation uses
     71 	ISO-8859-1 for file names, you can put this in your
     72 	<filename>~/.profile</filename>:
     73       </para>
     74 
     75       <programlisting>
     76 export G_FILENAME_ENCODING=ISO-8859-1
     77       </programlisting>
     78 
     79       <para>
     80 	Glib provides the functions <link
     81 	linkend="g_filename_to_utf8"><function>g_filename_to_utf8()</function></link>
     82 	and <link
     83 	linkend="g_filename_from_utf8"><function>g_filename_from_utf8()</function></link>
     84 	to perform the necessary conversions.  These functions convert
     85 	file names from the encoding specified in
     86 	<envar>G_FILENAME_ENCODING</envar> to UTF-8 and vice-versa.
     87 	<xref linkend="file-name-encodings-diagram"/> illustrates how
     88 	these functions are used to convert between UTF-8 and the
     89 	encoding for file names in the file system.
     90       </para>
     91 
     92       <figure id="file-name-encodings-diagram">
     93 	<title>Conversion between File Name Encodings</title>
     94 	<graphic fileref="file-name-encodings.png" format="PNG"/>
     95       </figure>
     96 
     97       <refsect3 id="file-name-encodings-checklist">
     98 	<title>Checklist for Application Writers</title>
     99 
    100 	<para>
    101 	  This section is a practical summary of the detailed
    102 	  description above.  You can use this as a checklist of
    103 	  things to do to make sure your applications process file
    104 	  name encodings correctly.
    105 	</para>
    106 
    107 	<orderedlist>
    108 	  <listitem>
    109 	    <para>
    110 	      If you get a file name from the file system from a
    111 	      function such as <function>readdir(3)</function> or
    112 	      <function>gtk_file_chooser_get_filename()</function>,
    113 	      you do not need to do any conversion to pass that
    114 	      file name to functions like <function>open(2)</function>,
    115 	      <function>rename(2)</function>, or
    116 	      <function>fopen(3)</function> &mdash; those are "raw"
    117 	      file names which the file system understands.
    118 	    </para>
    119 	  </listitem>
    120 
    121 	  <listitem>
    122 	    <para>
    123 	      If you need to display a file name, convert it to UTF-8
    124 	      first by using <link
    125 	      linkend="g_filename_to_utf8"><function>g_filename_to_utf8()</function></link>.
    126 	      If conversion fails, display a string like
    127 	      "<literal>Unknown file name</literal>".  <emphasis>Do
    128 	      not</emphasis> convert this string back into the
    129 	      encoding used for file names if you wish to pass it to
    130 	      the file system; use the original file name instead.
    131 	      For example, the document window of a word processor
    132 	      could display "Unknown file name" in its title bar but
    133 	      still let the user save the file, as it would keep the
    134 	      raw file name internally.  This can happen if the user
    135 	      has not set the <envar>G_FILENAME_ENCODING</envar>
    136 	      environment variable even though he has files whose
    137 	      names are not encoded in UTF-8.
    138 	    </para>
    139 	  </listitem>
    140 
    141 	  <listitem>
    142 	    <para>
    143 	      If your user interface lets the user type a file name
    144 	      for saving or renaming, convert it to the encoding used
    145 	      for file names in the file system by using <link
    146 	      linkend="g_filename_from_utf8"><function>g_filename_from_utf8()</function></link>.
    147 	      Pass the converted file name to functions like
    148 	      <function>fopen(3)</function>.  If conversion fails, ask
    149 	      the user to enter a different file name.  This can
    150 	      happen if the user types Japanese characters when
    151 	      <envar>G_FILENAME_ENCODING</envar> is set to
    152 	      <literal>ISO-8859-1</literal>, for example.
    153 	    </para>
    154 	  </listitem>
    155 	</orderedlist>
    156       </refsect3>
    157     </refsect2>
    158 
    159 <!-- ##### SECTION See_Also ##### -->
    160 <para>
    161 
    162 </para>
    163 
    164 <!-- ##### SECTION Stability_Level ##### -->
    165 
    166 
    167 <!-- ##### FUNCTION g_convert ##### -->
    168 <para>
    169 
    170 </para>
    171 
    172 @str: 
    173 @len: 
    174 @to_codeset: 
    175 @from_codeset: 
    176 @bytes_read: 
    177 @bytes_written: 
    178 @error: 
    179 @Returns: 
    180 
    181 
    182 <!-- ##### FUNCTION g_convert_with_fallback ##### -->
    183 <para>
    184 
    185 </para>
    186 
    187 @str: 
    188 @len: 
    189 @to_codeset: 
    190 @from_codeset: 
    191 @fallback: 
    192 @bytes_read: 
    193 @bytes_written: 
    194 @error: 
    195 @Returns: 
    196 
    197 
    198 <!-- ##### STRUCT GIConv ##### -->
    199 <para>
    200 The <structname>GIConv</structname> struct wraps an
    201 <function>iconv()</function> conversion descriptor. It contains private data
    202 and should only be accessed using the following functions.
    203 </para>
    204 
    205 
    206 <!-- ##### FUNCTION g_convert_with_iconv ##### -->
    207 <para>
    208 
    209 </para>
    210 
    211 @str: 
    212 @len: 
    213 @converter: 
    214 @bytes_read: 
    215 @bytes_written: 
    216 @error: 
    217 @Returns: 
    218 
    219 
    220 <!-- ##### MACRO G_CONVERT_ERROR ##### -->
    221 <para>
    222 Error domain for character set conversions. Errors in this domain will
    223 be from the #GConvertError enumeration. See #GError for information on 
    224 error domains.
    225 </para>
    226 
    227 
    228 
    229 <!-- ##### FUNCTION g_iconv_open ##### -->
    230 <para>
    231 
    232 </para>
    233 
    234 @to_codeset: 
    235 @from_codeset: 
    236 @Returns: 
    237 
    238 
    239 <!-- ##### FUNCTION g_iconv ##### -->
    240 <para>
    241 
    242 </para>
    243 
    244 @converter: 
    245 @inbuf: 
    246 @inbytes_left: 
    247 @outbuf: 
    248 @outbytes_left: 
    249 @Returns: 
    250 
    251 
    252 <!-- ##### FUNCTION g_iconv_close ##### -->
    253 <para>
    254 
    255 </para>
    256 
    257 @converter: 
    258 @Returns: 
    259 
    260 
    261 <!-- ##### FUNCTION g_locale_to_utf8 ##### -->
    262 <para>
    263 
    264 </para>
    265 
    266 @opsysstring: 
    267 @len: 
    268 @bytes_read: 
    269 @bytes_written: 
    270 @error: 
    271 @Returns: 
    272 
    273 
    274 <!-- ##### FUNCTION g_filename_to_utf8 ##### -->
    275 <para>
    276 
    277 </para>
    278 
    279 @opsysstring: 
    280 @len: 
    281 @bytes_read: 
    282 @bytes_written: 
    283 @error: 
    284 @Returns: 
    285 
    286 
    287 <!-- ##### FUNCTION g_filename_from_utf8 ##### -->
    288 <para>
    289 
    290 </para>
    291 
    292 @utf8string: 
    293 @len: 
    294 @bytes_read: 
    295 @bytes_written: 
    296 @error: 
    297 @Returns: 
    298 
    299 
    300 <!-- ##### FUNCTION g_filename_from_uri ##### -->
    301 <para>
    302 
    303 </para>
    304 
    305 @uri: 
    306 @hostname: 
    307 @error: 
    308 @Returns: 
    309 
    310 
    311 <!-- ##### FUNCTION g_filename_to_uri ##### -->
    312 <para>
    313 
    314 </para>
    315 
    316 @filename: 
    317 @hostname: 
    318 @error: 
    319 @Returns: 
    320 
    321 
    322 <!-- ##### FUNCTION g_get_filename_charsets ##### -->
    323 <para>
    324 
    325 </para>
    326 
    327 @charsets: 
    328 @Returns: 
    329 
    330 
    331 <!-- ##### FUNCTION g_filename_display_name ##### -->
    332 <para>
    333 
    334 </para>
    335 
    336 @filename: 
    337 @Returns: 
    338 
    339 
    340 <!-- ##### FUNCTION g_filename_display_basename ##### -->
    341 <para>
    342 
    343 </para>
    344 
    345 @filename: 
    346 @Returns: 
    347 
    348 
    349 <!-- ##### FUNCTION g_uri_list_extract_uris ##### -->
    350 <para>
    351 
    352 </para>
    353 
    354 @uri_list: 
    355 @Returns: 
    356 
    357 
    358 <!-- ##### FUNCTION g_locale_from_utf8 ##### -->
    359 <para>
    360 
    361 </para>
    362 
    363 @utf8string: 
    364 @len: 
    365 @bytes_read: 
    366 @bytes_written: 
    367 @error: 
    368 @Returns: 
    369 
    370 
    371 <!-- ##### ENUM GConvertError ##### -->
    372 <para>
    373 Error codes returned by character set conversion routines.
    374 </para>
    375 
    376 @G_CONVERT_ERROR_NO_CONVERSION: Conversion between the requested character sets
    377 is not supported.
    378 @G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input.
    379 @G_CONVERT_ERROR_FAILED: Conversion failed for some reason.
    380 @G_CONVERT_ERROR_PARTIAL_INPUT: Partial character sequence at end of input.
    381 @G_CONVERT_ERROR_BAD_URI: URI is invalid.
    382 @G_CONVERT_ERROR_NOT_ABSOLUTE_PATH: Pathname is not an absolute path.
    383 
    384 <!-- ##### FUNCTION g_get_charset ##### -->
    385 <para>
    386 
    387 </para>
    388 
    389 @charset: 
    390 @Returns: 
    391 
    392 
    393 <!--
    394 Local variables:
    395 mode: sgml
    396 sgml-parent-document: ("../glib-docs.sgml" "book" "refentry")
    397 End:
    398 -->
    399 
    400 
    401