Home | History | Annotate | Download | only in doc
      1   Here is a description of how you can use STLport to read/write utf8 files.
      2 utf8 is a way of encoding wide characters. As so, management of encoding in
      3 the C++ Standard library is handle by the codecvt locale facet which is part
      4 of the ctype category. However utf8 only describe how encoding must be
      5 performed, it cannot be used to classify characters so it is not enough info
      6 to know how to generate the whole ctype category facets of a locale
      7 instance.
      8 
      9 In C++ it means that the following code will throw an exception to
     10 signal that creation failed:
     11 
     12 #include <locale>
     13 // Will throw a std::runtime_error exception.
     14 std::locale loc(".utf8");
     15 
     16 For the same reason building a locale with the ctype facets based on
     17 UTF8 is also wrong:
     18 
     19 // Will throw a std::runtime_error exception:
     20 std::locale loc(locale::classic(), ".utf8", std::locale::ctype);
     21 
     22 The only solution to get a locale instance that will handle utf8 encoding
     23 is to specifically signal that the codecvt facet should be based on utf8
     24 encoding:
     25 
     26 // Will succeed if there is necessary platform support.
     27 locale loc(locale::classic(), new codecvt_byname<wchar_t, char, mbstate_t>(".utf8"));
     28 
     29   Once you have obtain a locale instance you can inject it in a file stream to
     30 read/write utf8 files:
     31 
     32 std::fstream fstr("file.utf8");
     33 fstr.imbue(loc);
     34 
     35 You can also access the facet directly to perform utf8 encoding/decoding operations:
     36 
     37 typedef std::codecvt<wchar_t, char, mbstate_t> codecvt_t;
     38 const codecvt_t& encoding = use_facet<codecvt_t>(loc);
     39 
     40 Notes:
     41 
     42 1. The dot ('.') is mandatory in front of utf8. This is a POSIX convention, locale
     43 names have the following format:
     44 language[_country[.encoding]]
     45 
     46 Ex: 'fr_FR'
     47     'french'
     48     'ru_RU.koi8r'
     49 
     50 2. utf8 encoding is only supported for the moment under Windows. The less common
     51 utf7 encoding is also supported.
     52