Home | History | Annotate | Download | only in icu
      1 Name: icu
      2 URL: http://site.icu-project.org/
      3 Version: 4.6
      4 License: MIT
      5 Security Critical: yes
      6 
      7 Description:
      8 This directory contains the source code of ICU 4.6 for C/C++
      9 
     10 1. It was obtained with the following:
     11 
     12     $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-6 icu46
     13 
     14 2. Platform header files for Linux, FreeBSD, OpenBSD, Android, Mac OS X, and QNX:
     15 
     16    - Apply platform.patch in patches directory. : It applies the upstream
     17      patch to platform.h.in (see http://bugs.icu-project.org/trac/ticket/8248)
     18      and change source/common/unicode/ptypes.h to refer to plinux.h and
     19      pmac.h generated below.
     20 
     21    - 'runConfigureICU Linux', 'runConfigureICU FreeBSD', and
     22      'runConfigureICU MacOSX' are run to generate
     23      source/common/unicode/platform.h.
     24 
     25    - On OpenBSD, source/common/unicode/platform.h is being generated
     26      by the icu4c port in the ports directory and not by runConfigureICU.
     27      In case the file has to be updated you can do:
     28      cd /home/ports/textproc/icu4c && make configure
     29 
     30    - Rename it to 'plinux.h', 'pfreebsd.h', 'popenbsd.h' and 'pmac.h'
     31 
     32    - Apply patches/pmach.h.patch on Mac to pmac.h
     33 
     34    - On Android, the pandroid.h was generated by copying plinux.h to
     35      pandroid.h and applying the patches/pandroid.h.patch.
     36 
     37    - For QNX, the pqnx.h was generated by copying plinux.h to
     38      pqnx.h and applying the patches/platform.qnx.patch.
     39 
     40    - For NaCl (icu_nacl.gypi), the pnacl.h was generated by copying plinux.h to
     41      pnacl.h and applying the patches/pnacl.h.patch.
     42 
     43    - Apply the CL at https://codereview.chromium.org/15973007/ to plinux.h
     44 
     45 3. The following directories were removed because they're not used by Chromium
     46    at the moment:
     47    as_is
     48    packaging
     49    source/extra
     50    source/sample
     51    source/layout
     52    source/layoutex
     53 
     54 
     55 4. The word breaking for Chinese and Japanese were modified to use a word
     56    frequency list with the following patch and cjdict.txt.
     57 
     58    - patches/segmentation.patch :
     59        Adds a dictionary (word-frequency)-based word breaking for CJK
     60        (Korean is supported in the code, but it does not do anything
     61         because we don't have a Korean word-list.)
     62 
     63    - source/data/brkitr/cjdict.txt :
     64        Chinese and Japanese word frequency list.
     65        See the file for license/copyright notice
     66 
     67    - source/data/brkitr/cc_edict.txt :
     68        the list of words derived from CC-Edict.)
     69 
     70    - patches/brkitr.patch
     71      * word.txt : Chinese/Japanese segmentation rules, Hebrew-script-specific
     72                   handling of U+0022, and splitting of FQDN into labels at '.'.
     73 		  For Hebrew, see http://unicode.org/cldr/track/ticket/3120
     74      * line.txt : Incorporated line_he and minor changes in CL, OP and ID
     75                   definitions.
     76 		  For Hebrew, see http://unicode.org/cldr/track/ticket/4004
     77 		  For others, see http://unicode.org/cldr/track/ticket/3974
     78 		                  http://unicode.org/cldr/track/ticket/4200
     79 		                  http://unicode.org/cldr/track/ticket/
     80      * brklocal.mk : build file changes to drop unnecessary brkitr rule
     81                      files (e.g. word_ja.txt, line_he.txt)
     82 
     83    - android/brkitr.patch (to be applied for Android build only) :
     84        Reverts some changes about Chinese/Japanese segmentation rules in
     85        patches/brkitr.patch to reduce binary size for Android.
     86 
     87    If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt
     88    to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test.
     89 
     90 5. Converter changes : converters.patch
     91   - Include what we really need. See source/data/mappings/ucmlocal.txt
     92   - Alias and mapping changes : source/data/mappings/convrtrs.txt
     93   - Changes several tables and add six new tables, three of which
     94     are 'fake' tables for ISO-2022-CN(-Ext).
     95   - ucnv2022.c is modified to use 3 'fake' tables added above for
     96     ISO-2022-CN(-Ext).
     97 
     98 6. Locale changes
     99   - patches/locale1.patch :
    100       Filipino, Amharic, and Swahili locales
    101       exemplar character set changes for CJK + 9 Indian locales
    102       Minor fixes for Danish, , Turkish, and Korean.
    103 
    104   - patches/locale2.patch :
    105       The minimum locale data Chrome needs for 47 languages Chrome is
    106       not localized to. Each locale data file has ExemplarCharacters,
    107       LocaleScript, layout, and the name of the language for a locale
    108       in its native language.
    109 
    110   - patches/locale3.patch : Locale build configuration files. They
    111     add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to
    112     source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}.
    113 
    114   - In source/data/region, run the following command to get rid of numeric region
    115     display names we don't use (everything other than 419).
    116      $ sed -i  '/[0-35-9][0-9][0-9]{/ d' *.txt
    117 
    118   - android/patch_locale.sh (to be run for Android build only):
    119       Makes changes to source/data/{curr,region,lang} to exclude these data
    120       except the language and script names of zh_Hans and zh_Hant.
    121  
    122   - Add tg.txt to source/data/locale source/data/lang to add the minimal locale
    123     data necessary for the spellchecker. In both directories, add tg.txt to
    124     reslocal.mk
    125 
    126 7. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt
    127 
    128   - patches/unihan.patch:
    129     unihan collation tables are never used in Chrome/Webkit, but it takes
    130     about 1MB in the uncompressed ICU data file in ICU 4.2.1.
    131 
    132 8. Timezone data update
    133   - Grab the latest version of the following timezone data files and
    134     put them in source/data/misc.
    135 
    136      metaZones.txt
    137      timezoneTypes.txt
    138      windowsZones.txt
    139      zoneinfo64.txt
    140 
    141    As of Mar 2014, the latest version is 2014a and the above files
    142    are available at
    143    http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2014a/44/
    144 
    145 9. Transliterator customization
    146 
    147    - Add el_Upper.txt taken from ICU 52 to source/data/trnslit
    148 
    149    - Also add css3transform.txt to the same directory
    150    - Put the following line in trnslocal.mk
    151 
    152      TRANSLIT_SOURCE=css3transform.txt
    153 
    154 10. Build-related changes
    155 
    156   - patches/wpo.patch
    157   - patches/vscomp.patch
    158     (see http://bugs.icu-project.org/trac/ticket/8355 and
    159          http://bugs.icu-project.org/trac/ticket/8356 )
    160   - patches/rtti.patch : Make RTTI work without exception handling on Windows
    161     (see http://bugs.icu-project.org/trac/ticket/8343)
    162   - patches/data.build.patch :
    163       To remove some data files we don't use and cut down the data size.
    164   - patches/data.build.win.patch :
    165       Windows-only data build patch. Add a new target DATALIB to makedata.mak
    166   - patches/clang.patch: To build with Clang.
    167     (see http://bugs.icu-project.org/trac/ticket/8954 Two other chunks in
    168     the patch have already been fixed in the ICU trunk.)
    169   - add an empty file (stubdatabuilt.txt) to source/stubdata
    170 
    171 11. Pre-built data libraries are checked in.
    172 
    173     Before building data file on Linux, re-run 'runConfigureICU Linux' again
    174     if it's run without data.build.patch in #10 above.
    175 
    176     Because we removed layout and layoutex directories in step 3,
    177     'runConfigureICU Linux' will fail even with '--disable-layout'. A
    178     work-around is to have a copy of our icu tree in a separate build directory
    179     and add back directories we removed in step 3 before
    180     running 'runConfigure'.
    181 
    182     'make' will fail in the 1st pass. Copy source/data/in/coll/invuca.icu
    183     to {BUILD_DIR_ROOT}/data/out/build/icudt46l/coll and re-run 'make'
    184     in {BUILD_DIR_ROOT}/data.
    185 
    186     'make' will fail again when pkgdata looks for css3transform.res. Edit
    187     data/out/tmp/icudata.lst to replace 'css3transform.res' with 'root.res'.
    188     (see http://bugs.icu-project.org/trac/ticket/10570 ) and run 'make' again.
    189 
    190 
    191     - source/data/in/icudtl.dat : Built on Linux with all the patches
    192       above applied. icudt46l.dat is generated in
    193       {BUILD_DIR_ROOT}/data/out/tmp and copied to the above location with a
    194       version number (46) dropped.
    195 
    196     - windows/icudt.dll : With icudt46l.dat in place, all the patches applied
    197       and header files moved (#11 below), generated by building icudt_build
    198       project of build/icudt_build.sln on Windows. icudt46.dll is
    199       generated in bin/{Release,Debug} and copied to windows/icudt.dll
    200       and checked in. Note that we drop the version number ('46') from the
    201       dll name to avoind having to update our build scripts/configuration
    202       files everytime ICU is upgraded to a new version.
    203 
    204     - {mac,linux}/icudt46l_dat.S : Built on Linux with all the
    205       patches above (except android/brkitr.patch) applied and checked in.
    206       This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp.
    207 
    208       mac/icudt46l_dat.S is identical to linux/icudt46l_dat.S. It's made
    209       by changing the header portion of the Linux version to read as following
    210       (no leading whitespace) :
    211 
    212           .globl _icudt46_dat
    213           #ifdef U_HIDE_DATA_SYMBOL
    214                  .private_extern _icudt46_dat
    215           #endif
    216                  .data
    217                  .const
    218                  .align 4
    219           _icudt46_dat:
    220 
    221 
    222     - android/icudt46l_dat.S : Built on Linux with all the patches above and
    223       android/brkitr.patch applied and android/patch_locale.sh executed, and
    224       checked in.
    225     - android/icudtl.dat : Generated as icudt46l.dat in
    226       {BUILD_DIR_ROOT}/data/out/tmp along with icudt46l_dat.S and
    227       copied to the above location with '46' dropped in its name.
    228 
    229 
    230 12. Apply the fix found with static analysis tools such as PSV and coverity
    231 
    232   - patches/static.analysis.patch
    233   - upstream trunk/4.8 do not have this code any more.
    234 
    235 13. Fix for msvs2010 applied:
    236 --- D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
    237  (revision 78292)
    238 +++ D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
    239  (working copy)
    240 @@ -75,7 +75,7 @@
    241  * Visual Studios 9.0.
    242  * Cygwin with MSVC 9.0 also complains here about redefinition.
    243  */
    244 -#if (!defined(_MSC_VER) || (_MSC_VER > 1500)) && !defined(CYGWINMSVC)
    245 +#if (!defined(_MSC_VER) || (_MSC_VER > 1600)) && !defined(CYGWINMSVC)
    246  const int32_t StringPiece::npos;
    247  #endif
    248 
    249 14. Fix for locales that don't use '.' as decimal separator: patches/nan.patch
    250   - upstream bug: http://bugs.icu-project.org/trac/ticket/8561
    251   - Handle other chars besides the dot. This is required because decNumber's
    252     parser expects the dot as a decimal separator.
    253   - Locales that don't use dot were producing "NaN" values.
    254 
    255 15. Fix a bug in the regex engine.
    256   - patches/regex.patch
    257   - upstream bug: http://bugs.icu-project.org/trac/ticket/8666 (fixed in the upstream)
    258 
    259 16. Apply the upstream patch for Korean search collator support (ICU 4.6.1).
    260    - patches/search_collation.patch
    261    - upstream bug: http://bugs.icu-project.org/trac/ticket/8290
    262 
    263 17. Fix a use of uninitialized memory bug in regular expression matching
    264    - patches/rematch.patch
    265    - upstream bug: http://bugs.icu-project.org/trac/ticket/8824
    266 
    267 18. Make it compile with -Werror on gcc 4.6
    268    - patches/gcc46.patch (ToT upstream does not have this code any more).
    269 
    270 19. Fix four out of bounds memory access error in common/uloc.c
    271     and common/uresbund.c
    272    - patches/uloc.patch
    273    - upstream bug:
    274      1. http://bugs.icu-project.org/trac/ticket/8984 (_canonicalize)
    275      2. http://bugs.icu-project.org/trac/ticket/9114 (_getKeywords)
    276      3. http://bugs.icu-project.org/trac/ticket/8812 (uresbund)
    277         http://bugs.icu-project.org/trac/ticket/8813 (uresbund)
    278      4. http://bugs.icu-project.org/trac/ticket/10250 (_getKeywords)
    279 
    280 20. Fix a null pointer error in ubrk_setText in ubrk.cpp.
    281     - patches/ubrk.patch
    282     - upstream bug : http://bugs.icu-project.org/trac/ticket/9115
    283 
    284 21. Fix a clang warning in rbbi.cpp by merging in an upstream change.
    285     - patches/changeset_30255.patch
    286     - upstream change : http://bugs.icu-project.org/trac/changeset/30255
    287 
    288 22. Fix time zone handling and compilation on iOS.
    289     - patches/ios_timezone.patch
    290     - upstream bugs : http://bugs.icu-project.org/trac/ticket/9051
    291                       http://bugs.icu-project.org/trac/ticket/8661
    292 
    293 23. Fix a buffer overflow in utext
    294     - patches/utext.patch
    295     - upstream change : http://bugs.icu-project.org/trac/changeset/29356
    296 
    297 24. Fix compilation errors on VS2012 and above.
    298     - patches/vs2012.patch
    299 
    300 25. Fix a buffer overflow in UTF-16/32 detection.
    301     - patches/csetdet.patch
    302     - upstream bug: http://bugs.icu-project.org/trac/ticket/10318
    303 
    304 26. Add BreakIterator::getRuleStatus
    305     - patches/breakiterator.patch
    306     - Copy and paste BreakIterator::getRuleStatus API from ICU 52
    307 
    308 27. Change export of U_ICUDATA_ENTRY_POINT from U_IMPORT to U_EXPORT.
    309     - patches/declspec.patch
    310 
    311 28. Add support for QNX Neutrino.
    312     -  patches/platform.qnx.patch:
    313        See #2 about the platform header generation.
    314     -  patches/si_value.undef.patch:
    315        Work around an all-lowercase macro defined in <signal.h>.
    316        Upstream took a different approach:
    317        http://bugs.icu-project.org/trac/ticket/9935
    318     -  patches/xopen_source.patch:
    319        Set _XOPEN_SOURCE to 600 as in the upstream changeset:
    320        http://bugs.icu-project.org/trac/changeset/30418
    321