Home | History | Annotate | Download | only in icu
      1 Name: icu
      2 URL: http://site.icu-project.org/
      3 Version: 4.6
      4 License: MIT
      5 Security Critical: yes
      6 
      7 Description:
      8 This directory contains the source code of ICU 4.6 for C/C++
      9 
     10 1. It was obtained with the following:
     11 
     12     $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-6 icu46
     13 
     14 2. Platform header files for Linux, FreeBSD, OpenBSD, Android and Mac OS X:
     15 
     16    - Apply platform.patch in patches directory. : It applies the upstream
     17      patch to platform.h.in (see http://bugs.icu-project.org/trac/ticket/8248)
     18      and change source/common/unicode/ptypes.h to refer to plinux.h and
     19      pmac.h generated below.
     20 
     21    - 'runConfigureICU Linux', 'runConfigureICU FreeBSD', and
     22      'runConfigureICU MacOSX' are run to generate
     23      source/common/unicode/platform.h.
     24 
     25    - On OpenBSD, source/common/unicode/platform.h is being generated
     26      by the icu4c port in the ports directory and not by runConfigureICU.
     27      In case the file has to be updated you can do:
     28      cd /home/ports/textproc/icu4c && make configure
     29 
     30    - Rename it to 'plinux.h', 'pfreebsd.h', 'popenbsd.h' and 'pmac.h'
     31 
     32    - Apply patches/pmach.h.patch on Mac to pmac.h
     33 
     34    - On Android, the pandroid.h was generated by copying plinux.h to
     35      pandroid.h and applying the patches/pandroid.h.patch.
     36 
     37    - Apply the CL at https://codereview.chromium.org/15973007/ to plinux.h
     38 
     39 3. The following directories were removed because they're not used by Chromium
     40    at the moment:
     41    as_is
     42    packaging
     43    source/extra
     44    source/sample
     45    source/layout
     46    source/layoutex
     47 
     48 
     49 4. The word breaking for Chinese and Japanese were modified to use a word
     50    frequency list with the following patch and cjdict.txt.
     51   
     52    - patches/segmentation.patch :
     53        Adds a dictionary (word-frequency)-based word breaking for CJK
     54        (Korean is supported in the code, but it does not do anything
     55         because we don't have a Korean word-list.)
     56 
     57    - source/data/brkitr/cjdict.txt :
     58        Chinese and Japanese word frequency list.
     59        See the file for license/copyright notice
     60 
     61    - source/data/brkitr/cc_edict.txt :
     62        the list of words derived from CC-Edict.)
     63 
     64    - patches/brkitr.patch
     65      * word.txt : Chinese/Japanese segmentation rules, Hebrew-script-specific
     66                   handling of U+0022, and splitting of FQDN into labels at '.'.
     67 		  For Hebrew, see http://unicode.org/cldr/track/ticket/3120
     68      * line.txt : Incorporated line_he and minor changes in CL, OP and ID
     69                   definitions.
     70 		  For Hebrew, see http://unicode.org/cldr/track/ticket/4004
     71 		  For others, see http://unicode.org/cldr/track/ticket/3974
     72 		                  http://unicode.org/cldr/track/ticket/4200
     73 		                  http://unicode.org/cldr/track/ticket/
     74      * brklocal.mk : build file changes to drop unnecessary brkitr rule
     75                      files (e.g. word_ja.txt, line_he.txt)
     76 
     77    - android/brkitr.patch (to be applied for Android build only) :
     78        Reverts some changes about Chinese/Japanese segmentation rules in
     79        patches/brkitr.patch to reduce binary size for Android.
     80 
     81    If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt
     82    to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test.
     83 
     84 5. Converter changes : converters.patch
     85   - Include what we really need. See source/data/mappings/ucmlocal.txt
     86   - Alias and mapping changes : source/data/mappings/convrtrs.txt
     87   - Changes several tables and add six new tables, three of which
     88     are 'fake' tables for ISO-2022-CN(-Ext).
     89   - ucnv2022.c is modified to use 3 'fake' tables added above for
     90     ISO-2022-CN(-Ext).
     91 
     92 6. Locale changes
     93   - patches/locale1.patch :
     94       Filipino, Amharic, and Swahili locales
     95       exemplar character set changes for CJK + 9 Indian locales
     96       Minor fixes for Danish, , Turkish, and Korean.
     97 
     98   - patches/locale2.patch :
     99       The minimum locale data Chrome needs for 47 languages Chrome is
    100       not localized to. Each locale data file has ExemplarCharacters,
    101       LocaleScript, layout, and the name of the language for a locale
    102       in its native language.
    103 
    104   - patches/locale3.patch : Locale build configuration files. They
    105     add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to
    106     source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}.
    107 
    108   - In source/data/region, run the following command to get rid of numeric region
    109     display names we don't use (everything other than 419).
    110      $ sed -i  '/[0-35-9][0-9][0-9]{/ d' *.txt
    111 
    112   - android/patch_locale.sh (to be run for Android build only):
    113       Makes changes to source/data/{curr,region,lang} to exclude these data
    114       except the language and script names of zh_Hans and zh_Hant.
    115 
    116 7. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt
    117 
    118   - patches/unihan.patch:
    119     unihan collation tables are never used in Chrome/Webkit, but it takes
    120     about 1MB in the uncompressed ICU data file in ICU 4.2.1.
    121 
    122 8. Timezone data update
    123   - Grab the latest version of the following timezone data files and
    124     put them in source/data/misc.
    125 
    126      metaZones.txt
    127      timezoneTypes.txt
    128      windowsZones.txt
    129      zoneinfo64.txt
    130 
    131    As of Dec 2013, the latest version is 2013h and the above files
    132    are available at
    133    http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2013h/44/
    134 
    135 9. Transliterator customization
    136 
    137    - Add the following files taken from ICU 52 to source/data/trnslit
    138 
    139      {tr,el,az}_{Upper,Lower,Title}.txt
    140 
    141    - Also add css3transform.txt to the same directory
    142    - Put the following line in trnslocal.mk
    143 
    144      TRANSLIT_SOURCE=css3transform.txt
    145 
    146 10. Build-related changes
    147 
    148   - patches/wpo.patch
    149   - patches/vscomp.patch
    150     (see http://bugs.icu-project.org/trac/ticket/8355 and
    151          http://bugs.icu-project.org/trac/ticket/8356 )
    152   - patches/rtti.patch : Make RTTI work without exception handling on Windows
    153     (see http://bugs.icu-project.org/trac/ticket/8343)
    154   - patches/data.build.patch :
    155       To remove some data files we don't use and cut down the data size.
    156   - patches/data.build.win.patch :
    157       Windows-only data build patch. Add a new target DATALIB to makedata.mak
    158   - patches/clang.patch: To build with Clang.
    159     (see http://bugs.icu-project.org/trac/ticket/8954 Two other chunks in
    160     the patch have already been fixed in the ICU trunk.)
    161   - add an empty file (stubdatabuilt.txt) to source/stubdata
    162 
    163 11. Pre-built data libraries are checked in.
    164 
    165     Before building data file on Linux, re-run 'runConfigureICU Linux' again
    166     if it's run without data.build.patch in #10 above.
    167 
    168     Because we removed layout and layoutex directories in step 3,
    169     'runConfigureICU Linux' will fail even with '--disable-layout'. A
    170     work-around is to have a copy of our icu tree in a separate build directory
    171     and add back directories we removed in step 3 before
    172     running 'runConfigure'.
    173 
    174     'make' will fail in the 1st pass. Copy source/data/in/coll/invuca.icu
    175     to {BUILD_DIR_ROOT}/data/out/build/icudt46l/coll and re-run 'make'
    176     in {BUILD_DIR_ROOT}/data.
    177 
    178     'make' will fail again when pkgdata looks for css3transform.res. Edit 
    179     data/out/tmp/icudata.lst to replace 'css3transform.res' with 'root.res'.
    180     (see http://bugs.icu-project.org/trac/ticket/10570 ) and run 'make' again.
    181 
    182 
    183     - source/data/in/icudt46l.dat : Built on Linux with all the patches
    184       above applied. This file will be generated in 
    185       {BUILD_DIR_ROOT}/data/out/tmp.
    186 
    187     - windows/icudt.dll : With icudt46l.dat in place, all the patches applied
    188       and header files moved (#11 below), generated by building icudt_build
    189       project of build/icudt_build.sln on Windows. icudt46.dll is
    190       generated in bin/{Release,Debug} and copied to windows/icudt.dll
    191       and checked in. Note that we drop the version number ('46') from the
    192       dll name to avoind having to update our build scripts/configuration
    193       files everytime ICU is upgraded to a new version.
    194 
    195     - {mac,linux}/icudt46l_dat.S : Built on Mac and Linux with all the
    196       patches above (except android/brkitr.patch) applied and checked in.
    197       This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp.
    198 
    199       Alternatively, one can just generate icudt46l_dat.S on Linux and adopt
    200       the header portion to match the current header in mac/icudt46l_dat.S.
    201       That is as following without no leading space in each line:
    202 
    203           .globl _icudt46_dat
    204           #ifdef U_HIDE_DATA_SYMBOL
    205                  .private_extern _icudt46_dat
    206           #endif
    207                  .data
    208                  .const
    209                  .align 4
    210           _icudt46_dat:
    211 
    212 
    213     - android/icudt46l_dat.S : Built on Linux with all the patches above and
    214       android/brkitr.patch applied and android/patch_locale.sh executed, and
    215       checked in.
    216 
    217 12. Apply the fix found with static analysis tools such as PSV and coverity
    218 
    219   - patches/static.analysis.patch
    220   - upstream trunk/4.8 do not have this code any more.
    221 
    222 13. Fix for msvs2010 applied:
    223 --- D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
    224  (revision 78292)
    225 +++ D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
    226  (working copy)
    227 @@ -75,7 +75,7 @@
    228  * Visual Studios 9.0.
    229  * Cygwin with MSVC 9.0 also complains here about redefinition.
    230  */
    231 -#if (!defined(_MSC_VER) || (_MSC_VER > 1500)) && !defined(CYGWINMSVC)
    232 +#if (!defined(_MSC_VER) || (_MSC_VER > 1600)) && !defined(CYGWINMSVC)
    233  const int32_t StringPiece::npos;
    234  #endif
    235 
    236 14. Fix for locales that don't use '.' as decimal separator: patches/nan.patch
    237   - upstream bug: http://bugs.icu-project.org/trac/ticket/8561
    238   - Handle other chars besides the dot. This is required because decNumber's
    239     parser expects the dot as a decimal separator.
    240   - Locales that don't use dot were producing "NaN" values.
    241 
    242 15. Fix a bug in the regex engine.
    243   - patches/regex.patch
    244   - upstream bug: http://bugs.icu-project.org/trac/ticket/8666 (fixed in the upstream)
    245 
    246 16. Apply the upstream patch for Korean search collator support (ICU 4.6.1).
    247    - patches/search_collation.patch
    248    - upstream bug: http://bugs.icu-project.org/trac/ticket/8290
    249 
    250 17. Fix a use of uninitialized memory bug in regular expression matching
    251    - patches/rematch.patch
    252    - upstream bug: http://bugs.icu-project.org/trac/ticket/8824
    253 
    254 18. Make it compile with -Werror on gcc 4.6
    255    - patches/gcc46.patch (ToT upstream does not have this code any more).
    256 
    257 19. Fix four out of bounds memory access error in common/uloc.c
    258     and common/uresbund.c 
    259    - patches/uloc.patch
    260    - upstream bug: 
    261      1. http://bugs.icu-project.org/trac/ticket/8984 (_canonicalize)
    262      2. http://bugs.icu-project.org/trac/ticket/9114 (_getKeywords)
    263      3. http://bugs.icu-project.org/trac/ticket/8812 (uresbund)
    264         http://bugs.icu-project.org/trac/ticket/8813 (uresbund)
    265      4. http://bugs.icu-project.org/trac/ticket/10250 (_getKeywords)
    266 
    267 20. Fix a null pointer error in ubrk_setText in ubrk.cpp.
    268     - patches/ubrk.patch
    269     - upstream bug : http://bugs.icu-project.org/trac/ticket/9115
    270 
    271 21. Fix a clang warning in rbbi.cpp by merging in an upstream change.
    272     - patches/changeset_30255.patch
    273     - upstream change : http://bugs.icu-project.org/trac/changeset/30255
    274 
    275 22. Fix time zone handling and compilation on iOS.
    276     - patches/ios_timezone.patch
    277     - upstream bugs : http://bugs.icu-project.org/trac/ticket/9051
    278     -                 http://bugs.icu-project.org/trac/ticket/8661
    279 
    280 23. Fix a buffer overflow in utext
    281     - patches/utext.patch
    282     - upstream change : http://bugs.icu-project.org/trac/changeset/29356
    283 
    284 24. Fix compilation errors on VS2012 and above.
    285     - patches/vs2012.patch
    286 
    287 25. Fix a buffer overflow in UTF-16/32 detection.
    288     - patches/csetdet.patch
    289     - upstream bug: http://bugs.icu-project.org/trac/ticket/10318
    290 
    291 26. Add BreakIterator::getRuleStatus
    292     - patches/breakiterator.patch
    293     - Copy and paste BreakIterator::getRuleStatus API from ICU 52
    294