1 Name: icu 2 URL: http://site.icu-project.org/ 3 Version: 4.6 4 License: MIT 5 Security Critical: yes 6 7 Description: 8 This directory contains the source code of ICU 4.6 for C/C++ 9 10 1. It was obtained with the following: 11 12 $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-6 icu46 13 14 2. Platform header files for Linux, FreeBSD, OpenBSD, Android and Mac OS X: 15 16 - Apply platform.patch in patches directory. : It applies the upstream 17 patch to platform.h.in (see http://bugs.icu-project.org/trac/ticket/8248) 18 and change source/common/unicode/ptypes.h to refer to plinux.h and 19 pmac.h generated below. 20 21 - 'runConfigureICU Linux', 'runConfigureICU FreeBSD', and 22 'runConfigureICU MacOSX' are run to generate 23 source/common/unicode/platform.h. 24 25 - On OpenBSD, source/common/unicode/platform.h is being generated 26 by the icu4c port in the ports directory and not by runConfigureICU. 27 In case the file has to be updated you can do: 28 cd /home/ports/textproc/icu4c && make configure 29 30 - Rename it to 'plinux.h', 'pfreebsd.h', 'popenbsd.h' and 'pmac.h' 31 32 - Apply patches/pmach.h.patch on Mac to pmac.h 33 34 - On Android, the pandroid.h was generated by copying plinux.h to 35 pandroid.h and applying the patches/pandroid.h.patch. 36 37 - Apply the CL at https://codereview.chromium.org/15973007/ to plinux.h 38 39 3. The following directories were removed because they're not used by Chromium 40 at the moment: 41 as_is 42 packaging 43 source/extra 44 source/sample 45 source/layout 46 source/layoutex 47 48 49 4. The word breaking for Chinese and Japanese were modified to use a word 50 frequency list with the following patch and cjdict.txt. 51 52 - patches/segmentation.patch : 53 Adds a dictionary (word-frequency)-based word breaking for CJK 54 (Korean is supported in the code, but it does not do anything 55 because we don't have a Korean word-list.) 56 57 - source/data/brkitr/cjdict.txt : 58 Chinese and Japanese word frequency list. 59 See the file for license/copyright notice 60 61 - source/data/brkitr/cc_edict.txt : 62 the list of words derived from CC-Edict.) 63 64 - patches/brkitr.patch 65 * word.txt : Chinese/Japanese segmentation rules, Hebrew-script-specific 66 handling of U+0022, and splitting of FQDN into labels at '.'. 67 For Hebrew, see http://unicode.org/cldr/track/ticket/3120 68 * line.txt : Incorporated line_he and minor changes in CL, OP and ID 69 definitions. 70 For Hebrew, see http://unicode.org/cldr/track/ticket/4004 71 For others, see http://unicode.org/cldr/track/ticket/3974 72 http://unicode.org/cldr/track/ticket/4200 73 http://unicode.org/cldr/track/ticket/ 74 * brklocal.mk : build file changes to drop unnecessary brkitr rule 75 files (e.g. word_ja.txt, line_he.txt) 76 77 - android/brkitr.patch (to be applied for Android build only) : 78 Reverts some changes about Chinese/Japanese segmentation rules in 79 patches/brkitr.patch to reduce binary size for Android. 80 81 If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt 82 to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test. 83 84 5. Converter changes : converters.patch 85 - Include what we really need. See source/data/mappings/ucmlocal.txt 86 - Alias and mapping changes : source/data/mappings/convrtrs.txt 87 - Changes several tables and add six new tables, three of which 88 are 'fake' tables for ISO-2022-CN(-Ext). 89 - ucnv2022.c is modified to use 3 'fake' tables added above for 90 ISO-2022-CN(-Ext). 91 92 6. Locale changes 93 - patches/locale1.patch : 94 Filipino, Amharic, and Swahili locales 95 exemplar character set changes for CJK + 9 Indian locales 96 Minor fixes for Danish, , Turkish, and Korean. 97 98 - patches/locale2.patch : 99 The minimum locale data Chrome needs for 47 languages Chrome is 100 not localized to. Each locale data file has ExemplarCharacters, 101 LocaleScript, layout, and the name of the language for a locale 102 in its native language. 103 104 - patches/locale3.patch : Locale build configuration files. They 105 add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to 106 source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}. 107 108 - In source/data/region, run the following command to get rid of numeric region 109 display names we don't use (everything other than 419). 110 $ sed -i '/[0-35-9][0-9][0-9]{/ d' *.txt 111 112 - android/patch_locale.sh (to be run for Android build only): 113 Makes changes to source/data/{curr,region,lang} to exclude these data 114 except the language and script names of zh_Hans and zh_Hant. 115 116 7. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt 117 118 - patches/unihan.patch: 119 unihan collation tables are never used in Chrome/Webkit, but it takes 120 about 1MB in the uncompressed ICU data file in ICU 4.2.1. 121 122 8. Timezone data update 123 - Grab the latest version of the following timezone data files and 124 put them in source/data/misc. 125 126 metaZones.txt 127 timezoneTypes.txt 128 windowsZones.txt 129 zoneinfo64.txt 130 131 As of Nov, 2011, the latest version is 2011n and the above files 132 are available at 133 http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2011n/44/ 134 135 9. Build-related changes 136 137 - patches/wpo.patch 138 - patches/vscomp.patch 139 (see http://bugs.icu-project.org/trac/ticket/8355 and 140 http://bugs.icu-project.org/trac/ticket/8356 ) 141 - patches/rtti.patch : Make RTTI work without exception handling on Windows 142 (see http://bugs.icu-project.org/trac/ticket/8343) 143 - patches/data.build.patch : 144 To remove some data files we don't use and cut down the data size. 145 - patches/data.build.win.patch : 146 Windows-only data build patch. Add a new target DATALIB to makedata.mak 147 - patches/clang.patch: To build with Clang. 148 (see http://bugs.icu-project.org/trac/ticket/8954 Two other chunks in 149 the patch have already been fixed in the ICU trunk.) 150 - add an empty file (stubdatabuilt.txt) to source/stubdata 151 152 10. Pre-built data libraries are checked in. 153 154 Before building data file on Linux, re-run runConfigureICU Linux again 155 if it's run without data.build.patch in #8 above. 156 157 'make' will fail in the 1st pass. Copy source/data/in/coll/invuca.icu 158 to {BUILD_DIR_ROOT}/data/out/build/icudt46l/coll and re-run 'make' 159 in {BUILD_DIR_ROOT}/data. 160 161 - source/data/in/icudt46l.dat : Built on Linux with all the patches 162 above applied, 163 164 - windows/icudt.dll : With icudt46l.dat in place, all the patches applied 165 and header files moved (#11 below), generated by building icudt_build 166 project of build/icudt_build.sln on Windows. icudt46.dll is 167 generated in bin/{Release,Debug} and copied to windows/icudt.dll 168 and checked in. Note that we drop the version number ('46') from the 169 dll name to avoind having to update our build scripts/configuration 170 files everytime ICU is upgraded to a new version. 171 172 - {mac,linux}/icudt46l_dat.S : Built on Mac and Linux with all the 173 patches above (except android/brkitr.patch) applied and checked in. 174 175 - android/icudt46l_dat.S : Built on Linux with all the patches above and 176 android/brkitr.patch applied and android/patch_locale.sh executed, and 177 checked in. 178 179 11. The header files were moved as shown below: 180 181 source/common/unicode ==> public/common/unicode 182 source/i18n/unicode ==> public/i18n/unicode 183 184 12. Apply the fix found with static analysis tools such as PSV and coverity 185 186 - patches/static.analysis.patch 187 - upstream trunk/4.8 do not have this code any more. 188 189 13. Fix for msvs2010 applied: 190 --- D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp 191 (revision 78292) 192 +++ D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp 193 (working copy) 194 @@ -75,7 +75,7 @@ 195 * Visual Studios 9.0. 196 * Cygwin with MSVC 9.0 also complains here about redefinition. 197 */ 198 -#if (!defined(_MSC_VER) || (_MSC_VER > 1500)) && !defined(CYGWINMSVC) 199 +#if (!defined(_MSC_VER) || (_MSC_VER > 1600)) && !defined(CYGWINMSVC) 200 const int32_t StringPiece::npos; 201 #endif 202 203 14. Fix for locales that don't use '.' as decimal separator: patches/nan.patch 204 - upstream bug: http://bugs.icu-project.org/trac/ticket/8561 205 - Handle other chars besides the dot. This is required because decNumber's 206 parser expects the dot as a decimal separator. 207 - Locales that don't use dot were producing "NaN" values. 208 209 15. Fix a bug in the regex engine. 210 - patches/regex.patch 211 - upstream bug: http://bugs.icu-project.org/trac/ticket/8666 (fixed in the upstream) 212 213 16. Apply the upstream patch for Korean search collator support (ICU 4.6.1). 214 - patches/search_collation.patch 215 - upstream bug: http://bugs.icu-project.org/trac/ticket/8290 216 217 17. Fix a use of uninitialized memory bug in regular expression matching 218 - patches/rematch.patch 219 - upstream bug: http://bugs.icu-project.org/trac/ticket/8824 220 221 18. Make it compile with -Werror on gcc 4.6 222 - patches/gcc46.patch (ToT upstream does not have this code any more). 223 224 19. Fix four out of bounds memory access error in common/uloc.c 225 and common/uresbund.c 226 - patches/uloc.patch 227 - upstream bug: 228 1. http://bugs.icu-project.org/trac/ticket/8984 (_canonicalize) 229 2. http://bugs.icu-project.org/trac/ticket/9114 (_getKeywords) 230 3. http://bugs.icu-project.org/trac/ticket/8812 (uresbund) 231 http://bugs.icu-project.org/trac/ticket/8813 (uresbund) 232 4. http://bugs.icu-project.org/trac/ticket/10250 (_getKeywords) 233 234 20. Fix a null pointer error in ubrk_setText in ubrk.cpp. 235 - patches/ubrk.patch 236 - upstream bug : http://bugs.icu-project.org/trac/ticket/9115 237 238 21. Fix a clang warning in rbbi.cpp by merging in an upstream change. 239 - patches/changeset_30255.patch 240 - upstream change : http://bugs.icu-project.org/trac/changeset/30255 241 242 22. Fix time zone handling and compilation on iOS. 243 - patches/ios_timezone.patch 244 - upstream bugs : http://bugs.icu-project.org/trac/ticket/9051 245 - http://bugs.icu-project.org/trac/ticket/8661 246 247 23. Fix a buffer overflow in utext 248 - patches/utext.patch 249 - upstream change : http://bugs.icu-project.org/trac/changeset/29356 250 251 24. Fix compilation errors on VS2012. 252 - patches/vs2012.patch 253 254 25. Fix a buffer overflow in UTF-16/32 detection. 255 - patches/csetdet.patch 256 - upstream bug: http://bugs.icu-project.org/trac/ticket/10318 257 258 259 260