1 Name: icu 2 URL: http://site.icu-project.org/ 3 Version: 4.6 4 License: MIT 5 Security Critical: yes 6 7 Description: 8 This directory contains the source code of ICU 4.6 for C/C++ 9 10 1. It was obtained with the following: 11 12 $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-6 icu46 13 14 2. Platform header files for Linux, FreeBSD, OpenBSD, Android, Mac OS X, and QNX: 15 16 - Apply platform.patch in patches directory. : It applies the upstream 17 patch to platform.h.in (see http://bugs.icu-project.org/trac/ticket/8248) 18 and change source/common/unicode/ptypes.h to refer to plinux.h and 19 pmac.h generated below. 20 21 - 'runConfigureICU Linux', 'runConfigureICU FreeBSD', and 22 'runConfigureICU MacOSX' are run to generate 23 source/common/unicode/platform.h. 24 25 - On OpenBSD, source/common/unicode/platform.h is being generated 26 by the icu4c port in the ports directory and not by runConfigureICU. 27 In case the file has to be updated you can do: 28 cd /home/ports/textproc/icu4c && make configure 29 30 - Rename it to 'plinux.h', 'pfreebsd.h', 'popenbsd.h' and 'pmac.h' 31 32 - Apply patches/pmach.h.patch on Mac to pmac.h 33 34 - On Android, the pandroid.h was generated by copying plinux.h to 35 pandroid.h and applying the patches/pandroid.h.patch. 36 37 - For QNX, the pqnx.h was generated by copying plinux.h to 38 pqnx.h and applying the patches/platform.qnx.patch. 39 40 - For NaCl (icu_nacl.gypi), the pnacl.h was generated by copying plinux.h to 41 pnacl.h and applying the patches/pnacl.h.patch. 42 43 - Apply the CL at https://codereview.chromium.org/15973007/ to plinux.h 44 45 3. The following directories were removed because they're not used by Chromium 46 at the moment: 47 as_is 48 packaging 49 source/extra 50 source/sample 51 source/layout 52 source/layoutex 53 54 55 4. The word breaking for Chinese and Japanese were modified to use a word 56 frequency list with the following patch and cjdict.txt. 57 58 - patches/segmentation.patch : 59 Adds a dictionary (word-frequency)-based word breaking for CJK 60 (Korean is supported in the code, but it does not do anything 61 because we don't have a Korean word-list.) 62 63 - source/data/brkitr/cjdict.txt : 64 Chinese and Japanese word frequency list. 65 See the file for license/copyright notice 66 67 - source/data/brkitr/cc_edict.txt : 68 the list of words derived from CC-Edict.) 69 70 - patches/brkitr.patch 71 * word.txt : Chinese/Japanese segmentation rules, Hebrew-script-specific 72 handling of U+0022, and splitting of FQDN into labels at '.'. 73 For Hebrew, see http://unicode.org/cldr/track/ticket/3120 74 * line.txt : Incorporated line_he and minor changes in CL, OP and ID 75 definitions. 76 For Hebrew, see http://unicode.org/cldr/track/ticket/4004 77 For others, see http://unicode.org/cldr/track/ticket/3974 78 http://unicode.org/cldr/track/ticket/4200 79 http://unicode.org/cldr/track/ticket/ 80 * brklocal.mk : build file changes to drop unnecessary brkitr rule 81 files (e.g. word_ja.txt, line_he.txt) 82 83 - android/brkitr.patch (to be applied for Android build only) : 84 Reverts some changes about Chinese/Japanese segmentation rules in 85 patches/brkitr.patch to reduce binary size for Android. 86 87 If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt 88 to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test. 89 90 5. Converter changes : converters.patch 91 - Include what we really need. See source/data/mappings/ucmlocal.txt 92 - Alias and mapping changes : source/data/mappings/convrtrs.txt 93 - Changes several tables and add six new tables, three of which 94 are 'fake' tables for ISO-2022-CN(-Ext). 95 - ucnv2022.c is modified to use 3 'fake' tables added above for 96 ISO-2022-CN(-Ext). 97 98 6. Locale changes 99 - patches/locale1.patch : 100 Filipino, Amharic, and Swahili locales 101 exemplar character set changes for CJK + 9 Indian locales 102 Minor fixes for Danish, , Turkish, and Korean. 103 104 - patches/locale2.patch : 105 The minimum locale data Chrome needs for 47 languages Chrome is 106 not localized to. Each locale data file has ExemplarCharacters, 107 LocaleScript, layout, and the name of the language for a locale 108 in its native language. 109 110 - patches/locale3.patch : Locale build configuration files. They 111 add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to 112 source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}. 113 114 - In source/data/region, run the following command to get rid of numeric region 115 display names we don't use (everything other than 419). 116 $ sed -i '/[0-35-9][0-9][0-9]{/ d' *.txt 117 118 - android/patch_locale.sh (to be run for Android build only): 119 Makes changes to source/data/{curr,region,lang} to exclude these data 120 except the language and script names of zh_Hans and zh_Hant. 121 122 - Add tg.txt to source/data/locale source/data/lang to add the minimal locale 123 data necessary for the spellchecker. In both directories, add tg.txt to 124 reslocal.mk 125 126 7. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt 127 128 - patches/unihan.patch: 129 unihan collation tables are never used in Chrome/Webkit, but it takes 130 about 1MB in the uncompressed ICU data file in ICU 4.2.1. 131 132 8. Timezone data update 133 - Grab the latest version of the following timezone data files and 134 put them in source/data/misc. 135 136 metaZones.txt 137 timezoneTypes.txt 138 windowsZones.txt 139 zoneinfo64.txt 140 141 As of Mar 2014, the latest version is 2014a and the above files 142 are available at 143 http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2014a/44/ 144 145 9. Transliterator customization 146 147 - Add el_Upper.txt taken from ICU 52 to source/data/trnslit 148 149 - Also add css3transform.txt to the same directory 150 - Put the following line in trnslocal.mk 151 152 TRANSLIT_SOURCE=css3transform.txt 153 154 10. Build-related changes 155 156 - patches/wpo.patch 157 - patches/vscomp.patch 158 (see http://bugs.icu-project.org/trac/ticket/8355 and 159 http://bugs.icu-project.org/trac/ticket/8356 ) 160 - patches/rtti.patch : Make RTTI work without exception handling on Windows 161 (see http://bugs.icu-project.org/trac/ticket/8343) 162 - patches/data.build.patch : 163 To remove some data files we don't use and cut down the data size. 164 - patches/data.build.win.patch : 165 Windows-only data build patch. Add a new target DATALIB to makedata.mak 166 - patches/clang.patch: To build with Clang. 167 (see http://bugs.icu-project.org/trac/ticket/8954 Two other chunks in 168 the patch have already been fixed in the ICU trunk.) 169 - add an empty file (stubdatabuilt.txt) to source/stubdata 170 171 11. Pre-built data libraries are checked in. 172 173 Before building data file on Linux, re-run 'runConfigureICU Linux' again 174 if it's run without data.build.patch in #10 above. 175 176 Because we removed layout and layoutex directories in step 3, 177 'runConfigureICU Linux' will fail even with '--disable-layout'. A 178 work-around is to have a copy of our icu tree in a separate build directory 179 and add back directories we removed in step 3 before 180 running 'runConfigure'. 181 182 'make' will fail in the 1st pass. Copy source/data/in/coll/invuca.icu 183 to {BUILD_DIR_ROOT}/data/out/build/icudt46l/coll and re-run 'make' 184 in {BUILD_DIR_ROOT}/data. 185 186 'make' will fail again when pkgdata looks for css3transform.res. Edit 187 data/out/tmp/icudata.lst to replace 'css3transform.res' with 'root.res'. 188 (see http://bugs.icu-project.org/trac/ticket/10570 ) and run 'make' again. 189 190 191 - source/data/in/icudtl.dat : Built on Linux with all the patches 192 above applied. icudt46l.dat is generated in 193 {BUILD_DIR_ROOT}/data/out/tmp and copied to the above location with a 194 version number (46) dropped. 195 196 - windows/icudt.dll : With icudt46l.dat in place, all the patches applied 197 and header files moved (#11 below), generated by building icudt_build 198 project of build/icudt_build.sln on Windows. icudt46.dll is 199 generated in bin/{Release,Debug} and copied to windows/icudt.dll 200 and checked in. Note that we drop the version number ('46') from the 201 dll name to avoind having to update our build scripts/configuration 202 files everytime ICU is upgraded to a new version. 203 204 - {mac,linux}/icudt46l_dat.S : Built on Linux with all the 205 patches above (except android/brkitr.patch) applied and checked in. 206 This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp. 207 208 mac/icudt46l_dat.S is identical to linux/icudt46l_dat.S. It's made 209 by changing the header portion of the Linux version to read as following 210 (no leading whitespace) : 211 212 .globl _icudt46_dat 213 #ifdef U_HIDE_DATA_SYMBOL 214 .private_extern _icudt46_dat 215 #endif 216 .data 217 .const 218 .align 4 219 _icudt46_dat: 220 221 222 - android/icudt46l_dat.S : Built on Linux with all the patches above and 223 android/brkitr.patch applied and android/patch_locale.sh executed, and 224 checked in. 225 - android/icudtl.dat : Generated as icudt46l.dat in 226 {BUILD_DIR_ROOT}/data/out/tmp along with icudt46l_dat.S and 227 copied to the above location with '46' dropped in its name. 228 229 230 12. Apply the fix found with static analysis tools such as PSV and coverity 231 232 - patches/static.analysis.patch 233 - upstream trunk/4.8 do not have this code any more. 234 235 13. Fix for msvs2010 applied: 236 --- D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp 237 (revision 78292) 238 +++ D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp 239 (working copy) 240 @@ -75,7 +75,7 @@ 241 * Visual Studios 9.0. 242 * Cygwin with MSVC 9.0 also complains here about redefinition. 243 */ 244 -#if (!defined(_MSC_VER) || (_MSC_VER > 1500)) && !defined(CYGWINMSVC) 245 +#if (!defined(_MSC_VER) || (_MSC_VER > 1600)) && !defined(CYGWINMSVC) 246 const int32_t StringPiece::npos; 247 #endif 248 249 14. Fix for locales that don't use '.' as decimal separator: patches/nan.patch 250 - upstream bug: http://bugs.icu-project.org/trac/ticket/8561 251 - Handle other chars besides the dot. This is required because decNumber's 252 parser expects the dot as a decimal separator. 253 - Locales that don't use dot were producing "NaN" values. 254 255 15. Fix a bug in the regex engine. 256 - patches/regex.patch 257 - upstream bug: http://bugs.icu-project.org/trac/ticket/8666 (fixed in the upstream) 258 259 16. Apply the upstream patch for Korean search collator support (ICU 4.6.1). 260 - patches/search_collation.patch 261 - upstream bug: http://bugs.icu-project.org/trac/ticket/8290 262 263 17. Fix a use of uninitialized memory bug in regular expression matching 264 - patches/rematch.patch 265 - upstream bug: http://bugs.icu-project.org/trac/ticket/8824 266 267 18. Make it compile with -Werror on gcc 4.6 268 - patches/gcc46.patch (ToT upstream does not have this code any more). 269 270 19. Fix four out of bounds memory access error in common/uloc.c 271 and common/uresbund.c 272 - patches/uloc.patch 273 - upstream bug: 274 1. http://bugs.icu-project.org/trac/ticket/8984 (_canonicalize) 275 2. http://bugs.icu-project.org/trac/ticket/9114 (_getKeywords) 276 3. http://bugs.icu-project.org/trac/ticket/8812 (uresbund) 277 http://bugs.icu-project.org/trac/ticket/8813 (uresbund) 278 4. http://bugs.icu-project.org/trac/ticket/10250 (_getKeywords) 279 280 20. Fix a null pointer error in ubrk_setText in ubrk.cpp. 281 - patches/ubrk.patch 282 - upstream bug : http://bugs.icu-project.org/trac/ticket/9115 283 284 21. Fix a clang warning in rbbi.cpp by merging in an upstream change. 285 - patches/changeset_30255.patch 286 - upstream change : http://bugs.icu-project.org/trac/changeset/30255 287 288 22. Fix time zone handling and compilation on iOS. 289 - patches/ios_timezone.patch 290 - upstream bugs : http://bugs.icu-project.org/trac/ticket/9051 291 http://bugs.icu-project.org/trac/ticket/8661 292 293 23. Fix a buffer overflow in utext 294 - patches/utext.patch 295 - upstream change : http://bugs.icu-project.org/trac/changeset/29356 296 297 24. Fix compilation errors on VS2012 and above. 298 - patches/vs2012.patch 299 300 25. Fix a buffer overflow in UTF-16/32 detection. 301 - patches/csetdet.patch 302 - upstream bug: http://bugs.icu-project.org/trac/ticket/10318 303 304 26. Add BreakIterator::getRuleStatus 305 - patches/breakiterator.patch 306 - Copy and paste BreakIterator::getRuleStatus API from ICU 52 307 308 27. Change export of U_ICUDATA_ENTRY_POINT from U_IMPORT to U_EXPORT. 309 - patches/declspec.patch 310 311 28. Add support for QNX Neutrino. 312 - patches/platform.qnx.patch: 313 See #2 about the platform header generation. 314 - patches/si_value.undef.patch: 315 Work around an all-lowercase macro defined in <signal.h>. 316 Upstream took a different approach: 317 http://bugs.icu-project.org/trac/ticket/9935 318 - patches/xopen_source.patch: 319 Set _XOPEN_SOURCE to 600 as in the upstream changeset: 320 http://bugs.icu-project.org/trac/changeset/30418 321