1 Name: icu 2 URL: http://site.icu-project.org/ 3 Version: 4.6 4 License: MIT 5 Security Critical: yes 6 7 Description: 8 This directory contains the source code of ICU 4.6 for C/C++ 9 10 1. It was obtained with the following: 11 12 $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-6 icu46 13 14 2. Platform header files for Linux, FreeBSD, OpenBSD, Android and Mac OS X: 15 16 - Apply platform.patch in patches directory. : It applies the upstream 17 patch to platform.h.in (see http://bugs.icu-project.org/trac/ticket/8248) 18 and change source/common/unicode/ptypes.h to refer to plinux.h and 19 pmac.h generated below. 20 21 - 'runConfigureICU Linux', 'runConfigureICU FreeBSD', and 22 'runConfigureICU MacOSX' are run to generate 23 source/common/unicode/platform.h. 24 25 - On OpenBSD, source/common/unicode/platform.h is being generated 26 by the icu4c port in the ports directory and not by runConfigureICU. 27 In case the file has to be updated you can do: 28 cd /home/ports/textproc/icu4c && make configure 29 30 - Rename it to 'plinux.h', 'pfreebsd.h', 'popenbsd.h' and 'pmac.h' 31 32 - Apply patches/pmach.h.patch on Mac to pmac.h 33 34 - On Android, the pandroid.h was generated by copying plinux.h to 35 pandroid.h and applying the patches/pandroid.h.patch. 36 37 - Apply the CL at https://codereview.chromium.org/15973007/ to plinux.h 38 39 3. The following directories were removed because they're not used by Chromium 40 at the moment: 41 as_is 42 packaging 43 source/extra 44 source/sample 45 source/layout 46 source/layoutex 47 48 49 4. The word breaking for Chinese and Japanese were modified to use a word 50 frequency list with the following patch and cjdict.txt. 51 52 - patches/segmentation.patch : 53 Adds a dictionary (word-frequency)-based word breaking for CJK 54 (Korean is supported in the code, but it does not do anything 55 because we don't have a Korean word-list.) 56 57 - source/data/brkitr/cjdict.txt : 58 Chinese and Japanese word frequency list. 59 See the file for license/copyright notice 60 61 - source/data/brkitr/cc_edict.txt : 62 the list of words derived from CC-Edict.) 63 64 - patches/brkitr.patch 65 * word.txt : Chinese/Japanese segmentation rules, Hebrew-script-specific 66 handling of U+0022, and splitting of FQDN into labels at '.'. 67 For Hebrew, see http://unicode.org/cldr/track/ticket/3120 68 * line.txt : Incorporated line_he and minor changes in CL, OP and ID 69 definitions. 70 For Hebrew, see http://unicode.org/cldr/track/ticket/4004 71 For others, see http://unicode.org/cldr/track/ticket/3974 72 http://unicode.org/cldr/track/ticket/4200 73 http://unicode.org/cldr/track/ticket/ 74 * brklocal.mk : build file changes to drop unnecessary brkitr rule 75 files (e.g. word_ja.txt, line_he.txt) 76 77 - android/brkitr.patch (to be applied for Android build only) : 78 Reverts some changes about Chinese/Japanese segmentation rules in 79 patches/brkitr.patch to reduce binary size for Android. 80 81 If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt 82 to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test. 83 84 5. Converter changes : converters.patch 85 - Include what we really need. See source/data/mappings/ucmlocal.txt 86 - Alias and mapping changes : source/data/mappings/convrtrs.txt 87 - Changes several tables and add six new tables, three of which 88 are 'fake' tables for ISO-2022-CN(-Ext). 89 - ucnv2022.c is modified to use 3 'fake' tables added above for 90 ISO-2022-CN(-Ext). 91 92 6. Locale changes 93 - patches/locale1.patch : 94 Filipino, Amharic, and Swahili locales 95 exemplar character set changes for CJK + 9 Indian locales 96 Minor fixes for Danish, , Turkish, and Korean. 97 98 - patches/locale2.patch : 99 The minimum locale data Chrome needs for 47 languages Chrome is 100 not localized to. Each locale data file has ExemplarCharacters, 101 LocaleScript, layout, and the name of the language for a locale 102 in its native language. 103 104 - patches/locale3.patch : Locale build configuration files. They 105 add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to 106 source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}. 107 108 - In source/data/region, run the following command to get rid of numeric region 109 display names we don't use (everything other than 419). 110 $ sed -i '/[0-35-9][0-9][0-9]{/ d' *.txt 111 112 - android/patch_locale.sh (to be run for Android build only): 113 Makes changes to source/data/{curr,region,lang} to exclude these data 114 except the language and script names of zh_Hans and zh_Hant. 115 116 7. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt 117 118 - patches/unihan.patch: 119 unihan collation tables are never used in Chrome/Webkit, but it takes 120 about 1MB in the uncompressed ICU data file in ICU 4.2.1. 121 122 8. Timezone data update 123 - Grab the latest version of the following timezone data files and 124 put them in source/data/misc. 125 126 metaZones.txt 127 timezoneTypes.txt 128 windowsZones.txt 129 zoneinfo64.txt 130 131 As of Dec 2013, the latest version is 2013h and the above files 132 are available at 133 http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2013h/44/ 134 135 9. Transliterator customization 136 137 - Add the following files taken from ICU 52 to source/data/trnslit 138 139 {tr,el,az}_{Upper,Lower,Title}.txt 140 141 - Also add css3transform.txt to the same directory 142 - Put the following line in trnslocal.mk 143 144 TRANSLIT_SOURCE=css3transform.txt 145 146 10. Build-related changes 147 148 - patches/wpo.patch 149 - patches/vscomp.patch 150 (see http://bugs.icu-project.org/trac/ticket/8355 and 151 http://bugs.icu-project.org/trac/ticket/8356 ) 152 - patches/rtti.patch : Make RTTI work without exception handling on Windows 153 (see http://bugs.icu-project.org/trac/ticket/8343) 154 - patches/data.build.patch : 155 To remove some data files we don't use and cut down the data size. 156 - patches/data.build.win.patch : 157 Windows-only data build patch. Add a new target DATALIB to makedata.mak 158 - patches/clang.patch: To build with Clang. 159 (see http://bugs.icu-project.org/trac/ticket/8954 Two other chunks in 160 the patch have already been fixed in the ICU trunk.) 161 - add an empty file (stubdatabuilt.txt) to source/stubdata 162 163 11. Pre-built data libraries are checked in. 164 165 Before building data file on Linux, re-run 'runConfigureICU Linux' again 166 if it's run without data.build.patch in #10 above. 167 168 Because we removed layout and layoutex directories in step 3, 169 'runConfigureICU Linux' will fail even with '--disable-layout'. A 170 work-around is to have a copy of our icu tree in a separate build directory 171 and add back directories we removed in step 3 before 172 running 'runConfigure'. 173 174 'make' will fail in the 1st pass. Copy source/data/in/coll/invuca.icu 175 to {BUILD_DIR_ROOT}/data/out/build/icudt46l/coll and re-run 'make' 176 in {BUILD_DIR_ROOT}/data. 177 178 'make' will fail again when pkgdata looks for css3transform.res. Edit 179 data/out/tmp/icudata.lst to replace 'css3transform.res' with 'root.res'. 180 (see http://bugs.icu-project.org/trac/ticket/10570 ) and run 'make' again. 181 182 183 - source/data/in/icudt46l.dat : Built on Linux with all the patches 184 above applied. This file will be generated in 185 {BUILD_DIR_ROOT}/data/out/tmp. 186 187 - windows/icudt.dll : With icudt46l.dat in place, all the patches applied 188 and header files moved (#11 below), generated by building icudt_build 189 project of build/icudt_build.sln on Windows. icudt46.dll is 190 generated in bin/{Release,Debug} and copied to windows/icudt.dll 191 and checked in. Note that we drop the version number ('46') from the 192 dll name to avoind having to update our build scripts/configuration 193 files everytime ICU is upgraded to a new version. 194 195 - {mac,linux}/icudt46l_dat.S : Built on Mac and Linux with all the 196 patches above (except android/brkitr.patch) applied and checked in. 197 This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp. 198 199 Alternatively, one can just generate icudt46l_dat.S on Linux and adopt 200 the header portion to match the current header in mac/icudt46l_dat.S. 201 That is as following without no leading space in each line: 202 203 .globl _icudt46_dat 204 #ifdef U_HIDE_DATA_SYMBOL 205 .private_extern _icudt46_dat 206 #endif 207 .data 208 .const 209 .align 4 210 _icudt46_dat: 211 212 213 - android/icudt46l_dat.S : Built on Linux with all the patches above and 214 android/brkitr.patch applied and android/patch_locale.sh executed, and 215 checked in. 216 217 12. Apply the fix found with static analysis tools such as PSV and coverity 218 219 - patches/static.analysis.patch 220 - upstream trunk/4.8 do not have this code any more. 221 222 13. Fix for msvs2010 applied: 223 --- D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp 224 (revision 78292) 225 +++ D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp 226 (working copy) 227 @@ -75,7 +75,7 @@ 228 * Visual Studios 9.0. 229 * Cygwin with MSVC 9.0 also complains here about redefinition. 230 */ 231 -#if (!defined(_MSC_VER) || (_MSC_VER > 1500)) && !defined(CYGWINMSVC) 232 +#if (!defined(_MSC_VER) || (_MSC_VER > 1600)) && !defined(CYGWINMSVC) 233 const int32_t StringPiece::npos; 234 #endif 235 236 14. Fix for locales that don't use '.' as decimal separator: patches/nan.patch 237 - upstream bug: http://bugs.icu-project.org/trac/ticket/8561 238 - Handle other chars besides the dot. This is required because decNumber's 239 parser expects the dot as a decimal separator. 240 - Locales that don't use dot were producing "NaN" values. 241 242 15. Fix a bug in the regex engine. 243 - patches/regex.patch 244 - upstream bug: http://bugs.icu-project.org/trac/ticket/8666 (fixed in the upstream) 245 246 16. Apply the upstream patch for Korean search collator support (ICU 4.6.1). 247 - patches/search_collation.patch 248 - upstream bug: http://bugs.icu-project.org/trac/ticket/8290 249 250 17. Fix a use of uninitialized memory bug in regular expression matching 251 - patches/rematch.patch 252 - upstream bug: http://bugs.icu-project.org/trac/ticket/8824 253 254 18. Make it compile with -Werror on gcc 4.6 255 - patches/gcc46.patch (ToT upstream does not have this code any more). 256 257 19. Fix four out of bounds memory access error in common/uloc.c 258 and common/uresbund.c 259 - patches/uloc.patch 260 - upstream bug: 261 1. http://bugs.icu-project.org/trac/ticket/8984 (_canonicalize) 262 2. http://bugs.icu-project.org/trac/ticket/9114 (_getKeywords) 263 3. http://bugs.icu-project.org/trac/ticket/8812 (uresbund) 264 http://bugs.icu-project.org/trac/ticket/8813 (uresbund) 265 4. http://bugs.icu-project.org/trac/ticket/10250 (_getKeywords) 266 267 20. Fix a null pointer error in ubrk_setText in ubrk.cpp. 268 - patches/ubrk.patch 269 - upstream bug : http://bugs.icu-project.org/trac/ticket/9115 270 271 21. Fix a clang warning in rbbi.cpp by merging in an upstream change. 272 - patches/changeset_30255.patch 273 - upstream change : http://bugs.icu-project.org/trac/changeset/30255 274 275 22. Fix time zone handling and compilation on iOS. 276 - patches/ios_timezone.patch 277 - upstream bugs : http://bugs.icu-project.org/trac/ticket/9051 278 - http://bugs.icu-project.org/trac/ticket/8661 279 280 23. Fix a buffer overflow in utext 281 - patches/utext.patch 282 - upstream change : http://bugs.icu-project.org/trac/changeset/29356 283 284 24. Fix compilation errors on VS2012 and above. 285 - patches/vs2012.patch 286 287 25. Fix a buffer overflow in UTF-16/32 detection. 288 - patches/csetdet.patch 289 - upstream bug: http://bugs.icu-project.org/trac/ticket/10318 290 291 26. Add BreakIterator::getRuleStatus 292 - patches/breakiterator.patch 293 - Copy and paste BreakIterator::getRuleStatus API from ICU 52 294