1 * Copyright (C) 2004-2011, International Business Machines 2 * Corporation and others. All Rights Reserved. 3 * 4 * file name: changes.txt 5 * encoding: US-ASCII 6 * tab size: 8 (not used) 7 * indentation:4 8 * 9 * created on: 2004may06 10 * created by: Markus W. Scherer 11 * 12 * change log for Unicode updates 13 14 ---------------------------------------------------------------------------- *** 15 16 Unicode 6.1 update 17 18 (TODO: Copy and adjust most of the 6.0 update instructions, 19 except retain this following section in this new form. 20 So far, this just documents the new procedure for building the property names data.) 21 22 * run genpname 23 (builds both pnames.icu and propname_data.h) 24 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 25 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource 26 - rebuild ICU & tools 27 28 ---------------------------------------------------------------------------- *** 29 30 ICU 4.8 (no Unicode update, just new script codes) 31 32 * 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 33 (added 2010-12-21) 34 Afak 439 Afaka 35 Jurc 510 Jurchen 36 Mroo 199 Mro, Mru 37 Nshu 499 Nshu 38 Shrd 319 Sharada, rad 39 Sora 398 Sora Sompeng 40 Takr 321 Takri, kr, kr 41 Tang 520 Tangut 42 Wole 480 Woleai 43 -> uscript.h 44 -> com.ibm.icu.lang.UScript 45 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 46 replace public static final int \1 = \2;\3 47 -> genpname/SyntheticPropertyValueAliases.txt 48 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 49 and in com.ibm.icu.dev.test.lang.TestUScript.java 50 51 * run genpname/preparse.pl (on Linux) 52 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname 53 + make sure that data.h is writable 54 + perl preparse.pl ~/svn.icu/trunk/src > out.txt 55 + preparse.pl shows no errors, out.txt Info and Warning lines look ok 56 57 * rebuild Unicode tools (at least genpname) using make 58 - You might first need to "make install" ICU so that the tools build can pick 59 up the new definitions from the installed header files. 60 61 * run genpname 62 (builds both pnames.icu and propname_data.h) 63 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 64 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource 65 - rebuild ICU & tools 66 67 * run genprops 68 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0 69 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0 70 - rebuild ICU & tools 71 72 * update Java data files 73 - refresh just the UCD-related files, just to be safe 74 - see (ICU4C)/source/data/icu4j-readme.txt 75 - mkdir /tmp/icu4j 76 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 77 - copy the big-endian Unicode data files to another location, 78 separate from the other data files 79 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 80 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 81 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 82 - refresh ICU4J 83 ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b 84 85 ---------------------------------------------------------------------------- *** 86 87 Unicode 6.0 update 88 89 *** related ICU Trac tickets 90 91 7264 Unicode 6.0 Update 92 93 *** Unicode version numbers 94 - makedata.mak 95 - uchar.h 96 (configure.in & configure: have been modified to extract the version from uchar.h) 97 - com.ibm.icu.util.VersionInfo 98 99 *** data files & enums & parser code 100 101 * file preparation 102 103 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed 104 - This now prepares both unidata and testdata files in respective output subfolders. 105 106 * PropertyAliases.txt changes 107 - new Script_Extensions property defined in the new ScriptExtensions.txt file 108 but not listed in PropertyAliases.txt; reported to unicode.org; 109 -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt 110 scx; Script_Extensions 111 -> uchar.h with new UProperty section 112 -> com.ibm.icu.lang.UProperty, parallel with uchar.h 113 114 * PropertyValueAliases.txt changes 115 - 12 new block names: 116 Alchemical_Symbols 117 Bamum_Supplement 118 Batak 119 Brahmi 120 CJK_Unified_Ideographs_Extension_D 121 Emoticons 122 Ethiopic_Extended_A 123 Kana_Supplement 124 Mandaic 125 Miscellaneous_Symbols_And_Pictographs 126 Playing_Cards 127 Transport_And_Map_Symbols 128 -> add to uchar.h 129 -> add to UCharacter.UnicodeBlock 130 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) 131 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 132 - Joining_Group (jg) values: 133 Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias 134 -> uchar.h & UCharacter.JoiningGroup 135 - 3 new scripts: 136 sc ; Batk ; Batak 137 sc ; Brah ; Brahmi 138 sc ; Mand ; Mandaic 139 -> remove these from SyntheticPropertyValueAliases.txt 140 -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN 141 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() 142 and in com.ibm.icu.dev.test.lang.TestUScript.java 143 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 144 (added 2009-11-11..2010-07-18) 145 Bass 259 Bassa Vah 146 Dupl 755 Duployan shortand 147 Elba 226 Elbasan 148 Gran 343 Grantha 149 Kpel 436 Kpelle 150 Loma 437 Loma 151 Mend 438 Mende 152 Merc 101 Meroitic Cursive 153 Narb 106 Old North Arabian 154 Nbat 159 Nabataean 155 Palm 126 Palmyrene 156 Sind 318 Sindhi 157 Wara 262 Warang Citi 158 -> uscript.h 159 -> com.ibm.icu.lang.UScript 160 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 161 replace public static final int \1 = \2;\3 162 -> SyntheticPropertyValueAliases.txt 163 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 164 and in com.ibm.icu.dev.test.lang.TestUScript.java 165 - ISO 15924 name change 166 Mero 100 Meroitic Hieroglyphs (was Meroitic) 167 -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC 168 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt 169 170 * UnicodeData.txt changes 171 - new CJK block: 172 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;; 173 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;; 174 -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion 175 176 * build Unicode tools using CMake+make 177 178 * run genpname/preparse.pl (on Linux) 179 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname 180 + make sure that data.h is writable 181 + perl preparse.pl ~/svn.icu/trunk/src > out.txt 182 + preparse.pl shows no errors, out.txt Info and Warning lines look ok 183 184 * rebuild Unicode tools (at least genpname) using make 185 - You might first need to "make install" ICU so that the tools build can pick 186 up the new definitions from the installed header files. 187 188 * run genpname 189 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 190 - rebuild ICU & tools 191 192 * update source/data/unidata/norm2/nfkc_cf.txt 193 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt 194 195 * update source/data/unidata/norm2/uts46.txt 196 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt 197 to ~/svn.icu/tools/trunk/src/unicode/py 198 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values 199 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py 200 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2 201 202 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 203 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 204 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 205 - Unicode 6.0: U+2260, U+226E, U+226F 206 207 * generate core properties data files 208 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 209 - rebuild ICU & tools 210 - run makeuca.sh so that genuca picks up the new nfc.nrm: 211 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 212 - rebuild ICU & tools 213 214 * implement new Script_Extensions property (provisional) 215 - parser & generator: genprops & uprops.icu 216 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp 217 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java 218 219 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2 220 - (one-time change) 221 - genbidi/gencase/genprops tools changes 222 - re-run makeprops.sh (see above) 223 - UCharacterProperty.java, UCharacterTypeIterator.java, 224 UBiDiProps.java, UCaseProps.java, and several others with minor changes; 225 UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java 226 227 * update Java data files 228 - refresh just the UCD-related files, just to be safe 229 - see (ICU4C)/source/data/icu4j-readme.txt 230 - mkdir /tmp/icu4j 231 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 232 output: 233 ... 234 Unicode .icu files built to ./out/build/icudt45l 235 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b 236 echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 237 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b 238 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b 239 mkdir -p /tmp/icu4j/main/shared/data 240 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 241 - copy the big-endian Unicode data files to another location, 242 separate from the other data files 243 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 244 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr 245 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b 246 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu 247 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b 248 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 249 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr 250 - refresh ICU4J 251 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b 252 253 * refresh Java test .txt files 254 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 255 256 * un-hardcode normalization skippable (NF*_Inert) test data 257 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools 258 259 * copy updated break iterator test files 260 - now handled by early ucdcopy.py and 261 copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata 262 (old instructions: 263 copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt 264 to ~/svn.icu/trunk/src/source/test/testdata) 265 - they are not used in ICU4J 266 267 * UCA 268 269 - get output from Mark's tools; look in 270 http://www.unicode.org/~book/incoming/mark/uca6.0.0/ 271 http://www.macchiato.com/unicode/utc/additional-uca-files 272 http://www.unicode.org/Public/UCA/6.0.0/ 273 http://www.unicode.org/~mdavis/uca/ 274 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 275 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 276 - update Han-implicit ranges for new CJK extensions: 277 swapCJK() in ucol.cpp & ImplicitCEGenerator.java 278 - genuca: allow bytes 02 for U+FFFE, new merge-sort character; 279 do not add it into invuca so that tailoring primary-after an ignorable works 280 - genuca: permit space between [variable top] bytes 281 - ucol.cpp: treat noncharacters like unassigned rather than ignorable 282 - run makeuca.sh: 283 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 284 - rebuild ICU4C 285 - refresh ICU4J collation data: 286 (subset of instructions above for properties data refresh, except copies all coll/*) 287 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 288 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 289 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 290 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b 291 - update (ICU)/source/test/testdata/CollationTest_*.txt 292 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 293 with output from Mark's Unicode tools 294 - run all tests with the *_SHORT.txt or the full files (the full ones have comments) 295 - note on intltest: if collate/UCAConformanceTest fails, then 296 utility/MultithreadTest/TestCollators will fail as well; 297 fix the conformance test before looking into the multi-thread test 298 299 * When refreshing all of ICU4J data from ICU4C 300 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 301 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 302 or 303 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 304 305 *** LayoutEngine script information 306 307 (For details see the Unicode 5.2 change log below.) 308 309 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, 310 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates 311 ScriptRunData.cpp, which is no longer needed.) 312 313 The generated files have a current copyright date and "@draft" statement. 314 315 * copy the above files into <icu>/source/layout, replacing the old files. 316 * fix mixed line endings 317 * review the diffs and fix incorrect @draft and missing aliases; 318 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc. 319 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 320 321 ---------------------------------------------------------------------------- *** 322 323 Unicode 5.2 update 324 325 *** related ICU Trac tickets 326 327 7084 Unicode 5.2 328 329 7167 verify collation bytes 330 7235 Java test NAME_ALIAS 331 7236 Java DerivedCoreProperties.txt test 332 7237 Java BidiTest.txt 333 7238 UTrie2 in core unidata 334 7239 test for tailoring gaps 335 7240 Java fix CollationMiscTest 336 7243 update layout engine for Unicode 5.2 337 338 *** Unicode version numbers 339 - makedata.mak 340 - uchar.h 341 - configure.in & configure 342 - update ucdVersion in gennames.c if an algorithmic range changes 343 344 *** data files & enums & parser code 345 346 * file preparation 347 348 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata 349 - includes finding files regardless of version numbers, 350 copying them, and performing the equivalent processing of the 351 ucdstrip and ucdmerge tools on the desired set of files 352 353 * notes on changes 354 - PropertyAliases.txt 355 moved from numeric to enumerated: 356 ccc ; Canonical_Combining_Class 357 new string properties: 358 NFKC_CF ; NFKC_Casefold 359 Name_Alias; Name_Alias 360 new binary properties: 361 Cased ; Cased 362 CI ; Case_Ignorable 363 CWCF ; Changes_When_Casefolded 364 CWCM ; Changes_When_Casemapped 365 CWKCF ; Changes_When_NFKC_Casefolded 366 CWL ; Changes_When_Lowercased 367 CWT ; Changes_When_Titlecased 368 CWU ; Changes_When_Uppercased 369 new CJK Unihan properties (not supported by ICU) 370 - PropertyValueAliases.txt 371 new block names 372 new scripts 373 one script code change: 374 sc ; Qaai ; Inherited 375 -> 376 sc ; Zinh ; Inherited ; Qaai 377 new Line_Break (lb) value: 378 lb ; CP ; Close_Parenthesis 379 new Joining_Group (jg) values: Farsi_Yeh, Nya 380 other new values: 381 ccc; 214; ATA ; Attached_Above 382 - DerivedBidiClass.txt 383 new default-R range: U+1E800 - U+1EFFF 384 - UnicodeData.txt 385 all of the ISO comments are gone 386 new CJK block end: 387 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last> 388 new CJK block: 389 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;; 390 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;; 391 392 * genpname 393 - run preparse.pl 394 + cd \svn\icuproj\icu\trunk\source\tools\genpname 395 + make sure that data.h is writable 396 + perl preparse.pl \svn\icuproj\icu\trunk > out.txt 397 + preparse.pl complains with errors like the following: 398 Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34. 399 This is because ICU 4.0 had scripts from ISO 15924 which are now 400 added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt 401 and PropertyValueAliases.txt. 402 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: 403 Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt 404 + preparse.pl complains with errors about block names missing from uchar.h; add them 405 406 * uchar.h & uscript.h & uprops.h & uprops.c & genprops 407 - new block & script values 408 + 26 new blocks 409 copy new blocks from Blocks.txt 410 MS VC++ 2008 regular expression: 411 find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$" 412 replace with " UBLOCK_\3 = 172, /*[\1]*/" 413 + several new script values already added in ICU 4.0 for ISO 15924 coverage 414 (removed from SyntheticPropertyValueAliases.txt, see genpname notes above) 415 + 3 new script values added for ISO 15924 and Unicode 5.2 coverage 416 + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2) 417 (added to SyntheticPropertyValueAliases.txt) 418 - new Joining Group (JG) values: Farsi_Yeh, Nya 419 - new Line_Break (lb) value: 420 lb ; CP ; Close_Parenthesis 421 422 * hardcoded Unihan range end/limit 423 - Unihan range end moves from 9FC3 to 9FCB 424 search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive) 425 + do change gennames.c 426 427 * Compare definitions of new binary properties with what we used to use 428 in algorithms, to see if the definitions changed. 429 - Verified that definitions for Cased and Case_Ignorable are unchanged. 430 The gencase tool now parses the newly public Case_Ignorable values 431 in case the definition changes in the future. 432 433 * uchar.c & uprops.h & uprops.c & genprops 434 - new numeric values that didn't exist in Unicode data before: 435 1/7, 1/9, 1/10, 3/10, 1/16, 3/16 436 the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5, 437 therefore redesign the encoding of numeric types and values for formatVersion 6; 438 design for simple numbers up to at least 144 ("one gross"), 439 large values up to at least 10^20, 440 and fractions with numerators -1..17 and denominators 1..16 441 to cover current and expected future values 442 (e.g., more Han numeric values, Meroitic twelfths) 443 444 * reimplement Hangul_Syllable_Type for new Jamo characters 445 - the old code assumed that all Jamo characters are in the 11xx block 446 - Unicode 5.2 fills holes there and adds new Jamo characters in 447 A960..A97F; Hangul Jamo Extended-A 448 and in 449 D7B0..D7FF; Hangul Jamo Extended-B 450 - Hangul_Syllable_Type can be trivially derived from a subset of 451 Grapheme_Cluster_Break values 452 453 * build Unicode data source code for hardcoding core data 454 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data 455 456 ICU data make path is \svn\icuproj\icu\trunk\source\data\ 457 ICU root path is \svn\icuproj\icu\trunk 458 Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 459 Information: cannot find "brklocal.mk". Not building user-additional break iterator files. 460 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. 461 Information: cannot find "collocal.mk". Not building user-additional resource bundle files. 462 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. 463 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. 464 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. 465 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files. 466 Creating data file for Unicode Property Names 467 Creating data file for Unicode Character Properties 468 Creating data file for Unicode Case Mapping Properties 469 Creating data file for Unicode BiDi/Shaping Properties 470 Creating data file for Unicode Normalization 471 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l" 472 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp" 473 474 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common 475 and rebuild the common library 476 477 *** UCA 478 479 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools) 480 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools 481 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools 482 [ Begin obsolete instructions: 483 Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files. 484 - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py 485 on Windows: 486 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt 487 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt 488 End obsolete instructions] 489 - run all tests with the *_SHORT.txt or the full files (the full ones have comments) 490 not just the *_STUB.txt files 491 - note on intltest: if collate/UCAConformanceTest fails, then 492 utility/MultithreadTest/TestCollators will fail as well; 493 fix the conformance test before looking into the multi-thread test 494 495 *** Implement Cased & Case_Ignorable properties 496 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable() 497 - Problem: These properties should be disjoint, but aren't 498 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not 499 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable 500 501 *** Implement Changes_When_Xyz properties 502 - without stored data 503 504 *** Implement Name_Alias property 505 - add it as another name field in unames.icu 506 - make it available via u_charName() and UCharNameChoice and 507 - consider it in u_charFromName() 508 509 *** Break iterators 510 511 * Update break iterator rules to new UAX versions and new property values 512 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary 513 514 *** new BidiTest file 515 - review format and data 516 - copy BidiTest.txt to source/test/testdata 517 - write test code using this data 518 - fix ICU code where it fails the conformance test 519 520 *** Java 521 - generally, find and update code corresponding to C/C++ 522 - UCharacter.UnicodeBlock constants: 523 a) add an _ID integer per new block, update COUNT 524 b) add a class instance per new block 525 Visual Studio regex: 526 find UBLOCK_{[^ ]+} = [0-9]+, {/.+} 527 replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 528 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias() 529 530 - port test changes to Java 531 532 *** LayoutEngine script information 533 534 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833) 535 536 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, 537 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates 538 ScriptRunData.cpp, which is no longer needed.) 539 540 The generated files have a current copyright date and "@draft" statement. 541 542 -> Eric Mader wrote in email on 20090930: 543 "I think the tool has been modified to update @draft to @stable for 544 older scripts and to add @draft for new scripts. 545 (I worked with an intern on this last year.) 546 You should check the output after you run it." 547 548 * copy the above files into <icu>/source/layout, replacing the old files. 549 * fix mixed line endings 550 * review the diffs and fix incorrect @draft and missing aliases 551 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 552 553 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 554 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 555 556 -> Eric Mader wrote in email on 20090930: 557 "This is just a matter of making sure that all the per-script tables have 558 entries for any new scripts that were added. 559 If any new Indic characters were added, then the class tables in 560 IndicClassTables.cpp should be updated to reflect this. 561 John Emmons should know how to do this if it's required." 562 563 * rebuild the layout and layoutex libraries. 564 565 *** Documentation 566 - Update User Guide 567 + Jamo_Short_Name, sfc->scf, binary property value aliases 568 569 ---------------------------------------------------------------------------- *** 570 571 Unicode 5.1 update 572 573 *** related ICU Trac tickets 574 575 5696 Update to Unicode 5.1 576 577 *** Unicode version numbers 578 - makedata.mak 579 - uchar.h 580 - configure.in & configure 581 - update ucdVersion in gennames.c if an algorithmic range changes 582 583 *** data files & enums & parser code 584 585 * file preparation 586 - ucdstrip: 587 DerivedCoreProperties.txt 588 DerivedNormalizationProps.txt 589 NormalizationTest.txt 590 PropList.txt 591 Scripts.txt 592 GraphemeBreakProperty.txt 593 SentenceBreakProperty.txt 594 WordBreakProperty.txt 595 - ucdstrip and ucdmerge: 596 EastAsianWidth.txt 597 LineBreak.txt 598 599 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) 600 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\ 601 copy 5.1.0\ucd\Blocks.txt ..\unidata\ 602 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\ 603 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\ 604 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ 605 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ 606 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ 607 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ 608 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\ 609 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\ 610 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\ 611 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\ 612 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\ 613 614 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt 615 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt 616 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt 617 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt 618 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt 619 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt 620 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt 621 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt 622 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt 623 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt 624 625 * genpname 626 - run preparse.pl 627 + cd \svn\icuproj\icu\uni51\source\tools\genpname 628 + make sure that data.h is writable 629 + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt 630 + preparse.pl complains with errors like the following: 631 Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30. 632 This is because ICU 3.8 had scripts from ISO 15924 which are now 633 added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt 634 and PropertyValueAliases.txt. 635 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: 636 Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii 637 + PropertyValueAliases.txt now explicitly contains values for boolean properties: 638 N/Y, No/Yes, F/T, False/True 639 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases. 640 It will use further values from the file if present. 641 642 * uchar.h & uscript.h & uprops.h & uprops.c & genprops 643 - new block & script values 644 + 17 new blocks 645 + 11 new script values already added in ICU 3.8 for ISO 15924 coverage 646 (removed from SyntheticPropertyValueAliases.txt) 647 + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1) 648 (added to SyntheticPropertyValueAliases.txt) 649 - uprops.icu (uprops.h) only provides 7 bits for script codes. 650 In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now. 651 There is none above 127 yet which is the script code for an 652 assigned Unicode character, so ICU 4.0 uprops.icu does not store any 653 script code values greater than 127. 654 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129 655 in a parallel bit field, and that overflows now. 656 Also, future values >=128 would be incompatible anyway. 657 uprops.h is modified to move around several of the bit fields 658 in the properties vector words, and now uses 8 bits for the script code. 659 Two other bit fields also grow to accommodate future growth: 660 Block (current count: 172) grows from 8 to 9 bits, 661 and Word_Break grows from 4 to 5 bits. 662 - renamed property Simple_Case_Folding (sfc->scf) 663 + nothing to be done: handled as normal alias 664 - new property JSN Jamo_Short_Name 665 + no new API: only contributes to the Name property 666 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark 667 - new Joining Group (JG) value: Burushashki_Yeh_Barree 668 - new Sentence_Break (SB) values: 669 SB ; CR ; CR 670 SB ; EX ; Extend 671 SB ; LF ; LF 672 SB ; SC ; SContinue 673 - new Word_Break (WB) values: 674 WB ; CR ; CR 675 WB ; Extend ; Extend 676 WB ; LF ; LF 677 WB ; MB ; MidNumLet 678 679 * Further changes in the 2008-02-29 update: 680 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP 681 because they should not normally be invisible. 682 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed) 683 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend 684 - new Word_Break (WB) value: NL=Newline 685 686 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison) 687 - Unihan range end moves from 9FBB to 9FC3 688 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive) 689 + do change gennames.c 690 691 * build Unicode data source code for hardcoding core data 692 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data 693 694 ICU data make path is \svn\icuproj\icu\uni51\source\data\ 695 ICU root path is \svn\icuproj\icu\uni51 696 Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 697 Information: cannot find "brklocal.mk". Not building user-additional break iterator files. 698 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. 699 Information: cannot find "collocal.mk". Not building user-additional resource bundle files. 700 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. 701 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. 702 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. 703 Creating data file for Unicode Character Properties 704 Creating data file for Unicode Case Mapping Properties 705 Creating data file for Unicode BiDi/Shaping Properties 706 Creating data file for Unicode Normalization 707 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l" 708 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp" 709 710 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common 711 and rebuild the common library 712 713 *** Break iterators 714 715 * Update break iterator rules to new UAX versions and new property values 716 717 *** UCA 718 719 * update FractionalUCA.txt and UCARules.txt with new canonical closure 720 721 *** Test suites 722 - Test that APIs using Unicode property value aliases (like UnicodeSet) 723 support all of the boolean values N/Y, No/Yes, F/T, False/True 724 -> TestBinaryValues() tests in both cintltst and intltest 725 726 *** LayoutEngine script information 727 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, 728 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates 729 ScriptRunData.cpp, which is no longer needed.) 730 731 The generated files have a current copyright date and "@draft" statement. 732 733 * copy the above files into <icu>/source/layout, replacing the old files. 734 735 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 736 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 737 738 * rebuild the layout and layoutex libraries. 739 740 *** Documentation 741 - Update User Guide 742 + Jamo_Short_Name, sfc->scf, binary property value aliases 743 744 ---------------------------------------------------------------------------- *** 745 746 Unicode 5.0 update 747 748 *** related Jitterbugs 749 750 5084 RFE: Update to Unicode 5.0 751 752 *** data files & enums & parser code 753 754 * file preparation 755 - ucdstrip: 756 DerivedCoreProperties.txt 757 DerivedNormalizationProps.txt 758 NormalizationTest.txt 759 PropList.txt 760 Scripts.txt 761 GraphemeBreakProperty.txt 762 SentenceBreakProperty.txt 763 WordBreakProperty.txt 764 - ucdstrip and ucdmerge: 765 EastAsianWidth.txt 766 LineBreak.txt 767 768 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) 769 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\ 770 copy 5.0.0\ucd\Blocks.txt ..\unidata\ 771 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\ 772 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\ 773 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ 774 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ 775 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ 776 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ 777 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\ 778 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\ 779 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\ 780 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\ 781 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\ 782 783 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt 784 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt 785 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt 786 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt 787 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt 788 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt 789 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt 790 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt 791 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt 792 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt 793 794 * update FractionalUCA.txt and UCARules.txt with new canonical closure 795 796 * genpname 797 - run preparse.pl 798 + make sure that data.h is writable 799 + perl preparse.pl \cvs\oss\icu > out.txt 800 801 * uchar.h & uscript.h & uprops.h & uprops.c & genprops 802 - new block & script values 803 + script values already added in ICU 3.6 because all of ISO 15924 is now covered 804 805 * build Unicode data source code for hardcoding core data 806 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data 807 808 ICU data make path is \cvs\oss\icu\source\data\ 809 ICU root path is \cvs\oss\icu 810 Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 811 [etc.] 812 Creating data file for Unicode Character Properties 813 Creating data file for Unicode Case Mapping Properties 814 Creating data file for Unicode BiDi/Shaping Properties 815 Creating data file for Unicode Normalization 816 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l" 817 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp" 818 819 - copy the .c source files to C:\cvs\oss\icu\source\common 820 and rebuild the common library 821 822 *** Unicode version numbers 823 - makedata.mak 824 - uchar.h 825 - configure.in 826 827 *** LayoutEngine script information 828 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, 829 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates 830 ScriptRunData.cpp, which is no longer needed.) 831 832 The generated files have a current copyright date and "@draft" statement. 833 834 * copy the above files into <icu>/source/layout, replacing the old files. 835 836 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 837 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 838 839 * rebuild the layout and layoutex libraries. 840 841 ---------------------------------------------------------------------------- *** 842 843 Unicode 4.1 update 844 845 *** related Jitterbugs 846 847 4332 RFE: Update to Unicode 4.1 848 4157 RBBI, TR29 4.1 updates 849 850 *** data files & enums & parser code 851 852 * file preparation 853 - ucdstrip: 854 DerivedCoreProperties.txt 855 DerivedNormalizationProps.txt 856 NormalizationTest.txt 857 GraphemeBreakProperty.txt 858 SentenceBreakProperty.txt 859 WordBreakProperty.txt 860 - ucdstrip and ucdmerge: 861 EastAsianWidth.txt 862 LineBreak.txt 863 864 * add new files to the repository 865 GraphemeBreakProperty.txt 866 SentenceBreakProperty.txt 867 WordBreakProperty.txt 868 869 * update FractionalUCA.txt and UCARules.txt with new canonical closure 870 871 * genpname 872 - handle new enumerated properties in sub read_uchar 873 - run preparse.pl 874 875 * uchar.h & uscript.h & uprops.h & uprops.c & genprops 876 - new binary properties 877 + Pattern_Syntax 878 + Pattern_White_Space 879 - new enumerated properties 880 + Grapheme_Cluster_Break 881 + Sentence_Break 882 + Word_Break 883 - new block & script & line break values 884 885 * gencase 886 - case-ignorable changes 887 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods 888 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk 889 890 *** Unicode version numbers 891 - makedata.mak 892 - uchar.h 893 - configure.in 894 895 *** tests 896 - verify that u_charMirror() round-trips 897 - test all new properties and some new values of old properties 898 899 *** other code 900 901 * hardcoded Unihan range end/limit 902 - Unihan range end moves from 9FA5 to 9FBB 903 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive) 904 + do not modify BOCU/BOCSU code because that would change the encoding 905 and break binary compatibility! 906 + similarly, do not change the GB 18030 range data (ucnvmbcs.c), 907 NamePrepProfile.txt 908 + ignore trietest.c: test data is arbitrary 909 + ignore tstnorm.cpp: test optimization, not important 910 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF 911 + do change line_th.txt and word_th.txt 912 by replacing hardcoded ranges with the new property values 913 + do change gennames.c 914 915 source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 916 source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 917 source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5, 918 919 * case mappings 920 - compare new special casing context conditions with previous ones 921 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods 922 923 * genpname 924 - consider storing only the short name if it is the same as the long name 925 926 *** other reviews 927 - UAX #29 changes (grapheme/word/sentence breaks) 928 - UAX #14 changes (line breaks) 929 - Pattern_Syntax & Pattern_White_Space 930 931 ---------------------------------------------------------------------------- *** 932 933 Unicode 4.0.1 update 934 935 *** related Jitterbugs 936 937 3170 RFE: Update to Unicode 4.0.1 938 3171 Add new Unicode 4.0.1 properties 939 3520 use Unicode 4.0.1 updates for break iteration 940 941 *** data files & enums & parser code 942 943 * file preparation 944 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt 945 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt 946 947 * file fixes 948 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No 949 according to PRI #26 950 http://www.unicode.org/review/resolved-pri.html#pri26 951 - undone again because no corrigendum in sight; 952 instead modified tests to not check consistency on this for Unicode 4.0.1 953 954 * ucdterms.txt 955 - update from http://www.unicode.org/copyright.html 956 formatted for plain text 957 958 * uchar.h & uprops.h & uprops.c & genprops 959 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed 960 - add U_LB_INSEPARABLE due to a spelling fix 961 + put short name comment only on line with new constant 962 for genpname perl script parser 963 - new binary properties 964 + STerm 965 + Variation_Selector 966 967 * genpname 968 - fix genpname perl script so that it doesn't choke on more than 2 names per property value 969 - perl script: correctly calculate the maximum number of fields per row 970 971 * uscript.h 972 - new script code Hrkt=Katakana_Or_Hiragana 973 974 * gennorm.c track changes in DerivedNormalizationProps.txt 975 - "FNC" -> "FC_NFKC" 976 - single field "NFD_NO" -> two fields "NFD_QC; N" etc. 977 978 * genprops/props2.c track changes in DerivedNumericValues.txt 979 - changed from 3 columns to 2, dropping the numeric type 980 + assume that the type is always numeric for Han characters, 981 and that only those are added in addition to what UnicodeData.txt lists 982 983 *** Unicode version numbers 984 - makedata.mak 985 - uchar.h 986 - configure.in 987 988 *** tests 989 - update test of default bidi classes according to PRI #28 990 /tsutil/cucdtst/TestUnicodeData 991 http://www.unicode.org/review/resolved-pri.html#pri28 992 - bidi tests: change exemplar character for ES depending on Unicode version 993 - change hardcoded expected property values where they change 994 995 *** other code 996 997 * name matching 998 - read UCD.html 999 1000 * scripts 1001 - use new Hrkt=Katakana_Or_Hiragana 1002 1003 * ZWJ & ZWNJ 1004 - are now part of combining character sequences 1005 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ 1006