Home | History | Annotate | Download | only in unidata
      1 * Copyright (C) 2004-2011, International Business Machines
      2 * Corporation and others.  All Rights Reserved.
      3 *
      4 *   file name:  changes.txt
      5 *   encoding:   US-ASCII
      6 *   tab size:   8 (not used)
      7 *   indentation:4
      8 *
      9 *   created on: 2004may06
     10 *   created by: Markus W. Scherer
     11 *
     12 * change log for Unicode updates
     13 
     14 ---------------------------------------------------------------------------- ***
     15 
     16 Unicode 6.1 update
     17 
     18 (TODO: Copy and adjust most of the 6.0 update instructions,
     19  except retain this following section in this new form.
     20  So far, this just documents the new procedure for building the property names data.)
     21 
     22 * run genpname
     23   (builds both pnames.icu and propname_data.h)
     24 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
     25 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
     26 - rebuild ICU & tools
     27 
     28 ---------------------------------------------------------------------------- ***
     29 
     30 ICU 4.8 (no Unicode update, just new script codes)
     31 
     32 * 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
     33   (added 2010-12-21)
     34     Afak    439     Afaka
     35     Jurc    510     Jurchen
     36     Mroo    199     Mro, Mru
     37     Nshu    499     Nshu
     38     Shrd    319     Sharada, rad
     39     Sora    398     Sora Sompeng
     40     Takr    321     Takri, kr, kr
     41     Tang    520     Tangut
     42     Wole    480     Woleai
     43   -> uscript.h
     44   -> com.ibm.icu.lang.UScript
     45     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
     46     replace  public static final int \1 = \2;\3
     47   -> genpname/SyntheticPropertyValueAliases.txt
     48   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
     49       and in com.ibm.icu.dev.test.lang.TestUScript.java
     50 
     51 * run genpname/preparse.pl (on Linux)
     52   + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
     53   + make sure that data.h is writable
     54   + perl preparse.pl ~/svn.icu/trunk/src > out.txt
     55   + preparse.pl shows no errors, out.txt Info and Warning lines look ok
     56 
     57 * rebuild Unicode tools (at least genpname) using make
     58 - You might first need to "make install" ICU so that the tools build can pick
     59   up the new definitions from the installed header files.
     60 
     61 * run genpname
     62   (builds both pnames.icu and propname_data.h)
     63 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
     64 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
     65 - rebuild ICU & tools
     66 
     67 * run genprops
     68 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
     69 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
     70 - rebuild ICU & tools
     71 
     72 * update Java data files
     73 - refresh just the UCD-related files, just to be safe
     74 - see (ICU4C)/source/data/icu4j-readme.txt
     75 - mkdir /tmp/icu4j
     76 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
     77 - copy the big-endian Unicode data files to another location,
     78   separate from the other data files
     79     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
     80     ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
     81     ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
     82 - refresh ICU4J
     83     ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
     84 
     85 ---------------------------------------------------------------------------- ***
     86 
     87 Unicode 6.0 update
     88 
     89 *** related ICU Trac tickets
     90 
     91 7264 Unicode 6.0 Update
     92 
     93 *** Unicode version numbers
     94 - makedata.mak
     95 - uchar.h
     96   (configure.in & configure: have been modified to extract the version from uchar.h)
     97 - com.ibm.icu.util.VersionInfo
     98 
     99 *** data files & enums & parser code
    100 
    101 * file preparation
    102 
    103 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
    104 - This now prepares both unidata and testdata files in respective output subfolders.
    105 
    106 * PropertyAliases.txt changes
    107 - new Script_Extensions property defined in the new ScriptExtensions.txt file
    108   but not listed in PropertyAliases.txt; reported to unicode.org;
    109   -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
    110     scx; Script_Extensions
    111   -> uchar.h with new UProperty section
    112   -> com.ibm.icu.lang.UProperty, parallel with uchar.h
    113 
    114 * PropertyValueAliases.txt changes
    115 - 12 new block names:
    116   Alchemical_Symbols
    117   Bamum_Supplement
    118   Batak
    119   Brahmi
    120   CJK_Unified_Ideographs_Extension_D
    121   Emoticons
    122   Ethiopic_Extended_A
    123   Kana_Supplement
    124   Mandaic
    125   Miscellaneous_Symbols_And_Pictographs
    126   Playing_Cards
    127   Transport_And_Map_Symbols
    128   -> add to uchar.h
    129   -> add to UCharacter.UnicodeBlock
    130     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
    131             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
    132 - Joining_Group (jg) values:
    133   Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
    134   -> uchar.h & UCharacter.JoiningGroup
    135 - 3 new scripts:
    136   sc ; Batk      ; Batak
    137   sc ; Brah      ; Brahmi
    138   sc ; Mand      ; Mandaic
    139   -> remove these from SyntheticPropertyValueAliases.txt
    140   -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
    141   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
    142       and in com.ibm.icu.dev.test.lang.TestUScript.java
    143 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
    144   (added 2009-11-11..2010-07-18)
    145   Bass        259     Bassa Vah
    146   Dupl        755     Duployan shortand
    147   Elba        226     Elbasan
    148   Gran        343     Grantha
    149   Kpel        436     Kpelle
    150   Loma        437     Loma
    151   Mend        438     Mende
    152   Merc        101     Meroitic Cursive
    153   Narb        106     Old North Arabian
    154   Nbat        159     Nabataean
    155   Palm        126     Palmyrene
    156   Sind        318     Sindhi
    157   Wara        262     Warang Citi
    158   -> uscript.h
    159   -> com.ibm.icu.lang.UScript
    160     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
    161     replace  public static final int \1 = \2;\3
    162   -> SyntheticPropertyValueAliases.txt
    163   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
    164       and in com.ibm.icu.dev.test.lang.TestUScript.java
    165 - ISO 15924 name change
    166   Mero        100     Meroitic Hieroglyphs (was Meroitic)
    167   -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
    168 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
    169 
    170 * UnicodeData.txt changes
    171 - new CJK block:
    172   2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
    173   2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
    174   -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
    175 
    176 * build Unicode tools using CMake+make
    177 
    178 * run genpname/preparse.pl (on Linux)
    179   + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
    180   + make sure that data.h is writable
    181   + perl preparse.pl ~/svn.icu/trunk/src > out.txt
    182   + preparse.pl shows no errors, out.txt Info and Warning lines look ok
    183 
    184 * rebuild Unicode tools (at least genpname) using make
    185 - You might first need to "make install" ICU so that the tools build can pick
    186   up the new definitions from the installed header files.
    187 
    188 * run genpname
    189 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
    190 - rebuild ICU & tools
    191 
    192 * update source/data/unidata/norm2/nfkc_cf.txt
    193 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
    194 
    195 * update source/data/unidata/norm2/uts46.txt
    196 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
    197   to ~/svn.icu/tools/trunk/src/unicode/py
    198 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
    199 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
    200 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
    201 
    202 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
    203   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
    204 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
    205 - Unicode 6.0: U+2260, U+226E, U+226F
    206 
    207 * generate core properties data files
    208 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
    209 - rebuild ICU & tools
    210 - run makeuca.sh so that genuca picks up the new nfc.nrm:
    211   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
    212 - rebuild ICU & tools
    213 
    214 * implement new Script_Extensions property (provisional)
    215 - parser & generator: genprops & uprops.icu
    216 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
    217 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
    218 
    219 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
    220 - (one-time change)
    221 - genbidi/gencase/genprops tools changes
    222 - re-run makeprops.sh (see above)
    223 - UCharacterProperty.java, UCharacterTypeIterator.java,
    224   UBiDiProps.java, UCaseProps.java, and several others with minor changes;
    225   UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
    226 
    227 * update Java data files
    228 - refresh just the UCD-related files, just to be safe
    229 - see (ICU4C)/source/data/icu4j-readme.txt
    230 - mkdir /tmp/icu4j
    231 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
    232   output:
    233     ...
    234     Unicode .icu files built to ./out/build/icudt45l
    235     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
    236     echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
    237     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
    238     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
    239     mkdir -p /tmp/icu4j/main/shared/data
    240     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
    241 - copy the big-endian Unicode data files to another location,
    242   separate from the other data files
    243     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
    244     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
    245     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
    246     ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
    247     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
    248     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
    249     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
    250 - refresh ICU4J
    251     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
    252 
    253 * refresh Java test .txt files
    254 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
    255 
    256 * un-hardcode normalization skippable (NF*_Inert) test data
    257 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
    258 
    259 * copy updated break iterator test files
    260 - now handled by early ucdcopy.py and
    261   copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
    262   (old instructions:
    263    copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
    264    to ~/svn.icu/trunk/src/source/test/testdata)
    265 - they are not used in ICU4J
    266 
    267 * UCA
    268 
    269 - get output from Mark's tools; look in
    270     http://www.unicode.org/~book/incoming/mark/uca6.0.0/
    271     http://www.macchiato.com/unicode/utc/additional-uca-files
    272     http://www.unicode.org/Public/UCA/6.0.0/
    273     http://www.unicode.org/~mdavis/uca/
    274 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
    275 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
    276 - update Han-implicit ranges for new CJK extensions:
    277   swapCJK() in ucol.cpp & ImplicitCEGenerator.java
    278 - genuca: allow bytes 02 for U+FFFE, new merge-sort character;
    279   do not add it into invuca so that tailoring primary-after an ignorable works
    280 - genuca: permit space between [variable top] bytes
    281 - ucol.cpp: treat noncharacters like unassigned rather than ignorable
    282 - run makeuca.sh:
    283   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
    284 - rebuild ICU4C
    285 - refresh ICU4J collation data:
    286   (subset of instructions above for properties data refresh, except copies all coll/*)
    287     ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
    288     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
    289     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
    290     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
    291 - update (ICU)/source/test/testdata/CollationTest_*.txt
    292   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
    293   with output from Mark's Unicode tools
    294 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
    295 - note on intltest: if collate/UCAConformanceTest fails, then
    296   utility/MultithreadTest/TestCollators will fail as well;
    297   fix the conformance test before looking into the multi-thread test
    298 
    299 * When refreshing all of ICU4J data from ICU4C
    300 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
    301 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
    302 or
    303 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
    304 
    305 *** LayoutEngine script information
    306 
    307 (For details see the Unicode 5.2 change log below.)
    308 
    309 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
    310 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
    311 ScriptRunData.cpp, which is no longer needed.)
    312 
    313 The generated files have a current copyright date and "@draft" statement.
    314 
    315 * copy the above files into <icu>/source/layout, replacing the old files.
    316 * fix mixed line endings
    317 * review the diffs and fix incorrect @draft and missing aliases;
    318   Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
    319 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
    320 
    321 ---------------------------------------------------------------------------- ***
    322 
    323 Unicode 5.2 update
    324 
    325 *** related ICU Trac tickets
    326 
    327 7084 Unicode 5.2
    328 
    329 7167 verify collation bytes
    330 7235 Java test NAME_ALIAS
    331 7236 Java DerivedCoreProperties.txt test
    332 7237 Java BidiTest.txt
    333 7238 UTrie2 in core unidata
    334 7239 test for tailoring gaps
    335 7240 Java fix CollationMiscTest
    336 7243 update layout engine for Unicode 5.2
    337 
    338 *** Unicode version numbers
    339 - makedata.mak
    340 - uchar.h
    341 - configure.in & configure
    342 - update ucdVersion in gennames.c if an algorithmic range changes
    343 
    344 *** data files & enums & parser code
    345 
    346 * file preparation
    347 
    348 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
    349 - includes finding files regardless of version numbers,
    350   copying them, and performing the equivalent processing of the
    351   ucdstrip and ucdmerge tools on the desired set of files
    352 
    353 * notes on changes
    354 - PropertyAliases.txt
    355   moved from numeric to enumerated:
    356     ccc       ; Canonical_Combining_Class
    357   new string properties:
    358     NFKC_CF   ; NFKC_Casefold
    359     Name_Alias; Name_Alias
    360   new binary properties:
    361     Cased     ; Cased
    362     CI        ; Case_Ignorable
    363     CWCF      ; Changes_When_Casefolded
    364     CWCM      ; Changes_When_Casemapped
    365     CWKCF     ; Changes_When_NFKC_Casefolded
    366     CWL       ; Changes_When_Lowercased
    367     CWT       ; Changes_When_Titlecased
    368     CWU       ; Changes_When_Uppercased
    369   new CJK Unihan properties (not supported by ICU)
    370 - PropertyValueAliases.txt
    371   new block names
    372   new scripts
    373   one script code change:
    374     sc ; Qaai      ; Inherited
    375     ->
    376     sc ; Zinh      ; Inherited                        ; Qaai
    377   new Line_Break (lb) value:
    378     lb ; CP        ; Close_Parenthesis
    379   new Joining_Group (jg) values: Farsi_Yeh, Nya
    380   other new values:
    381     ccc; 214; ATA  ; Attached_Above
    382 - DerivedBidiClass.txt
    383   new default-R range: U+1E800 - U+1EFFF
    384 - UnicodeData.txt
    385   all of the ISO comments are gone
    386   new CJK block end:
    387     9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
    388   new CJK block:
    389     2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
    390     2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
    391 
    392 * genpname
    393 - run preparse.pl
    394   + cd \svn\icuproj\icu\trunk\source\tools\genpname
    395   + make sure that data.h is writable
    396   + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
    397   + preparse.pl complains with errors like the following:
    398       Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
    399     This is because ICU 4.0 had scripts from ISO 15924 which are now
    400     added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
    401     and PropertyValueAliases.txt.
    402     -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
    403        Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
    404   + preparse.pl complains with errors about block names missing from uchar.h; add them
    405 
    406 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
    407 - new block & script values
    408   + 26 new blocks
    409     copy new blocks from Blocks.txt
    410     MS VC++ 2008 regular expression:
    411       find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
    412       replace with "    UBLOCK_\3 = 172, /*[\1]*/"
    413   + several new script values already added in ICU 4.0 for ISO 15924 coverage
    414     (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
    415   + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
    416   + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
    417     (added to SyntheticPropertyValueAliases.txt)
    418 - new Joining Group (JG) values: Farsi_Yeh, Nya
    419 - new Line_Break (lb) value:
    420     lb ; CP        ; Close_Parenthesis
    421 
    422 * hardcoded Unihan range end/limit
    423 - Unihan range end moves from 9FC3 to 9FCB
    424   search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
    425   + do change gennames.c
    426 
    427 * Compare definitions of new binary properties with what we used to use
    428   in algorithms, to see if the definitions changed.
    429 - Verified that definitions for Cased and Case_Ignorable are unchanged.
    430   The gencase tool now parses the newly public Case_Ignorable values
    431   in case the definition changes in the future.
    432 
    433 * uchar.c & uprops.h & uprops.c & genprops
    434 - new numeric values that didn't exist in Unicode data before:
    435     1/7, 1/9, 1/10, 3/10, 1/16, 3/16
    436   the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
    437   therefore redesign the encoding of numeric types and values for formatVersion 6;
    438   design for simple numbers up to at least 144 ("one gross"),
    439   large values up to at least 10^20,
    440   and fractions with numerators -1..17 and denominators 1..16
    441   to cover current and expected future values
    442   (e.g., more Han numeric values, Meroitic twelfths)
    443 
    444 * reimplement Hangul_Syllable_Type for new Jamo characters
    445 - the old code assumed that all Jamo characters are in the 11xx block
    446 - Unicode 5.2 fills holes there and adds new Jamo characters in
    447     A960..A97F; Hangul Jamo Extended-A
    448   and in
    449     D7B0..D7FF; Hangul Jamo Extended-B
    450 - Hangul_Syllable_Type can be trivially derived from a subset of
    451   Grapheme_Cluster_Break values
    452 
    453 * build Unicode data source code for hardcoding core data
    454 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
    455 
    456 ICU data make path is \svn\icuproj\icu\trunk\source\data\
    457 ICU root path is \svn\icuproj\icu\trunk
    458 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
    459 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
    460 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
    461 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
    462 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
    463 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
    464 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
    465 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
    466 Creating data file for Unicode Property Names
    467 Creating data file for Unicode Character Properties
    468 Creating data file for Unicode Case Mapping Properties
    469 Creating data file for Unicode BiDi/Shaping Properties
    470 Creating data file for Unicode Normalization
    471 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
    472 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
    473 
    474 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
    475   and rebuild the common library
    476 
    477 *** UCA
    478 
    479 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
    480 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
    481 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
    482 [ Begin obsolete instructions:
    483   Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
    484     - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
    485       on Windows:
    486         python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
    487         python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
    488   End obsolete instructions]
    489 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
    490   not just the *_STUB.txt files
    491 - note on intltest: if collate/UCAConformanceTest fails, then
    492   utility/MultithreadTest/TestCollators will fail as well;
    493   fix the conformance test before looking into the multi-thread test
    494 
    495 *** Implement Cased & Case_Ignorable properties
    496 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
    497 - Problem: These properties should be disjoint, but aren't
    498 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
    499 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable
    500 
    501 *** Implement Changes_When_Xyz properties
    502 - without stored data
    503 
    504 *** Implement Name_Alias property
    505 - add it as another name field in unames.icu
    506 - make it available via u_charName() and UCharNameChoice and
    507 - consider it in u_charFromName()
    508 
    509 *** Break iterators
    510 
    511 * Update break iterator rules to new UAX versions and new property values
    512 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
    513 
    514 *** new BidiTest file
    515 - review format and data
    516 - copy BidiTest.txt to source/test/testdata
    517 - write test code using this data
    518 - fix ICU code where it fails the conformance test
    519 
    520 *** Java
    521 - generally, find and update code corresponding to C/C++
    522 - UCharacter.UnicodeBlock constants:
    523   a) add an _ID integer per new block, update COUNT
    524   b) add a class instance per new block
    525      Visual Studio regex:
    526         find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
    527         replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
    528 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
    529 
    530 - port test changes to Java
    531 
    532 *** LayoutEngine script information
    533 
    534 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
    535 
    536 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
    537 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
    538 ScriptRunData.cpp, which is no longer needed.)
    539 
    540 The generated files have a current copyright date and "@draft" statement.
    541 
    542 -> Eric Mader wrote in email on 20090930:
    543     "I think the tool has been modified to update @draft to @stable for
    544      older scripts and to add @draft for new scripts.
    545      (I worked with an intern on this last year.)
    546      You should check the output after you run it."
    547 
    548 * copy the above files into <icu>/source/layout, replacing the old files.
    549 * fix mixed line endings
    550 * review the diffs and fix incorrect @draft and missing aliases
    551 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
    552 
    553 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
    554 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
    555 
    556 -> Eric Mader wrote in email on 20090930:
    557     "This is just a matter of making sure that all the per-script tables have
    558      entries for any new scripts that were added.
    559      If any new Indic characters were added, then the class tables in
    560      IndicClassTables.cpp should be updated to reflect this.
    561      John Emmons should know how to do this if it's required."
    562 
    563 * rebuild the layout and layoutex libraries.
    564 
    565 *** Documentation
    566 - Update User Guide
    567   + Jamo_Short_Name, sfc->scf, binary property value aliases
    568 
    569 ---------------------------------------------------------------------------- ***
    570 
    571 Unicode 5.1 update
    572 
    573 *** related ICU Trac tickets
    574 
    575 5696 Update to Unicode 5.1
    576 
    577 *** Unicode version numbers
    578 - makedata.mak
    579 - uchar.h
    580 - configure.in & configure
    581 - update ucdVersion in gennames.c if an algorithmic range changes
    582 
    583 *** data files & enums & parser code
    584 
    585 * file preparation
    586 - ucdstrip:
    587     DerivedCoreProperties.txt
    588     DerivedNormalizationProps.txt
    589     NormalizationTest.txt
    590     PropList.txt
    591     Scripts.txt
    592     GraphemeBreakProperty.txt
    593     SentenceBreakProperty.txt
    594     WordBreakProperty.txt
    595 - ucdstrip and ucdmerge:
    596     EastAsianWidth.txt
    597     LineBreak.txt
    598 
    599 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
    600 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
    601 copy 5.1.0\ucd\Blocks.txt ..\unidata\
    602 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
    603 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
    604 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
    605 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
    606 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
    607 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
    608 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
    609 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
    610 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
    611 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
    612 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
    613 
    614 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
    615 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
    616 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
    617 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
    618 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
    619 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
    620 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
    621 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
    622 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
    623 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
    624 
    625 * genpname
    626 - run preparse.pl
    627   + cd \svn\icuproj\icu\uni51\source\tools\genpname
    628   + make sure that data.h is writable
    629   + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
    630   + preparse.pl complains with errors like the following:
    631       Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
    632     This is because ICU 3.8 had scripts from ISO 15924 which are now
    633     added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
    634     and PropertyValueAliases.txt.
    635     -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
    636        Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
    637   + PropertyValueAliases.txt now explicitly contains values for boolean properties:
    638       N/Y, No/Yes, F/T, False/True
    639     -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
    640        It will use further values from the file if present.
    641 
    642 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
    643 - new block & script values
    644   + 17 new blocks
    645   + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
    646     (removed from SyntheticPropertyValueAliases.txt)
    647   + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
    648     (added to SyntheticPropertyValueAliases.txt)
    649 - uprops.icu (uprops.h) only provides 7 bits for script codes.
    650   In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
    651   There is none above 127 yet which is the script code for an
    652   assigned Unicode character, so ICU 4.0 uprops.icu does not store any
    653   script code values greater than 127.
    654   However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
    655   in a parallel bit field, and that overflows now.
    656   Also, future values >=128 would be incompatible anyway.
    657   uprops.h is modified to move around several of the bit fields
    658   in the properties vector words, and now uses 8 bits for the script code.
    659   Two other bit fields also grow to accommodate future growth:
    660   Block (current count: 172) grows from 8 to 9 bits,
    661   and Word_Break grows from 4 to 5 bits.
    662 - renamed property Simple_Case_Folding (sfc->scf)
    663   + nothing to be done: handled as normal alias
    664 - new property JSN Jamo_Short_Name
    665   + no new API: only contributes to the Name property
    666 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
    667 - new Joining Group (JG) value: Burushashki_Yeh_Barree
    668 - new Sentence_Break (SB) values:
    669     SB ; CR        ; CR
    670     SB ; EX        ; Extend
    671     SB ; LF        ; LF
    672     SB ; SC        ; SContinue
    673 - new Word_Break (WB) values:
    674     WB ; CR        ; CR
    675     WB ; Extend    ; Extend
    676     WB ; LF        ; LF
    677     WB ; MB        ; MidNumLet
    678 
    679 * Further changes in the 2008-02-29 update:
    680 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
    681   because they should not normally be invisible.
    682 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
    683 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend
    684 - new Word_Break (WB) value: NL=Newline
    685 
    686 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
    687 - Unihan range end moves from 9FBB to 9FC3
    688   search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
    689   + do change gennames.c
    690 
    691 * build Unicode data source code for hardcoding core data
    692 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
    693 
    694 ICU data make path is \svn\icuproj\icu\uni51\source\data\
    695 ICU root path is \svn\icuproj\icu\uni51
    696 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
    697 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
    698 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
    699 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
    700 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
    701 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
    702 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
    703 Creating data file for Unicode Character Properties
    704 Creating data file for Unicode Case Mapping Properties
    705 Creating data file for Unicode BiDi/Shaping Properties
    706 Creating data file for Unicode Normalization
    707 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
    708 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
    709 
    710 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
    711   and rebuild the common library
    712 
    713 *** Break iterators
    714 
    715 * Update break iterator rules to new UAX versions and new property values
    716 
    717 *** UCA
    718 
    719 * update FractionalUCA.txt and UCARules.txt with new canonical closure
    720 
    721 *** Test suites
    722 - Test that APIs using Unicode property value aliases (like UnicodeSet)
    723   support all of the boolean values N/Y, No/Yes, F/T, False/True
    724   -> TestBinaryValues() tests in both cintltst and intltest
    725 
    726 *** LayoutEngine script information
    727 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
    728 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
    729 ScriptRunData.cpp, which is no longer needed.)
    730 
    731 The generated files have a current copyright date and "@draft" statement.
    732 
    733 * copy the above files into <icu>/source/layout, replacing the old files.
    734 
    735 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
    736 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
    737 
    738 * rebuild the layout and layoutex libraries.
    739 
    740 *** Documentation
    741 - Update User Guide
    742   + Jamo_Short_Name, sfc->scf, binary property value aliases
    743 
    744 ---------------------------------------------------------------------------- ***
    745 
    746 Unicode 5.0 update
    747 
    748 *** related Jitterbugs
    749 
    750 5084 RFE: Update to Unicode 5.0
    751 
    752 *** data files & enums & parser code
    753 
    754 * file preparation
    755 - ucdstrip:
    756     DerivedCoreProperties.txt
    757     DerivedNormalizationProps.txt
    758     NormalizationTest.txt
    759     PropList.txt
    760     Scripts.txt
    761     GraphemeBreakProperty.txt
    762     SentenceBreakProperty.txt
    763     WordBreakProperty.txt
    764 - ucdstrip and ucdmerge:
    765     EastAsianWidth.txt
    766     LineBreak.txt
    767 
    768 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
    769 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
    770 copy 5.0.0\ucd\Blocks.txt ..\unidata\
    771 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
    772 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
    773 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
    774 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
    775 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
    776 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
    777 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
    778 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
    779 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
    780 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
    781 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
    782 
    783 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
    784 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
    785 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
    786 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
    787 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
    788 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
    789 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
    790 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
    791 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
    792 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
    793 
    794 * update FractionalUCA.txt and UCARules.txt with new canonical closure
    795 
    796 * genpname
    797 - run preparse.pl
    798   + make sure that data.h is writable
    799   + perl preparse.pl \cvs\oss\icu > out.txt
    800 
    801 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
    802 - new block & script values
    803   + script values already added in ICU 3.6 because all of ISO 15924 is now covered
    804 
    805 * build Unicode data source code for hardcoding core data
    806 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
    807 
    808 ICU data make path is \cvs\oss\icu\source\data\
    809 ICU root path is \cvs\oss\icu
    810 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
    811 [etc.]
    812 Creating data file for Unicode Character Properties
    813 Creating data file for Unicode Case Mapping Properties
    814 Creating data file for Unicode BiDi/Shaping Properties
    815 Creating data file for Unicode Normalization
    816 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
    817 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
    818 
    819 - copy the .c source files to C:\cvs\oss\icu\source\common
    820   and rebuild the common library
    821 
    822 *** Unicode version numbers
    823 - makedata.mak
    824 - uchar.h
    825 - configure.in
    826 
    827 *** LayoutEngine script information
    828 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
    829 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
    830 ScriptRunData.cpp, which is no longer needed.)
    831 
    832 The generated files have a current copyright date and "@draft" statement.
    833 
    834 * copy the above files into <icu>/source/layout, replacing the old files.
    835 
    836 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
    837 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
    838 
    839 * rebuild the layout and layoutex libraries.
    840 
    841 ---------------------------------------------------------------------------- ***
    842 
    843 Unicode 4.1 update
    844 
    845 *** related Jitterbugs
    846 
    847 4332 RFE: Update to Unicode 4.1
    848 4157 RBBI, TR29 4.1 updates
    849 
    850 *** data files & enums & parser code
    851 
    852 * file preparation
    853 - ucdstrip:
    854     DerivedCoreProperties.txt
    855     DerivedNormalizationProps.txt
    856     NormalizationTest.txt
    857     GraphemeBreakProperty.txt
    858     SentenceBreakProperty.txt
    859     WordBreakProperty.txt
    860 - ucdstrip and ucdmerge:
    861     EastAsianWidth.txt
    862     LineBreak.txt
    863 
    864 * add new files to the repository
    865     GraphemeBreakProperty.txt
    866     SentenceBreakProperty.txt
    867     WordBreakProperty.txt
    868 
    869 * update FractionalUCA.txt and UCARules.txt with new canonical closure
    870 
    871 * genpname
    872 - handle new enumerated properties in sub read_uchar
    873 - run preparse.pl
    874 
    875 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
    876 - new binary properties
    877   + Pattern_Syntax
    878   + Pattern_White_Space
    879 - new enumerated properties
    880   + Grapheme_Cluster_Break
    881   + Sentence_Break
    882   + Word_Break
    883 - new block & script & line break values
    884 
    885 * gencase
    886 - case-ignorable changes
    887   see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
    888   now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
    889 
    890 *** Unicode version numbers
    891 - makedata.mak
    892 - uchar.h
    893 - configure.in
    894 
    895 *** tests
    896 - verify that u_charMirror() round-trips
    897 - test all new properties and some new values of old properties
    898 
    899 *** other code
    900 
    901 * hardcoded Unihan range end/limit
    902 - Unihan range end moves from 9FA5 to 9FBB
    903   search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
    904   + do not modify BOCU/BOCSU code because that would change the encoding
    905     and break binary compatibility!
    906   + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
    907     NamePrepProfile.txt
    908   + ignore trietest.c: test data is arbitrary
    909   + ignore tstnorm.cpp: test optimization, not important
    910   + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
    911   + do change line_th.txt and word_th.txt
    912     by replacing hardcoded ranges with the new property values
    913   + do change gennames.c
    914 
    915 source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
    916 source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
    917 source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
    918 
    919 * case mappings
    920 - compare new special casing context conditions with previous ones
    921   see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
    922 
    923 * genpname
    924 - consider storing only the short name if it is the same as the long name
    925 
    926 *** other reviews
    927 - UAX #29 changes (grapheme/word/sentence breaks)
    928 - UAX #14 changes (line breaks)
    929 - Pattern_Syntax & Pattern_White_Space
    930 
    931 ---------------------------------------------------------------------------- ***
    932 
    933 Unicode 4.0.1 update
    934 
    935 *** related Jitterbugs
    936 
    937 3170 RFE: Update to Unicode 4.0.1
    938 3171 Add new Unicode 4.0.1 properties
    939 3520 use Unicode 4.0.1 updates for break iteration
    940 
    941 *** data files & enums & parser code
    942 
    943 * file preparation
    944 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
    945 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
    946 
    947 * file fixes
    948 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
    949   according to PRI #26
    950   http://www.unicode.org/review/resolved-pri.html#pri26
    951 - undone again because no corrigendum in sight;
    952   instead modified tests to not check consistency on this for Unicode 4.0.1
    953 
    954 * ucdterms.txt
    955 - update from http://www.unicode.org/copyright.html
    956   formatted for plain text
    957 
    958 * uchar.h & uprops.h & uprops.c & genprops
    959 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
    960 - add U_LB_INSEPARABLE due to a spelling fix
    961   + put short name comment only on line with new constant
    962     for genpname perl script parser
    963 - new binary properties
    964   + STerm
    965   + Variation_Selector
    966 
    967 * genpname
    968 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
    969 - perl script: correctly calculate the maximum number of fields per row
    970 
    971 * uscript.h
    972 - new script code Hrkt=Katakana_Or_Hiragana
    973 
    974 * gennorm.c track changes in DerivedNormalizationProps.txt
    975 - "FNC" -> "FC_NFKC"
    976 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.
    977 
    978 * genprops/props2.c track changes in DerivedNumericValues.txt
    979 - changed from 3 columns to 2, dropping the numeric type
    980   + assume that the type is always numeric for Han characters,
    981     and that only those are added in addition to what UnicodeData.txt lists
    982 
    983 *** Unicode version numbers
    984 - makedata.mak
    985 - uchar.h
    986 - configure.in
    987 
    988 *** tests
    989 - update test of default bidi classes according to PRI #28
    990   /tsutil/cucdtst/TestUnicodeData
    991   http://www.unicode.org/review/resolved-pri.html#pri28
    992 - bidi tests: change exemplar character for ES depending on Unicode version
    993 - change hardcoded expected property values where they change
    994 
    995 *** other code
    996 
    997 * name matching
    998 - read UCD.html
    999 
   1000 * scripts
   1001 - use new Hrkt=Katakana_Or_Hiragana
   1002 
   1003 * ZWJ & ZWNJ
   1004 - are now part of combining character sequences
   1005 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
   1006