Home | History | Annotate | Download | only in unidata
      1 * Copyright (C) 2004-2010, International Business Machines
      2 * Corporation and others.  All Rights Reserved.
      3 *
      4 *   file name:  changes.txt
      5 *   encoding:   US-ASCII
      6 *   tab size:   8 (not used)
      7 *   indentation:4
      8 *
      9 *   created on: 2004may06
     10 *   created by: Markus W. Scherer
     11 *
     12 * change log for Unicode updates
     13 
     14 ---------------------------------------------------------------------------- ***
     15 
     16 Unicode 6.0 update
     17 
     18 *** related ICU Trac tickets
     19 
     20 7264 Unicode 6.0 Update
     21 
     22 *** Unicode version numbers
     23 - makedata.mak
     24 - uchar.h
     25   (configure.in & configure: have been modified to extract the version from uchar.h)
     26 - com.ibm.icu.util.VersionInfo
     27 
     28 *** data files & enums & parser code
     29 
     30 * file preparation
     31 
     32 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
     33 - This now prepares both unidata and testdata files in respective output subfolders.
     34 
     35 * PropertyAliases.txt changes
     36 - new Script_Extensions property defined in the new ScriptExtensions.txt file
     37   but not listed in PropertyAliases.txt; reported to unicode.org;
     38   -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
     39     scx; Script_Extensions
     40   -> uchar.h with new UProperty section
     41   -> com.ibm.icu.lang.UProperty, parallel with uchar.h
     42 
     43 * PropertyValueAliases.txt changes
     44 - 12 new block names:
     45   Alchemical_Symbols
     46   Bamum_Supplement
     47   Batak
     48   Brahmi
     49   CJK_Unified_Ideographs_Extension_D
     50   Emoticons
     51   Ethiopic_Extended_A
     52   Kana_Supplement
     53   Mandaic
     54   Miscellaneous_Symbols_And_Pictographs
     55   Playing_Cards
     56   Transport_And_Map_Symbols
     57   -> add to uchar.h
     58   -> add to UCharacter.UnicodeBlock
     59     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
     60             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
     61 - Joining_Group (jg) values:
     62   Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
     63   -> uchar.h & UCharacter.JoiningGroup
     64 - 3 new scripts:
     65   sc ; Batk      ; Batak
     66   sc ; Brah      ; Brahmi
     67   sc ; Mand      ; Mandaic
     68   -> remove these from SyntheticPropertyValueAliases.txt
     69   -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
     70   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
     71       and in com.ibm.icu.dev.test.lang.TestUScript.java
     72 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
     73   (added 2009-11-11..2010-07-18)
     74   Bass        259     Bassa Vah
     75   Dupl        755     Duployan shortand
     76   Elba        226     Elbasan
     77   Gran        343     Grantha
     78   Kpel        436     Kpelle
     79   Loma        437     Loma
     80   Mend        438     Mende
     81   Merc        101     Meroitic Cursive
     82   Narb        106     Old North Arabian
     83   Nbat        159     Nabataean
     84   Palm        126     Palmyrene
     85   Sind        318     Sindhi
     86   Wara        262     Warang Citi
     87   -> uscript.h
     88   -> com.ibm.icu.lang.UScript
     89     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
     90     replace  public static final int \1 = \2;\3
     91   -> SyntheticPropertyValueAliases.txt
     92   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
     93       and in com.ibm.icu.dev.test.lang.TestUScript.java
     94 - ISO 15924 name change
     95   Mero        100     Meroitic Hieroglyphs (was Meroitic)
     96   -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
     97 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
     98 
     99 * UnicodeData.txt changes
    100 - new CJK block:
    101   2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
    102   2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
    103   -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
    104 
    105 * build Unicode tools using CMake+make
    106 
    107 * run genpname/preparse.pl (on Linux)
    108   + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
    109   + make sure that data.h is writable
    110   + perl preparse.pl ~/svn.icu/trunk/src > out.txt
    111   + preparse.pl shows no errors, out.txt Info and Warning lines look ok
    112 
    113 * rebuild Unicode tools (at least genpname) using make
    114 - You might first need to "make install" ICU so that the tools build can pick
    115   up the new definitions from the installed header files.
    116 
    117 * run genpname
    118 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
    119 - rebuild ICU & tools
    120 
    121 * update source/data/unidata/norm2/nfkc_cf.txt
    122 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
    123 
    124 * update source/data/unidata/norm2/uts46.txt
    125 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
    126   to ~/svn.icu/tools/trunk/src/unicode/py
    127 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
    128 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
    129 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
    130 
    131 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
    132   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
    133 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
    134 - Unicode 6.0: U+2260, U+226E, U+226F
    135 
    136 * generate core properties data files
    137 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
    138 - rebuild ICU & tools
    139 - run makeuca.sh so that genuca picks up the new nfc.nrm:
    140   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
    141 - rebuild ICU & tools
    142 
    143 * implement new Script_Extensions property (provisional)
    144 - parser & generator: genprops & uprops.icu
    145 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
    146 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
    147 
    148 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
    149 - (one-time change)
    150 - genbidi/gencase/genprops tools changes
    151 - re-run makeprops.sh (see above)
    152 - UCharacterProperty.java, UCharacterTypeIterator.java,
    153   UBiDiProps.java, UCaseProps.java, and several others with minor changes;
    154   UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
    155 
    156 * update Java data files
    157 - refresh just the UCD-related files, just to be safe
    158 - see (ICU4C)/source/data/icu4j-readme.txt
    159 - mkdir /tmp/icu4j
    160 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
    161   output:
    162     ...
    163     Unicode .icu files built to ./out/build/icudt45l
    164     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
    165     echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
    166     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
    167     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
    168     mkdir -p /tmp/icu4j/main/shared/data
    169     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
    170 - copy the big-endian Unicode data files to another location,
    171   separate from the other data files
    172     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
    173     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
    174     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
    175     ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
    176     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
    177     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
    178     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
    179 - refresh ICU4J
    180     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
    181 
    182 * refresh Java test .txt files
    183 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
    184 
    185 * un-hardcode normalization skippable (NF*_Inert) test data
    186 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
    187 
    188 * copy updated break iterator test files
    189 - now handled by early ucdcopy.py and
    190   copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
    191   (old instructions:
    192    copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
    193    to ~/svn.icu/trunk/src/source/test/testdata)
    194 - they are not used in ICU4J
    195 
    196 * UCA
    197 
    198 - get output from Mark's tools; look in
    199     http://www.unicode.org/~book/incoming/mark/uca6.0.0/
    200     http://www.macchiato.com/unicode/utc/additional-uca-files
    201     http://www.unicode.org/Public/UCA/6.0.0/
    202     http://www.unicode.org/~mdavis/uca/
    203 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
    204 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
    205 - update Han-implicit ranges for new CJK extensions:
    206   swapCJK() in ucol.cpp & ImplicitCEGenerator.java
    207 - genuca: allow bytes 02 for U+FFFE, new merge-sort character;
    208   do not add it into invuca so that tailoring primary-after an ignorable works
    209 - genuca: permit space between [variable top] bytes
    210 - ucol.cpp: treat noncharacters like unassigned rather than ignorable
    211 - run makeuca.sh:
    212   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
    213 - rebuild ICU4C
    214 - refresh ICU4J collation data:
    215   (subset of instructions above for properties data refresh, except copies all coll/*)
    216     ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
    217     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
    218     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
    219     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
    220 - update (ICU)/source/test/testdata/CollationTest_*.txt
    221   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
    222   with output from Mark's Unicode tools
    223 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
    224 - note on intltest: if collate/UCAConformanceTest fails, then
    225   utility/MultithreadTest/TestCollators will fail as well;
    226   fix the conformance test before looking into the multi-thread test
    227 
    228 * When refreshing all of ICU4J data from ICU4C
    229 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
    230 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
    231 or
    232 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
    233 
    234 *** LayoutEngine script information
    235 
    236 (For details see the Unicode 5.2 change log below.)
    237 
    238 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
    239 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
    240 ScriptRunData.cpp, which is no longer needed.)
    241 
    242 The generated files have a current copyright date and "@draft" statement.
    243 
    244 * copy the above files into <icu>/source/layout, replacing the old files.
    245 * fix mixed line endings
    246 * review the diffs and fix incorrect @draft and missing aliases;
    247   Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
    248 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
    249 
    250 ---------------------------------------------------------------------------- ***
    251 
    252 Unicode 5.2 update
    253 
    254 *** related ICU Trac tickets
    255 
    256 7084 Unicode 5.2
    257 
    258 7167 verify collation bytes
    259 7235 Java test NAME_ALIAS
    260 7236 Java DerivedCoreProperties.txt test
    261 7237 Java BidiTest.txt
    262 7238 UTrie2 in core unidata
    263 7239 test for tailoring gaps
    264 7240 Java fix CollationMiscTest
    265 7243 update layout engine for Unicode 5.2
    266 
    267 *** Unicode version numbers
    268 - makedata.mak
    269 - uchar.h
    270 - configure.in & configure
    271 - update ucdVersion in gennames.c if an algorithmic range changes
    272 
    273 *** data files & enums & parser code
    274 
    275 * file preparation
    276 
    277 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
    278 - includes finding files regardless of version numbers,
    279   copying them, and performing the equivalent processing of the
    280   ucdstrip and ucdmerge tools on the desired set of files
    281 
    282 * notes on changes
    283 - PropertyAliases.txt
    284   moved from numeric to enumerated:
    285     ccc       ; Canonical_Combining_Class
    286   new string properties:
    287     NFKC_CF   ; NFKC_Casefold
    288     Name_Alias; Name_Alias
    289   new binary properties:
    290     Cased     ; Cased
    291     CI        ; Case_Ignorable
    292     CWCF      ; Changes_When_Casefolded
    293     CWCM      ; Changes_When_Casemapped
    294     CWKCF     ; Changes_When_NFKC_Casefolded
    295     CWL       ; Changes_When_Lowercased
    296     CWT       ; Changes_When_Titlecased
    297     CWU       ; Changes_When_Uppercased
    298   new CJK Unihan properties (not supported by ICU)
    299 - PropertyValueAliases.txt
    300   new block names
    301   new scripts
    302   one script code change:
    303     sc ; Qaai      ; Inherited
    304     ->
    305     sc ; Zinh      ; Inherited                        ; Qaai
    306   new Line_Break (lb) value:
    307     lb ; CP        ; Close_Parenthesis
    308   new Joining_Group (jg) values: Farsi_Yeh, Nya
    309   other new values:
    310     ccc; 214; ATA  ; Attached_Above
    311 - DerivedBidiClass.txt
    312   new default-R range: U+1E800 - U+1EFFF
    313 - UnicodeData.txt
    314   all of the ISO comments are gone
    315   new CJK block end:
    316     9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
    317   new CJK block:
    318     2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
    319     2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
    320 
    321 * genpname
    322 - run preparse.pl
    323   + cd \svn\icuproj\icu\trunk\source\tools\genpname
    324   + make sure that data.h is writable
    325   + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
    326   + preparse.pl complains with errors like the following:
    327       Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
    328     This is because ICU 4.0 had scripts from ISO 15924 which are now
    329     added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
    330     and PropertyValueAliases.txt.
    331     -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
    332        Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
    333   + preparse.pl complains with errors about block names missing from uchar.h; add them
    334 
    335 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
    336 - new block & script values
    337   + 26 new blocks
    338     copy new blocks from Blocks.txt
    339     MS VC++ 2008 regular expression:
    340       find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
    341       replace with "    UBLOCK_\3 = 172, /*[\1]*/"
    342   + several new script values already added in ICU 4.0 for ISO 15924 coverage
    343     (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
    344   + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
    345   + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
    346     (added to SyntheticPropertyValueAliases.txt)
    347 - new Joining Group (JG) values: Farsi_Yeh, Nya
    348 - new Line_Break (lb) value:
    349     lb ; CP        ; Close_Parenthesis
    350 
    351 * hardcoded Unihan range end/limit
    352 - Unihan range end moves from 9FC3 to 9FCB
    353   search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
    354   + do change gennames.c
    355 
    356 * Compare definitions of new binary properties with what we used to use
    357   in algorithms, to see if the definitions changed.
    358 - Verified that definitions for Cased and Case_Ignorable are unchanged.
    359   The gencase tool now parses the newly public Case_Ignorable values
    360   in case the definition changes in the future.
    361 
    362 * uchar.c & uprops.h & uprops.c & genprops
    363 - new numeric values that didn't exist in Unicode data before:
    364     1/7, 1/9, 1/10, 3/10, 1/16, 3/16
    365   the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
    366   therefore redesign the encoding of numeric types and values for formatVersion 6;
    367   design for simple numbers up to at least 144 ("one gross"),
    368   large values up to at least 10^20,
    369   and fractions with numerators -1..17 and denominators 1..16
    370   to cover current and expected future values
    371   (e.g., more Han numeric values, Meroitic twelfths)
    372 
    373 * reimplement Hangul_Syllable_Type for new Jamo characters
    374 - the old code assumed that all Jamo characters are in the 11xx block
    375 - Unicode 5.2 fills holes there and adds new Jamo characters in
    376     A960..A97F; Hangul Jamo Extended-A
    377   and in
    378     D7B0..D7FF; Hangul Jamo Extended-B
    379 - Hangul_Syllable_Type can be trivially derived from a subset of
    380   Grapheme_Cluster_Break values
    381 
    382 * build Unicode data source code for hardcoding core data
    383 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
    384 
    385 ICU data make path is \svn\icuproj\icu\trunk\source\data\
    386 ICU root path is \svn\icuproj\icu\trunk
    387 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
    388 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
    389 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
    390 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
    391 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
    392 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
    393 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
    394 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
    395 Creating data file for Unicode Property Names
    396 Creating data file for Unicode Character Properties
    397 Creating data file for Unicode Case Mapping Properties
    398 Creating data file for Unicode BiDi/Shaping Properties
    399 Creating data file for Unicode Normalization
    400 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
    401 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
    402 
    403 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
    404   and rebuild the common library
    405 
    406 *** UCA
    407 
    408 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
    409 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
    410 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
    411 [ Begin obsolete instructions:
    412   Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
    413     - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
    414       on Windows:
    415         python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
    416         python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
    417   End obsolete instructions]
    418 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
    419   not just the *_STUB.txt files
    420 - note on intltest: if collate/UCAConformanceTest fails, then
    421   utility/MultithreadTest/TestCollators will fail as well;
    422   fix the conformance test before looking into the multi-thread test
    423 
    424 *** Implement Cased & Case_Ignorable properties
    425 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
    426 - Problem: These properties should be disjoint, but aren't
    427 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
    428 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable
    429 
    430 *** Implement Changes_When_Xyz properties
    431 - without stored data
    432 
    433 *** Implement Name_Alias property
    434 - add it as another name field in unames.icu
    435 - make it available via u_charName() and UCharNameChoice and
    436 - consider it in u_charFromName()
    437 
    438 *** Break iterators
    439 
    440 * Update break iterator rules to new UAX versions and new property values
    441 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
    442 
    443 *** new BidiTest file
    444 - review format and data
    445 - copy BidiTest.txt to source/test/testdata
    446 - write test code using this data
    447 - fix ICU code where it fails the conformance test
    448 
    449 *** Java
    450 - generally, find and update code corresponding to C/C++
    451 - UCharacter.UnicodeBlock constants:
    452   a) add an _ID integer per new block, update COUNT
    453   b) add a class instance per new block
    454      Visual Studio regex:
    455         find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
    456         replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
    457 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
    458 
    459 - port test changes to Java
    460 
    461 *** LayoutEngine script information
    462 
    463 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
    464 
    465 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
    466 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
    467 ScriptRunData.cpp, which is no longer needed.)
    468 
    469 The generated files have a current copyright date and "@draft" statement.
    470 
    471 -> Eric Mader wrote in email on 20090930:
    472     "I think the tool has been modified to update @draft to @stable for
    473      older scripts and to add @draft for new scripts.
    474      (I worked with an intern on this last year.)
    475      You should check the output after you run it."
    476 
    477 * copy the above files into <icu>/source/layout, replacing the old files.
    478 * fix mixed line endings
    479 * review the diffs and fix incorrect @draft and missing aliases
    480 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
    481 
    482 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
    483 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
    484 
    485 -> Eric Mader wrote in email on 20090930:
    486     "This is just a matter of making sure that all the per-script tables have
    487      entries for any new scripts that were added.
    488      If any new Indic characters were added, then the class tables in
    489      IndicClassTables.cpp should be updated to reflect this.
    490      John Emmons should know how to do this if it's required."
    491 
    492 * rebuild the layout and layoutex libraries.
    493 
    494 *** Documentation
    495 - Update User Guide
    496   + Jamo_Short_Name, sfc->scf, binary property value aliases
    497 
    498 ---------------------------------------------------------------------------- ***
    499 
    500 Unicode 5.1 update
    501 
    502 *** related ICU Trac tickets
    503 
    504 5696 Update to Unicode 5.1
    505 
    506 *** Unicode version numbers
    507 - makedata.mak
    508 - uchar.h
    509 - configure.in & configure
    510 - update ucdVersion in gennames.c if an algorithmic range changes
    511 
    512 *** data files & enums & parser code
    513 
    514 * file preparation
    515 - ucdstrip:
    516     DerivedCoreProperties.txt
    517     DerivedNormalizationProps.txt
    518     NormalizationTest.txt
    519     PropList.txt
    520     Scripts.txt
    521     GraphemeBreakProperty.txt
    522     SentenceBreakProperty.txt
    523     WordBreakProperty.txt
    524 - ucdstrip and ucdmerge:
    525     EastAsianWidth.txt
    526     LineBreak.txt
    527 
    528 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
    529 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
    530 copy 5.1.0\ucd\Blocks.txt ..\unidata\
    531 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
    532 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
    533 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
    534 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
    535 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
    536 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
    537 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
    538 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
    539 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
    540 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
    541 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
    542 
    543 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
    544 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
    545 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
    546 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
    547 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
    548 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
    549 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
    550 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
    551 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
    552 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
    553 
    554 * genpname
    555 - run preparse.pl
    556   + cd \svn\icuproj\icu\uni51\source\tools\genpname
    557   + make sure that data.h is writable
    558   + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
    559   + preparse.pl complains with errors like the following:
    560       Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
    561     This is because ICU 3.8 had scripts from ISO 15924 which are now
    562     added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
    563     and PropertyValueAliases.txt.
    564     -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
    565        Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
    566   + PropertyValueAliases.txt now explicitly contains values for boolean properties:
    567       N/Y, No/Yes, F/T, False/True
    568     -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
    569        It will use further values from the file if present.
    570 
    571 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
    572 - new block & script values
    573   + 17 new blocks
    574   + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
    575     (removed from SyntheticPropertyValueAliases.txt)
    576   + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
    577     (added to SyntheticPropertyValueAliases.txt)
    578 - uprops.icu (uprops.h) only provides 7 bits for script codes.
    579   In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
    580   There is none above 127 yet which is the script code for an
    581   assigned Unicode character, so ICU 4.0 uprops.icu does not store any
    582   script code values greater than 127.
    583   However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
    584   in a parallel bit field, and that overflows now.
    585   Also, future values >=128 would be incompatible anyway.
    586   uprops.h is modified to move around several of the bit fields
    587   in the properties vector words, and now uses 8 bits for the script code.
    588   Two other bit fields also grow to accommodate future growth:
    589   Block (current count: 172) grows from 8 to 9 bits,
    590   and Word_Break grows from 4 to 5 bits.
    591 - renamed property Simple_Case_Folding (sfc->scf)
    592   + nothing to be done: handled as normal alias
    593 - new property JSN Jamo_Short_Name
    594   + no new API: only contributes to the Name property
    595 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
    596 - new Joining Group (JG) value: Burushashki_Yeh_Barree
    597 - new Sentence_Break (SB) values:
    598     SB ; CR        ; CR
    599     SB ; EX        ; Extend
    600     SB ; LF        ; LF
    601     SB ; SC        ; SContinue
    602 - new Word_Break (WB) values:
    603     WB ; CR        ; CR
    604     WB ; Extend    ; Extend
    605     WB ; LF        ; LF
    606     WB ; MB        ; MidNumLet
    607 
    608 * Further changes in the 2008-02-29 update:
    609 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
    610   because they should not normally be invisible.
    611 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
    612 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend
    613 - new Word_Break (WB) value: NL=Newline
    614 
    615 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
    616 - Unihan range end moves from 9FBB to 9FC3
    617   search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
    618   + do change gennames.c
    619 
    620 * build Unicode data source code for hardcoding core data
    621 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
    622 
    623 ICU data make path is \svn\icuproj\icu\uni51\source\data\
    624 ICU root path is \svn\icuproj\icu\uni51
    625 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
    626 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
    627 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
    628 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
    629 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
    630 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
    631 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
    632 Creating data file for Unicode Character Properties
    633 Creating data file for Unicode Case Mapping Properties
    634 Creating data file for Unicode BiDi/Shaping Properties
    635 Creating data file for Unicode Normalization
    636 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
    637 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
    638 
    639 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
    640   and rebuild the common library
    641 
    642 *** Break iterators
    643 
    644 * Update break iterator rules to new UAX versions and new property values
    645 
    646 *** UCA
    647 
    648 * update FractionalUCA.txt and UCARules.txt with new canonical closure
    649 
    650 *** Test suites
    651 - Test that APIs using Unicode property value aliases (like UnicodeSet)
    652   support all of the boolean values N/Y, No/Yes, F/T, False/True
    653   -> TestBinaryValues() tests in both cintltst and intltest
    654 
    655 *** LayoutEngine script information
    656 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
    657 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
    658 ScriptRunData.cpp, which is no longer needed.)
    659 
    660 The generated files have a current copyright date and "@draft" statement.
    661 
    662 * copy the above files into <icu>/source/layout, replacing the old files.
    663 
    664 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
    665 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
    666 
    667 * rebuild the layout and layoutex libraries.
    668 
    669 *** Documentation
    670 - Update User Guide
    671   + Jamo_Short_Name, sfc->scf, binary property value aliases
    672 
    673 ---------------------------------------------------------------------------- ***
    674 
    675 Unicode 5.0 update
    676 
    677 *** related Jitterbugs
    678 
    679 5084 RFE: Update to Unicode 5.0
    680 
    681 *** data files & enums & parser code
    682 
    683 * file preparation
    684 - ucdstrip:
    685     DerivedCoreProperties.txt
    686     DerivedNormalizationProps.txt
    687     NormalizationTest.txt
    688     PropList.txt
    689     Scripts.txt
    690     GraphemeBreakProperty.txt
    691     SentenceBreakProperty.txt
    692     WordBreakProperty.txt
    693 - ucdstrip and ucdmerge:
    694     EastAsianWidth.txt
    695     LineBreak.txt
    696 
    697 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
    698 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
    699 copy 5.0.0\ucd\Blocks.txt ..\unidata\
    700 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
    701 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
    702 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
    703 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
    704 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
    705 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
    706 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
    707 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
    708 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
    709 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
    710 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
    711 
    712 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
    713 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
    714 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
    715 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
    716 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
    717 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
    718 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
    719 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
    720 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
    721 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
    722 
    723 * update FractionalUCA.txt and UCARules.txt with new canonical closure
    724 
    725 * genpname
    726 - run preparse.pl
    727   + make sure that data.h is writable
    728   + perl preparse.pl \cvs\oss\icu > out.txt
    729 
    730 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
    731 - new block & script values
    732   + script values already added in ICU 3.6 because all of ISO 15924 is now covered
    733 
    734 * build Unicode data source code for hardcoding core data
    735 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
    736 
    737 ICU data make path is \cvs\oss\icu\source\data\
    738 ICU root path is \cvs\oss\icu
    739 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
    740 [etc.]
    741 Creating data file for Unicode Character Properties
    742 Creating data file for Unicode Case Mapping Properties
    743 Creating data file for Unicode BiDi/Shaping Properties
    744 Creating data file for Unicode Normalization
    745 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
    746 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
    747 
    748 - copy the .c source files to C:\cvs\oss\icu\source\common
    749   and rebuild the common library
    750 
    751 *** Unicode version numbers
    752 - makedata.mak
    753 - uchar.h
    754 - configure.in
    755 
    756 *** LayoutEngine script information
    757 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
    758 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
    759 ScriptRunData.cpp, which is no longer needed.)
    760 
    761 The generated files have a current copyright date and "@draft" statement.
    762 
    763 * copy the above files into <icu>/source/layout, replacing the old files.
    764 
    765 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
    766 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
    767 
    768 * rebuild the layout and layoutex libraries.
    769 
    770 ---------------------------------------------------------------------------- ***
    771 
    772 Unicode 4.1 update
    773 
    774 *** related Jitterbugs
    775 
    776 4332 RFE: Update to Unicode 4.1
    777 4157 RBBI, TR29 4.1 updates
    778 
    779 *** data files & enums & parser code
    780 
    781 * file preparation
    782 - ucdstrip:
    783     DerivedCoreProperties.txt
    784     DerivedNormalizationProps.txt
    785     NormalizationTest.txt
    786     GraphemeBreakProperty.txt
    787     SentenceBreakProperty.txt
    788     WordBreakProperty.txt
    789 - ucdstrip and ucdmerge:
    790     EastAsianWidth.txt
    791     LineBreak.txt
    792 
    793 * add new files to the repository
    794     GraphemeBreakProperty.txt
    795     SentenceBreakProperty.txt
    796     WordBreakProperty.txt
    797 
    798 * update FractionalUCA.txt and UCARules.txt with new canonical closure
    799 
    800 * genpname
    801 - handle new enumerated properties in sub read_uchar
    802 - run preparse.pl
    803 
    804 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
    805 - new binary properties
    806   + Pattern_Syntax
    807   + Pattern_White_Space
    808 - new enumerated properties
    809   + Grapheme_Cluster_Break
    810   + Sentence_Break
    811   + Word_Break
    812 - new block & script & line break values
    813 
    814 * gencase
    815 - case-ignorable changes
    816   see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
    817   now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
    818 
    819 *** Unicode version numbers
    820 - makedata.mak
    821 - uchar.h
    822 - configure.in
    823 
    824 *** tests
    825 - verify that u_charMirror() round-trips
    826 - test all new properties and some new values of old properties
    827 
    828 *** other code
    829 
    830 * hardcoded Unihan range end/limit
    831 - Unihan range end moves from 9FA5 to 9FBB
    832   search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
    833   + do not modify BOCU/BOCSU code because that would change the encoding
    834     and break binary compatibility!
    835   + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
    836     NamePrepProfile.txt
    837   + ignore trietest.c: test data is arbitrary
    838   + ignore tstnorm.cpp: test optimization, not important
    839   + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
    840   + do change line_th.txt and word_th.txt
    841     by replacing hardcoded ranges with the new property values
    842   + do change gennames.c
    843 
    844 source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
    845 source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
    846 source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
    847 
    848 * case mappings
    849 - compare new special casing context conditions with previous ones
    850   see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
    851 
    852 * genpname
    853 - consider storing only the short name if it is the same as the long name
    854 
    855 *** other reviews
    856 - UAX #29 changes (grapheme/word/sentence breaks)
    857 - UAX #14 changes (line breaks)
    858 - Pattern_Syntax & Pattern_White_Space
    859 
    860 ---------------------------------------------------------------------------- ***
    861 
    862 Unicode 4.0.1 update
    863 
    864 *** related Jitterbugs
    865 
    866 3170 RFE: Update to Unicode 4.0.1
    867 3171 Add new Unicode 4.0.1 properties
    868 3520 use Unicode 4.0.1 updates for break iteration
    869 
    870 *** data files & enums & parser code
    871 
    872 * file preparation
    873 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
    874 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
    875 
    876 * file fixes
    877 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
    878   according to PRI #26
    879   http://www.unicode.org/review/resolved-pri.html#pri26
    880 - undone again because no corrigendum in sight;
    881   instead modified tests to not check consistency on this for Unicode 4.0.1
    882 
    883 * ucdterms.txt
    884 - update from http://www.unicode.org/copyright.html
    885   formatted for plain text
    886 
    887 * uchar.h & uprops.h & uprops.c & genprops
    888 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
    889 - add U_LB_INSEPARABLE due to a spelling fix
    890   + put short name comment only on line with new constant
    891     for genpname perl script parser
    892 - new binary properties
    893   + STerm
    894   + Variation_Selector
    895 
    896 * genpname
    897 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
    898 - perl script: correctly calculate the maximum number of fields per row
    899 
    900 * uscript.h
    901 - new script code Hrkt=Katakana_Or_Hiragana
    902 
    903 * gennorm.c track changes in DerivedNormalizationProps.txt
    904 - "FNC" -> "FC_NFKC"
    905 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.
    906 
    907 * genprops/props2.c track changes in DerivedNumericValues.txt
    908 - changed from 3 columns to 2, dropping the numeric type
    909   + assume that the type is always numeric for Han characters,
    910     and that only those are added in addition to what UnicodeData.txt lists
    911 
    912 *** Unicode version numbers
    913 - makedata.mak
    914 - uchar.h
    915 - configure.in
    916 
    917 *** tests
    918 - update test of default bidi classes according to PRI #28
    919   /tsutil/cucdtst/TestUnicodeData
    920   http://www.unicode.org/review/resolved-pri.html#pri28
    921 - bidi tests: change exemplar character for ES depending on Unicode version
    922 - change hardcoded expected property values where they change
    923 
    924 *** other code
    925 
    926 * name matching
    927 - read UCD.html
    928 
    929 * scripts
    930 - use new Hrkt=Katakana_Or_Hiragana
    931 
    932 * ZWJ & ZWNJ
    933 - are now part of combining character sequences
    934 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
    935