1 == Notes on {kddi,docomo,softbank}-*.ucm mappings. 2 3 kddi-jisx-208 is a variant of JIS X 208 used by KDDI, a Japanese cell 4 phone carrier. 5 6 kddi-shift_jis, docomo-shift_jis, and softbank-shift_jis are variants 7 of Shift_JIS used by KDDI, DoCoMo and SoftBank. 8 9 - kddi-jisx-208 contains Emoji (emoticon) code points in 10 0x75xx, 0x76xx, 0x77xx, 0x78xx, 0x79xx, 0x7Axx, 0x7Bxx, 11 where xx means 21-7E. 12 13 - kddi-shift_jis contains Emoji code points in 14 0xEBxx, 0xECxx, 0xEDxx, and 0xEExx, 0xF3xx, 0xF4xx, 0xF6xx, 0xF7xx, 15 where xx means 40-7E, 80-FC. 16 17 - docomo-shift_jis contains Emoji code points in 18 0xF8xx, and 0xF9xx, where xx means 40-7E, 80-FC. 19 20 - softbank-shift_jis contains Emoji code points in 21 0xF7xx, 0xF9xx, and 0xFBxx, where xx means 40-7E, 80-FC. 22 23 - softbank-jisx-208 contains Emoji code points in 24 0x75xx, 0x76xx, 0x77xx, 0x78xx, 0x79xx, 0x7Axx, 0x7Bxx, 0x7Dxx 25 where xx means 21-7E. 26 27 28 == How the -2012.ucm tables were modified in April 2013 29 30 The -2012 versions were created by 31 http://code.google.com/p/emoji4unicode/source/browse/trunk/src/gen_conversion_files.py 32 33 using each of the older 2012 versions as the base table files 34 to avoid non-Emoji changes: 35 36 # gen_google_ucm.sh 37 icu_mappings=/google/src/cloud/mscherer/icubranch/google_vendor_src_branch/icu/source/data/mappings 38 dest=/home/mscherer/www/no_crawl/emoji 39 ./gen_conversion_files.py $icu_mappings/docomo-shift_jis-2012.ucm 40 cp ../generated/docomo-shift_jis-2012.ucm $dest 41 ./gen_conversion_files.py $icu_mappings/kddi-shift_jis-2012.ucm 42 cp ../generated/kddi-shift_jis-2012.ucm $dest 43 ./gen_conversion_files.py $icu_mappings/softbank-shift_jis-2012.ucm 44 cp ../generated/softbank-shift_jis-2012.ucm $dest 45 ./gen_conversion_files.py 46 47 The only differences from 2012-sep are in mappings for symbols 48 that have Unicode Variation Selector (VS) sequences. 49 50 The older tables relied on a hack in the ICU conversion code that 51 ignored the "use fallback" flag for fallbacks from sequences with VS. 52 53 The new tables rely on a new feature in ICU4C 51: 54 For the relevant symbols that have roundtrip mappings, 55 - the mappings with Emoji Variation Selector 56 use the |0 roundtrip precision 57 - the other mappings (no VS & text VS) 58 use the |4 "good one-way" precision 59 60 See http://bugs.icu-project.org/trac/ticket/9602 61 62 == How the -2012.ucm tables were created in September 2012 63 64 The 2012 versions were created by 65 http://code.google.com/p/emoji4unicode/source/browse/trunk/src/gen_conversion_files.py 66 67 using each of the 2007 versions as the base table files 68 to avoid non-Emoji changes: 69 70 icu_mappings=~/p4/emoji/google_vendor_src_branch/icu/source/data/mappings 71 dest=~/www/no_crawl/emoji 72 ./gen_conversion_files.py $icu_mappings/docomo-shift_jis-2007.ucm 73 cp ../generated/docomo-shift_jis-2012.ucm $dest 74 ./gen_conversion_files.py $icu_mappings/kddi-shift_jis-2007.ucm 75 cp ../generated/kddi-shift_jis-2012.ucm $dest 76 ./gen_conversion_files.py $icu_mappings/softbank-shift_jis-2007.ucm 77 cp ../generated/softbank-shift_jis-2012.ucm $dest 78 ./gen_conversion_files.py 79 80 The emoji4unicode code uses the mappings that were established during the 81 Unicode Emoji standardization process. 82 The new conversion tables round-trip carrier Emoji symbol codes 83 to and from Unicode 6 standard code points 84 and also include fallback mappings from the Google PUA code points 85 to the carrier codes. 86 87 The trailing "|0" etc. on the mapping table lines specify the mapping type: 88 |0 round-trip Unicode <-> charset 89 |1 fallback Unicode -> charset 90 |3 "reverse fallback" Unicode <- charset 91 92 For details about the .ucm file format see 93 http://userguide.icu-project.org/conversion/data#TOC-.ucm-File-Format 94 95 == How the -2007.ucm tables were created 96 97 So far, we haven't obtained "official" conversion tables from the cell 98 phone carriers. However, we empirically know their clients support 99 VDCs in MS932, like U2460 (CIRCLED DIGIT ONE), etc. Hence we use 100 MS932 as the base table for them. 101 102 kddi-jisx-208-2007.ucm is based on jisx-208.ucm in this directory. 103 The original table's mappings to codes 0x75xx to 0x7Bxx are excluded 104 to avoid collisions with emoji. 105 106 kddi-shift_jis-2007.ucm is based on windows-932-2000.ucm. 107 The original table's mappings to codes 0xEBxx to 0xEExx, and 0xF0xx to 108 0xF90xx (EUDC block), are excluded to avoid collisions with emoji. 109 110 docomo-shift_jis-2007.ucm is based on windows-932-2000.ucm. 111 The original table's mappings to codes 0xF0xx to 0xF90xx (EUDC block) 112 are excluded to avoid collisions with emoji. 113 114 softbank-shift_jis-2007.ucm is based on windows-932-2000.ucm. 115 The original table's mappings to codes 0xF0xx to 0xF90xx (EUDC block), 116 and 0xFBxx, are excluded to avoid collisions with emoji. 117 118 softbank-jisx-208-2007.ucm is based on jisx-208.ucm in this directory. 119 The original table's mappings to codes 0x75xx to 0x7Bxx, and 0x7Dxx 120 are excluded to avoid collisions with emoji. 121 122 == Google Standard Emoji Unicode Mapping 123 124 The Google standard emoji Unicode mapping can be found at: 125 126 /home/build/google3/i18n/encodings/emoji/emoji_unicode_mapping.txt 127 128 129 130 TODO(mscherer): Use <icu:base> to share most standard JIS mappings 131 among *-shift_jis-2007.ucm files. 132