common/transforms/Hiragana-Katakana.xml

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE supplementalData SYSTEM "../../common/dtd/ldmlSupplemental.dtd">
<!--
Copyright  1991-2013 Unicode, Inc.
CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/)
For terms of use, see http://www.unicode.org/copyright.html
-->
<supplementalData>
	<version number="$Revision: 12263 $"/>
	<transforms>
		<transform source="Hira" target="Kana" direction="both" alias="Hiragana-Katakana und-Kana-t-und-hira" backwardAlias="Katakana-Hiragana und-Hira-t-und-kana">
			<tRule>
# note: a global filter is more efficient, but MUST include all source chars
:: [\u0000-\u007E  - - -[:Hiragana:] [:Katakana:] [:nonspacing mark:]] ;
:: NFKC ();
# Hiragana-Katakana
# This is largely a one-to-one mapping, but it has a
# few kinks:
# 1. The Katakana va/vi/ve/vo (30F7-30FA) have no
# Hiragana equivalents.  We use Hiragana wa/wi/we/wo
# (308F-3092) with a voicing mark (3099), which is
# semantically equivalent.  However, this is a non-
# roundtripping transformation.
# 2. The Katakana small ka/ke (30F5,30F6) have no
# Hiragana equiavlents.  We convert them to normal
# Hiragana ka/ke (304B,3051).  This is a one-way
# information-losing transformation and precludes
# round-tripping of 30F5 and 30F6.
# 3. The combining marks 3099-309C are in the Hiragana
# block, but they apply to Katakana as well, so we
# leave them untouched.
# 4. The Katakana prolonged sound mark 30FC doubles the
# preceding vowel.  This is a one-way information-
# losing transformation from Katakana to Hiragana.
# 5. The Katakana middle dot separates words in foreign
# expressions; we leave this unmodified.
# The above points preclude successful round-trip
# transformations of arbitrary input text.  However,
# they provide naturalistic results that should conform
# to user expectations.
# Combining equivalents va/vi/ve/vo
  ;
  ;
  ;
  ;
# One-to-one mappings, main block
# 3041:3094  30A1:30F4
# 309D,E  30FD,E
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
  ;
# One-way Katakana-Hiragana xform of small K ka/ke to
# normal H ka/ke.
  ;
  ;
# Katakana followed by a prolonged sound mark 30FC has
# its final vowel doubled.  This is a Katakana-Hiragana
# one-way information-losing transformation.  We
# include the small Katakana (e.g., small A 3041) and
# do not distinguish them from their large
# counterparts.  It doesn't make sense to double a
# small counterpart vowel as a small Hiragana vowel, so
# we don't do so.  In natural text this should never
# occur anyway.  If a 30FC is seen without a preceding
# vowel sound (e.g., after n 30F3) we do not change it.
### $long = ;
# The following categories are Hiragana, not Katakana
# as might be expected, since by the time we get to the
# 30FC, the preceding character will have already been
# transformed to Hiragana.
# {The following mechanically generated from the
# Unicode 3.0 data:}
$xa = [ \
      \
      \
      \
];
$xi = [ \
      \
      \
   \
];
$xu = [ \
      \
      \
      \
];
$xe = [ \
      \
      \
   \
];
$xo = [ \
      \
      \
     \
];
  $xa {};
  $xi {};
  $xu {};
  $xe {};
  $xo {};
:: (NFKC) ;
# note: a global filter is more efficient, but MUST include all source chars!!
:: ([\u0000-\u007E  - - -[:Hiragana:] [:Katakana:] [:nonspacing mark:]]);
# eof
			</tRule>
		</transform>
	</transforms>
</supplementalData>