Home | History | Annotate | Download | only in rfc
      1 <?xml version="1.0" encoding="US-ASCII"?>
      2 <!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
      3 <!ENTITY RFC4646 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4646.xml">
      4 <!ENTITY rfc5646 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5646.xml">
      5 ]>
      6 <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
      7 <?rfc strict="yes" ?>
      8 <?rfc toc="yes"?>
      9 <?rfc tocdepth="4"?>
     10 <?rfc symrefs="yes"?>
     11 <?rfc sortrefs="yes" ?>
     12 <?rfc compact="yes" ?>
     13 <?rfc subcompact="no" ?>
     14 <rfc category="info" docName="draft-davis-t-langtag-ext-08" ipr="trust200902"
     15 	submissionType="independent"
     16 >
     17 	<front>
     18 
     19 
     20 		<title abbrev="BCP 47 Extension T">BCP 47 Extension T - Transformed Content</title>
     21 
     22 		<author fullname="Mark Davis" initials="M.E." surname="Davis">
     23 			<organization>Google</organization>
     24 			<address>
     25 				<email>mark (a] macchiato.com</email>
     26 			</address>
     27 		</author>
     28 
     29 		<author fullname="Addison Phillips" initials="A" surname="Phillips">
     30 			<organization>Lab126</organization>
     31 			<address>
     32 				<email>addison (a] lab126.com</email>
     33 			</address>
     34 		</author>
     35 
     36 		<author initials="Y" surname="Umaoka" fullname="Yoshito Umaoka">
     37 			<organization abbrev="IBM">IBM</organization>
     38 			<address>
     39 				<email>yoshito_umaoka (a] us.ibm.com</email>
     40 			</address>
     41 		</author>
     42 		
     43         <author initials="C" surname="Falk" fullname="Courtney Falk">
     44             <organization abbrev="Infinite Automata">Infinite Automata</organization>
     45             <address>
     46                 <email>court (a] infiauto.com</email>
     47             </address>
     48         </author>
     49         
     50 		<date month="December" year="2011" day="6" />
     51 
     52 
     53 
     54 		<!-- Meta-data Declarations -->
     55 
     56 		<area>General</area>
     57 
     58 		<workgroup>Internet Engineering Task Force</workgroup>
     59 
     60 
     61 
     62 		<keyword>locale</keyword>
     63 		<keyword>bcp 47</keyword>
     64 
     65 		<!-- Keywords will be incorporated into HTML output files in a meta tag 
     66 			but they have no effect on text or nroff output. If you submit your draft 
     67 			to the RFC Editor, the keywords will be used for the search engine. -->
     68 
     69 		<abstract>
     70 			<t>
     71 				This document specifies an Extension to BCP 47
     72 				which provides
     73 				subtags
     74 				for specifying the source language or script of transformed
     75 				content,
     76 				including content
     77 				that
     78 				has been transliterated, transcribed, or
     79 				translated, or in some other way influenced by the source. It also provides for additional information used for
     80 				identification.
     81 			</t>
     82 		</abstract>
     83 	</front>
     84 
     85 	<middle>
     86 		<section title="Introduction">
     87 			<t>
     88 				<xref target="BCP47"></xref>
     89 				permits the definition and registration of language tag extensions
     90 				"that contain a language component and are compatible with
     91 				applications that
     92 				understand language tags". This document defines an
     93 				extension for
     94 				specifying the source of content that has been transformed,
     95 				including text that has been transliterated, transcribed, or
     96 				translated, or in some other way influenced by the source.
     97 				It may be used in queries to request content that has been
     98 				transformed.
     99 				The "singleton" identifier for this extension is 't'.
    100 			</t>
    101 			<t>
    102 				Language tags, as defined by
    103 				<xref target="BCP47"></xref>, are useful for identifying the language of content.
    104 				There are
    105 				mechanisms for specifying variant subtags for special purposes.
    106 				However, these variants are insufficient for specifying content that has
    107 				undergone
    108 				transformations,
    109 				including content that has been
    110 				transliterated,
    111 				transcribed, or
    112 				translated.
    113 				The correct interpretation of the content may depend upon knowledge of the conventions used for the transformation.
    114 			</t>
    115 			<t>
    116 			   Suppose that Italian or Russian
    117 			   cities on a map are transcribed for Japanese users. Each name needs to be
    118 			   transliterated into katakana using rules appropriate for the specific
    119 			   source and target language.   When tagging such data, it is important
    120 			   to be able to indicate not only the resulting content language ("ja"
    121 			   in this case), but also the source language.</t>
    122 						<t>Transforms such as transliterations may vary depending not only on the
    123 			   basis of the source and target script, but also on the source and target language.
    124 			   Thus the
    125 			   Russian &lt;U+041F U+0443 U+0442 U+0438 U+043D> (which corresponds to
    126 			   the Cyrillic &lt;PE, U, TE, I, EN>) transliterates into "Putin" in
    127 			   English but "Poutine" in French.  The identifier could be used to indicate
    128 			   a desired mechanical transformation in an API, or could be used to tag
    129 			   data that has been converted (mechanically or by hand) according to a
    130 			   transliteration method.</t>
    131 				<t>
    132 				In addition, many different conventions have arisen for how to transform text, even between the same languages and scripts.
    133                 For example, "Gaddafi" is commonly transliterated from Arabic to English as any of (G/Q/K/Kh)a(d/dh/dd/dhdh/th/zz)af(i/y).
    134 				Some examples of  standardized conventions used for transcribing or transliterating text include:
    135                 <list style="letters">
    136 					<t>United Nations Group of Experts on Geographical Names (UNGEGN)</t>
    137 					<t>US Library of Congress (LOC)</t>
    138 					<t>US Board on Geographic Names (BGN)</t>
    139 					<t>Korean Ministry of Culture, Sports and Tourism (MCST)</t>
    140 					<t>International Organization for Standardization (ISO)</t>
    141 			     </list>
    142 				</t>
    143 				<t>The usage of this extension is not limited to formal transformations, 
    144 				and may include other instances where the content is in some other way influenced by the source. 
    145 				For example, this extension could be used to designate a request for a speech recognizer 
    146 				that is tailored specifically for 2nd-language speakers who are 
    147 				1st-language speakers of a particular language (e.g. a recognizer for "English spoken with a Chinese accent").</t>
    148 			<section title="Requirements Language">
    149 				<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
    150 					NOT",
    151 					"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
    152 					in this
    153 					document are to be interpreted as described in RFC 2119.</t>
    154 			</section>
    155 		</section>
    156 
    157     
    158 
    159     <?rfc needLines="8" ?>
    160 
    161 		<section title="BCP47 Required Information">
    162             <section title="Overview">
    163 				<t>
    164 					Identification of transformed content can be done using the 't' extension
    165 					defined in this document.
    166 					This extension is formed by the 't'
    167 					singleton followed by a sequence of subtags that would form a
    168 					language tag as defined by
    169 					<xref target="BCP47"></xref>.
    170 					This allows for the source language or script to be specified to
    171 					the degree of precision required.
    172 					There are restrictions on the
    173 					sequence of subtags.
    174 					They MUST form a regular, valid, canonical
    175 					language
    176 					tag, and MUST neither include extensions nor private use
    177 					sequences introduced by the
    178 					singleton
    179 					'x'.
    180 					Where only the script is
    181 					relevant (such as identifying
    182 					a
    183 					script-script
    184 					transliteration) then
    185 					'und' is used for the primary language subtag.
    186 				</t>
    187 				<t>For example:</t>
    188 				<texttable>
    189 					<ttcol>Language Tag</ttcol>
    190 	
    191 					<ttcol>Description</ttcol>
    192 	
    193 					<c>ja-t-it</c>
    194 	
    195 					<c>The content is Japanese, transformed from Italian.</c>
    196 	
    197 					<c>ja-Kana-t-it</c>
    198 	
    199 					<c>The content is Japanese Katakana, transformed from Italian.</c>
    200 	
    201 					<c>und-Latn-t-und-cyrl</c>
    202 	
    203 					<c>The content is in the Latin script, transformed from the Cyrillic
    204 						script.</c>
    205 	
    206 				</texttable>
    207 				<t>
    208 					Note that the sequence of subtags governed by 't' cannot contain a
    209 					singleton (a single-character subtag), because that would start a
    210 					new extension.
    211 					For example, the tag "ja-t-i-ami"
    212 					does not indicate
    213 					that the source is in "i-ami", because "i-ami" is not a
    214 					regular
    215 					language tag in
    216 					<xref target="BCP47"></xref>. That tag would express an empty 't' extension followed by an 'i'
    217 					extension.
    218 				</t>
    219 				<t>The 't' extension is not intended for use in structured data that already provides 
    220 				separate source and target language identifiers.
    221 				For example, this is the case in localization interchange formats such as XLIFF.
    222 				In such cases, it would be inappropriate to use "ja-t-it" for the target language tag because the source language tag
    223 				"it" would already be present in the data. Instead one would use the language tag "ja".
    224 				</t>
    225 				<t>As noted earlier, it is sometimes necessary to indicate additional
    226 					information about a transformation.
    227 					This additional information is optionally supplied after the source in a series of one or more fields,
    228 					where each field consists of a field separator subtag followed by one or more non-separator subtags.
    229 					Each field separator subtag consists of a single letter followed by a single digit.
    230 					</t>
    231 				<t>A transformation mechanism is an optional field that indicates
    232 					the
    233 					specification used for the transformation, such as "UNGEGN" for
    234 					the
    235 					the United Nations Group of Experts on
    236 					Geographical
    237 					Names
    238 					transliterations and transcriptions. It uses the 'm0' field separator followed by certain subtags.
    239 				</t>
    240 				<t>For example:</t>
    241 				<texttable>
    242 					<ttcol>Language Tag</ttcol>
    243 	
    244 					<ttcol>Description</ttcol>
    245 	
    246 					<c>und-Cyrl-t-und-latn-m0-ungegn-2007</c>
    247 	
    248 					<c>the content is in Cyrillic, transformed from Latn, according
    249 						to a
    250 						UNGEGN specification dated 2007.</c>
    251 	
    252 				</texttable>
    253 				<t>The field separator subtags such as 'm0' were chosen because they are
    254 					short, visually distinctive,
    255 					and cannot occur in a language subtag
    256 					(outside of an extension and
    257 					after 'x'),
    258 					thus eliminating the
    259 					potential for collision or confusion with the
    260 					source language tag.</t>
    261 				<t>
    262 					The field subtags are defined by
    263 					<eref target="http://unicode.org/reports/tr35/">Section 3</eref>
    264 					of
    265 					<xref target="UTS35">Unicode
    266 						Technical Standard #35: Unicode Locale Data
    267 						Markup Language</xref> (LDML), the main specification for the Unicode
    268                     Common Locale Data Repository (CLDR) project.
    269                     As required by BCP 47, subtags follow the language tag ABNF and
    270 					other rules for the formation of language tags and subtags, are
    271 					restricted to the ASCII letters and digits, are not case sensitive,
    272 					and do not exceed eight characters in length.
    273 				</t>
    274 				<t>
    275 					EDITORIAL NOTE: This new facility has been accepted by the Unicode
    276 				    CLDR committee for incorporation into the next versions of CLDR and LDML, parallel
    277 					with the structure of the 'u' extension
    278 					<xref target="RFC6067"></xref>,
    279 					for which it is already the maintaining authority.
    280 					The data and
    281 					specification will be available by the time this internet
    282 					draft has
    283 					been
    284 					approved.
    285 				</t>
    286 				<t>The LDML specification is available over the Internet and at no cost, and
    287 					is
    288 					available via a royalty-free license at
    289 					http://unicode.org/copyright.html. LDML is versioned, and each
    290 					version of LDML is numbered, dated, and stable. Extension subtags,
    291 					once
    292 					defined by LDML, are never retracted or substantially changed in meaning. </t>
    293 				<t>The maintaining authority for the 't' extension is
    294 					the Unicode
    295 					Consortium:</t>
    296 	
    297 				<texttable>
    298 					<ttcol>Item</ttcol>
    299 	
    300 					<ttcol>Value</ttcol>
    301 	
    302 					<c>Name</c>
    303 	
    304 					<c>Unicode Consortium</c>
    305 	
    306 					<c>Contact Email</c>
    307 	
    308 					<c>cldr-contact (a] unicode.org</c>
    309 	
    310 					<c>Discussion List Email</c>
    311 	
    312 					<c>cldr-users (a] unicode.org</c>
    313 	
    314 					<c>URL Location</c>
    315 	
    316 					<c>cldr.unicode.org</c>
    317 	
    318 					<c>Specification</c>
    319 	
    320 					<c>Unicode Technical Standard #35 Unicode Locale Data Markup
    321 						Language (LDML), http://unicode.org/reports/tr35/</c>
    322 					<c>Section</c>
    323 	
    324 					<c>Section 3 Unicode Language and Locale Identifiers</c>
    325 				</texttable>
    326             </section>
    327 			<section title="Structure" anchor="structure">
    328 				<t>The subtags in the 't' extension are of the following form:</t>
    329 <figure>			
    330 <artwork type='abnf'>
    331 t-ext=    "t"                      ; Extension
    332           (("-" lang *("-" field)) ; Source + optional field(s)
    333           / 1*("-" field))         ; Field(s) only (no source)
    334 
    335 lang=     language                 ; BCP47, with restrictions
    336           ["-" script]
    337           ["-" region]
    338           *("-" variant)
    339 
    340 field=    sep 1*("-" 3*8alphanum)  ; With restrictions
    341 
    342 sep=      ALPHA DIGIT              ; Subtag separators
    343 alphanum= ALPHA / DIGIT
    344 </artwork>
    345 </figure>
    346                 <t>where &lt;language>, &lt;script>, &lt;region>, and &lt;variant> rules are specified in <xref target="BCP47"></xref>,
    347                 &lt;ALPHA> and &lt;DIGIT> rules - in <xref target="RFC5234"></xref>.</t>
    348 				<t>Description and restrictions:
    349 					<list style="letters">
    350 						<t>The 't' extension MUST have at least one subtag.</t>
    351 						<t>
    352 							The 't' extension normally starts with a source language tag,
    353 							which MUST be a regular, canonical language tag as specified by
    354 							<xref target="BCP47"></xref>.
    355 							Tags described by the 'irregular' production in BCP 47 MUST NOT
    356 							be
    357 							used to form the language tag.
    358 							The source language tag MAY be
    359 							omitted: some field values do not
    360 							require it.
    361 						</t>
    362 						<t>There is optionally a sequence of fields, where each field has a
    363 							separator followed by a sequence of one or more subtags.
    364 							Two identical field
    365 							separators MUST NOT be present in the language tag.</t>
    366 						<t>
    367 							The order of the fields in a 't' extension is not significant. The order of subtags within a field is significant.
    368 							(See
    369 							<xref target='canonicalization' />
    370 							Canonicalization.)
    371 						</t>
    372 		                <t>
    373 		                    The 't' subtag fields are defined by 
    374 		                    <eref target="http://unicode.org/reports/tr35/">Section 3</eref>
    375 		                    of
    376 		                    <xref target="UTS35">Unicode
    377 		                        Technical Standard #35: Unicode Locale
    378 		                        Data Markup Language</xref>.
    379 		                </t>
    380 					</list>
    381 				</t>
    382 			</section>
    383             <section title="Canonicalization" anchor="canonicalization">
    384                 <t>As required by
    385                     <xref target="BCP47"></xref>, the use of uppercase or lowercase letters is not significant in
    386                     the subtags used in this extension. The canonical form for all
    387                     subtags in the extension is lowercase, with the fields ordered by
    388                     the separators, alphabetically.
    389                     The order of subtags within a field is significant, and MUST NOT be changed in the process of canonicalizing.</t>
    390             </section>
    391             <section title="BCP47 Registration Form" anchor="regform">
    392                 <t>
    393                     Per
    394                     <xref target="BCP47">RFC 5646, Section 3.7</xref>:
    395                 </t>
    396                 <figure>
    397                     <artwork>
    398 %%
    399 Identifier: t
    400 Description: Specifying Transformed Content
    401 Comments: Subtags for the identification of content that has been
    402 transformed, including but not limited to: 
    403 transliteration, transcription, and translation.
    404 Added: 2010-mm-dd
    405 RFC: [TBD]
    406 Authority: Unicode Consortium
    407 Contact_Email: cldr-contact (a] unicode.org
    408 Mailing_List: cldr-users (a] unicode.org
    409 URL: http://www.unicode.org/Public/cldr/latest/core.zip
    410 %% </artwork>
    411                 </figure>
    412 
    413             </section>
    414             <section title="Field Definitions" anchor="summary">
    415                 <t>Assignment of 't' field subtags is determined by the Unicode CLDR
    416                     Technical Committee, in accordance with the policies and procedures
    417                     in
    418                     <eref target="http://www.unicode.org/consortium/tc-procedures.html">http://www.unicode.org/consortium/tc-procedures.html</eref>,
    419                     and subject to the Unicode Consortium Policies on
    420                     <eref target="http://www.unicode.org/policies/policies.html">http://www.unicode.org/policies/policies.html</eref>.</t>
    421                 <t>
    422                     Assignments that can be made by successive versions of
    423                     <xref target="UTS35">LDML</xref>
    424                     by the Unicode Consortium without requiring a new RFC include:
    425                     <list style="symbols">
    426                     <t>The
    427                     allocation of new field separator subtags for use after the 't' extension.</t>
    428                     <t>The allocation of subtags valid after a field separator subtag.</t>
    429                     <t>The addition of subtag aliases and descriptions. </t>
    430                     <t>The modification of subtag descriptions.</t>
    431                     </list>
    432                     Changes to the syntax or meaning of the 't' extension would require a new 
    433                     RFC that obsoletes this document; such an RFC would break stability, and
    434                     would thus be contrary to the policies of the Unicode Consortium.
    435                 </t>
    436 				<t>
    437 				  At the time this document was published, one field was specified in 
    438 				  <xref target="UTS35"></xref>: the transform mechanism.
    439                   That field is summarized here:
    440 					<list style="letters">
    441 						<t>
    442 							The transform mechanism consists of a sequence of
    443 							subtags
    444 							starting
    445 							with the 'm0' separator followed by one or more
    446 							mechanism subtags.
    447 							Each mechanism subtag has a length of 3 to 8
    448 							alphanumeric
    449 							characters.
    450 							The sequence as a whole provides an
    451 							identification of the
    452 							specification
    453 							for the transform,
    454 							such as the
    455 							mechanism subtag 'ungegn' in
    456 							"und-Cyrl-t-und-latn-m0-ungegn".
    457 							In
    458 							many cases, only one mechanism subtag is necessary, but
    459 							multiple
    460 							subtags MAY be defined in
    461 							<xref target="UTS35"></xref>
    462 							where necessary.
    463 						</t>
    464 						<t>
    465 							Any purely numeric subtag is a representation of a date in the
    466 							Gregorian calendar.
    467 							It MAY occur in any mechanism field, but it SHOULD only be used where necessary.
    468 							If it does occur:
    469 							<list style="symbols">
    470 								<t>it MUST occur as the final subtag in the field</t>
    471 								<t>it MUST NOT be the only subtag in the field</t>
    472 								<t>it MUST only consist of a sequence of digits of the form YYYY,
    473 									YYYYMM, or YYYYMMDD</t>
    474                                 <t>it SHOULD be as short as possible</t>
    475                             <t>Note: The format is related to that of <xref target="RFC3339"></xref>, but is not the same.
    476                             The RFC 3339 full-date won't work because it uses hyphens. The offset ("Z") is not used
    477                             because the date is a publication date (aka 'floating date'). For more information, see
    478                              Section 3.3, Floating Time in 
    479                              <xref target="W3C-TimeZones"></xref>.</t>
    480 							</list>
    481 							Examples:
    482 							<list style="symbols">
    483 							<t>20110623 represents June 23rd, 2011.</t>
    484 							<t>There are 3 dated versions of the UNGEGN transliteration
    485                             specification for Hebrew to Latin. They can be represented by the following language tags:
    486                             <list style="symbols">
    487                                 <t>und-Hebr-t-und-Latn-m0-ungegn-1972</t>
    488                                 <t>und-Hebr-t-und-Latn-m0-ungegn-1977</t>
    489                                 <t>und-Hebr-t-und-Latn-m0-ungegn-2007</t>
    490                             </list>
    491 							</t>
    492 							<t>Suppose that the BGN transliteration
    493 							specification for Cyrillic to Latin had three versions,
    494 							dated
    495 							June 11th, 1999; Dec 30th, 1999; and May 1st, 2011.
    496 							In that
    497 							case, the corresponding first two DATE subtags would require
    498 							months
    499 							to be distinctive (199906 and 199912), but the last
    500 							subtag
    501 							would only
    502 							require the year (2011).</t>
    503 							</list>
    504 						</t>
    505 						<t>
    506 							Some mechanisms may use a versioning system that is not
    507 							distinguished by date, or not by date alone.
    508 							In the latter case,
    509 							the version will be of a form specified by
    510 							<xref target="UTS35"></xref>
    511 							for that mechanism.
    512 							For example, if the mechanism XXX uses
    513 							versions of the form v21a,
    514 							then a tag could look like
    515 							"ja-t-it-m0-xxx-v21a". If there are
    516 							multiple subversions
    517 							distinguished by date,
    518 							then a tag could look like
    519 							"ja-t-it-m0-xxx-v21a-2007".
    520 						</t>
    521 					</list>
    522 					
    523 				</t>
    524 				<t>A language tag with the 't' extension MAY be used to request a specific transform of content.
    525 				In such a case, the recipient SHOULD return content that corresponds
    526 				as closely as feasible to the requested transform, including the specification of the mechanism.
    527 				For example, if the request is ja-t-it-m0-xxx-v21a-2007,
    528 				and the recipient has content corresponding to both ja-t-it-m0-xxx-v21a and ja-t-it-m0-xxx-v21b-2009, then the v21a version would be preferred.
    529 				As is the case for language matching as discussed in <xref target="BCP47"></xref>,
    530 				different implementations MAY have different measures of "closeness".</t>
    531 			</section>
    532 			<section title="Registration of Field Subtags" anchor="registration">
    533 				<t>Registration of transform mechanisms is requested by filing a ticket at
    534 					<eref target="http://cldr.unicode.org/">cldr.unicode.org</eref>.
    535 					The proposal in the ticket MUST contain the following information:</t>
    536 				<texttable>
    537                     <ttcol>Item</ttcol>
    538                     <ttcol>Description</ttcol>
    539                     <c>Subtag</c>
    540                     <c>The proposed mechanism subtag (or subtag sequence).</c>
    541                     <c>Description</c>
    542                     <c>A description of the proposed mechanism; that description MUST be sufficient to distinguish it from other mechanisms in use.</c>
    543                     <c>Version</c>
    544                     <c>If versioning for the mechanism is not done according to date, then a description of the versioning conventions used for the mechanism.</c>
    545 				</texttable>
    546                 <t>Proposals for clarifications of descriptions or additional aliases may also be requested by filing a ticket.</t>
    547                 <t>The committee MAY define a template for submissions that requests more information,
    548                  if it is found that such information would be useful in evaluating proposals.</t>
    549 			</section>
    550             <section title="Registration of Additional Fields" anchor="field-registration">
    551                 <t>In the event that it proves necessary to add an additional field (such as 'm2'),
    552                 it can be requested by filing a ticket at
    553                     <eref target="http://cldr.unicode.org/">cldr.unicode.org</eref>.
    554                     The proposal in the ticket MUST contain a full description of the
    555                     proposed field semantics and subtag syntax,
    556                     and MUST be conform to the ABNF syntax for "field" presented in <xref target="structure" />.</t>
    557             </section>
    558             <section title="Committee Responses to Registration Proposals" anchor="committee-responses">
    559                 <t>The committee MUST post each proposal publicly within 2 weeks after reception,
    560                 to allow for comments. The committee must respond publicly to each proposal within 4 weeks after reception.</t>
    561                 <t>The response MAY:
    562                     <list style="symbols">
    563                         <t>request more information or clarification</t>
    564                         <t>accept the proposal, optionally with modifications to the subtag or description</t>
    565                         <t>reject the proposal, because of significant objections raised on the mailing list or 
    566                         due to problems with constraints in this document or in <xref target="UTS35"></xref></t>
    567                     </list>
    568                 </t>
    569                 <t>Accepted tickets result in a new entry in the machine-readable CLDR BCP47 data,
    570                 or in the case of a clarified description,
    571                 modifications to the description attribute value for an existing entry.</t>
    572             </section>
    573             <section title="Machine-Readable Data" anchor="machine-readable">
    574 				<t>
    575 					EDITORIAL NOTE: The following parallels the structure used for the
    576 					'u' extension
    577 					<xref target="RFC6067"></xref>,
    578 					for which the Unicode Consortium is the maintaining authority.
    579 					The
    580 					data and
    581 					specification will be available by the time this internet
    582 					draft has
    583 					been
    584 					approved. The description field is in the process of being added to CLDR.
    585 				</t>
    586 				<t>
    587 					Beginning with CLDR version 1.7.2, machine-readable files are
    588 					available listing the data defined for BCP47 extensions for each
    589 					successive version of
    590 					<xref target="UTS35"></xref>. These releases are listed on
    591 					<eref target="http://cldr.unicode.org/index/downloads">http://cldr.unicode.org/index/downloads</eref>.
    592 					Each release has an associated data directory of the form
    593 					"http://unicode.org/Public/cldr/<version>;", where
    594 					"&lt;version&gt;" is replaced by the release number. For example,
    595 					for version 1.7.2, the "core.zip" file is located at
    596 					<eref target="http://unicode.org/Public/cldr/1.7.2/">http://unicode.org/Public/cldr/1.7.2/core.zip</eref>.
    597 					The most
    598                     recent version is always identified by the version "latest" and can
    599                     be accessed by the URL in
    600                     <xref target="regform"></xref>.</t>
    601 			     <t>Inside the "core.zip" file, the directory "common/bcp47" contains the
    602 					data files listing the valid attributes, keys, and types for each successive version of <xref target="UTS35"></xref>.
    603 					Each data file list the keys and types relevant to that topic. For example, mechanism.xml contains the subtags (types) for the 't' mechanisms.</t>
    604 					<t>The XML structure lists the keys, such as &lt;key extension="t" name="m0" alias="collation" description="Transliteration extension mechanism">, with subelements for the types, 
    605 					such as &lt;type name="ungegn" description="United Nations Group of Experts on Geographical Names"/>. The currently defined attributes for the mechanisms include:</t>
    606 			     <texttable>
    607                     <ttcol>Attribute</ttcol>
    608                     <ttcol>Description</ttcol>
    609                     <ttcol>Examples</ttcol>
    610                     
    611                     <c>name</c>
    612                     <c>The name of the mechanism, limited to 3-8 characters (or sequences of them).</c>
    613                     <c>UNGEGN, ALALC</c>
    614                     
    615                     <c>description</c>
    616                     <c>A description of the name, with all and only that information necessary to distinguish one name
    617                      from others with which it might be confused.  Descriptions are not intended to provide general background information.</c>
    618                     <c>United Nations Group of Experts on Geographical Names; American Library Association-Library of Congress</c>
    619 
    620                     <c>since</c>
    621                     <c>Indicates the first version of CLDR where the name appears. (Required for new items.)</c>
    622                     <c>1.9, 2.0.1</c>
    623                     
    624                     <c>alias</c>
    625                     <c>Alternative name of the key or type, not limited in number of characters. Aliases are intended for backwards compatibility,
    626                     not to provide all possible alternate names or designations. (Optional)</c>
    627                     <c></c>
    628 
    629 				</texttable>
    630 				<t>The file for the transform extension is "transform.xml".
    631 				The initial version of that file contains the following information.</t>
    632 				<figure><artwork>
    633 &lt;key extension="t" name="m0" description=
    634       "Transliteration extension mechanism"/>
    635    &lt;type name="ungegn" description=
    636       "United Nations Group of Experts on Geographical Names"/>
    637    &lt;type name="alaloc" description=
    638       "American Library Association-Library of Congress"/> 
    639    &lt;type name="bgn" description=
    640       "US Board on Geographic Names"/> 
    641    &lt;type name="mcst" description=
    642       "Korean Ministry of Culture, Sports and Tourism"/> 
    643    &lt;type name="iso" description=
    644       "International Organization for Standardization"/> 
    645    &lt;type name="din" description=
    646       "Deutsches Institut fuer Normung"/> 
    647    &lt;type name="gost" description=
    648       "Euro-Asian Council for Standardization, Metrology
    649        and Certification"/> 
    650 &lt;/key>
    651 				</artwork></figure>
    652 				<t>
    653 					To get the version information in XML when working with the data
    654 					files, the XML parser must be validating. When the 'core.zip' file
    655 					is unzipped, the 'dtd' directory will be at the same level as the
    656 					'bcp47' directory; that is required for correct validation. For
    657 					each release after CLDR 1.8, types introduced in that release are
    658 					also marked in the data files by the XML attribute "since", such as
    659 					in the following example:
    660 					<figure>
    661 						<artwork>&lt;type name="adp" since="1.9"/&gt; </artwork>
    662 					</figure>
    663 				</t>
    664 				<t>
    665 					The data is also currently maintained in a source code repository,
    666 					with each release tagged, for viewing directly without unzipping.
    667 					For example, see:
    668 					<list style="symbols">
    669 						<t>http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/</t>
    670 						<t>http://unicode.org/repos/cldr/tags/release-1-8/common/bcp47/</t>
    671 					</list>
    672 				</t>
    673 				<t>For more information, see 
    674 				<eref target="http://cldr.unicode.org/index/bcp47-extension">http://cldr.unicode.org/index/bcp47-extension</eref>.</t>
    675 			</section>
    676 		</section>
    677 		<section anchor="Acknowledgements" title="Acknowledgements">
    678 			<t>Thanks to John Emmons and the rest of the Unicode
    679 				CLDR Technical
    680 				Committee for their work in developing the BCP 47 subtags
    681 				for LDML.</t>
    682 		</section>
    683 
    684 		<section anchor="IANA" title="IANA Considerations">
    685 			<t>
    686 				This document will require IANA to insert the record of
    687 				<xref target="regform"></xref>
    688 				into the Language Extensions Registry, according to
    689 				Section 3.7,
    690 				Extensions and the Extensions Registry of "Tags for
    691 				Identifying
    692 				Languages" in
    693 				<xref target="BCP47"></xref>. Per Section 5.2 of
    694 				<xref target="BCP47"></xref>, there might be occasional (rare) requests by the Unicode
    695 				Consortium (the "Authority" listed in the record) for maintenance of
    696 				this record. Changes that can be submitted to IANA without the
    697 				publication of a new RFC are limited to modification of the
    698 				Comments, Contact_Email, Mailing_List, and URL fields. Any such
    699 				requested changes MUST use the domain 'unicode.org' in any new
    700 				addresses or URIs, MUST explicitly cite this document (so that IANA
    701 				can reference these requirements), and MUST originate from the
    702 				'unicode.org' domain. The domain or authority can only be changed
    703 				via a new RFC.
    704 			</t>
    705 			<t>This document does not require IANA to create or maintain a new
    706 				registry or otherwise impact IANA.</t>
    707 		</section>
    708 
    709 		<section anchor="Security" title="Security Considerations">
    710 			<t>
    711 				The security considerations for this extension are the same as those
    712 				for
    713 				<xref target="BCP47"></xref>. See
    714 				<xref target="BCP47">RFC 5646, Section 6, Security Considerations</xref>.
    715 			</t>
    716 		</section>
    717 	</middle>
    718 
    719 
    720 
    721 	<back>
    722 		<references title="Normative References">
    723 			<reference anchor="UTS35" target="http://www.unicode.org/reports/tr35/">
    724 				<front>
    725 					<title abbrev="LDML">
    726 						Unicode Technical Standard #35: Locale Data
    727 						Markup Language (LDML)
    728 						</title>
    729 					<author initials="M" surname="Davis" fullname="Mark Davis">
    730 						<organization>Unicode Consortium</organization>
    731 					</author>
    732 					<date day="21" month="December" year="2007" />
    733 				</front>
    734 			</reference>
    735 			<reference anchor="BCP47">
    736 				<front>
    737 					<title abbrev="BCP47">Tags for the Identification of Language (BCP47)</title>
    738 					<author initials="M.E." surname="Davis" fullname="Mark Davis"
    739 						role="editor">
    740 						<organization>Google</organization>
    741 					</author>
    742                     <author initials="A." surname="Phillips" fullname="Addison Phillips"
    743                         role="editor">
    744                         <organization>Lab126</organization>
    745                     </author>
    746 					<date month="September" year="2009" />
    747 				</front>
    748 			</reference>
    749 			<reference anchor="RFC6067">
    750 				<front>
    751 					<title abbrev="RFC6067">BCP 47 Extension U</title>
    752 					<author initials="M.E." surname="Davis" fullname="Mark Davis"
    753 						role="editor">
    754 						<organization>Google
    755 						</organization>
    756 					</author>
    757                     <author initials="A." surname="Phillips" fullname="Addison Phillips"
    758                         role="editor">
    759                         <organization>Lab126</organization>
    760                     </author>
    761                     <author initials="Y." surname="Umaoka" fullname="Yoshito Umaoka"
    762                         role="editor">
    763                         <organization>IBM</organization>
    764                     </author>
    765 					<date month="September" year="2010" />
    766 				</front>
    767 			</reference>
    768 			<reference anchor="RFC5234">
    769 				<front>
    770 					<title>Augmented BNF for Syntax Specifications: ABNF</title>
    771 					<author surname="Crocker" fullname="Dave Crocker"
    772                         role="editor">
    773 						<organization>International Organization for Standardization</organization>
    774 					</author>
    775 					<date year="2008" />
    776 					<abstract>
    777 						<t>   Internet technical specifications often need to define a formal
    778    syntax.  Over the years, a modified version of Backus-Naur Form
    779    (BNF), called Augmented BNF (ABNF), has been popular among many
    780    Internet specifications.  The current specification documents ABNF.
    781    It balances compactness and simplicity with reasonable
    782    representational power.  The differences between standard BNF and
    783    ABNF involve naming rules, repetition, alternatives, order-
    784    independence, and value ranges.  This specification also supplies
    785    additional rule definitions and encoding for a core lexical analyzer
    786    of the type common to several Internet specifications.</t>
    787 					</abstract>
    788 				</front>
    789 			</reference>
    790 		</references>
    791 		<references title="Informative References">
    792 			<reference anchor="ldml-registry">
    793 				<front>
    794 					<title>Registry for Common Locale Data Repository tag elements</title>
    795 					<author fullname="Unicode Consortium"></author>
    796 					<date year="2009" month="September" />
    797 				</front>
    798 			</reference>
    799             <reference anchor="W3C-TimeZones" target="http://www.w3.org/TR/2011/NOTE-timezone-20110705/">
    800                 <front>
    801                     <title>W3C Working Group Note: Working with Time Zones</title>
    802                     <author surname="Phillips" fullname="Addison Phillips" role="editor">
    803                         <organization>W3C</organization>
    804                     </author>
    805                     <date year="2011" month="July" />
    806                 </front>
    807             </reference>
    808             <reference anchor="RFC3339">
    809                 <front>
    810                     <title>Date and Time on the Internet: Timestamps</title>
    811                     <author surname="Klyne" fullname="Graham Klyne"
    812                         role="editor">
    813                         <organization>Clearswift Corporation</organization>
    814                     </author>
    815                     <author surname="Newman" fullname="Chris Newman"
    816                         role="editor">
    817                         <organization>Sun Microsystems</organization>
    818                     </author>
    819                     <date year="2002" />
    820                     <abstract>
    821                         <t>   This document specifies an Internet standards track protocol for the
    822    Internet community, and requests discussion and suggestions for
    823    improvements.  Please refer to the current edition of the "Internet
    824    Official Protocol Standards" (STD 1) for the standardization state
    825    and status of this protocol.  Distribution of this memo is unlimited.
    826                         </t>
    827                     </abstract>
    828                 </front>
    829             </reference>
    830 		</references>
    831 
    832 
    833 	</back>
    834 </rfc>
    835