Home | History | Annotate | Download | only in rfc
      1 <?xml version="1.0" encoding="US-ASCII"?>
      2 <!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
      3 <!ENTITY RFC4646 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4646.xml">
      4 <!ENTITY rfc5646 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5646.xml">
      5 ]>
      6 <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
      7 
      8 <?rfc strict="yes" ?>
      9 <?rfc toc="yes"?>
     10 <?rfc tocdepth="4"?>
     11 <?rfc symrefs="yes"?>
     12 <?rfc sortrefs="yes" ?>
     13 <?rfc compact="yes" ?>
     14 <?rfc subcompact="no" ?>
     15 
     16 
     17 <rfc category="bcp" docName="draft-davis-u-langtag-ext-00" ipr="trust200902" submissionType="independent">
     18   
     19 
     20   
     21 
     22   <front>
     23     
     24 
     25     <title abbrev="BCP 47 Unicode Locale Extension">BCP 47 Extension U</title>
     26 
     27     <author fullname="Mark Davis" initials="M.E." surname="Davis">
     28       <organization>Google</organization>
     29 
     30       <address>
     31         
     32 
     33         
     34 
     35         <email>mark (a] macchiato.com</email>
     36 
     37         
     38       </address>
     39     </author>
     40 
     41     <author fullname="Addison Phillips" initials="A" surname="Phillips">
     42       <organization>Lab126</organization>
     43 
     44       <address><email>addison (a] inter-locale.com</email></address>
     45     </author>
     46 
     47     <author initials="Y" surname="Umaoka" fullname="Yoshito Umaoka"><organization abbrev="IBM">IBM</organization><address><email>yoshito_umaoka (a] us.ibm.com</email></address></author><date month="January" year="2010" day="13"/>
     48 
     49     
     50 
     51     <!-- Meta-data Declarations -->
     52 
     53     <area>General</area>
     54 
     55     <workgroup>Internet Engineering Task Force</workgroup>
     56 
     57     
     58 
     59     <keyword>locale</keyword><keyword>bcp 47</keyword>
     60 
     61     <!-- Keywords will be incorporated into HTML output
     62          files in a meta tag but they have no effect on text or nroff
     63          output. If you submit your draft to the RFC Editor, the
     64          keywords will be used for the search engine. -->
     65 
     66     <abstract>
     67       <t>This document specifies an Extension to BCP 47
     68       which provides subtags that specify language and/or locale-based behavior or refinements to language tags, according
     69       to work done by the Unicode Consortium.</t>
     70     </abstract>
     71   </front>
     72 
     73   <middle>
     74     <section title="Introduction">
     75       <t>  <xref target="BCP47"></xref> permits the definition and registration of language tag extensions "that
     76       contain a language component and are compatible with applications that
     77       understand language tags". This document defines an extension for identifying Unicode locale-based variations using language tags. The "singleton" identifier for this extension is 'u'.</t>
     78 
     79       <section title="Requirements Language">
     80         <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
     81         "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
     82         document are to be interpreted as described in RFC 2119.</t>
     83       </section>
     84     </section>
     85 
     86     
     87 
     88     <?rfc needLines="8" ?>
     89 
     90     <section title="BCP47 Required Information">
     91       <t>Language tags, as defined by <xref target="BCP47"></xref>, are useful for identifying the language of content. They are also used as locale identifiers (or can be mapped to locales) in many operating environments and APIs. However, most such locale identifiers also provide additional "tailorings" or options for specific values within a language, culture, region, or other variation. This extension provides a mechanism for using these additional tailorings within language tags for general interchange.</t><t> The maintaining authority for this extension's registry is the Unicode
     92       Consortium. Unicode defines common locale data and identifiers for this data:</t>
     93 
     94       <texttable>
     95         <ttcol>Item</ttcol>
     96 
     97         <ttcol>Value</ttcol>
     98 
     99         <c>Name</c>
    100 
    101         <c>Unicode Consortium</c>
    102 
    103         <c>Contact Email</c>
    104 
    105         <c>cldr (a] unicode.org</c>
    106 
    107         <c>Discussion List Email</c>
    108 
    109         <c>cldr-users (a] unicode.org</c>
    110 
    111         <c>URL Location</c>
    112 
    113         <c>cldr.unicode.org</c>
    114 
    115         <c>Specification</c>
    116 
    117         <c>Unicode Technical Standard #35 Unicode Locale Data Markup Language
    118         (LDML), http://unicode.org/reports/tr35/</c>
    119 
    120         <c>Section</c>
    121 
    122         <c>Section 3.2 BCP 47 Tag Conversion</c>
    123       </texttable>
    124 
    125       <t>The specification of extension subtags is provided by Section 3 of <xref target="LDML">Unicode
    126       Technical Standard #35 Unicode Locale Data Markup Language</xref>. As required by BCP 47, subtags follow the language tag ABNF and
    127       other rules for the formation of language tags and subtags, are restricted to the ASCII letters and digits, are not case sensitive, and do not exceed eight characters in length.</t>
    128       <t><xref target="LDML"></xref> specifies a canonical
    129       representation. LDML is available over the Internet and at no cost, and
    130       is available via a royalty-free license at
    131       http://unicode.org/copyright.html. LDML is versioned, and each
    132       version of LDML is numbered, dated, and stable. Extension subtags, once
    133       defined by LDML, are never retracted or change in meaning in a
    134       substantial way. </t>
    135 
    136       <section title="Summary">
    137         <t>The subtags available for use in the 'u' extension consist of a set of attributes, keys, and types. Attributes, keys, types, and their respective meanings are defined in Section 3 (Unicode Language and Locale Identifiers) of <xref target="LDML"></xref>. The following is a summary of that definition (for details see Section 3):<list style="symbols"><t>An 'attribute' is a subtag with a length of three or more characters following the singleton and preceding any 'keyword' sequences. No attributes were defined at the time of this document's publication.</t><t>A 'keyword' is a sequence of subtags consisting of a 'key' subtag, followed by zero or more 'type' subtags. Each 'key' MUST be unique within the extension. The order of the 'type' subtags within a 'keyword' is sometimes significant to their interpretation. Note that 'keys' can appear without a subsequent 'type' subtag.<list style="letters"><t>A 'key' is a subtag with a length of exactly two characters. Each 'key' is followed by zero or more 'type' subtags.  </t><t>A 'type' is a subtag with a length of three or more characters following a key. 'Type' subtags are specific to a particular 'key' and the order of the 'type' subtags MAY be significant to the interpretation of the 'keyword'.</t></list></t></list></t><t>For example, the language tag "de-DE-u-attr-co-phonebk" consists of:<list style="symbols"><t>The base language tag "de-DE" (German as used in Germany), exactly as defined by <xref target="BCP47"></xref> using subtags from the IANA Language Subtag Registry.</t><t>The singleton 'u', identifying this extension.</t><t>The attribute 'attr', which is an example for illustration (no attributes were defined at the time this document was published).</t><t>The keyword 'co-phonebk', consisting to the key 'co' (Collation) and the type 'phonebk' (Phonebook collation order).</t></list></t><t>With successive versions of <xref target="LDML"></xref>, additional attributes, keys, and types MAY be defined. Once defined, attributes, keys, and types will never be removed. Machine-readable files listing the valid attributes, keys, and types are available in the CLDR repository for each version. For example, for version 1.7.2, the files are located at <eref target="http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/">http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/</eref>. These also can contain aliases which were used in previous versions of <xref target="LDML"></xref>.</t><section title="Canonicalization" anchor="canonicalization"><t>As required by <xref target="BCP47"></xref>, case is not significant. The canonical form for all subtags in the extension is lowercase. The canonical order of attributes is in <xref target="US-ASCII"></xref> order (that is, numbers before letters, with letters sorted as lowercase US-ASCII code points). The canonical order of keywords is in <xref target="US-ASCII"></xref> order by key. The order of subtags within a keyword is significant; the meaning of this extension is altered if those subtags are rearranged. Thus, the canonical form of the extension never reorders the subtags within a keyword.</t>
    138 
    139         
    140       </section></section>
    141 
    142       <section title="Registration Form" anchor="regform"><t>Per <xref target="RFC5646"></xref>, Section 3.7:</t>
    143         <figure><artwork>%%
    144 Identifier: u
    145 Description: Unicode Locale
    146 Comments: Subtags for the identification of language and cultural 
    147     variations. Used to set behavior in locale APIs.
    148 Added: 2009-mm-dd
    149 RFC: [TBD]
    150 Authority: Unicode Consortium
    151 Contact_Email: cldr (a] unicode.org
    152 Mailing_List: cldr-users (a] unicode.org
    153 URL: http://cldr.unicode.org
    154 %%</artwork></figure>
    155 
    156         
    157       </section>
    158     </section>
    159 
    160     <section anchor="Acknowledgements" title="Acknowledgements">
    161       <t>Thanks to John Emmons and the rest of the Unicode
    162       CLDR Technical Committee for their work in developing the BCP 47 subtags
    163       for LDML.</t>
    164     </section>
    165 
    166     <section anchor="IANA" title="IANA Considerations">
    167       <t>This document will require IANA to insert the record in <xref target="regform"></xref> into the Language Extensions Registry, according to
    168       Section 3.7. Extensions and the Extensions Registry of "Tags for
    169       Identifying Languages" in <xref target="BCP47"></xref>. There might be occasional maintenance of this record. This document does not require IANA to create or maintain a new registry or otherwise impact IANA.</t>
    170     </section>
    171 
    172     <section anchor="Security" title="Security Considerations">
    173       <t>The security considerations for this extension are the same as those
    174       for <xref target="RFC5646"></xref> (or its successors). See Section 6. Security Considerations
    175       of <xref target="RFC5646"></xref>.</t>
    176     </section>
    177   </middle>
    178 
    179   
    180 
    181   <back>
    182     <references title="Normative References">&rfc5646;
    183 <reference anchor="LDML" target="http://www.unicode.org/reports/tr35/">
    184 <front>
    185 <title abbrev="LDML">
    186 Unicode Technical Standard #35: Locale Data Markup Language (LDML)
    187 </title>
    188 <author initials="M" surname="Davis" fullname="Mark Davis">
    189 <organization>Unicode Consortium</organization>
    190 </author>
    191 <date day="21" month="December" year="2007"/>
    192 </front>
    193 </reference>	 <reference anchor="BCP47"><front><title abbrev="BCP47">Tags for the Identification of Language (BCP47)</title><author initials="M.E." surname="Davis" fullname="Mark Davis" role="editor"><organization><?xm-replace_text {organization}?></organization></author><date month="September" year="2009"/></front></reference><reference anchor="US-ASCII">
    194 				<front>
    195 					<title>ISO/IEC 646:1991, Information technology -- ISO 7-bit coded character set for information interchange. </title>
    196 					<author>
    197 						<organization>International Organization for Standardization</organization>
    198 					</author>
    199 					<date year="1991"/>
    200 					<abstract>
    201 						<t>This standard defines an International Reference Version (IRV) which corresponds exactly to what is widely known as ASCII or US-ASCII. ISO/IEC 646 was based on the earlier standard ECMA-6. ECMA has maintained its standard up to date with respect to ISO/IEC 646 and makes an electronic copy available at http://www.ecma-international.org/publications/standards/Ecma-006.htm. ISO/IEC 646 JTC 1/SC 2</t>
    202 					</abstract>
    203 				</front>
    204 			</reference></references><references title="Informative References"><reference anchor="ldml-registry"><front><title>Registry for Common Locale Data Repository tag elements</title><author fullname="Unicode Consortium"><organization><?xm-replace_text {organization}?></organization></author><date year="2009" month="September"/></front></reference></references>
    205 
    206     
    207   </back>
    208 </rfc>
    209