Home | History | Annotate | Download | only in text
      1 /*
      2  *******************************************************************************
      3  * Copyright (C) 2004-2011, International Business Machines Corporation and    *
      4  * others. All Rights Reserved.                                                *
      5  * Copyright (C) 2009 , Yahoo! Inc.                                            *
      6  *******************************************************************************
      7  */
      8 package com.ibm.icu.text;
      9 
     10 import java.io.IOException;
     11 import java.io.ObjectInputStream;
     12 import java.text.FieldPosition;
     13 import java.text.Format;
     14 import java.text.ParsePosition;
     15 
     16 import com.ibm.icu.impl.PatternProps;
     17 
     18 /**
     19  * <p><code>SelectFormat</code> supports the creation of  internationalized
     20  * messages by selecting phrases based on keywords. The pattern  specifies
     21  * how to map keywords to phrases and provides a default phrase. The
     22  * object provided to the format method is a string that's matched
     23  * against the keywords. If there is a match, the corresponding phrase
     24  * is selected; otherwise, the default phrase is used.</p>
     25  *
     26  * <h4>Using <code>SelectFormat</code> for Gender Agreement</h4>
     27  *
     28  * <p>Note: Typically, select formatting is done via <code>MessageFormat</code>
     29  * with a <code>select</code> argument type,
     30  * rather than using a stand-alone <code>SelectFormat</code>.</p>
     31  *
     32  * <p>The main use case for the select format is gender based  inflection.
     33  * When names or nouns are inserted into sentences, their gender can  affect pronouns,
     34  * verb forms, articles, and adjectives. Special care needs to be
     35  * taken for the case where the gender cannot be determined.
     36  * The impact varies between languages:</p>
     37  *
     38  * <ul>
     39  * <li>English has three genders, and unknown gender is handled as a  special
     40  * case. Names use the gender of the named person (if known), nouns  referring
     41  * to people use natural gender, and inanimate objects are usually  neutral.
     42  * The gender only affects pronouns: "he", "she", "it", "they".
     43  *
     44  * <li>German differs from English in that the gender of nouns is  rather
     45  * arbitrary, even for nouns referring to people ("M&#u00E4;dchen", girl, is  neutral).
     46  * The gender affects pronouns ("er", "sie", "es"), articles ("der",  "die",
     47  * "das"), and adjective forms ("guter Mann", "gute Frau", "gutes  M&#u00E4;dchen").
     48  *
     49  * <li>French has only two genders; as in German the gender of nouns
     50  * is rather arbitrary - for sun and moon, the genders
     51  * are the opposite of those in German. The gender affects
     52  * pronouns ("il", "elle"), articles ("le", "la"),
     53  * adjective forms ("bon", "bonne"), and sometimes
     54  * verb forms ("all&#u00E9;", "all&#u00E9e;").
     55  *
     56  * <li>Polish distinguishes five genders (or noun classes),
     57  * human masculine, animate non-human masculine, inanimate masculine,
     58  * feminine, and neuter.
     59  * </ul>
     60  *
     61  * <p>Some other languages have noun classes that are not related to  gender,
     62  * but similar in grammatical use.
     63  * Some African languages have around 20 noun classes.</p>
     64  *
     65  * <p><b>Note:</b>For the gender of a <i>person</i> in a given sentence,
     66  * we usually need to distinguish only between female, male and other/unknown.</p>
     67  *
     68  * <p>To enable localizers to create sentence patterns that take their
     69  * language's gender dependencies into consideration, software has to  provide
     70  * information about the gender associated with a noun or name to
     71  * <code>MessageFormat</code>.
     72  * Two main cases can be distinguished:</p>
     73  *
     74  * <ul>
     75  * <li>For people, natural gender information should be maintained  for each person.
     76  * Keywords like "male", "female", "mixed" (for groups of people)
     77  * and "unknown" could be used.
     78  *
     79  * <li>For nouns, grammatical gender information should be maintained  for
     80  * each noun and per language, e.g., in resource bundles.
     81  * The keywords "masculine", "feminine", and "neuter" are commonly  used,
     82  * but some languages may require other keywords.
     83  * </ul>
     84  *
     85  * <p>The resulting keyword is provided to <code>MessageFormat</code>  as a
     86  * parameter separate from the name or noun it's associated with. For  example,
     87  * to generate a message such as "Jean went to Paris", three separate  arguments
     88  * would be provided: The name of the person as argument 0, the  gender of
     89  * the person as argument 1, and the name of the city as argument 2.
     90  * The sentence pattern for English, where the gender of the person has
     91  * no impact on this simple sentence, would not refer to argument 1  at all:</p>
     92  *
     93  * <pre>{0} went to {2}.</pre>
     94  *
     95  * <p><b>Note:</b> The entire sentence should be included (and partially repeated)
     96  * inside each phrase. Otherwise translators would have to be trained on how to
     97  * move bits of the sentence in and out of the select argument of a message.
     98  * (The examples below do not follow this recommendation!)</p>
     99  *
    100  * <p>The sentence pattern for French, where the gender of the person affects
    101  * the form of the participle, uses a select format based on argument 1:</p>
    102  *
    103  * <pre>{0} est {1, select, female {all&#u00E9;e} other {all&#u00E9;}} &#u00E0; {2}.</pre>
    104  *
    105  * <p>Patterns can be nested, so that it's possible to handle  interactions of
    106  * number and gender where necessary. For example, if the above  sentence should
    107  * allow for the names of several people to be inserted, the  following sentence
    108  * pattern can be used (with argument 0 the list of people's names,
    109  * argument 1 the number of people, argument 2 their combined gender, and
    110  * argument 3 the city name):</p>
    111  *
    112  * <pre>{0} {1, plural,
    113  * one {est {2, select, female {all&#u00E9;e} other  {all&#u00E9;}}}
    114  * other {sont {2, select, female {all&#u00E9;es} other {all&#u00E9;s}}}
    115  * }&#u00E0; {3}.</pre>
    116  *
    117  * <h4>Patterns and Their Interpretation</h4>
    118  *
    119  * <p>The <code>SelectFormat</code> pattern string defines the phrase  output
    120  * for each user-defined keyword.
    121  * The pattern is a sequence of (keyword, message) pairs.
    122  * A keyword is a "pattern identifier": [^[[:Pattern_Syntax:][:Pattern_White_Space:]]]+</p>
    123  *
    124  * <p>Each message is a MessageFormat pattern string enclosed in {curly braces}.</p>
    125  *
    126  * <p>You always have to define a phrase for the default keyword
    127  * <code>other</code>; this phrase is returned when the keyword
    128  * provided to
    129  * the <code>format</code> method matches no other keyword.
    130  * If a pattern does not provide a phrase for <code>other</code>, the  method
    131  * it's provided to returns the error  <code>U_DEFAULT_KEYWORD_MISSING</code>.
    132  * <br/>
    133  * Pattern_White_Space between keywords and messages is ignored.
    134  * Pattern_White_Space within a message is preserved and output.</p>
    135  *
    136  * <p><pre>Example:
    137  * MessageFormat msgFmt = new MessageFormat("{0} est " +
    138  *     "{1, select, female {all&#u00E9;e} other {all&#u00E9;}} &#u00E0; Paris.",
    139  *     new ULocale("fr"));
    140  * Object args[] = {"Kirti","female"};
    141  * System.out.println(msgFmt.format(args));
    142  * </pre>
    143  * <p>
    144  * Produces the output:<br/>
    145  * <code>Kirti est all&#u00E9;e &#u00E0; Paris.</code>
    146  * </p>
    147  *
    148  * @stable ICU 4.4
    149  */
    150 
    151 public class SelectFormat extends Format{
    152     // Generated by serialver from JDK 1.5
    153     private static final long serialVersionUID = 2993154333257524984L;
    154 
    155     /*
    156      * The applied pattern string.
    157      */
    158     private String pattern = null;
    159 
    160     /**
    161      * The MessagePattern which contains the parsed structure of the pattern string.
    162      */
    163     transient private MessagePattern msgPattern;
    164 
    165     /**
    166      * Creates a new <code>SelectFormat</code> for a given pattern string.
    167      * @param  pattern the pattern for this <code>SelectFormat</code>.
    168      * @stable ICU 4.4
    169      */
    170     public SelectFormat(String pattern) {
    171         applyPattern(pattern);
    172     }
    173 
    174     /*
    175      * Resets the <code>SelectFormat</code> object.
    176      */
    177     private void reset() {
    178         pattern = null;
    179         if(msgPattern != null) {
    180             msgPattern.clear();
    181         }
    182     }
    183 
    184     /**
    185      * Sets the pattern used by this select format.
    186      * Patterns and their interpretation are specified in the class description.
    187      *
    188      * @param pattern the pattern for this select format.
    189      * @throws IllegalArgumentException when the pattern is not a valid select format pattern.
    190      * @stable ICU 4.4
    191      */
    192     public void applyPattern(String pattern) {
    193         this.pattern = pattern;
    194         if (msgPattern == null) {
    195             msgPattern = new MessagePattern();
    196         }
    197         try {
    198             msgPattern.parseSelectStyle(pattern);
    199         } catch(RuntimeException e) {
    200             reset();
    201             throw e;
    202         }
    203     }
    204 
    205     /**
    206      * Returns the pattern for this <code>SelectFormat</code>
    207      *
    208      * @return the pattern string
    209      * @stable ICU 4.4
    210      */
    211     public String toPattern() {
    212         return pattern;
    213     }
    214 
    215     /**
    216      * Finds the SelectFormat sub-message for the given keyword, or the "other" sub-message.
    217      * @param pattern A MessagePattern.
    218      * @param partIndex the index of the first SelectFormat argument style part.
    219      * @param keyword a keyword to be matched to one of the SelectFormat argument's keywords.
    220      * @return the sub-message start part index.
    221      */
    222     public static int findSubMessage(MessagePattern pattern, int partIndex, String keyword) {
    223         int count=pattern.countParts();
    224         int msgStart=0;
    225         // Iterate over (ARG_SELECTOR, message) pairs until ARG_LIMIT or end of select-only pattern.
    226         do {
    227             MessagePattern.Part part=pattern.getPart(partIndex++);
    228             MessagePattern.Part.Type type=part.getType();
    229             if(type==MessagePattern.Part.Type.ARG_LIMIT) {
    230                 break;
    231             }
    232             assert type==MessagePattern.Part.Type.ARG_SELECTOR;
    233             // part is an ARG_SELECTOR followed by a message
    234             if(pattern.partSubstringMatches(part, keyword)) {
    235                 // keyword matches
    236                 return partIndex;
    237             } else if(msgStart==0 && pattern.partSubstringMatches(part, "other")) {
    238                 msgStart=partIndex;
    239             }
    240             partIndex=pattern.getLimitPartIndex(partIndex);
    241         } while(++partIndex<count);
    242         return msgStart;
    243     }
    244 
    245     /**
    246      * Selects the phrase for the given keyword.
    247      *
    248      * @param keyword a phrase selection keyword.
    249      * @return the string containing the formatted select message.
    250      * @throws IllegalArgumentException when the given keyword is not a "pattern identifier"
    251      * @stable ICU 4.4
    252      */
    253     public final String format(String keyword) {
    254         //Check for the validity of the keyword
    255         if (!PatternProps.isIdentifier(keyword)) {
    256             throw new IllegalArgumentException("Invalid formatting argument.");
    257         }
    258         // If no pattern was applied, throw an exception
    259         if (msgPattern == null || msgPattern.countParts() == 0) {
    260             throw new IllegalStateException("Invalid format error.");
    261         }
    262 
    263         // Get the appropriate sub-message.
    264         int msgStart = findSubMessage(msgPattern, 0, keyword);
    265         if (!msgPattern.jdkAposMode()) {
    266             int msgLimit = msgPattern.getLimitPartIndex(msgStart);
    267             return msgPattern.getPatternString().substring(msgPattern.getPart(msgStart).getLimit(),
    268                                                            msgPattern.getPatternIndex(msgLimit));
    269         }
    270         // JDK compatibility mode: Remove SKIP_SYNTAX.
    271         StringBuilder result = null;
    272         int prevIndex = msgPattern.getPart(msgStart).getLimit();
    273         for (int i = msgStart;;) {
    274             MessagePattern.Part part = msgPattern.getPart(++i);
    275             MessagePattern.Part.Type type = part.getType();
    276             int index = part.getIndex();
    277             if (type == MessagePattern.Part.Type.MSG_LIMIT) {
    278                 if (result == null) {
    279                     return pattern.substring(prevIndex, index);
    280                 } else {
    281                     return result.append(pattern, prevIndex, index).toString();
    282                 }
    283             } else if (type == MessagePattern.Part.Type.SKIP_SYNTAX) {
    284                 if (result == null) {
    285                     result = new StringBuilder();
    286                 }
    287                 result.append(pattern, prevIndex, index);
    288                 prevIndex = part.getLimit();
    289             } else if (type == MessagePattern.Part.Type.ARG_START) {
    290                 if (result == null) {
    291                     result = new StringBuilder();
    292                 }
    293                 result.append(pattern, prevIndex, index);
    294                 prevIndex = index;
    295                 i = msgPattern.getLimitPartIndex(i);
    296                 index = msgPattern.getPart(i).getLimit();
    297                 MessagePattern.appendReducedApostrophes(pattern, prevIndex, index, result);
    298                 prevIndex = index;
    299             }
    300         }
    301     }
    302 
    303     /**
    304      * Selects the phrase for the given keyword.
    305      * and appends the formatted message to the given <code>StringBuffer</code>.
    306      * @param keyword a phrase selection keyword.
    307      * @param toAppendTo the selected phrase will be appended to this
    308      *        <code>StringBuffer</code>.
    309      * @param pos will be ignored by this method.
    310      * @throws IllegalArgumentException when the given keyword is not a String
    311      *         or not a "pattern identifier"
    312      * @return the string buffer passed in as toAppendTo, with formatted text
    313      *         appended.
    314      * @stable ICU 4.4
    315      */
    316     public StringBuffer format(Object keyword, StringBuffer toAppendTo,
    317             FieldPosition pos) {
    318         if (keyword instanceof String) {
    319             toAppendTo.append(format( (String)keyword));
    320         }else{
    321             throw new IllegalArgumentException("'" + keyword + "' is not a String");
    322         }
    323         return toAppendTo;
    324     }
    325 
    326     /**
    327      * This method is not supported by <code>SelectFormat</code>.
    328      * @param source the string to be parsed.
    329      * @param pos defines the position where parsing is to begin,
    330      * and upon return, the position where parsing left off.  If the position
    331      * has not changed upon return, then parsing failed.
    332      * @return nothing because this method is not supported.
    333      * @throws UnsupportedOperationException thrown always.
    334      * @stable ICU 4.4
    335      */
    336     public Object parseObject(String source, ParsePosition pos) {
    337         throw new UnsupportedOperationException();
    338     }
    339 
    340     /**
    341      * {@inheritDoc}
    342      * @stable ICU 4.4
    343      */
    344     @Override
    345     public boolean equals(Object obj) {
    346         if(this == obj) {
    347             return true;
    348         }
    349         if(obj == null || getClass() != obj.getClass()) {
    350             return false;
    351         }
    352         SelectFormat sf = (SelectFormat) obj;
    353         return msgPattern == null ? sf.msgPattern == null : msgPattern.equals(sf.msgPattern);
    354     }
    355 
    356     /**
    357      * {@inheritDoc}
    358      * @stable ICU 4.4
    359      */
    360     @Override
    361     public int hashCode() {
    362         if (pattern != null) {
    363             return pattern.hashCode();
    364         }
    365         return 0;
    366     }
    367 
    368     /**
    369      * {@inheritDoc}
    370      * @stable ICU 4.4
    371      */
    372     @Override
    373     public String toString() {
    374         return "pattern='" + pattern + "'";
    375     }
    376 
    377     private void readObject(ObjectInputStream in)
    378         throws IOException, ClassNotFoundException {
    379         in.defaultReadObject();
    380         if (pattern != null) {
    381             applyPattern(pattern);
    382         }
    383     }
    384 }
    385