Home | History | Annotate | Download | only in text
      1 /*
      2  *******************************************************************************
      3  * Copyright (C) 1996-2004, International Business Machines Corporation and    *
      4  * others. All Rights Reserved.                                                *
      5  *******************************************************************************
      6  */
      7 package com.ibm.icu.text;
      8 
      9 /**
     10  * Interface that defines an API for forward-only iteration
     11  * on text objects.
     12  * This is a minimal interface for iteration without random access
     13  * or backwards iteration. It is especially useful for wrapping
     14  * streams with converters into an object for collation or
     15  * normalization.
     16  *
     17  * <p>Characters can be accessed in two ways: as code units or as
     18  * code points.
     19  * Unicode code points are 21-bit integers and are the scalar values
     20  * of Unicode characters. ICU uses the type <code>int</code> for them.
     21  * Unicode code units are the storage units of a given
     22  * Unicode/UCS Transformation Format (a character encoding scheme).
     23  * With UTF-16, all code points can be represented with either one
     24  * or two code units ("surrogates").
     25  * String storage is typically based on code units, while properties
     26  * of characters are typically determined using code point values.
     27  * Some processes may be designed to work with sequences of code units,
     28  * or it may be known that all characters that are important to an
     29  * algorithm can be represented with single code units.
     30  * Other processes will need to use the code point access functions.</p>
     31  *
     32  * <p>ForwardCharacterIterator provides next() to access
     33  * a code unit and advance an internal position into the text object,
     34  * similar to a <code>return text[position++]</code>.<br>
     35  * It provides nextCodePoint() to access a code point and advance an internal
     36  * position.</p>
     37  *
     38  * <p>nextCodePoint() assumes that the current position is that of
     39  * the beginning of a code point, i.e., of its first code unit.
     40  * After nextCodePoint(), this will be true again.
     41  * In general, access to code units and code points in the same
     42  * iteration loop should not be mixed. In UTF-16, if the current position
     43  * is on a second code unit (Low Surrogate), then only that code unit
     44  * is returned even by nextCodePoint().</p>
     45  *
     46  * Usage:
     47  * <code>
     48  *  public void function1(UForwardCharacterIterator it) {
     49  *     int c;
     50  *     while((c=it.next())!=UForwardCharacterIterator.DONE) {
     51  *         // use c
     52  *      }
     53  *  }
     54  * </code>
     55  * </p>
     56  * @stable ICU 2.4
     57  *
     58  */
     59 
     60 public interface UForwardCharacterIterator {
     61 
     62     /**
     63      * Indicator that we have reached the ends of the UTF16 text.
     64      * @stable ICU 2.4
     65      */
     66     public static final int DONE = -1;
     67     /**
     68      * Returns the UTF16 code unit at index, and increments to the next
     69      * code unit (post-increment semantics).  If index is out of
     70      * range, DONE is returned, and the iterator is reset to the limit
     71      * of the text.
     72      * @return the next UTF16 code unit, or DONE if the index is at the limit
     73      *         of the text.
     74      * @stable ICU 2.4
     75      */
     76     public int next();
     77 
     78     /**
     79      * Returns the code point at index, and increments to the next code
     80      * point (post-increment semantics).  If index does not point to a
     81      * valid surrogate pair, the behavior is the same as
     82      * <code>next()<code>.  Otherwise the iterator is incremented past
     83      * the surrogate pair, and the code point represented by the pair
     84      * is returned.
     85      * @return the next codepoint in text, or DONE if the index is at
     86      *         the limit of the text.
     87      * @stable ICU 2.4
     88      */
     89     public int nextCodePoint();
     90 
     91 }
     92