1 /* 2 ******************************************************************************* 3 * Copyright (C) 1996-2004, International Business Machines Corporation and * 4 * others. All Rights Reserved. * 5 ******************************************************************************* 6 */ 7 package com.ibm.icu.text; 8 9 /** 10 * Interface that defines an API for forward-only iteration 11 * on text objects. 12 * This is a minimal interface for iteration without random access 13 * or backwards iteration. It is especially useful for wrapping 14 * streams with converters into an object for collation or 15 * normalization. 16 * 17 * <p>Characters can be accessed in two ways: as code units or as 18 * code points. 19 * Unicode code points are 21-bit integers and are the scalar values 20 * of Unicode characters. ICU uses the type <code>int</code> for them. 21 * Unicode code units are the storage units of a given 22 * Unicode/UCS Transformation Format (a character encoding scheme). 23 * With UTF-16, all code points can be represented with either one 24 * or two code units ("surrogates"). 25 * String storage is typically based on code units, while properties 26 * of characters are typically determined using code point values. 27 * Some processes may be designed to work with sequences of code units, 28 * or it may be known that all characters that are important to an 29 * algorithm can be represented with single code units. 30 * Other processes will need to use the code point access functions.</p> 31 * 32 * <p>ForwardCharacterIterator provides next() to access 33 * a code unit and advance an internal position into the text object, 34 * similar to a <code>return text[position++]</code>.<br> 35 * It provides nextCodePoint() to access a code point and advance an internal 36 * position.</p> 37 * 38 * <p>nextCodePoint() assumes that the current position is that of 39 * the beginning of a code point, i.e., of its first code unit. 40 * After nextCodePoint(), this will be true again. 41 * In general, access to code units and code points in the same 42 * iteration loop should not be mixed. In UTF-16, if the current position 43 * is on a second code unit (Low Surrogate), then only that code unit 44 * is returned even by nextCodePoint().</p> 45 * 46 * Usage: 47 * <code> 48 * public void function1(UForwardCharacterIterator it) { 49 * int c; 50 * while((c=it.next())!=UForwardCharacterIterator.DONE) { 51 * // use c 52 * } 53 * } 54 * </code> 55 * </p> 56 * @stable ICU 2.4 57 * 58 */ 59 60 public interface UForwardCharacterIterator { 61 62 /** 63 * Indicator that we have reached the ends of the UTF16 text. 64 * @stable ICU 2.4 65 */ 66 public static final int DONE = -1; 67 /** 68 * Returns the UTF16 code unit at index, and increments to the next 69 * code unit (post-increment semantics). If index is out of 70 * range, DONE is returned, and the iterator is reset to the limit 71 * of the text. 72 * @return the next UTF16 code unit, or DONE if the index is at the limit 73 * of the text. 74 * @stable ICU 2.4 75 */ 76 public int next(); 77 78 /** 79 * Returns the code point at index, and increments to the next code 80 * point (post-increment semantics). If index does not point to a 81 * valid surrogate pair, the behavior is the same as 82 * <code>next()<code>. Otherwise the iterator is incremented past 83 * the surrogate pair, and the code point represented by the pair 84 * is returned. 85 * @return the next codepoint in text, or DONE if the index is at 86 * the limit of the text. 87 * @stable ICU 2.4 88 */ 89 public int nextCodePoint(); 90 91 } 92