com.google.common.base
Class CharMatcher

java.lang.Object
  extended by com.google.common.base.CharMatcher
All Implemented Interfaces:
Predicate<Character>

public abstract class CharMatcher
extends Object
implements Predicate<Character>

Determines a true or false value for any Java char value, just as Predicate does for any Object. Also offers basic text processing methods based on this function. Implementations are strongly encouraged to be side-effect-free and immutable.

Throughout the documentation of this class, the phrase "matching character" is used to mean "any character c for which this.matches(c) returns true".

Note: This class deals only with char values; it does not understand supplementary Unicode code points in the range 0x10000 to 0x10FFFF. Such logical characters are encoded into a String using surrogate pairs, and a CharMatcher treats these just as two separate characters.

Since:
2009.09.15 tentative
Author:
Kevin Bourrillion

Nested Class Summary
protected static class CharMatcher.LookupTable
          A bit array with one bit per char value, used by precomputed().
 
Field Summary
static CharMatcher ANY
          Matches any character.
static CharMatcher ASCII
          Determines whether a character is ASCII, meaning that its code point is less than 128.
static CharMatcher BREAKING_WHITESPACE
          Determines whether a character is a breaking whitespace (that is, a whitespace which can be interpreted as a break between words for formatting purposes).
static CharMatcher DIGIT
          Determines whether a character is a digit according to Unicode.
static CharMatcher INVISIBLE
          Determines whether a character is invisible; that is, if its Unicode category is any of SPACE_SEPARATOR, LINE_SEPARATOR, PARAGRAPH_SEPARATOR, CONTROL, FORMAT, SURROGATE, and PRIVATE_USE according to ICU4J.
static CharMatcher JAVA_DIGIT
          Determines whether a character is a digit according to Java's definition.
static CharMatcher JAVA_ISO_CONTROL
          Determines whether a character is an ISO control character according to Character.isISOControl(char).
static CharMatcher JAVA_LETTER
          Determines whether a character is a letter according to Java's definition.
static CharMatcher JAVA_LETTER_OR_DIGIT
          Determines whether a character is a letter or digit according to Java's definition.
static CharMatcher JAVA_LOWER_CASE
          Determines whether a character is lower case according to Java's definition.
static CharMatcher JAVA_UPPER_CASE
          Determines whether a character is upper case according to Java's definition.
static CharMatcher JAVA_WHITESPACE
          Determines whether a character is whitespace according to Java's definition; it is usually preferable to use WHITESPACE.
static CharMatcher NONE
          Matches no characters.
static CharMatcher SINGLE_WIDTH
          Determines whether a character is single-width (not double-width).
static CharMatcher WHITESPACE
          Determines whether a character is whitespace according to the latest Unicode standard, as illustrated here.
 
Constructor Summary
CharMatcher()
           
 
Method Summary
 CharMatcher and(CharMatcher other)
          Returns a matcher that matches any character matched by both this matcher and other.
static CharMatcher anyOf(CharSequence sequence)
          Returns a char matcher that matches any character present in the given character sequence.
 boolean apply(Character character)
          Returns true if this matcher matches the given character.
 String collapseFrom(CharSequence sequence, char replacement)
          Returns a string copy of the input character sequence, with each group of consecutive characters that match this matcher replaced by a single replacement character.
 int countIn(CharSequence sequence)
          Returns the number of matching characters found in a character sequence.
static CharMatcher forPredicate(Predicate<? super Character> predicate)
          Returns a matcher with identical behavior to the given Character-based predicate, but which operates on primitive char instances instead.
 int indexIn(CharSequence sequence)
          Returns the index of the first matching character in a character sequence, or -1 if no matching character is present.
 int indexIn(CharSequence sequence, int start)
          Returns the index of the first matching character in a character sequence, starting from a given position, or -1 if no character matches after that position.
static CharMatcher inRange(char startInclusive, char endInclusive)
          Returns a char matcher that matches any character in a given range (both endpoints are inclusive).
static CharMatcher is(char match)
          Returns a char matcher that matches only one specified character.
static CharMatcher isNot(char match)
          Returns a char matcher that matches any character except the one specified.
 int lastIndexIn(CharSequence sequence)
          Returns the index of the last matching character in a character sequence, or -1 if no matching character is present.
abstract  boolean matches(char c)
          Determines a true or false value for the given character.
 boolean matchesAllOf(CharSequence sequence)
          Returns true if a character sequence contains only matching characters.
 boolean matchesNoneOf(CharSequence sequence)
          Returns true if a character sequence contains no matching characters.
 CharMatcher negate()
          Returns a matcher that matches any character not matched by this matcher.
static CharMatcher noneOf(CharSequence sequence)
          Returns a char matcher that matches any character not present in the given character sequence.
 CharMatcher or(CharMatcher other)
          Returns a matcher that matches any character matched by either this matcher or other.
 CharMatcher precomputed()
          Returns a char matcher functionally equivalent to this one, but which may be faster to query than the original; your mileage may vary.
 String removeFrom(CharSequence sequence)
          Returns a string containing all non-matching characters of a character sequence, in order.
 String replaceFrom(CharSequence sequence, char replacement)
          Returns a string copy of the input character sequence, with each character that matches this matcher replaced by a given replacement character.
 String replaceFrom(CharSequence sequence, CharSequence replacement)
          Returns a string copy of the input character sequence, with each character that matches this matcher replaced by a given replacement sequence.
 String retainFrom(CharSequence sequence)
          Returns a string containing all matching characters of a character sequence, in order.
protected  void setBits(CharMatcher.LookupTable table)
          For use by implementors; sets the bit corresponding to each character ('\0' to '\uFFFF') that matches this matcher in the given bit array, leaving all other bits untouched.
 String trimAndCollapseFrom(CharSequence sequence, char replacement)
          Collapses groups of matching characters exactly as collapseFrom(java.lang.CharSequence, char) does, except that groups of matching characters at the start or end of the sequence are removed without replacement.
 String trimFrom(CharSequence sequence)
          Returns a substring of the input character sequence that omits all characters this matcher matches from the beginning and from the end of the string.
 String trimLeadingFrom(CharSequence sequence)
          Returns a substring of the input character sequence that omits all characters this matcher matches from the beginning of the string.
 String trimTrailingFrom(CharSequence sequence)
          Returns a substring of the input character sequence that omits all characters this matcher matches from the end of the string.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface com.google.common.base.Predicate
equals
 

Field Detail

WHITESPACE

public static final CharMatcher WHITESPACE
Determines whether a character is whitespace according to the latest Unicode standard, as illustrated here. This is not the same definition used by other Java APIs. See a comparison of several definitions of "whitespace" at (TODO).

Note: as the Unicode definition evolves, we will modify this constant to keep it up to date.


BREAKING_WHITESPACE

public static final CharMatcher BREAKING_WHITESPACE
Determines whether a character is a breaking whitespace (that is, a whitespace which can be interpreted as a break between words for formatting purposes). See WHITESPACE for a discussion of that term.

Since:
2010.01.04 tentative

ASCII

public static final CharMatcher ASCII
Determines whether a character is ASCII, meaning that its code point is less than 128.


DIGIT

public static final CharMatcher DIGIT
Determines whether a character is a digit according to Unicode.


JAVA_WHITESPACE

public static final CharMatcher JAVA_WHITESPACE
Determines whether a character is whitespace according to Java's definition; it is usually preferable to use WHITESPACE. See a comparison of several definitions of "whitespace" at go/white+space.


JAVA_DIGIT

public static final CharMatcher JAVA_DIGIT
Determines whether a character is a digit according to Java's definition. If you only care to match ASCII digits, you can use inRange('0', '9').


JAVA_LETTER

public static final CharMatcher JAVA_LETTER
Determines whether a character is a letter according to Java's definition. If you only care to match letters of the Latin alphabet, you can use inRange('a', 'z').or(inRange('A', 'Z')).


JAVA_LETTER_OR_DIGIT

public static final CharMatcher JAVA_LETTER_OR_DIGIT
Determines whether a character is a letter or digit according to Java's definition.


JAVA_UPPER_CASE

public static final CharMatcher JAVA_UPPER_CASE
Determines whether a character is upper case according to Java's definition.


JAVA_LOWER_CASE

public static final CharMatcher JAVA_LOWER_CASE
Determines whether a character is lower case according to Java's definition.


JAVA_ISO_CONTROL

public static final CharMatcher JAVA_ISO_CONTROL
Determines whether a character is an ISO control character according to Character.isISOControl(char).


INVISIBLE

public static final CharMatcher INVISIBLE
Determines whether a character is invisible; that is, if its Unicode category is any of SPACE_SEPARATOR, LINE_SEPARATOR, PARAGRAPH_SEPARATOR, CONTROL, FORMAT, SURROGATE, and PRIVATE_USE according to ICU4J.


SINGLE_WIDTH

public static final CharMatcher SINGLE_WIDTH
Determines whether a character is single-width (not double-width). When in doubt, this matcher errs on the side of returning false (that is, it tends to assume a character is double-width). Note: as the reference file evolves, we will modify this constant to keep it up to date.


ANY

public static final CharMatcher ANY
Matches any character.


NONE

public static final CharMatcher NONE
Matches no characters.

Constructor Detail

CharMatcher

public CharMatcher()
Method Detail

is

public static CharMatcher is(char match)
Returns a char matcher that matches only one specified character.


isNot

public static CharMatcher isNot(char match)
Returns a char matcher that matches any character except the one specified.

To negate another CharMatcher, use negate().


anyOf

public static CharMatcher anyOf(CharSequence sequence)
Returns a char matcher that matches any character present in the given character sequence.


noneOf

public static CharMatcher noneOf(CharSequence sequence)
Returns a char matcher that matches any character not present in the given character sequence.


inRange

public static CharMatcher inRange(char startInclusive,
                                  char endInclusive)
Returns a char matcher that matches any character in a given range (both endpoints are inclusive). For example, to match any lowercase letter of the English alphabet, use CharMatcher.inRange('a', 'z').

Throws:
IllegalArgumentException - if endInclusive < startInclusive

forPredicate

public static CharMatcher forPredicate(Predicate<? super Character> predicate)
Returns a matcher with identical behavior to the given Character-based predicate, but which operates on primitive char instances instead.


matches

public abstract boolean matches(char c)
Determines a true or false value for the given character.


negate

public CharMatcher negate()
Returns a matcher that matches any character not matched by this matcher.


and

public CharMatcher and(CharMatcher other)
Returns a matcher that matches any character matched by both this matcher and other.


or

public CharMatcher or(CharMatcher other)
Returns a matcher that matches any character matched by either this matcher or other.


precomputed

public CharMatcher precomputed()
Returns a char matcher functionally equivalent to this one, but which may be faster to query than the original; your mileage may vary. Precomputation takes time and is likely to be worthwhile only if the precomputed matcher is queried many thousands of times.

This method has no effect (returns this) when called in GWT: it's unclear whether a precomputed matcher is faster, but it certainly consumes more memory, which doesn't seem like a worthwhile tradeoff in a browser.


setBits

protected void setBits(CharMatcher.LookupTable table)
For use by implementors; sets the bit corresponding to each character ('\0' to '\uFFFF') that matches this matcher in the given bit array, leaving all other bits untouched.

The default implementation loops over every possible character value, invoking matches(char) for each one.


matchesAllOf

public boolean matchesAllOf(CharSequence sequence)
Returns true if a character sequence contains only matching characters.

The default implementation iterates over the sequence, invoking matches(char) for each character, until this returns false or the end is reached.

Parameters:
sequence - the character sequence to examine, possibly empty
Returns:
true if this matcher matches every character in the sequence, including when the sequence is empty

matchesNoneOf

public boolean matchesNoneOf(CharSequence sequence)
Returns true if a character sequence contains no matching characters.

The default implementation iterates over the sequence, invoking matches(char) for each character, until this returns false or the end is reached.

Parameters:
sequence - the character sequence to examine, possibly empty
Returns:
true if this matcher matches every character in the sequence, including when the sequence is empty

indexIn

public int indexIn(CharSequence sequence)
Returns the index of the first matching character in a character sequence, or -1 if no matching character is present.

The default implementation iterates over the sequence in forward order calling matches(char) for each character.

Parameters:
sequence - the character sequence to examine from the beginning
Returns:
an index, or -1 if no character matches

indexIn

public int indexIn(CharSequence sequence,
                   int start)
Returns the index of the first matching character in a character sequence, starting from a given position, or -1 if no character matches after that position.

The default implementation iterates over the sequence in forward order, beginning at start, calling matches(char) for each character.

Parameters:
sequence - the character sequence to examine
start - the first index to examine; must be nonnegative and no greater than sequence.length()
Returns:
the index of the first matching character, guaranteed to be no less than start, or -1 if no character matches
Throws:
IndexOutOfBoundsException - if start is negative or greater than sequence.length()

lastIndexIn

public int lastIndexIn(CharSequence sequence)
Returns the index of the last matching character in a character sequence, or -1 if no matching character is present.

The default implementation iterates over the sequence in reverse order calling matches(char) for each character.

Parameters:
sequence - the character sequence to examine from the end
Returns:
an index, or -1 if no character matches

countIn

public int countIn(CharSequence sequence)
Returns the number of matching characters found in a character sequence.


removeFrom

public String removeFrom(CharSequence sequence)
Returns a string containing all non-matching characters of a character sequence, in order. For example:
   CharMatcher.is('a').removeFrom("bazaar")
... returns "bzr".


retainFrom

public String retainFrom(CharSequence sequence)
Returns a string containing all matching characters of a character sequence, in order. For example:
   CharMatcher.is('a').retainFrom("bazaar")
... returns "aaa".


replaceFrom

public String replaceFrom(CharSequence sequence,
                          char replacement)
Returns a string copy of the input character sequence, with each character that matches this matcher replaced by a given replacement character. For example:
   CharMatcher.is('a').replaceFrom("radar", 'o')
... returns "rodor".

The default implementation uses indexIn(CharSequence) to find the first matching character, then iterates the remainder of the sequence calling matches(char) for each character.

Parameters:
sequence - the character sequence to replace matching characters in
replacement - the character to append to the result string in place of each matching character in sequence
Returns:
the new string

replaceFrom

public String replaceFrom(CharSequence sequence,
                          CharSequence replacement)
Returns a string copy of the input character sequence, with each character that matches this matcher replaced by a given replacement sequence. For example:
   CharMatcher.is('a').replaceFrom("yaha", "oo")
... returns "yoohoo".

Note: If the replacement is a fixed string with only one character, you are better off calling replaceFrom(CharSequence, char) directly.

Parameters:
sequence - the character sequence to replace matching characters in
replacement - the characters to append to the result string in place of each matching character in sequence
Returns:
the new string

trimFrom

public String trimFrom(CharSequence sequence)
Returns a substring of the input character sequence that omits all characters this matcher matches from the beginning and from the end of the string. For example:
 CharMatcher.anyOf("ab").trimFrom("abacatbab")
... returns "cat".

Note that

   CharMatcher.inRange('\0', ' ').trimFrom(str)
... is equivalent to String.trim().


trimLeadingFrom

public String trimLeadingFrom(CharSequence sequence)
Returns a substring of the input character sequence that omits all characters this matcher matches from the beginning of the string. For example:
 CharMatcher.anyOf("ab").trimLeadingFrom("abacatbab")
... returns "catbab".


trimTrailingFrom

public String trimTrailingFrom(CharSequence sequence)
Returns a substring of the input character sequence that omits all characters this matcher matches from the end of the string. For example:
 CharMatcher.anyOf("ab").trimTrailingFrom("abacatbab")
... returns "abacat".


collapseFrom

public String collapseFrom(CharSequence sequence,
                           char replacement)
Returns a string copy of the input character sequence, with each group of consecutive characters that match this matcher replaced by a single replacement character. For example:
   CharMatcher.anyOf("eko").collapseFrom("bookkeeper", '-')
... returns "b-p-r".

The default implementation uses indexIn(CharSequence) to find the first matching character, then iterates the remainder of the sequence calling matches(char) for each character.

Parameters:
sequence - the character sequence to replace matching groups of characters in
replacement - the character to append to the result string in place of each group of matching characters in sequence
Returns:
the new string

trimAndCollapseFrom

public String trimAndCollapseFrom(CharSequence sequence,
                                  char replacement)
Collapses groups of matching characters exactly as collapseFrom(java.lang.CharSequence, char) does, except that groups of matching characters at the start or end of the sequence are removed without replacement.


apply

public boolean apply(Character character)
Returns true if this matcher matches the given character.

Specified by:
apply in interface Predicate<Character>
Parameters:
character - the input that the predicate should act on
Returns:
the value of this predicate when applied to the input t
Throws:
NullPointerException - if character is null