org.apache.lucene.analysis
Class CharTokenizer
public abstract
class
CharTokenizer
extends Tokenizer
An abstract base class for simple, character-oriented tokenizers.
Method Summary |
protected abstract boolean | isTokenChar(char c) Returns true iff a character should be included in a token. |
Token | next() Returns the next token in the stream, or null at EOS. |
protected char | normalize(char c) Called on each token character to normalize it before it is added to the
token. |
public CharTokenizer(Reader input)
protected abstract boolean isTokenChar(char c)
Returns true iff a character should be included in a token. This
tokenizer generates as tokens adjacent sequences of characters which
satisfy this predicate. Characters for which this is false are used to
define token boundaries and are not included in tokens.
public final
Token next()
Returns the next token in the stream, or null at EOS.
protected char normalize(char c)
Called on each token character to normalize it before it is added to the
token. The default implementation does nothing. Subclasses may use this
to, e.g., lowercase tokens.
Copyright © 2000-2008 Apache Software Foundation. All Rights Reserved.