org.apache.lucene.analysis.ru

Class RussianLetterTokenizer

public class RussianLetterTokenizer extends CharTokenizer

A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters in a given "russian charset". The problem with LeterTokenizer is that it uses Character.isLetter() method, which doesn't know how to detect letters in encodings like CP1252 and KOI8 (well-known problems with 0xD7 and 0xF7 chars)

Version: $Id: RussianLetterTokenizer.java 150998 2004-08-16 20:30:46Z dnaber $

Author: Boris Okner, b.okner@rogers.com

Constructor Summary
RussianLetterTokenizer(Reader in, char[] charset)
Method Summary
protected booleanisTokenChar(char c)
Collects only characters which satisfy {@link Character#isLetter(char)}.

Constructor Detail

RussianLetterTokenizer

public RussianLetterTokenizer(Reader in, char[] charset)

Method Detail

isTokenChar

protected boolean isTokenChar(char c)
Collects only characters which satisfy {@link Character#isLetter(char)}.
Copyright © 2000-2008 Apache Software Foundation. All Rights Reserved.