public final class HMMChineseTokenizerFactory extends TokenizerFactory
HMMChineseTokenizer
Note: this class will currently emit tokens for punctuation. So you should either add
a WordDelimiterFilter after to remove these (with concatenate off), or use the
SmartChinese stoplist with a StopFilterFactory via:
words="org/apache/lucene/analysis/cn/smart/stopwords.txt"
Modifier and Type | Field and Description |
---|---|
static String |
NAME
SPI name
|
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
Constructor and Description |
---|
HMMChineseTokenizerFactory(Map<String,String> args)
Creates a new HMMChineseTokenizerFactory
|
Modifier and Type | Method and Description |
---|---|
Tokenizer |
create(AttributeFactory factory) |
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
public static final String NAME
public Tokenizer create(AttributeFactory factory)
create
in class TokenizerFactory
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.