public class HMMChineseTokenizer extends SegmentingTokenizerBase
The analyzer uses probabilistic knowledge to find the optimal word segmentation for Simplified Chinese text. The text is first broken into sentences, then each sentence is segmented into words.
AttributeSource.State
buffer, BUFFERMAX, offset
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
HMMChineseTokenizer()
Creates a new HMMChineseTokenizer
|
HMMChineseTokenizer(AttributeFactory factory)
Creates a new HMMChineseTokenizer, supplying the AttributeFactory
|
Modifier and Type | Method and Description |
---|---|
protected boolean |
incrementWord() |
void |
reset() |
protected void |
setNextSentence(int sentenceStart,
int sentenceEnd) |
end, incrementToken, isSafeEnd
close, correctOffset, setReader
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
public HMMChineseTokenizer()
public HMMChineseTokenizer(AttributeFactory factory)
protected void setNextSentence(int sentenceStart, int sentenceEnd)
setNextSentence
in class SegmentingTokenizerBase
protected boolean incrementWord()
incrementWord
in class SegmentingTokenizerBase
public void reset() throws IOException
reset
in class SegmentingTokenizerBase
IOException
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.