HMMChineseTokenizer (Lucene 8.9.0 API)乐学网一站式学习平台

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.Tokenizer
    - - org.apache.lucene.analysis.util.SegmentingTokenizerBase
      - org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer

All Implemented Interfaces:

Closeable, AutoCloseable
```
public class HMMChineseTokenizer
extends SegmentingTokenizerBase
```
Tokenizer for Chinese or mixed Chinese-English text.
The analyzer uses probabilistic knowledge to find the optimal word segmentation for Simplified Chinese text. The text is first broken into sentences, then each sentence is segmented into words.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
  AttributeSource.State

Field Summary
- Fields inherited from class org.apache.lucene.analysis.util.SegmentingTokenizerBase
  buffer, BUFFERMAX, offset
- Fields inherited from class org.apache.lucene.analysis.Tokenizer
  input
- Fields inherited from class org.apache.lucene.analysis.TokenStream
  DEFAULT_TOKEN_ATTRIBUTE_FACTORY

Constructor Summary

Constructors
Constructor and Description
`HMMChineseTokenizer()` Creates a new HMMChineseTokenizer
`HMMChineseTokenizer(AttributeFactory factory)` Creates a new HMMChineseTokenizer, supplying the AttributeFactory

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected boolean`	`incrementWord()`
`void`	`reset()`
`protected void`	`setNextSentence(int sentenceStart, int sentenceEnd)`

Methods inherited from class org.apache.lucene.analysis.util.SegmentingTokenizerBase
end, incrementToken, isSafeEnd

Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffset, setReader

Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - HMMChineseTokenizer
```
public HMMChineseTokenizer()
```
    Creates a new HMMChineseTokenizer
  - HMMChineseTokenizer
```
public HMMChineseTokenizer(AttributeFactory factory)
```
    Creates a new HMMChineseTokenizer, supplying the AttributeFactory
- Method Detail
  - setNextSentence
```
protected void setNextSentence(int sentenceStart,
                               int sentenceEnd)
```
    Specified by:
    
    setNextSentence in class SegmentingTokenizerBase
  - incrementWord
```
protected boolean incrementWord()
```
    Specified by:
    
    incrementWord in class SegmentingTokenizerBase
  - reset
```
public void reset()
           throws IOException
```
    Overrides:
    
    reset in class SegmentingTokenizerBase
    
    Throws:
    
    IOException

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.