public class ICUNormalizer2Filter extends TokenFilter
Normalizer2
With this filter, you can normalize text in the following ways:
If you use the defaults, this filter is a simple way to standardize Unicode text in a language-independent way for search:
Normalizer2
,
FilteredNormalizer2
AttributeSource.State
input
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
ICUNormalizer2Filter(TokenStream input)
Create a new Normalizer2Filter that combines NFKC normalization, Case
Folding, and removes Default Ignorables (NFKC_Casefold)
|
ICUNormalizer2Filter(TokenStream input,
com.ibm.icu.text.Normalizer2 normalizer)
Create a new Normalizer2Filter with the specified Normalizer2
|
Modifier and Type | Method and Description |
---|---|
boolean |
incrementToken() |
close, end, reset
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
public ICUNormalizer2Filter(TokenStream input)
public ICUNormalizer2Filter(TokenStream input, com.ibm.icu.text.Normalizer2 normalizer)
input
- streamnormalizer
- normalizer to usepublic final boolean incrementToken() throws IOException
incrementToken
in class TokenStream
IOException
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.