Package | Description |
---|---|
org.apache.lucene.analysis.icu.segmentation |
Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.
|
Class and Description |
---|
ICUTokenizer
Breaks text into words according to UAX #29: Unicode Text Segmentation
(http://www.unicode.org/reports/tr29/)
|
ICUTokenizerConfig
Class that allows for tailored Unicode Text Segmentation on
a per-writing system basis.
|
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.