public final class PatternCaptureGroupTokenFilter extends TokenFilter
For example, a pattern like:
"(https?://([a-zA-Z\-_0-9.]+))"
when matched against the string "http://www.foo.com/index" would return the tokens "https://www.foo.com" and "www.foo.com".
If none of the patterns match, or if preserveOriginal is true, the original token will be preserved.
Each pattern is matched as often as it can be, so the pattern
"(...)"
, when matched against "abcdefghi"
would
produce ["abc","def","ghi"]
A camelCaseFilter could be written as:
"([A-Z]{2,})",
"(?<![A-Z])([A-Z][a-z]+)",
"(?:^|\\b|(?<=[0-9_])|(?<=[A-Z]{2}))([a-z]+)",
"([0-9]+)"
plus if preserveOriginal
is true, it would also return
"camelCaseFilter"
AttributeSource.State
input
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
PatternCaptureGroupTokenFilter(TokenStream input,
boolean preserveOriginal,
Pattern... patterns) |
Modifier and Type | Method and Description |
---|---|
boolean |
incrementToken() |
void |
reset() |
close, end
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
public PatternCaptureGroupTokenFilter(TokenStream input, boolean preserveOriginal, Pattern... patterns)
input
- the input TokenStream
preserveOriginal
- set to true to return the original token even if one of the
patterns matchespatterns
- an array of Pattern
objects to match against each tokenpublic boolean incrementToken() throws IOException
incrementToken
in class TokenStream
IOException
public void reset() throws IOException
reset
in class TokenFilter
IOException
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.