public final class MockAnalyzer extends Analyzer
This analyzer is a replacement for Whitespace/Simple/KeywordAnalyzers for unit tests. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, it's a great idea to test it with this analyzer instead. MockAnalyzer has the following behavior:
MockTokenizer
are turned on for extra
checks that the consumer is consuming properly. These checks can be disabled
with setEnableChecks(boolean)
.
MockTokenizer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
Constructor and Description |
---|
MockAnalyzer(Random random)
Create a Whitespace-lowercasing analyzer with no stopwords removal.
|
MockAnalyzer(Random random,
CharacterRunAutomaton runAutomaton,
boolean lowerCase)
|
MockAnalyzer(Random random,
CharacterRunAutomaton runAutomaton,
boolean lowerCase,
CharacterRunAutomaton filter)
Creates a new MockAnalyzer.
|
Modifier and Type | Method and Description |
---|---|
Analyzer.TokenStreamComponents |
createComponents(String fieldName) |
int |
getOffsetGap(String fieldName)
Get the offset gap between tokens in fields if several fields with the same name were added.
|
int |
getPositionIncrementGap(String fieldName) |
protected TokenStream |
normalize(String fieldName,
TokenStream in) |
void |
setEnableChecks(boolean enableChecks)
Toggle consumer workflow checking: if your test consumes tokenstreams normally you
should leave this enabled.
|
void |
setMaxTokenLength(int length)
Toggle maxTokenLength for MockTokenizer
|
void |
setOffsetGap(int offsetGap)
Set a new offset gap which will then be added to the offset when several fields with the same name are indexed
|
void |
setPositionIncrementGap(int positionIncrementGap) |
attributeFactory, close, getReuseStrategy, getVersion, initReader, initReaderForNormalization, normalize, setVersion, tokenStream, tokenStream
public MockAnalyzer(Random random, CharacterRunAutomaton runAutomaton, boolean lowerCase, CharacterRunAutomaton filter)
random
- Random for payloads behaviorrunAutomaton
- DFA describing how tokenization should happen (e.g. [a-zA-Z]+)lowerCase
- true if the tokenizer should lowercase termsfilter
- DFA describing how terms should be filtered (set of stopwords, etc)public MockAnalyzer(Random random, CharacterRunAutomaton runAutomaton, boolean lowerCase)
public MockAnalyzer(Random random)
Calls MockAnalyzer(random, MockTokenizer.WHITESPACE, true, MockTokenFilter.EMPTY_STOPSET, false
).
public Analyzer.TokenStreamComponents createComponents(String fieldName)
createComponents
in class Analyzer
protected TokenStream normalize(String fieldName, TokenStream in)
public void setPositionIncrementGap(int positionIncrementGap)
public int getPositionIncrementGap(String fieldName)
getPositionIncrementGap
in class Analyzer
public void setOffsetGap(int offsetGap)
offsetGap
- The offset gap that should be used.public int getOffsetGap(String fieldName)
getOffsetGap
in class Analyzer
fieldName
- Currently not used, the same offset gap is returned for each field.public void setEnableChecks(boolean enableChecks)
public void setMaxTokenLength(int length)
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.