public class IntersectBlockReader extends BlockReader
TermsEnum
response to UniformSplitTerms.intersect(CompiledAutomaton, BytesRef)
,
intersecting the terms with an automaton.
By design of the UniformSplit block keys, it is less efficient than
org.apache.lucene.codecs.blocktree.IntersectTermsEnum
for FuzzyQuery
(-37%).
It is slightly slower for WildcardQuery
(-5%) and slightly faster for
PrefixQuery
(+5%).
Modifier and Type | Class and Description |
---|---|
protected class |
IntersectBlockReader.AutomatonNextTermCalculator
This is mostly a copy of AutomatonTermsEnum.
|
protected static class |
IntersectBlockReader.BlockIteration
Block iteration order.
|
TermsEnum.SeekStatus
Modifier and Type | Field and Description |
---|---|
protected Automaton |
automaton |
protected IntersectBlockReader.BlockIteration |
blockIteration
Block iteration order determined when scanning the terms in the current block.
|
protected BytesRef |
commonSuffix |
protected boolean |
finite |
protected int |
minTermLength |
protected IntersectBlockReader.AutomatonNextTermCalculator |
nextStringCalculator |
protected int |
NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD
Threshold that controls when to attempt to jump to a block away.
|
protected int |
numConsecutivelyRejectedTerms
Counter of the number of consecutively rejected terms.
|
protected int |
numMatchedBytes
Number of bytes accepted by the automaton when validating the current term.
|
protected ByteRunAutomaton |
runAutomaton |
protected BytesRef |
seekTerm
Set this when our current mode is seeking to this term.
|
protected int[] |
states
Automaton states reached when validating the current term, from 0 to
numMatchedBytes - 1. |
blockDecoder, blockFirstLineStart, blockHeader, blockHeaderReader, blockInput, blockLine, blockLineReader, blockReadBuffer, blockStartFP, dictionaryBrowser, dictionaryBrowserSupplier, fieldMetadata, forcedTerm, lineIndexInBlock, postingsReader, scratchBlockBytes, scratchBlockLine, scratchTermState, termState, termStateForced, termStateSerializer, termStatesReadBuffer
NULL_ACCOUNTABLE
Modifier | Constructor and Description |
---|---|
protected |
IntersectBlockReader(CompiledAutomaton compiled,
BytesRef startTerm,
IndexDictionary.BrowserSupplier dictionaryBrowserSupplier,
IndexInput blockInput,
PostingsReaderBase postingsReader,
FieldMetadata fieldMetadata,
BlockDecoder blockDecoder) |
Modifier and Type | Method and Description |
---|---|
protected boolean |
endsWithCommonSuffix(byte[] termBytes,
int termLength)
Indicates whether the given term ends with the automaton common suffix.
|
protected int |
getMinTermLength()
Computes the minimal length of the terms accepted by the automaton.
|
BytesRef |
next() |
protected boolean |
nextBlock()
Opens the next block.
|
protected BytesRef |
nextTermInBlockMatching()
Finds the next block line that matches (accepted by the automaton), or null when at end of block.
|
TermsEnum.SeekStatus |
seekCeil(BytesRef text) |
boolean |
seekExact(BytesRef text) |
void |
seekExact(BytesRef term,
TermState state)
Positions this
BlockReader without re-seeking the term dictionary. |
void |
seekExact(long ord)
Not supported.
|
protected boolean |
seekFirstBlock() |
clearTermState, compareToMiddleAndJump, createBlockHeaderSerializer, createBlockLineSerializer, createDeltaBaseTermStateSerializer, decodeBlockBytesIfNeeded, docFreq, getOrCreateDictionaryBrowser, impacts, initializeBlockReadLazily, initializeHeader, isBeyondLastTerm, isCurrentTerm, newCorruptIndexException, nextTerm, ord, postings, ramBytesUsed, readHeader, readLineInBlock, readTermState, readTermStateIfNotRead, seekInBlock, seekInBlock, term, termState, totalTermFreq
attributes
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getChildResources
protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD
This counter is 0 when entering a block. It is incremented each time a term is rejected by the automaton.
When the counter is greater than or equal to this threshold, then we compute the next term accepted by
the automaton, with IntersectBlockReader.AutomatonNextTermCalculator
, and we jump to a block away if the next term
accepted is greater than the immediate next term in the block.
A low value, for example 1, improves the performance of automatons requiring many jumps, for example
FuzzyQuery
and most WildcardQuery
.
A higher value improves the performance of automatons with less or no jump, for example
PrefixQuery
.
A threshold of 4 seems to be a good balance.
protected final Automaton automaton
protected final ByteRunAutomaton runAutomaton
protected final boolean finite
protected final BytesRef commonSuffix
protected final int minTermLength
protected final IntersectBlockReader.AutomatonNextTermCalculator nextStringCalculator
protected BytesRef seekTerm
protected int numMatchedBytes
protected int[] states
numMatchedBytes
- 1.protected IntersectBlockReader.BlockIteration blockIteration
protected int numConsecutivelyRejectedTerms
NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD
, this may trigger a jump to a block away.protected IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm, IndexDictionary.BrowserSupplier dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase postingsReader, FieldMetadata fieldMetadata, BlockDecoder blockDecoder) throws IOException
IOException
protected int getMinTermLength()
public BytesRef next() throws IOException
next
in interface BytesRefIterator
next
in class BlockReader
IOException
protected boolean seekFirstBlock() throws IOException
IOException
protected BytesRef nextTermInBlockMatching() throws IOException
IOException
protected boolean endsWithCommonSuffix(byte[] termBytes, int termLength)
protected boolean nextBlock() throws IOException
blockIteration
order, it may be the very next block, or a block away that may contain
seekTerm
.IOException
public boolean seekExact(BytesRef text)
seekExact
in class BlockReader
public void seekExact(long ord)
BlockReader
seekExact
in class BlockReader
public void seekExact(BytesRef term, TermState state)
BlockReader
BlockReader
without re-seeking the term dictionary.
The block containing the term is not read by this method. It will be read
lazily only if needed, for example if BlockReader.next()
is called.
Calling BlockReader.postings(org.apache.lucene.index.PostingsEnum, int)
after this method does require the block to be read.
seekExact
in class BlockReader
public TermsEnum.SeekStatus seekCeil(BytesRef text)
seekCeil
in class BlockReader
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.