public final class Lucene80DocValuesFormat extends DocValuesFormat
Documents that have a value for the field are encoded in a way that it is always possible to know the ordinal of the current document in the set of documents that have a value. For instance, say the set of documents that have a value for the field is {1, 5, 6, 11}. When the iterator is on 6, it knows that this is the 3rd item of the set. This way, values can be stored densely and accessed based on their index at search time. If all documents in a segment have a value for the field, the index is the same as the doc ID, so this case is encoded implicitly and is very fast at query time. On the other hand if some documents are missing a value for the field then the set of documents that have a value is encoded into blocks. All doc IDs that share the same upper 16 bits are encoded into the same block with the following strategies:
shorts
while the upper
16 bits are given by the block ID.
ntz
operations while the index is computed by
accumulating the bit counts
of the visited longs.
Advancing >= 512 documents is performed by skipping to the start of the needed 512 document
sub-block and iterating to the specific document within that block. The index for the
sub-block that is skipped to is retrieved from a rank-table positioned beforethe bit set.
The rank-table holds the origo index numbers for all 512 documents sub-blocks, represented
as an unsigned short for each 128 blocks.
index sorting
.
Skipping blocks to arrive at a wanted document is either done on an iterative basis or by using the jump-table stored at the end of the chain of blocks. The jump-table holds the offset as well as the index for all blocks, packed in a single long per block.
Then the five per-document value types (Numeric,Binary,Sorted,SortedSet,SortedNumeric) are encoded using the following strategies:
DirectWriter
.
SmallFloat
),
a lookup table is written instead. Each per-document entry is instead the ordinal
to this table, and those ordinals are compressed with bitpacking (DirectWriter
).
Depending on calculated gains, the numbers might be split into blocks of 16384 values. In that case, a jump-table with block offsets is appended to the blocks for O(1) access to the needed block.
docID * length
).
Files:
Modifier and Type | Class and Description |
---|---|
static class |
Lucene80DocValuesFormat.Mode
Configuration option for doc values.
|
Modifier and Type | Field and Description |
---|---|
static String |
MODE_KEY
Attribute key for compression mode.
|
Constructor and Description |
---|
Lucene80DocValuesFormat()
Default constructor.
|
Lucene80DocValuesFormat(Lucene80DocValuesFormat.Mode mode)
Constructor
|
Modifier and Type | Method and Description |
---|---|
DocValuesConsumer |
fieldsConsumer(SegmentWriteState state)
Returns a
DocValuesConsumer to write docvalues to the
index. |
DocValuesProducer |
fieldsProducer(SegmentReadState state)
Returns a
DocValuesProducer to read docvalues from the index. |
availableDocValuesFormats, forName, getName, reloadDocValuesFormats, toString
public static final String MODE_KEY
public Lucene80DocValuesFormat()
public Lucene80DocValuesFormat(Lucene80DocValuesFormat.Mode mode)
public DocValuesConsumer fieldsConsumer(SegmentWriteState state) throws IOException
DocValuesFormat
DocValuesConsumer
to write docvalues to the
index.fieldsConsumer
in class DocValuesFormat
IOException
public DocValuesProducer fieldsProducer(SegmentReadState state) throws IOException
DocValuesFormat
DocValuesProducer
to read docvalues from the index.
NOTE: by the time this call returns, it must hold open any files it will need to use; else, those files may be deleted. Additionally, required files may be deleted during the execution of this call before there is a chance to open them. Under these circumstances an IOException should be thrown by the implementation. IOExceptions are expected and will automatically cause a retry of the segment opening logic with the newly revised segments.
fieldsProducer
in class DocValuesFormat
IOException
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.