public class Lucene87StoredFieldsFormat extends StoredFieldsFormat
Principle
This StoredFieldsFormat
compresses blocks of documents in
order to improve the compression ratio compared to document-level
compression. It uses the LZ4
compression algorithm by default in 16KB blocks, which is fast to compress
and very fast to decompress data. Although the default compression method
that is used (BEST_SPEED
) focuses more on speed than on
compression ratio, it should provide interesting compression ratios
for redundant inputs (such as log files, HTML or plain text). For higher
compression, you can choose (BEST_COMPRESSION
),
which uses the DEFLATE
algorithm with 48kB blocks and shared dictionaries for a better ratio at the
expense of slower performance. These two options can be configured like this:
// the default: for high performance indexWriterConfig.setCodec(new Lucene87Codec(Mode.BEST_SPEED)); // instead for higher performance (but slower): // indexWriterConfig.setCodec(new Lucene87Codec(Mode.BEST_COMPRESSION));
File formats
Stored fields are represented by three files:
A fields data file (extension .fdt
). This file stores a compact
representation of documents in compressed blocks of 16KB or more. When
writing a segment, documents are appended to an in-memory byte[]
buffer. When its size reaches 16KB or more, some metadata about the documents
is flushed to disk, immediately followed by a compressed representation of
the buffer using the
LZ4
compression format.
Notes
StoredFieldVisitor
s which are only
interested in the first fields of a document to not have to decompress 10MB
of data if the document is 10MB, but only 16KB.A fields index file (extension .fdx
). This file stores two
monotonic arrays
, one for the first doc IDs of
each block of compressed documents, and another one for the corresponding
offsets on disk. At search time, the array containing doc IDs is
binary-searched in order to find the block that contains the expected doc ID,
and the associated offset on disk is retrieved from the second array.
A fields meta file (extension .fdm
). This file stores metadata
about the monotonic arrays stored in the index file.
Known limitations
This StoredFieldsFormat
does not support individual documents
larger than (231 - 214
) bytes.
Modifier and Type | Class and Description |
---|---|
static class |
Lucene87StoredFieldsFormat.Mode
Configuration option for stored fields.
|
Modifier and Type | Field and Description |
---|---|
static CompressionMode |
BEST_COMPRESSION_MODE
Compression mode for
Lucene87StoredFieldsFormat.Mode.BEST_COMPRESSION |
static CompressionMode |
BEST_SPEED_MODE
Compression mode for
Lucene87StoredFieldsFormat.Mode.BEST_SPEED |
static String |
MODE_KEY
Attribute key for compression mode.
|
Constructor and Description |
---|
Lucene87StoredFieldsFormat()
Stored fields format with default options
|
Lucene87StoredFieldsFormat(Lucene87StoredFieldsFormat.Mode mode)
Stored fields format with specified mode
|
Modifier and Type | Method and Description |
---|---|
StoredFieldsReader |
fieldsReader(Directory directory,
SegmentInfo si,
FieldInfos fn,
IOContext context)
Returns a
StoredFieldsReader to load stored
fields. |
StoredFieldsWriter |
fieldsWriter(Directory directory,
SegmentInfo si,
IOContext context)
Returns a
StoredFieldsWriter to write stored
fields. |
public static final String MODE_KEY
public static final CompressionMode BEST_COMPRESSION_MODE
Lucene87StoredFieldsFormat.Mode.BEST_COMPRESSION
public static final CompressionMode BEST_SPEED_MODE
Lucene87StoredFieldsFormat.Mode.BEST_SPEED
public Lucene87StoredFieldsFormat()
public Lucene87StoredFieldsFormat(Lucene87StoredFieldsFormat.Mode mode)
public StoredFieldsReader fieldsReader(Directory directory, SegmentInfo si, FieldInfos fn, IOContext context) throws IOException
StoredFieldsFormat
StoredFieldsReader
to load stored
fields.fieldsReader
in class StoredFieldsFormat
IOException
public StoredFieldsWriter fieldsWriter(Directory directory, SegmentInfo si, IOContext context) throws IOException
StoredFieldsFormat
StoredFieldsWriter
to write stored
fields.fieldsWriter
in class StoredFieldsFormat
IOException
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.