public class CompressingStoredFieldsFormat extends StoredFieldsFormat
StoredFieldsFormat
that compresses documents in chunks in
order to improve the compression ratio.
For a chunk size of chunkSize bytes, this StoredFieldsFormat
does not support documents larger than (231 - chunkSize)
bytes.
For optimal performance, you should use a MergePolicy
that returns
segments that have the biggest byte size first.
Constructor and Description |
---|
CompressingStoredFieldsFormat(String formatName,
CompressionMode compressionMode,
int chunkSize,
int maxDocsPerChunk,
int blockShift)
Create a new
CompressingStoredFieldsFormat with an empty segment
suffix. |
CompressingStoredFieldsFormat(String formatName,
String segmentSuffix,
CompressionMode compressionMode,
int chunkSize,
int maxDocsPerChunk,
int blockShift)
Create a new
CompressingStoredFieldsFormat . |
Modifier and Type | Method and Description |
---|---|
StoredFieldsReader |
fieldsReader(Directory directory,
SegmentInfo si,
FieldInfos fn,
IOContext context)
Returns a
StoredFieldsReader to load stored
fields. |
StoredFieldsWriter |
fieldsWriter(Directory directory,
SegmentInfo si,
IOContext context)
Returns a
StoredFieldsWriter to write stored
fields. |
String |
toString() |
public CompressingStoredFieldsFormat(String formatName, CompressionMode compressionMode, int chunkSize, int maxDocsPerChunk, int blockShift)
CompressingStoredFieldsFormat
with an empty segment
suffix.public CompressingStoredFieldsFormat(String formatName, String segmentSuffix, CompressionMode compressionMode, int chunkSize, int maxDocsPerChunk, int blockShift)
CompressingStoredFieldsFormat
.
formatName
is the name of the format. This name will be used
in the file formats to perform
codec header checks
.
segmentSuffix
is the segment suffix. This suffix is added to
the result file name only if it's not the empty string.
The compressionMode
parameter allows you to choose between
compression algorithms that have various compression and decompression
speeds so that you can pick the one that best fits your indexing and
searching throughput. You should never instantiate two
CompressingStoredFieldsFormat
s that have the same name but
different CompressionMode
s.
chunkSize
is the minimum byte size of a chunk of documents.
A value of 1
can make sense if there is redundancy across
fields.
maxDocsPerChunk
is an upperbound on how many docs may be stored
in a single chunk. This is to bound the cpu costs for highly compressible data.
Higher values of chunkSize
should improve the compression
ratio but will require more memory at indexing time and might make document
loading a little slower (depending on the size of your OS cache compared
to the size of your index).
formatName
- the name of the StoredFieldsFormat
compressionMode
- the CompressionMode
to usechunkSize
- the minimum number of bytes of a single chunk of stored documentsmaxDocsPerChunk
- the maximum number of documents in a single chunkblockShift
- the log in base 2 of number of chunks to store in an index blockCompressionMode
public StoredFieldsReader fieldsReader(Directory directory, SegmentInfo si, FieldInfos fn, IOContext context) throws IOException
StoredFieldsFormat
StoredFieldsReader
to load stored
fields.fieldsReader
in class StoredFieldsFormat
IOException
public StoredFieldsWriter fieldsWriter(Directory directory, SegmentInfo si, IOContext context) throws IOException
StoredFieldsFormat
StoredFieldsWriter
to write stored
fields.fieldsWriter
in class StoredFieldsFormat
IOException
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.