public class DirectoryTaxonomyWriter extends Object implements TaxonomyWriter
TaxonomyWriter
which uses a Directory
to store the taxonomy
information on disk, and keeps an additional in-memory cache of some or all
categories.
In addition to the permanently-stored information in the Directory
,
efficiency dictates that we also keep an in-memory cache of recently
seen or all categories, so that we do not need to go back to disk
for every category addition to see which ordinal this category already has,
if any. A TaxonomyWriterCache
object determines the specific caching
algorithm used.
This class offers some hooks for extending classes to control the
IndexWriter
instance that is used. See openIndexWriter(org.apache.lucene.store.Directory, org.apache.lucene.index.IndexWriterConfig)
.
Modifier and Type | Class and Description |
---|---|
static class |
DirectoryTaxonomyWriter.DiskOrdinalMap
DirectoryTaxonomyWriter.OrdinalMap maintained on file system |
static class |
DirectoryTaxonomyWriter.MemoryOrdinalMap
DirectoryTaxonomyWriter.OrdinalMap maintained in memory |
static interface |
DirectoryTaxonomyWriter.OrdinalMap
Mapping from old ordinal to new ordinals, used when merging indexes
with separate taxonomies.
|
Modifier and Type | Field and Description |
---|---|
static String |
INDEX_EPOCH
Property name of user commit data that contains the index epoch.
|
Constructor and Description |
---|
DirectoryTaxonomyWriter(Directory d)
Create this with
OpenMode.CREATE_OR_APPEND . |
DirectoryTaxonomyWriter(Directory directory,
IndexWriterConfig.OpenMode openMode)
Creates a new instance with a default cache as defined by
defaultTaxonomyWriterCache() . |
DirectoryTaxonomyWriter(Directory directory,
IndexWriterConfig.OpenMode openMode,
TaxonomyWriterCache cache)
Construct a Taxonomy writer.
|
Modifier and Type | Method and Description |
---|---|
int |
addCategory(FacetLabel categoryPath)
addCategory() adds a category with a given path name to the taxonomy,
and returns its ordinal.
|
void |
addTaxonomy(Directory taxoDir,
DirectoryTaxonomyWriter.OrdinalMap map)
Takes the categories from the given taxonomy directory, and adds the
missing ones to this taxonomy.
|
void |
close()
Frees used resources as well as closes the underlying
IndexWriter ,
which commits whatever changes made to it to the underlying
Directory . |
protected void |
closeResources()
A hook for extending classes to close additional resources that were used.
|
long |
commit() |
protected IndexWriterConfig |
createIndexWriterConfig(IndexWriterConfig.OpenMode openMode)
Create the
IndexWriterConfig that would be used for opening the internal index writer. |
static TaxonomyWriterCache |
defaultTaxonomyWriterCache()
Defines the default
TaxonomyWriterCache to use in constructors
which do not specify one. |
protected void |
ensureOpen()
Verifies that this instance wasn't closed, or throws
AlreadyClosedException if it is. |
protected int |
findCategory(FacetLabel categoryPath)
Look up the given category in the cache and/or the on-disk storage,
returning the category's ordinal, or a negative number in case the
category does not yet exist in the taxonomy.
|
TaxonomyWriterCache |
getCache()
Returns the
TaxonomyWriterCache in use by this writer. |
Directory |
getDirectory()
Returns the
Directory of this taxonomy writer. |
Iterable<Map.Entry<String,String>> |
getLiveCommitData()
Returns the commit user data iterable that was set on
TaxonomyWriter.setLiveCommitData(Iterable) . |
int |
getParent(int ordinal)
getParent() returns the ordinal of the parent category of the category
with the given ordinal.
|
int |
getSize()
getSize() returns the number of categories in the taxonomy.
|
long |
getTaxonomyEpoch()
Expert: returns current index epoch, if this is a
near-real-time reader.
|
protected IndexWriter |
openIndexWriter(Directory directory,
IndexWriterConfig config)
Open internal index writer, which contains the taxonomy data.
|
long |
prepareCommit()
prepare most of the work needed for a two-phase commit.
|
void |
replaceTaxonomy(Directory taxoDir)
Replaces the current taxonomy with the given one.
|
void |
rollback()
Rollback changes to the taxonomy writer and closes the instance.
|
void |
setCacheMissesUntilFill(int i)
Set the number of cache misses before an attempt is made to read the entire
taxonomy into the in-memory cache.
|
void |
setLiveCommitData(Iterable<Map.Entry<String,String>> commitUserData)
Sets the commit user data iterable.
|
public static final String INDEX_EPOCH
IndexWriterConfig.OpenMode.CREATE
.
Applications should not use this property in their commit data because it will be overridden by this taxonomy writer.
public DirectoryTaxonomyWriter(Directory directory, IndexWriterConfig.OpenMode openMode, TaxonomyWriterCache cache) throws IOException
directory
- The Directory
in which to store the taxonomy. Note that
the taxonomy is written directly to that directory (not to a
subdirectory of it).openMode
- Specifies how to open a taxonomy for writing: APPEND
means open an existing index for append (failing if the index does
not yet exist). CREATE
means create a new index (first
deleting the old one if it already existed).
APPEND_OR_CREATE
appends to an existing index if there
is one, otherwise it creates a new index.cache
- A TaxonomyWriterCache
implementation which determines
the in-memory caching policy. See for example
LruTaxonomyWriterCache
and UTF8TaxonomyWriterCache
.
If null or missing, defaultTaxonomyWriterCache()
is used.CorruptIndexException
- if the taxonomy is corrupted.LockObtainFailedException
- if the taxonomy is locked by another writer.IOException
- if another error occurred.public DirectoryTaxonomyWriter(Directory directory, IndexWriterConfig.OpenMode openMode) throws IOException
defaultTaxonomyWriterCache()
.IOException
public DirectoryTaxonomyWriter(Directory d) throws IOException
OpenMode.CREATE_OR_APPEND
.IOException
public TaxonomyWriterCache getCache()
TaxonomyWriterCache
in use by this writer.protected IndexWriter openIndexWriter(Directory directory, IndexWriterConfig config) throws IOException
Extensions may provide their own IndexWriter
implementation or instance.
NOTE: the instance this method returns will be closed upon calling
to close()
.
NOTE: the merge policy in effect must not merge none adjacent segments. See
comment in createIndexWriterConfig(IndexWriterConfig.OpenMode)
for the logic behind this.
directory
- the Directory
on top of which an IndexWriter
should be opened.config
- configuration for the internal index writer.IOException
createIndexWriterConfig(IndexWriterConfig.OpenMode)
protected IndexWriterConfig createIndexWriterConfig(IndexWriterConfig.OpenMode openMode)
IndexWriterConfig
that would be used for opening the internal index writer.
IndexWriter
as they see fit,
including setting a merge-scheduler
, or
deletion-policy
, different RAM size
etc.openMode
- see IndexWriterConfig.OpenMode
openIndexWriter(Directory, IndexWriterConfig)
public static TaxonomyWriterCache defaultTaxonomyWriterCache()
TaxonomyWriterCache
to use in constructors
which do not specify one.
The current default is UTF8TaxonomyWriterCache
, i.e.,
the entire taxonomy is cached in memory while building it.
public void close() throws IOException
IndexWriter
,
which commits whatever changes made to it to the underlying
Directory
.close
in interface Closeable
close
in interface AutoCloseable
IOException
protected void closeResources() throws IOException
IndexReader
as well as the
TaxonomyWriterCache
instances that were used. super.closeResources()
call in your implementation.IOException
protected int findCategory(FacetLabel categoryPath) throws IOException
IOException
public int addCategory(FacetLabel categoryPath) throws IOException
TaxonomyWriter
Before adding a category, addCategory() makes sure that all its ancestor categories exist in the taxonomy as well. As result, the ordinal of a category is guaranteed to be smaller then the ordinal of any of its descendants.
addCategory
in interface TaxonomyWriter
IOException
protected final void ensureOpen()
AlreadyClosedException
if it is.public long commit() throws IOException
commit
in interface TwoPhaseCommit
IOException
public void setLiveCommitData(Iterable<Map.Entry<String,String>> commitUserData)
TaxonomyWriter
IndexWriter.setLiveCommitData(java.lang.Iterable<java.util.Map.Entry<java.lang.String, java.lang.String>>)
.setLiveCommitData
in interface TaxonomyWriter
public Iterable<Map.Entry<String,String>> getLiveCommitData()
TaxonomyWriter
TaxonomyWriter.setLiveCommitData(Iterable)
.getLiveCommitData
in interface TaxonomyWriter
public long prepareCommit() throws IOException
IndexWriter.prepareCommit()
.prepareCommit
in interface TwoPhaseCommit
IOException
public int getSize()
TaxonomyWriter
Because categories are numbered consecutively starting with 0, it means the taxonomy contains ordinals 0 through getSize()-1.
Note that the number returned by getSize() is often slightly higher than the number of categories inserted into the taxonomy; This is because when a category is added to the taxonomy, its ancestors are also added automatically (including the root, which always get ordinal 0).
getSize
in interface TaxonomyWriter
public void setCacheMissesUntilFill(int i)
This taxonomy writer holds an in-memory cache of recently seen categories to speed up operation. On each cache-miss, the on-disk index needs to be consulted. When an existing taxonomy is opened, a lot of slow disk reads like that are needed until the cache is filled, so it is more efficient to read the entire taxonomy into memory at once. We do this complete read after a certain number (defined by this method) of cache misses.
If the number is set to 0
, the entire taxonomy is read into the
cache on first use, without fetching individual categories first.
NOTE: it is assumed that this method is called immediately after the taxonomy writer has been created.
public int getParent(int ordinal) throws IOException
TaxonomyWriter
When a category is specified as a path name, finding the path of its parent is as trivial as dropping the last component of the path. getParent() is functionally equivalent to calling getPath() on the given ordinal, dropping the last component of the path, and then calling getOrdinal() to get an ordinal back.
If the given ordinal is the ROOT_ORDINAL, an INVALID_ORDINAL is returned. If the given ordinal is a top-level category, the ROOT_ORDINAL is returned. If an invalid ordinal is given (negative or beyond the last available ordinal), an IndexOutOfBoundsException is thrown. However, it is expected that getParent will only be called for ordinals which are already known to be in the taxonomy. TODO (Facet): instead of a getParent(ordinal) method, consider having a
getCategory(categorypath, prefixlen) which is similar to addCategory except it doesn't add new categories; This method can be used to get the ordinals of all prefixes of the given category, and it can use exactly the same code and cache used by addCategory() so it means less code.
getParent
in interface TaxonomyWriter
IOException
public void addTaxonomy(Directory taxoDir, DirectoryTaxonomyWriter.OrdinalMap map) throws IOException
DirectoryTaxonomyWriter.OrdinalMap
with a mapping from the original ordinal to the new
ordinal.IOException
public void rollback() throws IOException
AlreadyClosedException
).rollback
in interface TwoPhaseCommit
IOException
public void replaceTaxonomy(Directory taxoDir) throws IOException
IndexWriter.addIndexes(Directory...)
to replace both the taxonomy
as well as the search index content.IOException
public final long getTaxonomyEpoch()
DirectoryTaxonomyReader
to support NRT.Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.