FeatureField (Lucene 8.9.0 API)乐学网一站式学习平台

java.lang.Object
- org.apache.lucene.document.Field
- - org.apache.lucene.document.FeatureField

All Implemented Interfaces:

IndexableField
```
public final class FeatureField
extends Field
```
Field that can be used to store static scoring factors into documents. This is mostly inspired from the work from Nick Craswell, Stephen Robertson, Hugo Zaragoza and Michael Taylor. Relevance weighting for query independent evidence. Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. August 15-19, 2005, Salvador, Brazil.
Feature values are internally encoded as term frequencies. Putting feature queries as BooleanClause.Occur.SHOULD clauses of a BooleanQuery allows to combine query-dependent scores (eg. BM25) with query-independent scores using a linear combination. The fact that feature values are stored as frequencies also allows search logic to efficiently skip documents that can't be competitive when total hit counts are not requested. This makes it a compelling option compared to storing such factors eg. in a doc-value field.
This field may only store factors that are positively correlated with the final score, like pagerank. In case of factors that are inversely correlated with the score like url length, the inverse of the scoring factor should be stored, ie. 1/urlLength.
This field only considers the top 9 significant bits for storage efficiency which allows to store them on 16 bits internally. In practice this limitation means that values are stored with a relative precision of 2^-8 = 0.00390625.
Given a scoring factor S > 0 and its weight w > 0, there are four ways that S can be turned into a score:
- w * log(a + S), with a ≥ 1. This function usually makes sense because the distribution of scoring factors often follows a power law. This is typically the case for pagerank for instance. However the paper suggested that the satu and sigm functions give even better results.
- satu(S) = w * S / (S + k), with k > 0. This function is similar to the one used by BM25Similarity in order to incorporate term frequency into the final score and produces values between 0 and 1. A value of 0.5 is obtained when S and k are equal.
- sigm(S) = w * S^a / (S^a + k^a), with k > 0, a > 0. This function provided even better results than the two above but is also harder to tune due to the fact it has 2 parameters. Like with satu, values are in the 0..1 range and 0.5 is obtained when S and k are equal.
- w * S. Expert: This function doesn't apply any transformation to an indexed feature value, and the indexed value itself, multiplied by weight, determines the score. Thus, there is an expectation that a feature value is encoded in the index in a way that makes sense for scoring.
The constants in the above formulas typically need training in order to compute optimal values. If you don't know where to start, the newSaturationQuery(String, String) method uses 1f as a weight and tries to guess a sensible value for the pivot parameter of the saturation function based on index statistics, which shouldn't perform too bad. Here is an example, assuming that documents have a FeatureField called 'features' with values for the 'pagerank' feature.
```
 Query query = new BooleanQuery.Builder()
     .add(new TermQuery(new Term("body", "apache")), Occur.SHOULD)
     .add(new TermQuery(new Term("body", "lucene")), Occur.SHOULD)
     .build();
 Query boost = FeatureField.newSaturationQuery("features", "pagerank");
 Query boostedQuery = new BooleanQuery.Builder()
     .add(query, Occur.MUST)
     .add(boost, Occur.SHOULD)
     .build();
 TopDocs topDocs = searcher.search(boostedQuery, 10);
 
```
WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.document.Field
  Field.Store

Field Summary
- Fields inherited from class org.apache.lucene.document.Field
  fieldsData, name, tokenStream, type

Constructor Summary

Constructors
Constructor and Description

FeatureField(String fieldName, String featureName, float featureValue)
Create a feature.

Constructors
Constructor and Description
`FeatureField(String fieldName, String featureName, float featureValue)` Create a feature.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static DoubleValuesSource`	`newDoubleValues(String field, String featureName)` Creates a `DoubleValuesSource` instance which can be used to read the values of a feature from the a `FeatureField` for documents.
`static SortField`	`newFeatureSort(String field, String featureName)` Creates a SortField for sorting by the value of a feature.
`static Query`	`newLinearQuery(String fieldName, String featureName, float weight)` Return a new `Query` that will score documents as `weight * S` where S is the value of the static feature.
`static Query`	`newLogQuery(String fieldName, String featureName, float weight, float scalingFactor)` Return a new `Query` that will score documents as `weight * Math.log(scalingFactor + S)` where S is the value of the static feature.
`static Query`	`newSaturationQuery(String fieldName, String featureName)` Same as `newSaturationQuery(String, String, float, float)` but `1f` is used as a weight and a reasonably good default pivot value is computed based on index statistics and is approximately equal to the geometric mean of all values that exist in the index.
`static Query`	`newSaturationQuery(String fieldName, String featureName, float weight, float pivot)` Return a new `Query` that will score documents as `weight * S / (S + pivot)` where S is the value of the static feature.
`static Query`	`newSigmoidQuery(String fieldName, String featureName, float weight, float pivot, float exp)` Return a new `Query` that will score documents as `weight * S^a / (S^a + pivot^a)` where S is the value of the static feature.
`void`	`setFeatureValue(float featureValue)` Update the feature value of this field.
`TokenStream`	`tokenStream(Analyzer analyzer, TokenStream reuse)` Creates the TokenStream used for indexing this field.

Methods inherited from class org.apache.lucene.document.Field
binaryValue, fieldType, getCharSequenceValue, name, numericValue, readerValue, setBytesValue, setBytesValue, setByteValue, setDoubleValue, setFloatValue, setIntValue, setLongValue, setReaderValue, setShortValue, setStringValue, setTokenStream, stringValue, tokenStreamValue, toString

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - FeatureField
```
public FeatureField(String fieldName,
                    String featureName,
                    float featureValue)
```
    Create a feature.
    
    Parameters:
    
    fieldName - The name of the field to store the information into. All features may be stored in the same field.
    
    featureName - The name of the feature, eg. 'pagerank`. It will be indexed as a term.
    
    featureValue - The value of the feature, must be a positive, finite, normal float.
- Method Detail
  - setFeatureValue
```
public void setFeatureValue(float featureValue)
```
    Update the feature value of this field.
  - tokenStream
```
public TokenStream tokenStream(Analyzer analyzer,
                               TokenStream reuse)
```
    Description copied from interface: IndexableField
    
    Creates the TokenStream used for indexing this field. If appropriate, implementations should use the given Analyzer to create the TokenStreams.
    
    Specified by:
    
    tokenStream in interface IndexableField
    
    Overrides:
    
    tokenStream in class Field
    
    Parameters:
    
    analyzer - Analyzer that should be used to create the TokenStreams from
    
    reuse - TokenStream for a previous instance of this field name. This allows custom field types (like StringField and NumericField) that do not use the analyzer to still have good performance. Note: the passed-in type may be inappropriate, for example if you mix up different types of Fields for the same field name. So it's the responsibility of the implementation to check.
    
    Returns:
    
    TokenStream value for indexing the document. Should always return a non-null value if the field is to be indexed
  - newLinearQuery
```
public static Query newLinearQuery(String fieldName,
                                   String featureName,
                                   float weight)
```
    Return a new Query that will score documents as weight * S where S is the value of the static feature.
    
    Parameters:
    
    fieldName - field that stores features
    
    featureName - name of the feature
    
    weight - weight to give to this feature, must be in (0,64]
    
    Throws:
    
    IllegalArgumentException - if weight is not in (0,64]
  - newLogQuery
```
public static Query newLogQuery(String fieldName,
                                String featureName,
                                float weight,
                                float scalingFactor)
```
    Return a new Query that will score documents as weight * Math.log(scalingFactor + S) where S is the value of the static feature.
    
    Parameters:
    
    fieldName - field that stores features
    
    featureName - name of the feature
    
    weight - weight to give to this feature, must be in (0,64]
    
    scalingFactor - scaling factor applied before taking the logarithm, must be in [1, +Infinity)
    
    Throws:
    
    IllegalArgumentException - if weight is not in (0,64] or scalingFactor is not in [1, +Infinity)
  - newSaturationQuery
```
public static Query newSaturationQuery(String fieldName,
                                       String featureName,
                                       float weight,
                                       float pivot)
```
    Return a new Query that will score documents as weight * S / (S + pivot) where S is the value of the static feature.
    
    Parameters:
    
    fieldName - field that stores features
    
    featureName - name of the feature
    
    weight - weight to give to this feature, must be in (0,64]
    
    pivot - feature value that would give a score contribution equal to weight/2, must be in (0, +Infinity)
    
    Throws:
    
    IllegalArgumentException - if weight is not in (0,64] or pivot is not in (0, +Infinity)
  - newSaturationQuery
```
public static Query newSaturationQuery(String fieldName,
                                       String featureName)
```
    Same as newSaturationQuery(String, String, float, float) but 1f is used as a weight and a reasonably good default pivot value is computed based on index statistics and is approximately equal to the geometric mean of all values that exist in the index.
    
    Parameters:
    
    fieldName - field that stores features
    
    featureName - name of the feature
    
    Throws:
    
    IllegalArgumentException - if weight is not in (0,64] or pivot is not in (0, +Infinity)
  - newSigmoidQuery
```
public static Query newSigmoidQuery(String fieldName,
                                    String featureName,
                                    float weight,
                                    float pivot,
                                    float exp)
```
    Return a new Query that will score documents as weight * S^a / (S^a + pivot^a) where S is the value of the static feature.
    
    Parameters:
    
    fieldName - field that stores features
    
    featureName - name of the feature
    
    weight - weight to give to this feature, must be in (0,64]
    
    pivot - feature value that would give a score contribution equal to weight/2, must be in (0, +Infinity)
    
    exp - exponent, higher values make the function grow slower before 'pivot' and faster after 'pivot', must be in (0, +Infinity)
    
    Throws:
    
    IllegalArgumentException - if w is not in (0,64] or either k or a are not in (0, +Infinity)
  - newFeatureSort
```
public static SortField newFeatureSort(String field,
                                       String featureName)
```
    Creates a SortField for sorting by the value of a feature.
    This sort orders documents by descending value of a feature. The value returned in FieldDoc for the hits contains a Float instance with the feature value.
    If a document is missing the field, then it is treated as having a vaue of 0.0f.
    
    Parameters:
    
    field - field name. Must not be null.
    
    featureName - feature name. Must not be null.
    
    Returns:
    
    SortField ordering documents by the value of the feature
    
    Throws:
    
    NullPointerException - if field or featureName is null.
  - newDoubleValues
```
public static DoubleValuesSource newDoubleValues(String field,
                                                 String featureName)
```
    Creates a DoubleValuesSource instance which can be used to read the values of a feature from the a FeatureField for documents.
    
    Parameters:
    
    field - field name. Must not be null.
    
    featureName - feature name. Must not be null.
    
    Returns:
    
    a DoubleValuesSource which can be used to access the values of the feature for documents
    
    Throws:
    
    NullPointerException - if field or featureName is null.

Class FeatureField

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.document.Field

Field Summary

Fields inherited from class org.apache.lucene.document.Field

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.document.Field

Methods inherited from class java.lang.Object

Constructor Detail

FeatureField

Method Detail

setFeatureValue

tokenStream

newLinearQuery

newLogQuery

newSaturationQuery

newSaturationQuery

newSigmoidQuery

newFeatureSort

newDoubleValues