public class SweetSpotSimilarity extends ClassicSimilarity
A similarity with a lengthNorm that provides for a "plateau" of equally good lengths, and tf helper functions.
For lengthNorm, A min/max can be specified to define the plateau of lengths that should all have a norm of 1.0. Below the min, and above the max the lengthNorm drops off in a sqrt function.
For tf, baselineTf and hyperbolicTf functions are provided, which subclasses can choose between.
Similarity.SimScorer
discountOverlaps
Constructor and Description |
---|
SweetSpotSimilarity() |
Modifier and Type | Method and Description |
---|---|
float |
baselineTf(float freq)
Implemented as:
(x <= min) ? base : sqrt(x+(base**2)-min)
...but with a special case check for 0. |
float |
hyperbolicTf(float freq)
Uses a hyperbolic tangent function that allows for a hard max...
|
float |
lengthNorm(int numTerms)
Implemented as:
1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 )
. |
void |
setBaselineTfFactors(float base,
float min)
Sets the baseline and minimum function variables for baselineTf
|
void |
setHyperbolicTfFactors(float min,
float max,
double base,
float xoffset)
Sets the function variables for the hyperbolicTf functions
|
void |
setLengthNormFactors(int min,
int max,
float steepness,
boolean discountOverlaps)
Sets the default function variables used by lengthNorm when no field
specific variables have been set.
|
float |
tf(float freq)
Delegates to baselineTf
|
String |
toString() |
idf, idfExplain
computeNorm, getDiscountOverlaps, idfExplain, scorer, setDiscountOverlaps
public void setBaselineTfFactors(float base, float min)
baselineTf(float)
public void setHyperbolicTfFactors(float min, float max, double base, float xoffset)
min
- the minimum tf value to ever be returned (default: 0.0)max
- the maximum tf value to ever be returned (default: 2.0)base
- the base value to be used in the exponential for the hyperbolic function (default: 1.3)xoffset
- the midpoint of the hyperbolic function (default: 10.0)hyperbolicTf(float)
public void setLengthNormFactors(int min, int max, float steepness, boolean discountOverlaps)
lengthNorm(int)
public float lengthNorm(int numTerms)
1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 )
.
This degrades to 1/sqrt(x)
when min and max are both 1 and
steepness is 0.5
:TODO: potential optimization is to just flat out return 1.0f if numTerms is between min and max.
lengthNorm
in class ClassicSimilarity
setLengthNormFactors(int, int, float, boolean)
,
An SVG visualization of this functionpublic float tf(float freq)
tf
in class ClassicSimilarity
baselineTf(float)
public float baselineTf(float freq)
(x <= min) ? base : sqrt(x+(base**2)-min)
...but with a special case check for 0.
This degrates to sqrt(x)
when min and base are both 0
public float hyperbolicTf(float freq)
tf(x)=min+(max-min)/2*(((base**(x-xoffset)-base**-(x-xoffset))/(base**(x-xoffset)+base**-(x-xoffset)))+1)
This code is provided as a convenience for subclasses that want to use a hyperbolic tf function.
public String toString()
toString
in class ClassicSimilarity
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.