Package org.elasticsearch.common.util
Class SetBackedScalingCuckooFilter
java.lang.Object
org.elasticsearch.common.util.SetBackedScalingCuckooFilter
- All Implemented Interfaces:
Writeable
An approximate set membership datastructure that scales as more unique values are inserted.
Can definitively say if a member does not exist (no false negatives), but may say an item exists
when it does not (has false positives). Similar in usage to a Bloom Filter.
Internally, the datastructure maintains a Set of hashes up to a specified threshold. This provides 100% accurate membership queries.
When the threshold is breached, a list of CuckooFilters are created and used to track membership. These filters are approximate similar to Bloom Filters.
This datastructure scales as more values are inserted by growing the list of CuckooFilters. Final size is dependent on the cardinality of data inserted, and the precision specified.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.elasticsearch.common.io.stream.Writeable
Writeable.Reader<V>, Writeable.Writer<V>
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
add(long value)
Add's the provided value to the set for trackingvoid
add(org.apache.lucene.util.BytesRef value)
Add's the provided value to the set for trackingboolean
double
getFpp()
Returns the false-positive rate used for the cuckoo filters.getRng()
Returns the random number generator used for the cuckoo hashing process.long
Get the approximate size of this datastructure.int
Returns the number of distinct values that are tracked before converting to an approximate representation.int
hashCode()
void
Merge `other` cuckoo filter into this cuckoo.boolean
mightContain(long value)
Returns true if the set might contain the provided value, false otherwise.boolean
mightContain(org.apache.lucene.util.BytesRef value)
Returns true if the set might contain the provided value, false otherwise.void
registerBreaker(Consumer<Long> breaker)
Registers a circuit breaker with the datastructure.void
writeTo(StreamOutput out)
Write this into the StreamOutput.
-
Constructor Details
-
SetBackedScalingCuckooFilter
- Parameters:
threshold
- The number of distinct values that should be tracked before converting to an approximate representationrng
- A random number generator needed for the cuckoo hashing processfpp
- the false-positive rate that should be used for the cuckoo filters.
-
SetBackedScalingCuckooFilter
- Throws:
IOException
-
-
Method Details
-
writeTo
Description copied from interface:Writeable
Write this into the StreamOutput.- Specified by:
writeTo
in interfaceWriteable
- Throws:
IOException
-
getThreshold
public int getThreshold()Returns the number of distinct values that are tracked before converting to an approximate representation. -
getRng
Returns the random number generator used for the cuckoo hashing process. -
getFpp
public double getFpp()Returns the false-positive rate used for the cuckoo filters. -
registerBreaker
Registers a circuit breaker with the datastructure. CuckooFilter's can "saturate" and refuse to accept any new values. When this happens, the datastructure scales by adding a new filter. This new filter's bytes will be tracked in the registered breaker when configured. -
mightContain
public boolean mightContain(org.apache.lucene.util.BytesRef value)Returns true if the set might contain the provided value, false otherwise. False values are 100% accurate, while true values may be a false-positive. -
mightContain
public boolean mightContain(long value)Returns true if the set might contain the provided value, false otherwise. False values are 100% accurate, while true values may be a false-positive. -
add
public void add(org.apache.lucene.util.BytesRef value)Add's the provided value to the set for tracking -
add
public void add(long value)Add's the provided value to the set for tracking -
getSizeInBytes
public long getSizeInBytes()Get the approximate size of this datastructure. Approximate because only the Set occupants are tracked, not the overhead of the Set itself. -
merge
Merge `other` cuckoo filter into this cuckoo. After merging, this filter's state will be the union of the two. During the merging process, the internal Set may be upgraded to a cuckoo if it goes over threshold -
hashCode
public int hashCode() -
equals
-