Package org.elasticsearch.search.aggregations
Aggregations
Builds analytic information over all hits in a search request. Aggregations are essentially a tool for sumarizing data, and that summary is often used to generate a visualization.
Types of aggregations
There are three main types of aggregations, each in their own sub package:- Bucket aggregations - which group documents (e.g. a histogram)
- Metric aggregations - which compute a summary value from several documents (e.g. a sum)
- Pipeline aggregations - which run as a seperate step and compute values across buckets
How Aggregations Work
TODO: Info about search phases goes here
Aggregations operate in general as Map Reduce jobs. The coordinating node for
the query dispatches the aggregation to each data node. The data nodes all
instantiate an AggregationBuilder
of the appropriate type, which in turn builds the
Aggregator
for that node. This
collects the data from that shard, via
BucketCollector.getLeafCollector(org.apache.lucene.index.LeafReaderContext)
more or less. These values are shipped back to the coordinating node, which
performs the reduction on them (partial reductions in place on the data nodes
are also possible).
Three modes of operation
When it comes to actually collecting values, there are three ways aggregations operate, in general. Which one we choose depends on limitations in the query and how the data was ingested (e.g. if it is searchable).
The easiest to understand is the Compatible (i.e. usable in all situations) mode, which can be thought of as iterating each query hit and collecting a value from it. This is the least performant way to evaluate aggregations, requiring looking at every hit.
The fastest way to run an aggregation is by looking at the index structures
directly. For example, Lucene just stores the minimum and maximum values
of fields per segment, so a min aggregation matching all documents in a segment
can just look up its result. Generally speaking, this mode can be engaged when
there are no queries or sub-aggregations, and is gated by
ValuesSourceConfig.getPointReaderOrNull()
.
Finally, we can rewrite an aggregation into faster aggregations,
or ideally into just a query. Generally, the goal here is to get to
filter by filters (which is an optimization on the filters aggregation
which runs it as a set of filter queries). Often this process will look like rewriting
a DateHistogram into a DateRange, and then rewriting the DateRange into Filters.
If you see AdaptingAggregator
, that's
a good clue that the rewrite mode is being used. In general, when we rewrite aggregations,
we are able to detect if the rewritten agg can run in a "fast" mode, and decline the
rewrite if it can't.
In general, aggs will try to use one of the fast modes, and if that's not possible, fall back to running in compatible mode.
-
InterfaceDescriptionAn aggregation.Compare two buckets by their ordinal.Parses the aggregation request and creates the appropriate aggregator factory for it.Interface shared by
AggregationBuilder
andPipelineAggregationBuilder
so they can conveniently share the same namespace forXContentParser.namedObject(Class, String, Object)
.Defines behavior for comparingbucket keys
to imposes a total ordering of buckets of the same type. -
ClassDescriptionAbstractAggregationBuilder<AB extends AbstractAggregationBuilder<AB>>Base implementation of a
AggregationBuilder
.An Aggregator that delegates collection to another Aggregator and then translates its results into the results you'd expect from another aggregation.Common xcontent fields that are shared among addAggregationA factory that knows how to create anAggregator
of a specific type.Common xcontent fields shared among aggregator buildersUtility class to create aggregations.Aggregation phase of a search request, used to collect aggregationsRepresents a set ofAggregation
sAn Aggregator.Base implementation for concrete aggregators.An immutable collection ofAggregatorFactories
.A mutable collection ofAggregationBuilder
s andPipelineAggregationBuilder
s.A Collector that can collect data in separate buckets.MultiBucketsAggregation.Bucket
ordering strategy.Upper bound of how manyowningBucketOrds
that anAggregator
will have to collect into.A wrapper around reducing buckets with the same key that can delay that reduction as long as possible.An internal implementation ofAggregation
.An internal implementation ofAggregations
.InternalMultiBucketAggregation<A extends InternalMultiBucketAggregation,B extends InternalMultiBucketAggregation.InternalBucket>Implementations forMultiBucketsAggregation.Bucket
ordering strategies.MultiBucketsAggregation.Bucket
ordering strategy to sort by a sub-aggregation.MultiBucketsAggregation.Bucket
ordering strategy to sort by multiple criteria.Contains logic for parsing aBucketOrder
from aXContentParser
.Contains logic for reading/writingBucketOrder
from/to streams.Collects results for a particular segment.ALeafBucketCollector
that delegates all calls to the sub leaf aggregator and sets the scorer on its source of values if it implementsScorerAware
.ABucketCollector
which allows running a bucket collection with severalBucketCollector
s.An aggregation service that creates instances ofMultiBucketConsumerService.MultiBucketConsumer
.AnIntConsumer
that throws aMultiBucketConsumerService.TooManyBucketsException
when the sum of the provided values is above the limit (`search.max_buckets`).An aggregator that is not collected, this can typically be used when running an aggregation over a field that doesn't have a mapping.An implementation ofAggregation
that is parsed from a REST response.A factory that knows how to create anPipelineAggregator
of a specific type.The aggregation context that is part of the search context.Merges many buckets into the "top" buckets as sorted byBucketOrder
. -
Enum ClassDescriptionA rough count of the number of buckets that
Aggregator
s built by this builder will contain per parent bucket used to validate sorts and pipeline aggregations.Aggregation mode for sub aggregations. -
ExceptionDescriptionThrown when failing to execute an aggregationThrown when failing to execute an aggregation