Metric Aggregation in Elasticsearch

Elasticsearch is a powerful tool not just for search but also for performing complex data analytics. Metric aggregations are a crucial aspect of this capability, allowing users to compute metrics like averages, sums, and more on numeric fields within their data.

This guide will delve into metric aggregations in Elasticsearch, explaining what they are, how they work, and providing detailed examples to illustrate their use.

What are Metric Aggregations?

Metric aggregations in Elasticsearch calculate metrics based on the values of numeric fields in your documents. Unlike bucket aggregations, which group documents into buckets, metric aggregations work directly on the numeric values and return statistical metrics. They are essential for summarizing large datasets and deriving insights such as averages, minimums, maximums, sums, and more.

Types of Metric Aggregations

Elasticsearch offers several types of metric aggregations, each serving a different purpose:

  • Average Aggregation: Calculates the average of numeric values.
  • Sum Aggregation: Computes the sum of numeric values.
  • Min Aggregation: Finds the minimum value.
  • Max Aggregation: Finds the maximum value.
  • Stats Aggregation: Provides a summary of statistics (count, min, max, sum, and average).
  • Extended Stats Aggregation: Includes additional statistics such as variance, standard deviation, and sum of squares.
  • Value Count Aggregation: Counts the number of values.
  • Percentiles Aggregation: Calculates percentiles over numeric values.
  • Percentile Ranks Aggregation: Computes the percentile rank of specific values.
  • Cardinality Aggregation: Estimates the count of distinct values.
  • Geo Bounds Aggregation: Computes the bounding box containing all geo-points in the field.

Example Dataset

To make the explanations concrete, let’s assume we have an Elasticsearch index called products with documents that look like this:

{
"product_id": 1,
"name": "Laptop",
"category": "electronics",
"price": 1000,
"quantity_sold": 5,
"rating": 4.5
}

Average Aggregation

The average aggregation computes the average value of a numeric field. Let’s calculate the average price of products in our index.

Query:

GET /products/_search
{
"size": 0,
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}

Output:

{
"aggregations": {
"avg_price": {
"value": 550.0
}
}
}

In this example, the average price of products is $550.0.

Sum Aggregation

The sum aggregation calculates the total sum of a numeric field. Let’s calculate the total quantity sold for all products.

Query:

GET /products/_search
{
"size": 0,
"aggs": {
"total_quantity_sold": {
"sum": {
"field": "quantity_sold"
}
}
}
}

Output:

{
"aggregations": {
"total_quantity_sold": {
"value": 25
}
}
}

In this example, the total quantity sold for all products is 25.

Min Aggregation

The min aggregation finds the minimum value of a numeric field. Let’s find the minimum price of products.

Query:

GET /products/_search
{
"size": 0,
"aggs": {
"min_price": {
"min": {
"field": "price"
}
}
}
}

Output

{
"aggregations": {
"min_price": {
"value": 100.0
}
}
}

In this example, the minimum price of products is $100.0.

Max Aggregation

The max aggregation finds the maximum value of a numeric field. Let’s find the maximum price of products.

Query:

GET /products/_search
{
"size": 0,
"aggs": {
"max_price": {
"max": {
"field": "price"
}
}
}
}

Output

{
"aggregations": {
"max_price": {
"value": 1000.0
}
}
}

In this example, the maximum price of products is $1000.0.

Stats Aggregation

The stats aggregation provides a summary of statistics, including count, sum, min, max, and average. Let’s get the stats for the price field.

Query:

GET /products/_search
{
"size": 0,
"aggs": {
"price_stats": {
"stats": {
"field": "price"
}
}
}
}

Output

{
"aggregations": {
"price_stats": {
"count": 10,
"min": 100.0,
"max": 1000.0,
"avg": 550.0,
"sum": 5500.0
}
}
}

In this example, we get a summary of statistics for the price field.

Extended Stats Aggregation

The extended stats aggregation provides additional statistics such as variance, standard deviation, and sum of squares. Let’s get the extended stats for the price field.

Query

GET /products/_search
{
"size": 0,
"aggs": {
"extended_price_stats": {
"extended_stats": {
"field": "price"
}
}
}
}

Output

{
"aggregations": {
"extended_price_stats": {
"count": 10,
"min": 100.0,
"max": 1000.0,
"avg": 550.0,
"sum": 5500.0,
"sum_of_squares": 3850000.0,
"variance": 202500.0,
"std_deviation": 450.0
}
}
}

In this example, we get extended statistics for the price field, including variance and standard deviation.

Value Count Aggregation

The value count aggregation counts the number of values in a field. Let’s count the number of products.

Query

GET /products/_search
{
"size": 0,
"aggs": {
"product_count": {
"value_count": {
"field": "product_id"
}
}
}
}

Output

{
"aggregations": {
"product_count": {
"value": 10
}
}
}

In this example, the number of products is 10.

Percentiles Aggregation

The percentiles aggregation calculates the percentiles over numeric values. Let’s calculate the 25th, 50th, and 75th percentiles for the price field.

Query

GET /products/_search
{
"size": 0,
"aggs": {
"price_percentiles": {
"percentiles": {
"field": "price",
"percents": [25, 50, 75]
}
}
}
}

Output

{
"aggregations": {
"price_percentiles": {
"values": {
"25.0": 275.0,
"50.0": 550.0,
"75.0": 825.0
}
}
}
}

In this example, we get the 25th, 50th, and 75th percentiles for the price field.

Percentile Ranks Aggregation

The percentile rank aggregation computes the percentile rank of specific values. Let’s calculate the percentile ranks for prices 300 and 600.

Query

GET /products/_search
{
"size": 0,
"aggs": {
"price_percentile_ranks": {
"percentile_ranks": {
"field": "price",
"values": [300, 600]
}
}
}
}

Output

{
"aggregations": {
"price_percentile_ranks": {
"values": {
"300.0": 30.0,
"600.0": 60.0
}
}
}
}

In this example, prices 300 and 600 fall into the 30th and 60th percentiles, respectively.

Cardinality Aggregation

The cardinality aggregation estimates the count of distinct values. Let’s estimate the number of distinct categories.

Query

GET /products/_search
{
"size": 0,
"aggs": {
"distinct_categories": {
"cardinality": {
"field": "category.keyword"
}
}
}
}

Output

{
"aggregations": {
"distinct_categories": {
"value": 3
}
}
}

In this example, there are 3 distinct categories.

Geo Bounds Aggregation

The geo-bounds aggregation computes the bounding box containing all geo-points in the field. Let’s calculate the geo-bounds for a field containing geo points.

Query

GET /locations/_search
{
"size": 0,
"aggs": {
"geo_bounds": {
"geo_bounds": {
"field": "location"
}
}
}
}

Output

{
"aggregations": {
"geo_bounds": {
"bounds": {
"top_left": {
"lat": 40.73,
"lon": -74.1
},
"bottom_right": {
"lat": 40.01,
"lon": -71.12
}
}
}
}
}

In this example, the geo-bounds aggregation calculates the bounding box for the geo-points.

Conclusion

Metric aggregations in Elasticsearch are a powerful way to perform statistical analysis on your data. They allow you to calculate averages, sums, minimums, maximums, and more, providing valuable insights into your data. By understanding and utilizing these aggregations, you can unlock the full potential of Elasticsearch for your data analytics needs. Whether you’re summarizing sales data, analyzing user behavior, or exploring any other type of numeric data, metric aggregations are an essential tool in your Elasticsearch toolkit.