Relevance Scoring and Search Relevance in Elasticsearch

Elasticsearch is a powerful search engine that good at fulltext search among other types of queries. One of its key features is the ability to rank search results based on relevance. Relevance scoring determines how well a document matches a given search query and ensures that the most relevant results appear at the top.

In this article, we will understand relevance scoring in Elasticsearch with detailed examples and outputs to make the concepts simple and easy to learn.

Introduction to Relevance Scoring

  • Relevance scoring is a mechanism used by Elasticsearch to rank documents according to how well they match a search query.
    When we perform a search, Elasticsearch calculates a relevance score for each document which is then used to sort the search results.
  • The default relevance scoring algorithm used by Elasticsearch is the BM25 algorithm, which is a modern version of the TF-IDF (Term Frequency-Inverse Document Frequency) model.
  • BM25 considers several factors, including term frequency, inverse document frequency, and field length normalization, to compute a score.

Key Concepts of Relevance Scoring

  • Term Frequency (TF): Measures how often a term appears in a document. The more frequently a term appears, the higher its contribution to the relevance score.
  • Inverse Document Frequency (IDF): Measures the importance of a term across all documents. Terms that appear in many documents have lower IDF values, reducing their impact on the relevance score.
  • Field Length Normalization: Adjusts the score based on the length of the field. Longer fields may dilute the impact of term frequency.

Basic Examples of Relevance Scoring and Search Relevance in Elasticsearch

To understand about the Relevance Scoring and Search Relevance in Elasticsearch we will consider below products collection as shown as below:

[
{
"title": "Wireless Headphones",
"description": "High-quality wireless headphones with noise-canceling technology.",
"price": 99.99,
"popularity": 100
},
{
"title": "Smartphone",
"description": "A powerful smartphone with a high-resolution display.",
"price": 499.99,
"popularity": 200
},
{
"title": "Laptop",
"description": "Thin and light laptop with long battery life.",
"price": 899.99,
"popularity": 150
},
{
"title": "Smart Watch",
"description": "Fitness tracker with heart rate monitor and GPS.",
"price": 199.99,
"popularity": 75
},
{
"title": "Tablet",
"description": "10-inch tablet with quad-core processor.",
"price": 299.99,
"popularity": 120
}
]

Let’s start with a simple example using a match query to see how relevance scoring works.

Example 1: Match Query

Let’s Retrieve all products with a description containing the term “smartphone.”

GET /products/_search
{
"query": {
"match": {
"description": "smartphone"
}
}
}

Output:

"hits" : [
{
"_id" : "2",
"_source" : {
"title" : "Smartphone",
"description" : "A powerful smartphone with a high-resolution display.",
"price" : 499.99,
"popularity" : 200
}
}
]

Explanation: This query searches for documents in the “products” index where the “description” field contains the term “smartphone.” It retrieves all documents that match this criteria

Example 2: Boosting with Multi-Match Query

Let’s Search for products with either “smartphone” or “tablet” in the title or description, giving more weight to matches in the title

GET /products/_search
{
"query": {
"multi_match": {
"query": "smartphone tablet",
"fields": ["title^2", "description"]
}
}
}

Output:

"hits" : [
{
"_id" : "2",
"_source" : {
"title" : "Smartphone",
"description" : "A powerful smartphone with a high-resolution display.",
"price" : 499.99,
"popularity" : 200
}
},
{
"_id" : "5",
"_source" : {
"title" : "Tablet",
"description" : "10-inch tablet with quad-core processor.",
"price" : 299.99,
"popularity" : 120
}
}
]

Explanation:This query searches for documents in the “products” index where either the “title” or “description” field contains the terms “smartphone” or “tablet.” It gives more weight to matches in the “title” field (by using the ^2 notation) compared to matches in the “description” field

Example 3: Custom Scoring with Function Score Query

Let’s Retrieve all products, boosting their relevance based on the popularity of each product. The popularity is used as a factor in the relevance score calculation, with a square root modifier to moderate the boost effect.

GET /products/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"boost_mode": "sqrt",
"functions": [
{
"field_value_factor": {
"field": "popularity",
"factor": 1.2,
"modifier": "sqrt"
}
}
]
}
}
}

Output:

"hits" : [
{
"_id" : "2",
"_source" : {
"title" : "Smartphone",
"description" : "A powerful smartphone with a high-resolution display.",
"price" : 499.99,
"popularity" : 200
},
"_score" : 14.142136
},
{
"_id" : "3",
"_source" : {
"title" : "Laptop",
"description" : "Thin and light laptop with long battery life.",
"price" : 899.99,
"popularity" : 150
},
"_score" : 12.247448
},
{
"_id" : "5",
"_source" : {
"title" : "Tablet",
"description" : "10-inch tablet with quad-core processor.",
"price" : 299.99,
"popularity" : 120
},
"_score" : 10.954451
},
{
"_id" : "1",
"_source" : {
"title" : "Wireless Headphones",
"description" : "High-quality wireless headphones with noise-canceling technology.",
"price" : 99.99,
"popularity" : 100
},
"_score" : 10
},
{
"_id" : "4",
"_source" : {
"title" : "Smart Watch",
"description" : "Fitness tracker with heart rate monitor and GPS.",
"price" : 199.99,
"popularity" : 75
},
"_score" : 8.6602545
}
]

Explanation:This query retrieves all documents in the “products” index, boosting their relevance based on the “popularity” field of each document. The “popularity” field is used as a factor in the relevance score calculation, with a square root modifier to moderate the boost effect

Practical Tips for Improving Search Relevance

  • Analyze User Behavior: Monitor how users interact with your search results and adjust relevance parameters based on their behavior.
  • Use Synonyms: Implement a synonym filter to handle different terms that mean the same thing, improving the relevance of search results.
  • Boost Important Fields: Use field boosts to emphasize the importance of certain fields in your documents.
  • Experiment with Scoring Functions: Try different scoring functions and parameters to find the best combination for your specific use case.
  • Optimize Index Settings: Fine-tune index settings like BM25 parameters to better align with your data and search requirements.

Conclusion

Understanding relevance scoring and search relevance in Elasticsearch is crucial for building effective search applications. By understanding the concepts and techniques discussed in this article you can improve the quality and relevance of your search results and ensuring that users find the most relevant information quickly and easily.

Remember, relevance scoring is an iterative process. Continuously monitor, analyze, and adjust your search configurations to adapt to changing data and user behavior.