Default Analyzer and Tokenizer
Elasticsearch uses the standard analyzer by default, which includes a standard tokenizer. Let’s see an example to understand how the default analyzer works.
Example: Default Analyzer
Consider the following text: “Elasticsearch is a powerful search engine.”
GET /_analyze
{
"text": "Elasticsearch is a powerful search engine"
}
Output:
{
"tokens": [
{ "token": "elasticsearch", "start_offset": 0, "end_offset": 14, "type": "<ALPHANUM>", "position": 0 },
{ "token": "is", "start_offset": 15, "end_offset": 17, "type": "<ALPHANUM>", "position": 1 },
{ "token": "a", "start_offset": 18, "end_offset": 19, "type": "<ALPHANUM>", "position": 2 },
{ "token": "powerful", "start_offset": 20, "end_offset": 28, "type": "<ALPHANUM>", "position": 3 },
{ "token": "search", "start_offset": 29, "end_offset": 35, "type": "<ALPHANUM>", "position": 4 },
{ "token": "engine", "start_offset": 36, "end_offset": 42, "type": "<ALPHANUM>", "position": 5 }
]
}
In this example:
- The text is tokenized into individual words.
- Each token includes information such as the start and end offsets and the token’s position.
Full Text Search with Analyzer and Tokenizer
Elasticsearch is renowned for its powerful full-text search capabilities. At the heart of this functionality are analyzers and tokenizers, which play a crucial role in how text is processed and indexed. This guide will help you understand how analyzers and tokenizers work in Elasticsearch, with detailed examples and outputs to make these concepts easy to grasp.