Default Analyzer and Tokenizer

Elasticsearch uses the standard analyzer by default, which includes a standard tokenizer. Let’s see an example to understand how the default analyzer works.

Example: Default Analyzer

Consider the following text: “Elasticsearch is a powerful search engine.”

GET /_analyze
{
  "text": "Elasticsearch is a powerful search engine"
}

Output:

{
  "tokens": [
    { "token": "elasticsearch", "start_offset": 0, "end_offset": 14, "type": "<ALPHANUM>", "position": 0 },
    { "token": "is", "start_offset": 15, "end_offset": 17, "type": "<ALPHANUM>", "position": 1 },
    { "token": "a", "start_offset": 18, "end_offset": 19, "type": "<ALPHANUM>", "position": 2 },
    { "token": "powerful", "start_offset": 20, "end_offset": 28, "type": "<ALPHANUM>", "position": 3 },
    { "token": "search", "start_offset": 29, "end_offset": 35, "type": "<ALPHANUM>", "position": 4 },
    { "token": "engine", "start_offset": 36, "end_offset": 42, "type": "<ALPHANUM>", "position": 5 }
  ]
}

In this example:

The text is tokenized into individual words.
Each token includes information such as the start and end offsets and the token’s position.

Full Text Search with Analyzer and Tokenizer

Elasticsearch is renowned for its powerful full-text search capabilities. At the heart of this functionality are analyzers and tokenizers, which play a crucial role in how text is processed and indexed. This guide will help you understand how analyzers and tokenizers work in Elasticsearch, with detailed examples and outputs to make these concepts easy to grasp.

Default Analyzer and Tokenizer

Example: Default Analyzer

Full Text Search with Analyzer and Tokenizer

Similar Reads