Custom Analyzers
Sometimes, the default analyzer might not fit your needs, and you may need to create a custom analyzer. Custom analyzers allow you to define specific character filters, tokenizers, and token filters to tailor the text analysis process.
Example: Custom Analyzer
Let’s create a custom analyzer that:
- Converts text to lowercase.
- Uses a whitespace tokenizer.
- Removes English stop words.
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"stop"
]
}
}
}
}
}
Now, let’s analyze some text using this custom analyzer.
GET /my_index/_analyze
{
"analyzer": "custom_analyzer",
"text": "Elasticsearch is a powerful search engine"
}
Output:
{
"tokens": [
{ "token": "elasticsearch", "start_offset": 0, "end_offset": 14, "type": "word", "position": 0 },
{ "token": "powerful", "start_offset": 20, "end_offset": 28, "type": "word", "position": 1 },
{ "token": "search", "start_offset": 29, "end_offset": 35, "type": "word", "position": 2 },
{ "token": "engine", "start_offset": 36, "end_offset": 42, "type": "word", "position": 3 }
]
}
In this example:
- The text is converted to lowercase.
- The text is tokenized based on whitespace.
- Stop words (“is“, “a“) are removed from the tokens.
Full Text Search with Analyzer and Tokenizer
Elasticsearch is renowned for its powerful full-text search capabilities. At the heart of this functionality are analyzers and tokenizers, which play a crucial role in how text is processed and indexed. This guide will help you understand how analyzers and tokenizers work in Elasticsearch, with detailed examples and outputs to make these concepts easy to grasp.