This tutorial is built for ElasticSearch version 5.2. Version 5 features a bunch of breaking changes in terms of query DSL and mapping.
If you're still running version 2.x, please have a look at the v2 branch of this repository.
I've lined up a bunch of examples to showcase the features and the sheer power of ElasticSearch. A lot of the information is based on "ElasticSearch, The Definitive Guide".
Download ElasticSearch & Kibana here, then follow these simple steps:
./bin/elasticsearch
./bin/kibana
Exercise 1 is very simple and the goal is to get the hang of the ElasticSearch RESTFul interface.
Topics:
Load exercise 1
In exercise 2 we will be indexing a lot of data. To improve the performance, we're doing this in bulk.
This data contains information from the Combell blog. I've indexed the following information:
This data will be used in the other exercises.
Load the blog data in bulk
In exercise 3 we're performing some basic queries using the ElasticSearch query DSL. The DSL is JSON-based and the queries are full-text searches.
Here's a couple of searches we're performing:
Load exercise 3
In exercise 4, we're going to focus on the analysis of full-text and human language. We'll ignore the database capabilities of ElasticSearch and throw some text at it, and see how it tokenizes the data.
Depending on the analyzer you use, ElasticSearch will tokenize and store the data in a different way. Don't worry, the original data will remain in the source of the document, it's the inverted index that changes.
Load exercise 4
Exercise 5 is all about the schema of an index. ElasticSearch is marketed as being schemaless. In reality, ElasticSearch will guess the schema for you.
I'll show you examples where it guesses successfully and examples where it doesn't.
Load exercise 5
To avoid that ElasticSearch guesses the schema wrong, explicit mapping is a good idea. Exercise 6 will set up the right mapping for our blog example and re-insert the data.
Integers and strings will be defined accordingly and the date will have the right format.
The explicit mapping will be used in exercise 7.
Load exercise 6
The 2 searches in exercise 5 that failed will now be executed again. Thanks to explicit mapping, the output will be correct.
Load exercise 7
In exercise 8, we will define yet another mapping on our blog index. This mapping only treats the "title" field as full-text. The rest of the strings will not be analyzed and tokenized. They will be stored "as is".
This data will be used in exercise 9.
Load exercise 8
In exercise 9, I'll show you the difference between full-text searches using queries and exact value matches using queries in filter mode.
The mapping that was done in exercise 8 has made sure there is now a "keyword" field on the title property. This means that queries on "title" are treated as full-text searches and boolean filters on the regular "title.keyword" field are treated as exact value matches.
In one of the examples, I'll also show you how to combine multiple queries and filters.
This is what we'll do in this exercise:
Load exercise 9
We will again remap the data. This time, we will treat the "title" property as an analyzed field. By default the "standard" analyzer is used. Because our data is both in Dutch and English, I added 2 fields:
This is the final version of the mapping. The other examples will use this mapping and data.
Load exercise 10
Exercise 11 is all about the analysis of text, based on the language. Exercise 4 was a hint towards the analysis of data. Now we'll actually perform searches that depend on language analysis.
Load exercise 11
In exercise 12, we'll create a new "cities" index, that contains all the cities that are located in the West-Vlaanderen province of Belgium. The index stores the name of the city and its geo coordinates.
The explicit mapping and the data will be used in other exercises.
Load exercise 12
In the previous exercise, we created a new index and indexed some geo data. In exercise 13, we'll actually perform searches on this data.
2 queries will be showcased:
Load exercise 13
In exercise 14, we'll load data into yet another index. This index is called "cars" and it contains car sales information. Every transaction keeps track of the following information:
This information will be used in exercise 15.
Load exercise 14
Aggregations are a very powerful feature of ElasticSearch. It's basically like "group by" in SQL, but way more powerful. Aggregations are the reason why ElasticSearch is popular in the big data and data science community.
These are the aggregations we'll execute in this exercise:
Load exercise 15