Elasticsearch Interview Questions and Answers with Examples
Elasticsearch Interview Questions, Are you looking for the list of top Rated Elasticsearch Interview Questions? Or the one who is casually looking for the Best Platform which is offering Best interview questions on Elastic Search? Or the one who is carrying experience seeking the List of best Elasticsearch Interview Questions and Answers with Examples for experienced then stays with us for the most asked interview questions on Elastic Search which are asked in the most common interviews.
Are you the one who is dreaming to become the certified Pro Hadoop Developer? Then ask India’s Leading Big Data Training institute how to become a pro developer. Get the Advanced Big Data Certification course under the guidance of World-class Trainers of Big Data Training institute.
1. What is Elasticsearch?
Elasticsearch is a search engine that is based on Lucene. It offers a distributed, multitenant – capable full-text
search engine with as HTTP (HyperText Transfer Protocol) web interface and Schema-free JSON
(JavaScript Object Notation) documents.
It is developed in Java and is an open-source released under Apache License.
2. List the software requirements to install Elasticsearch?
Since Elasticsearch is built using Java, we require any of the following software to run Elasticsearch on our device.
The latest version of Java 8 series
Java version 1.8.0_131 is recommended.
3. How to start an elastic search server?
Run Following command on your terminal to start Elasticsearch server:
CD elasticsearch
./bin/elasticsearch
curl ‘http://localhost:9200/?pretty’ command is used to check the ElasticSearch server is running or not.
4. What is a Cluster in Elasticsearch?
It is a set or a collection of one or more than one nodes or servers that hold your complete data and offers federated indexing and search capabilities across all the nodes. It is identified by a different and
unique name that is “Elasticsearch” by default.
This name is considered to be important because a node can be a part of a cluster only if it is set up to join
the cluster by its name.
5. Can you list some companies that use Elasticsearch?
Some of the companies that use Elasticsearch along with Logstash and Kibana are:
Wikipedia
Netflix
Accenture
Stack Overflow
Fujitsu
6. What is an Index?
An index in Elasticsearch is similar to a table in relational databases. The only difference lies
in storing the actual values in the relational database, whereas that is optional in Elasticsearch.
An index is capable of storing actual or analyzed values in an index
7. What is a Node?
Each and every instance of Elasticsearch is a node. And, a collection of multiple nodes that can work in harmony
form an Elasticsearch cluster.
8. Please Explain Mapping?
Mapping is a process that defines how a document is mapped to the search engine, searchable characteristics
are included such as which fields are tokenized as well as searchable.
In Elasticsearch an index created may contain documents of all “mapping types”.
9. What is a type in Elastic search?
A type in Elasticsearch is a logical category of the index whose semantics are completely up to the user.
10. What is Document?
A document in Elasticsearch is similar to a row in relational databases. The only difference is that every document in an index can have a different structure or field but having the same data type for common fields is mandatory. Each field with different data types can occur multiple times in a document.
The fields can also contain other documents.
India’s Leading Big Data Training Institute
11. What are SHARDS?
There are resource limitations like RAM, vCPU, etc., for scale-out, due to which applications employ multiple
instances of Elasticsearch on separate machines.
Data in an index can be partitioned into multiple portions which are managed by a separate node or instance
of Elasticsearch. Each such portion is called a Shard. And an Elasticsearch index has 5 shards by default.
12. How to add or create an index in ElasticSearch Cluster?
By using the command PUT before the index name, creates the index and if you want to add another index
then use the command POST before the index name.
Ex: PUT website
An index named computer is created
13. What is REPLICAS?
Each shard in elastic search has again two copies of the shard that are called the replicas.
They serve the purpose of fault tolerance and high availability.
14. How to delete an index in Elastic search?
To delete an index in Elasticsearch uses the command DELETE /index name.
Ex: DELETE /website
15. How to add a Mapping in an Index?
Basically, Elasticsearch will automatically create the mapping according to the data provided by the user in the request body. Its bulk functionality can be used to add more than one JSON object in the index.
Ex: POST website /_bulk
16. How to list all indexes of a Cluster in ES.?
By using GET / _index name/ indices we can get the list of indices present in the cluster.
17. How relevancy and scoring are done in Elasticsearch?
The Boolean model is used by Lucene to find similar documents, and a formula called practical scoring
the function is used to calculate the relevance.
This formula copies concepts from the inverse document/term-document frequency and the vector space model
and adds modern features like a coordination factor, field length normalization as well.
Score (q, d) is the relevance score of document “d” for query “q”.
18. How can you retrieve a document by ID in ES.?
To retrieve a document in Elasticsearch, we use the GET verb followed by the _index, _type, _id.
Ex: GET / computer / blog / 123?=pretty
19. List different types of queries supported by Elasticsearch?
The Queries are divided into two types with multiple queries categorized under them.
Full-text queries: Match Query, Match phrase Query, Multi match Query, Match phrase prefix Query,
common terms Query, Query string Query, simple Query String Query.
Term level queries: term Query, term set Query, terms Query, Range Query, Prefix Query, wildcard Query,
regexp Query, fuzzy Query, exists Query, type Query, ids Query.
20. What are the different ways of searching in Elasticsearch?
We can perform the following searches in Elasticsearch:
Multi-index, Multitype search: All search APIs can be applied across all multiple indices with the support for the multi-index system.
We can search for certain tags across all indices as well as all across all indices and all types.
URI search: A search request is executed purely using a URI by providing request parameters.
Request body search: A search request can be executed by a search DSL, that includes the query DSL within the body.
21. How does aggregation work in Elasticsearch?
The aggregation framework provides aggregated data based on the search query. It can be seen as a unit
of work that builds analytic information over the set of documents.
There are different types of aggregations with different purposes and outputs.
22. What is the difference between Term-based and Full-text queries?
Term-based Queries: Queries like the term query or fuzzy query are the low-level queries that do not have the analysis phase. A term Query for the term Foo searches for the exact term in the inverted index and calculates
the IDF/TF relevance score for every document that has a term.
Full-text Queries: Queries like match query or query string queries are the high-level queries that understand that mapping of a field.As soon as the query assembles the complete list of items it executes the appropriate low-level query for every term, and finally combines their results to produce the relevance score of every document.
23. Can Elasticsearch replace the database?
Yes, Elasticsearch can be used as a replacement for a database as the Elasticsearch is very powerful.
It offers features like multi-tenancy, sharding, and Replication, distribution and cloud Realtime get,
Refresh, commit, versioning and re-indexing and many more,
which makes it an apt replacement for a database.
24. Where is Elasticsearch data stored?
Elasticsearch is a distributed documented store with several directories. It can store and retrieve the complex data structures that are serialized as JSON documents in real-time.
25. How to check the elastic search server is running?
Generally, Elasticsearch uses the port range of 9200-9300.
So, to check if it is running on your server just type the URL of the homepage followed by the port number.
Ex: localhost:9200
26. Features of ElasticSearch?
Built on Top of Lucene (A full-text search engine by Apache )
Document-Oriented (Stores data structured JSON documents)
Full-Text Search (Supports Full-text search indexing which giving faster result retrieval)
Schema-Free (Uses NoSQL)
Restful API (Support Restful APIs for storage and retrieval of records)
Supports Autocompletion & Instant Search
27. Does ElasticSearch have a schema?
Yes, ElasticSearch can have mappings that can be used to enforce a schema on documents.
28. What is indexing in ElasticSearch?
The process of storing data in an index is called indexing in ElasticSearch. Data in ElasticSearch can be dividend into write-once and read-many segments. Whenever an update is attempted,
a new version of the document is written to the index.
29. What is an Analyzer in ElasticSearch & its types?
While indexing data in ElasticSearch, data is transformed internally by the Analyzer defined for the index, and then indexed.
An analyzer is built of tokenizer and filters. The following types of Analyzers are available in ElasticSearch 1.10.
1. STANDARD ANALYZER
2. SIMPLE ANALYZER
3. WHITESPACE ANALYZER
4. STOP ANALYZER
5. KEYWORD ANALYZER
6. PATTERN ANALYZER
7. LANGUAGE ANALYZERS
8. SNOWBALL ANALYZER
9. CUSTOM ANALYZER
30. What is a Tokenizer in ElasticSearch?
A Tokenizer breakdown field values of a document into a stream and inverted indexes are created and updated using these values, and these streams of values are stored in the document.
31. What is the query language of ElasticSearch?
ElasticSearch uses the Apache Lucene query language, which is called Query DSL.
32. What Is Inverted Index In Elasticsearch?
Answer: The inverted index is the heart of search engines. The primary goal of a search engine is to provide speedy searches while finding the documents in which our search terms occur.
The inverted index is a hashmap like data structure that directs users from a word to a document or a web page.
It is the heart of search engines. Its main goal is to provide quick searches for finding data from millions of documents.
Usually, in Books, we have inverted indexes as below. Based on the word we can thus find the page on which the word exists.
Consider the following statements
Google is a good website.
Google is one of the good websites.
For indexing purpose, the above text is tokenized into separate terms and all the unique terms are stored
inside the index with information such as in which document this term appears and what is the term position in that document.
So the inverted index for the document text will be as follows-
When you search for the term website OR websites, the query is executed against the inverted index and the terms are looked out for, and the documents where these terms appear are quickly identified.
33. What Is Elasticsearch?
Elasticsearch is a search engine based on Lucene. It provides a distributed, multitenant-capable
full-text search engine with an HTTP web interface and schema-free JSON documents.
Elasticsearch is developed in Java and is released as open-source under the terms of the Apache License.
34. What Are The Basic Operations You Can Perform On A Document?
The following operations can be performed on documents
INDEXING A DOCUMENT USING ELASTICSEARCH.
FETCHING DOCUMENTS USING ELASTICSEARCH.
UPDATING DOCUMENTS USING ELASTICSEARCH.
DELETING DOCUMENTS USING ELASTICSEARCH.
35. Explain Match All Query?
Match all query is the most basic query; it returns all the content and with a score of 1.0 for every object.
Ex.
POST http://localhost:9200/schools*/_search
{
“query”:{
“match_all” : { }
}
}
36. Explain the Match query?
Match query is used to match a text or phrase with the values of one or more fields.
Ex.
POST http://localhost:9200/schools*/_search
{
“query”:{
“match” : {
“city”:”pune”
}
}
}
37. Explain Multi_match query?
multi match query is used to match a text or phrase with more than one field. For example,
POST http://localhost:9200/schools*/_search
{
“query”:{
“multi_match” : {
“query”: “hyderabad”,
“fields”: [ “city”, “state” ]
}
}
}
38. Explain Range Query?
The range query is used to search the objects with values between the ranges of values. For this,
we need to use operators like
gte − greater than equal to
gt − greater-than
lte − less-than equal to
lt − less-than
For example,
{
“query”:{
“range”:{
“rating”:{
“gte”:3.5
}
}
}
}
39. Explain Geo Queries?
These queries deal with geo locations and geo points. These queries help to find out schools or any other
geographical object near to any location. You need to use geo point data type. For example,
{
“query”:{
“filtered”:{
“filter”:{
“geo_distance”:{
“distance”:”100km”,
“location”:[32.052098, 76.649294]
}
}
}
}
}
40. What are Aggregations in ElasticSearch?
Aggregation is a framework that collects all the data selected by the search query.
This framework includes many building blocks to provide support in building complex summaries of the data.
41. How Max aggregation is used?
Max aggregation is used to get the max value of a specific numeric field in aggregated documents. Here’s example,
POST http://localhost:9200/schools/_search
{
“aggs” : {
“max_fees” : { “max” : { “field” : “fees” } }
}
}
42. How Avg Aggregation is done?
Avg aggregation can be used to find the average of any numeric field appear in the aggregated documents. For example,
POST http://localhost:9200/schools/_search
{
“aggs”:{
“avg_fees”:{“avg”:{“field”:”fees”}}
}
}
43. Min aggregation in Elasticsearch?
Min aggregation is used to find the min value of a specific numeric field in aggregated documents. Here’s example,
POST http://localhost:9200/schools*/_search
{
“aggs” : {
“min_fees” : { “min” : { “field” : “fees” } }
}
}
44. Sum aggregation in ElasticSearch.
Sum aggregation is used to calculate the sum of a specific numeric field in aggregated documents. For example,
POST http://localhost:9200/schools*/_search
{
“aggs” : {
“total_fees” : { “sum” : { “field” : “fees” } }
}
}
45. What are the advantages of ElasticSearch?
Elasticsearch is developed on Java, which makes it compatible on almost every platform.
Elasticsearch is real-time, in other words after one second the added document is searchable in this engine.
Elasticsearch is distributed, which makes it easy to scale and integrate into any big organization.
Elasticsearch is creating full backups in an easy way with the concept of the gateway, which is present in Elasticsearch.
Handling multi-tenancy is very easy in Elasticsearch when compared to Apache Solr.
Elasticsearch uses JSON objects as responses, which makes it possible to invoke the Elasticsearch server with a large number of different programming languages.
Elasticsearch supports almost every document type except those that do not support text rendering.
Elasticsearch – Disadvantages
Elasticsearch does not have multi-language support in terms of handling request and response data (only possible in JSON) unlike in Apache Solr, where it is possible in CSV, XML and JSON formats.
Elasticsearch also has a problem with Split-brain situations but in rare cases.
46. Compare Elasticsearch and RDBMS
Elasticsearch index is a collection of type as it is a database which is a collection of tables in RDBMS
(Relation Database Management System). Here each table is a collection of rows as every mapping is a collection of JSON objects Elasticsearch.
Elasticsearch |RDBMS
Index |Database
Shard |Shard
Mapping |Table
Field |Field
JSON Object |Tuple
47. Create Mapping and Add bulk data to that index.
To create mapping and data in Elasticsearch according to the data provided in the request body, use its bulk
functionality to add more than one JSON object in this index.
POST http://localhost:9200/schools/_bulk
{
“index”:{
“_index”:”schools”, “_type”:”school”, “_id”:”1″
}
}
{
“name”:”Central School”, “description”:”CBSE Affiliation”, “street”:”Nagan”,
“city”:”paprola”, “state”:”HP”, “zip”:”176115″, “location”:[31.8955385, 76.8380405],
“fees”:2000, “tags”:[“Senior Secondary”, “beautiful campus”], “rating”:”3.5″
}
{
“index”:{
“_index”:”schools”, “_type”:”school”, “_id”:”2″
}
}
{
“name”:”Saint Paul School”, “description”:”ICSE
Afiliation”, “street”:”Dawarka”, “city”:”Delhi”, “state”:”Delhi”, “zip”:”110075″,
“location”:[28.5733056, 77.0122136], “fees”:5000,
“tags”:[“Good Faculty”, “Great Sports”], “rating”:”4.5″
}
48. What are the Elasticsearch REST API and use of it?
Elasticsearch provides a very comprehensive and powerful REST API that you can use to interact with your cluster. Among the few things that can be done with the API are as follows:
Check your cluster, node, and index health, status, and statistics
Administer your cluster, node, and index data and metadata
Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes
Execute advanced search operations viz. aggregations, filtering, paging, scripting, sorting, among many others.
49. What are the Disadvantages of Elasticsearch?
Elasticsearch does not support multiple languages while handling request and response data in JSON.
In rare cases, it has a problem with Split-Brain situations.
50. Explain Joins in ElasticSearch.
In a distributed system like Elasticsearch, performing full SQL-style joins is very expensive. Thus, Elasticsearch provides two forms of join which are designed to scale horizontally.
1) nested query
This query is used for the documents containing nested type fields. Using this query, you can query each object as an independent document.
2) has_child & has_parent queries
This query is used to retrieve the parent-child relationship between two document types within a single index.
The has_child query returns the matching parent documents, while the has_parent query returns the matching child documents.
The following example shows a simple join query:
POST /my_playlist/_search
{
“query”:
{
“has_child” : {
“type” : “kpop”, “query” : {
“match” : {
“artist” : “EXO”
}
}
}
}
}