DOCUMENT API’s

date 3rd November, 2019 |
by Prwatech |
0 Comments

Add Document

Documents in Elasticsearch are stored as a JSON object. Also, documents are added to indices, and documents have a type. One Index can have many types & you can store any number of documents in an Index.

In this example, ” information”, “person” and “1” are index, type and id respectively, Elasticsearch will automatically create the index if it does not exist.

Example:-
POST /information/person/1
{
“name” : “Paul”,
“lastname” : “Wheeler”,
“job_desc” : “Manager”
}

Get Document

Now that the document is stored, to retrieve the document we use the below API.

Example:- GET /information/person/1

Update Document

To update a document we use below API.

Example:- POST /information/person/1/_update

{

“doc”:{
“job_description” : “Data analyst”
}

}

Delete Document

To delete a document use the below API.

Example:-

DELETE /information/person/1

Search Document

We can search the stored document using either “/_search?q=something”.

Example:-

GET localhost:9200/_search?q=Paul

►Analysis Phase In ElasticSearch
Elasticsearch uses a special data structure called “Inverted index” for very fast searches. An inverted index is a list of all the unique words that a document contains, and for each word, a list of the documents in which it appears. An inverted index is created from a document indexed in elasticsearch. the process of creating an Inverted index from a document called analysis (tokenization and Filterization).

Analysis happens during Indexing a document as well as Searching a document. We will see how elasticsearch creates an Inverted index and how it is stored in shards which later used for searching documents.

►The analysis process is the key phase in creating an inverted index in shards. whenever you index a document in Elasticsearch, it goes through the analysis phase where documents are tokenized, filtered processed (stemming, synonyms detected and remove stop words).

►For every document, this inverted index will be created and stored in a temporary buffer until it becomes full. Once the buffer is full, it is flushed into segments.

►A segment is the smallest logical unit of a Shard basically small blocks where you can store a list of the inverted index. Shard is like a collection of segments. Segments are filled with a flushed inverted index.

►Once a segment is filled completely with an inverted index, shards become eligible for searching. Segments created are an immutable collection of the immutable inverted index.

EXAMPLE: Consider two documents text below for analysis.

Here, we’re indexing these documents to ELS.

POST /user/tweets/

{ “ name”: “Rohit”

“ comment”: “The thin lifeguard was swimming in the lake.”

“ date”: “2018-10-27” }

{ “ name”: “Amol”

“ comment”: “Swimmers race with the skinny lifeguard in the lake”

“ date”: “2018-10-28” }

Let’s assume we are interested in the comment fields of a document. We have two texts to consider for analysis.
1. The thin lifeguard was swimming in the lake
2. Swimmers race with the skinny lifeguard in lake
►Tokenization: To create an inverted index, we simply split the comment of each document into separate words (which we call terms or tokens), creates a sorted list of all the unique terms, along with the list in which document each term/token appears.

Token	Present in Document

Swimmers	2
The	1
in	1,2
lifeguard	1,2
lake	1,2
race	2
skinny	2
swimming	2
the	1,2
thin	1
was	1
with	1

►Filtering: After the tokenization filtering process is applied to these. Filters are such as:

►Removing stop words (a, an, the, in, etc. of the English word)

►Lowercasing (To make search case insensitive)

►stemming (swimming to swim)

►synonymous ( thin == skinny )

After these operations, the output which is an inverted index is pushed into buffer.

►ANALYSERS In ElasticSearch
Elasticsearch provides pre-builtin analyzers which can be used in any index without further configuration. Here is a list of elastic search built-in analyzers.

Standard Analyzer (Default)
Simple Analyzer
Whitespace Analyzer
Stop Analyzer
Keyword Analyzer
Pattern Analyzer
Language Analyzers (English, Hindi, French, Spanish & many more)
Custom Analyser (we can define our own custom analyzer as well)

♦Bulk Load in ElasticSearch

Bulk load is nothing but indexing/inserting more than one documents at a time.

We have to use the _bulk keyword to upload bulk data.

Example.

(this command will index these 3 documents into vehicles index inside cars type.)

POST /vehicles/cars/_bulk

{ “index”: {}} //index for doc 1

{ “price” : 10000, “colour” : “white”, “make” : “Honda”, “sold” : “2016-10-28”, “condition”: “okay”}

{ “index”: {}} //index for doc 2

{ “price” : 20000, “colour” : “white”, “make” : “Honda”, “sold” : “2016-11-05”, “condition”: “new”}

{ “index”: {}} //index for doc 3

{ “price” : 30000, “colour” : “green”, “make” : “ford”, “sold” : “2016-05-18”, “condition”: “new”}

Popular Tags:

Introduction

Installing ElasticSearch & Kibana (Windows)

ElasticSearch Basic Concepts

DOCUMENT API’s

INDEX CONFIGURATIONS :

Query DSL Component

Aggregations In ElasticSearch:

DOCUMENT API’s

♦Bulk Load in ElasticSearch

Recent Blogs

Take a Big Step in Your Career

Quick Support