DOCUMENT API’s

  • date 3rd November, 2019 |
  • by Prwatech |
  • 0 Comments
  1. Add Document

Documents in Elasticsearch are stored as a JSON object. Also, documents are added to indices, and documents have a type. One Index can have many types & you can store any number of documents in an Index.

In this example, ” information”, “person” and  “1” are index, type and id respectively, Elasticsearch will automatically create the index if it does not exist.

Exmaple:-






POST /information/person/1
{
“name” : “Paul”,
“lastname” : “Wheeler”,
“job_desc” : “Manager”
}

  1. Get Document

Now that the document is stored, to retrieve the document we use below API.

Example:-       GET  /information/person/1

  1. Update Document

To update a document we use below API.

Example:-       POST  /information/person/1/_update

{

“doc”:{
“job_description” : “Data analyst”
}

}

  1. Delete Document

To delete a document use the below API.

Example:- 

  DELETE  /information/person/1

  1. Search Document

We can search the stored document using either “/_search?q=something”.

Example:-

GET localhost:9200/_search?q=Paul

►Analysis Phase In ElasticSearch






Elasticsearch uses a special data structure called “Inverted index” for very fast searches. An inverted index is a list of all the unique words that a document contains, and for each word, a list of the documents in which it appears. An inverted index is created from a document indexed in elasticsearch. the process of creating an Inverted index from a document called analysis (tokenization and Filterization).

Analysis happens during Indexing a document as well as Searching a document. We will see how elasticsearch creates an Inverted index and how it is stored in shards which later used for searching documents.

►The analysis process is the key phase in creating an inverted index in shards. whenever you index a document in Elasticsearch, it goes through the analysis phase where documents are tokenized, filtered processed (stemming, synonyms detected and remove stop words).

►For every document, this inverted index will be created and stored in a temporary buffer until it becomes full. Once the buffer is full, it is flushed into segments.

►A segment is the smallest logical unit of a Shard basically small blocks where you can store a list of the inverted index. Shard is like a collection of segments. Segments are filled with a flushed inverted index.

►Once a segment is filled completely with an inverted index, shards become eligible for searching. Segments created are an immutable collection of the immutable inverted index.

EXAMPLE: Consider two documents text below for analysis.

Here, we’re indexing these documents to ELS.

POST /user/tweets/

{           “ name”: “Rohit”

“ comment”: “The thin lifeguard was swimming in the lake.”

“ date”: “2018-10-27”     }

{           “ name”: “Amol”

“ comment”: “Swimmers race with the skinny lifeguard in the lake”

“ date”: “2018-10-28”     }

Let’s assume we are interested in the comment fields of a document. We have two texts to consider for analysis.
1. The thin lifeguard was swimming in the lake
2. Swimmers race with the skinny lifeguard in lake






►Tokenization: To create an inverted index, we simply split the comment of each document into separate words (which we call terms or tokens), creates a sorted list of all the unique terms, along with the list in which document each term/token appears.

Token Present in Document
   
Swimmers 2
The 1
in 1,2
lifeguard 1,2
lake 1,2
race 2
skinny 2
swimming 2
the 1,2
thin 1
was 1
with 1

►Filtering: After the tokenization filtering process is applied to these. Filters are such as:

►Removing stop words (the, in, etc. of the English word)

►Lowercasing (To make search case insensitive)

►stemming (swimming to swim)

►synonymous ( thin == skinny )

After these operations, the output which is an inverted index is pushed into buffer.

►ANALYSERS In ElasticSearch






Elasticsearch provides pre-builtin analyzers which can be used in any index without further configuration. Here is a list of elastic search built-in analyzers.

  • Standard Analyzer (Default)
  • Simple Analyzer
  • Whitespace Analyzer
  • Stop Analyzer
  • Keyword Analyzer
  • Pattern Analyzer
  • Language Analyzers (English, Hindi, French, Spanish & many more)
  • Custom Analyser (we can define our own custom analyzer as well)

Bulk Load in ElasticSearch

Bulk load is nothing but indexing/inserting more than one documents at a time.

We have to use the _bulk keyword to upload bulk data.

Example.

(this command will index these 3 documents into vehicles index inside cars type.)

POST /vehicles/cars/_bulk

{ “index”: {}}            //index for doc 1

{ “price” : 10000, “colour” : “white”, “make” : “Honda”, “sold” : “2016-10-28”, “condition”: “okay”}

{ “index”: {}}            //index for doc 2

{ “price” : 20000, “colour” : “white”, “make” : “Honda”, “sold” : “2016-11-05”, “condition”: “new”}

{ “index”: {}}            //index for doc 3

{ “price” : 30000, “colour” : “green”, “make” : “ford”, “sold” : “2016-05-18”, “condition”: “new”}

Leave a Reply

Your email address will not be published. Required fields are marked *

Quick Support

image image