ElasticSearch for .NET Developers

Ben van MolElasticSearch for .NET

SEARCH ENGINEWhy would I need one?

Search is more than text comparison

Search must advice

Search must be intelligent

Search must aggregate

What is ElasticSearch?

“flexible and powerful open-source, distributed (NoSQL), RESTful search engine build on top of Lucene”(http://www/elastic.co)

Features: real-time data, real-time analytics, distributed, high availability, multi-tenancy, full text search, document oriented, conflict management, schema free, restful API, per-operation persistence, apache 2 open source license, build on top of apache lucene.

http://www/elastic.co

Installation

Procedure

Java based, requires v7+ Same JVM version on all nodes is required Set a bunch of environment variables

Fill in the ElasticSearch config files

Streamlined Installation available for Windows (local service) https://github.com/rgl/elasticsearch-setup/releases

https://github.com/rgl/elasticsearch-setup/releases



Scalability & performance

Scalability

NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address

- Structured & fixed data model vs. dynamic model

- Efficient, scale-out architecture instead of expensive, monolithic architecture (scale-up)

- Object-oriented programming that is easy to use and flexible

Data representation in JSON

Scalability - Architecture

Cluster logical grouping of multiple nodes

Node an elasticsearch server instance Master – in charge of managing cluster-wide operations

Only one, responsible for cluster-wide operations No bottleneck for queries

Shard low-level worker instance that holds a slice of all data Each document belongs to a single primary shard

Created during index creation Determines the number of data stored in each shard

Replica A copy of a master shard on a different node Can be created any time Spreading over nodes => done automatically

POST /<index name>{

"settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 }

}

Create an index1 node

2 nodes

3 nodes

3 nodes 2 replica’s

Having more replica’s shards on the same number of nodes doesn’t increase our performance at all

because each shard has access to a smaller fraction of its node’s resources

but it adds redundancy.

Default Routing

Hashes the ID of a document and uses that to find a shard (retrieve document). Gives an even distribution of documents across the entire set of shards

But what about search?

Incomming requestBroadcast & query all shards

Aggregate all results & send back

Custom Routing

Configure routing for a certain type:XPUT /<index name>/<type>/_mapping -d { "order":{ "_routing":{ "required":true, "path":"customerID" } } }

Search for a specific document of user user123:XGET /<index name>/<type>/_search?routing=user123 -d { "query":{ "match_all":{} } }

Tell ElasticSearch which property to use to determine routing E.g. zipcode, age,

Default routing ensures that distribution is fairly uniform across all shards.

Once you start implementing your own custom schemes, it is entirely possible that this

uniformity is lost.

Advanced Search Capabilities

Dealing with human language

Indexation

Example : <div>Here is some example text including an extract of 9 poems</div> Analyzers

Character filters convert 9 to nine strip HTML and extract the actual text lower-case all words

Tokenizer create individual terms or tokens from text, minding comma’s, whitespaces, periods, hyphens, …

Token filter: remove stopwords like ‘an’, ‘the’, … stemming: reduce verbes and words to their stem

{Here} {is} {some} {example} {text} {including} {extract} {nine} {poems}

Text Analysis - Experiments

Whitespace Whitespace tokenizer - A tokenizer of type whitespace that divides text at whitespace.

Sentence: Convert the title-case text using the ToLower(string) command.

Result: {Convert} {the} {title-case} {text} {using} {the} {ToLower(string)} {command.}


Simple Standard tokenizer - A tokenizer of type standard providing grammar based tokenizer

that is a good tokenizer for most European language documents. Lower-case token filter


Result: {convert} {the} {title} {case} {text} {using} {the} {tolower} {string} {command}


Stop analyzer: Standard tokenizer Lower-case token filter Stop token filter

A token filter of type stop that removes stop words (meaningless words for search) from token streams.

Support for multiple languages


Result: {convert} {the} {title} {case} {text} {using} {the} {tolower} {string} {command}


Snowball Standard tokenizer Lower-case token filter Stop token filter Stemming (snowball generated stemmer)

A filter that stems (reduce a word to the core) words using a Snowball-generated stemmer Support for multiple languages


Result: {convert} {title} {case} {text} {usinge} {tolower} {string} {command}

Text Analysis- Adding Custom Analyzers

PUT /my-index/_settings

{ "index": { "analysis": { "analyzer": { “YourCustomAnalyzer": { "type": "custom", "char_filter": [ "html_strip" ], "tokenizer": "standard", “filter": [ "lowercase", "stop", "snowball" ] } } } }}

A list of available analysis tools: CharacterFilters: http://bit.ly/1H3hgJF Tokenizers: http://bit.ly/1zIU2IO Token filters: http://bit.ly/1AJXCO2

Possible to create your own combination!

http://bit.ly/1H3hgJF

http://bit.ly/1H3hgJF

http://bit.ly/1zIU2IO

http://bit.ly/1zIU2IO

http://bit.ly/1AJXCO2

http://bit.ly/1AJXCO2

Text Analysis – Define analyzer

Create a Mapping Type (cfr. Table) Assign fields Define field types (string, int, date,

…) Define the analyzer to be used Define the boost value on a field Define the routing …

PUT /my_index/_mapping/my_type{ "my_type": { "properties": { "english_title": { "type": "string", "analyzer": "english" } } }}

ELASTIC AND .NETLet’s get dirty!

What is NEST?

NEST

• All request & response objects represented• Strongly typed Query DSL implementation• Supports fluent syntax• Uses ElasticSearch.net

ElasticSearch.NET

• Low-level, dependency-free client• All ES endpoints are available as methods

ElasticSearch RESTFul API

http://nest.azurewebsites.net/



NEST – Connection Initialization

Initialize an ElasticClient:

All actions on the ElasticSearch cluster are performed using the ElasticClient

For example: Search Index DeleteIndex/CreateIndex …

Uri node = new Uri("http://192.168.137.73:9200");ConnectionSettings settings = new ConnectionSettings(node, defaultIndex: "products");ElasticClient client = new ElasticClient(settings);

Index your content

JSON .NET

PUT /products/product/1 Index the RAW JSON string Index a Type

Automatically infers Index Type ID

Use ElasticType to define type behavior Use ElasticProperty to define field behavior Define explicit values for inferred ones

More information: http://nest.azurewebsites.net/nest/index-type-inference.html

http://localhost:9200/products/product/1

{ "id":"1", "name" : "MacBook Air", "price" : 1099, "descr" : "Some lengthy never-read description", "attributes" : { "color" : "silver", "display" : 13.3, "ram" : 4 }}

http://nest.azurewebsites.net/nest/index-type-inference.html



Index your Content - .NET

Raw JSON string

Type based indexation

Modify out-of-the-box behavior using decorators

client.Raw.Index("products", "product", new JavaScriptSerializer().Serialize(prod));

client.Index(product);

[ElasticType(Name = "Product", IdProperty="id")] public class Product { public int id { get; set; } [ElasticProperty(Name = "name", Index = FieldIndexOption.Analyzed, Type = FieldType.String, Analyzer = "standard")] public string name { get; set; }

Query your content – JSON Query

JSON exampleshttp://localhost:9200/products/product/_search

Some queries will return nothing if lowercased by analyzer & split on whitespace!

{ "query" : { "term" : { "name": "MacBook Air" }}} { "query" : { "prefix" : { "name": "Mac" }}}{ "query" : { "range" : { "price" : { "from" : 1000, "to": 2000 } } } }{ "from": 0, "size": 10, "query" : { "term" : { "name": "MacBook Air" }}}{ "sort" : { "name" : { "order": "asc" } }, "query" : { "term" : { "name": "MacBook Air" }}}

Query your content – JSON Result{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0.076713204, "hits": [ { "_index": "products", "_type": "Product", "_id": "1", "_score": 0.076713204, "_source": { "id": 1, "name": "MacBook Air", "price": 1099.0, "descr": "Some lengthy never-read description", "attributes": { "color": "silver", "display": 13.300000190734863, "ram": 4 } } },

Query your content – Query DSL .NET

Retrieve all products from an index using a MatchAll search

Retrieve all products by using a term query

Search on all fields using the _all built-in property

Search on a combination of fields using boolean operators (see fiddler result)

result = client.Search<Product>(s => s.MatchAll());

result = client.Search<Product>(s => s.Query(q => q.Term(t => t.name, "macbook")));result = client.Search<Product>(s => s.Query(q => q.Term("name", "macbook")));

result = client.Search<Product>(s => s.Query(q => q.Term("_all", "macbook")));

result = client.Search<Product>(s => s.Query(q => q.Term("name", "macbook") || q.Term("descr","macbook")));

Query your content – Query DSL

Search on a combination of fields using boolean operators and a date range filter

Some more advanced query examples: Wildcard Query - use wildcards to search for relevant documents Span Near - search for word combinations within a certain span in the document More like this query - finds documents which are ‘like’ a given set of documents using

representative terms More information: http://bit.ly/1A6wpKs

result = client.Search<Product>(s => s .Query(q => (q.Term("name", "macbook") || q.Term("descr", "macbook")) && q.Range(r => r .OnField("price") .Greater(1000) .LowerOrEquals(2000) )));

http://bit.ly/1A6wpKs

http://bit.ly/1A6wpKs

Query your content – Fuzzy searches

Perform a fuzzy search to overcome query string errors result = client.Search<Product>(s => s .Query(q => q .Match(m => m .Query("makboek") .OnField("name") .Fuzziness(10) .PrefixLength(1) )));

Query your content - Paging

Select pages from the full result set using the From & Size filters

result = client.Search<Product>(s => s .Query(q => q.Term("name", "macbook") || q.Term("descr", "macbook")) .From(0) .Size(1));

Query your content – Hit Highlighting.NET Code JSON Result

Hit Highlighting

Possible to add other Pre- and Post-tags on specific fields

result = client.Search<Product>(s => s .Query(q => q.Term("name", "macbook")) .Highlight(h => h .PreTags("<b>") .PostTags("</b>") .OnFields(f => f .OnField(e => e.name))));

Query your content – Aggregations

.NET Code JSON Result

Aggregations group documents based on term values

Useful to create a facetted search interface

result = client.Search<Product>(s => s .Aggregations(a => a .Terms("color", st => st .Field(o => o.attributes.color))));

Query your content – Suggesters

Did you mean Term suggester

Suggests terms based on edit distance (=number of operations needed to switch term)

More info: http://bit.ly/1FDFPwr

Phrase suggester adds additional logic on top of the term suggester to select entire corrected

phrases instead of individual tokens weighted based on ngram-language models.

Provides better suggestions because of co-occurrence & frequency More info: http://bit.ly/1FbfAKg

http://bit.ly/1FDFPwr



http://bit.ly/1FbfAKg



Query your content – Suggesters

Search as you type Completion suggester

a so-called prefix suggester does not do spell correction like the term or phrase suggesters but allows basic auto-complete

functionality Uses FST models and makes them part of the index for faster querying More info: http://bit.ly/1HwFKbO

hotel, marriot, mercure, munchen and munich

http://bit.ly/1HwFKbO



QUESTIONS?

Technology

ElasticSearch for .NET Developers