Upload
eric-rodriguez
View
125
Download
7
Embed Size (px)
DESCRIPTION
Global introduction to elastisearch presented at BigData meetup. Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Citation preview
Introduction to Elasticsearch27th May 2014 - BigData Meetup
Eric Rodriguez @wavyx
About MeEric Rodriguez Founder of data.be !• Web entrepreneur • Data addict • Multi-Language: PHP, Java/
Groovy/Grails, .Net, …
be.linkedin.com/in/erodriguez !github.com/wavyx !@wavyx
Elasticsearch - Company
• Founded in 2012 => http://www.elasticsearch.com
• Professional services
• Training
• Consultancy / Development support
• Production support subscription (3 levels of SLAs)
Enterprises using Elasticsearch
(M)ELK Stack
• Elasticsearch - Search server based on Lucene
• Logstash - Tool for managing events and logs
• Kibana - Visualize logs and time-stamped data
• Marvel - Monitor your cluster’s heartbeat
You Know, for Search…
Logstash• Collect, parse, index, and search logs
Kibana• A versatile dashboard to see and interact with your data
Marvel• Monitor the health of your cluster
cluster-wide metrics, overview of all nodes and indices and events (master election, new nodes)
real time, search and
analytics engine
open-source
Lucene
JSON
schema free
documentstore
RESTful
API
documentation
scalability
high availability
distributed
multi tenancy
per-operation persistence
Use Cases• Full-Text Search
• Data Store
• Analytics
• Alerts
• Ads
• …
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Elasticsearch core• Apache Lucene is a high-performance, full-featured text search engine library
written entirely in Java
• Elasticsearch added value: “Simple is best”
• Simple API (with documentation)
• JSON & RESTful
• Sharding & Replication
• Extensibility: plugins and scripts
• Interoperability: clients and integrations
Terms for DBAs
• Index
• Type
• Document
• Fields
• Mapping
ElasticsearchRDBMs
• Database
• Table
• Row
• Column
• Schema
Plug & Play
• Zero configuration
• 4 LoC to get started ;)
Alive !
=> http://localhost:9200/?pretty
REST• Check your cluster, node, and index health, status, and statistics
• Administer your cluster, node, and index data and metadata
• Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes
• Execute advanced search operations such as paging, sorting, filtering, scripting, faceting, aggregations, and many others
Basic Operations 1/3
• Add a document
• Create index
Basic Operations 2/3
• Modify/Replace a document
• Delete a document
• Delete index
Basic Operations 3/3• Update a document
Mapping 1/2
• Define how a document should be mapped (similar to schema): searchable fields, tokenization, storage, ..
• Explicit mapping is defined on an index/type level
• A default mapping is automatically created
Mapping 2/2• Core types: string, integer/long, float/double, boolean, and null
• Other types: Array, Object, Nested, IP, GeoPoint, GeoShape, Attachment
• Example
Search API 1/2
• Multi-index, Multi-type
• Uri search - Google like Operators (AND/OR), fields, sort, paging, wildcards, …
Search API 2/2• Paging & Sort
• Fields: selection, scripts
• Post filter
• Highlighting
• Rescoring
• Explain
• …
Query DSL• “SQL” for elasticsearch
• Queries should be used
• for full text search
• where the result depends on a relevance score
• Filters should be used
• for binary yes/no searches
• for queries on exact values
Basic Queries
Basic Filters
Analysis 1/2• Analysis is extracting “terms” from a given text
• Processing natural language to make it computer searchable
• Configurable registry of Analyzers that can be used
• to break indexed (analyzed) fields when a document is indexed
• to process query strings
Analysis 2/2
• Analyzers are composed of
• a single Tokenizer (may be preceded by one or more CharFilters)
• zero or more TokenFilters
• Default Analyzersstandard, pattern, whitespace, language, snowball
Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.
Analytics• Aggregation of information: similar to “group by”
• Facets
• Aggregated data based on a search query
• One-dimensional results
• Ex: “term facets” return facetcounts for various values for a specific field Think color, tag, category, …
• Aggregations (ES 1.0+)
• Nested Facets
• Basic Stats: mean, min, max, std dev, term counts
• Significant Terms, Percentiles, Cardinality estimations
Facets• not yet deprecated, but use aggregations!
• Various Facets terms, range, histogram, date, statistical, geo distance, …
Aggregations• A generic powerful framework that can be divided into 2 main families:
• Bucketing Each bucket is associated with a key and a document criterion The aggregation process provides a list of buckets - each one with a set of documents that "belong" to it.
• MetricAggregations that keep track and compute metrics over a set of documents.
• Aggregations can be nested !
Bucket Aggregators• global
• filter
• missing
• terms
• range
• date range
• ip range
• histogram
• date histogram
• geo distance
• geohash grid
• nested
• reverse nested
• top hits (version 1.3)
Metrics Aggregators• count
• stats
• extended stats
• cardinality
• percentiles
• min
• max
• sum
• avg
Search for end users
• Suggesters - “Did you mean” Terms, Phrases, Completion, Context
• “More like this” Find documents that are "like" provided text by running it against one or more fields
Percolator• Classic ES
1. Add & Index documents
2. Search with queries
3. Retrieve matching documents
• Percolator
1. Add & Index queries
2. Percolate documents
3. Retrieve matching queries
Why Percolate ?!
• Alerts: social media mentions, weather forecast, news alerts
• Automatic Monitoring: price monitoring, stock alerts, logs
• Ads: display targeted ads based on user’s search queries
• Enrich: percolate new documents, then add query matches as document tags
High Availability 1/2• Sharding - Write Scalability
• Split logical data over multiple machines & Control data flows
• Each index has a fixed number of shards
• Improve indexing performance
• Replication - Read Scalability
• Each shard can have 0-many replicas (dynamic setup)
• Removing SPOF (Single Point Of Failure)
• Improve search performance
High Availability 2/2• Zen Discovery
• Automatic discovery of nodes within a cluster and electing a master node
• Useful for failover and replication
• Specific modules: Amazon EC2, Microsoft Azure, Google Compute Engine
• Snapshot & Restore module
Cluster Management• Marvel - http://www.elasticsearch.org/overview/marvel/
• BigDesk - http://bigdesk.org/
• Paramedic - https://github.com/karmi/elasticsearch-paramedic
• KOPF - https://github.com/lmenezes/elasticsearch-kopf/
• Elastic HQ - http://www.elastichq.org/
Clients & Integration• Ecosystem: Kibana, Logstash, Marvel, Hadoop integration
• API Clients: Java, Javascript, Groovy, PHP, Perl, Python, .Net, Ruby, Scala, Clojure, Go, Erlang, …
• Integrations: Grails, Django, Play!, Symfony2, Carrot2, Spring, Drupal, Wordpress, …
• Rivers: CouchDB, JDBC, MongoDB, Neo4j, Redis, RabbitMQ, ActiveMQ, Amazon SQS, File System, Twitter, Wikipedia, RSS, …
Fast & Furious EvolutionVersion 1.1March 25, 2014
• Cardinality Agg
• Percentiles Agg
• Significant Terms Agg
• Search Templates
• Cross fields search
• Alias for indices & templates
Version 1.2May 22, 2014• Java 7
• Indexing & Merging performance
• Aggregations performance
• Context suggester
• Deep scrolling
• Field value factor
Benchmark API coming in 1.3
Version 1.0Feb 12, 2014• Aggregations
• Snapshot & Restore
• Distributed Percolator
• Cat API
• Federated search
• Doc values
• Circuit breaker
Resources• http://www.elasticsearch.org/guide/
• http://www.elasticsearch.org/videos/
• http://www.elasticsearchtutorial.com/
• http://exploringelasticsearch.com/
• http://joelabrahamsson.com/elasticsearch-101/
• http://belczyk.com/2014/01/elasticsearch-recomended-learning-materials/
• http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules-plugins.html
Books• Elasticsearch Server
http://www.packtpub.com/elasticsearch-server-2e/book
• Elasticsearch in Action http://www.manning.com/hinman/
Books• Elasticsearch Cookbook
http://www.packtpub.com/elasticsearch-cookbook/book
• Mastering Elasticsearch http://www.packtpub.com/mastering-elasticsearch-querying-and-data-handling/book
Books• Elasticsearch - The Definitive Guide
http://www.elasticsearch.org/blog/elasticsearch-definitive-guide/
Thank [email protected] - @wavyx
be.linkedin.com/in/erodriguez - github.com/wavyxhttp://www.meetup.com/ElasticSearch-User-Group-Belux-Belgium-Luxembourg/