View
357
Download
0
Embed Size (px)
DESCRIPTION
A short intro about the Elasticsearch techonlogy for young programmers in Bucharest.
Citation preview
ElasticsearchScalable Full-Text Search Engine
Thursday, February 27, 14
Goals for this talk
Thursday, February 27, 14
Outline
• What’s full text search and why do we use it?
• What can you do with Elasticsearch?
• Why is Elasticsearch different?
• DEMO TIME!
Thursday, February 27, 14
Text Search do I really need to explain it?
Thursday, February 27, 14
%LIKE%
• In the beginning there was:
SELECT * FROM tweets WHERE content LIKE ‘%zuckerberg%’
Thursday, February 27, 14
But that’s not what you usually search for!
• You want:
Search by author
Search by time
Search by sentiment
Search by location
Search by everything!
Thursday, February 27, 14
That’s a lot of metadata!
• You can’t search through all that on the fly if you want realtime results
• You need to index it first!
Thursday, February 27, 14
Inverted Index
• Some documents:1: ‘Mark Zuckerberg sells Facebook’ [Monday]2: ‘Facebook buys WhatsApp’ [Tuesday]3: ‘Mark’s Facebook buys Instagram’[Monday]
• Inverted index for them:Facebook: { 1, 2, 3}
Mark: {1, 3}Instagram: {2}WhatsApp: {2}[Monday]: {1, 3}
Thursday, February 27, 14
Ok, now that we have data, we also want some numbers behind it!
• In our previous example:
• Facebook is mentioned 3 times
• There are 2 posts on [Monday]
• The most frequent words are Facebook and Mark
Thursday, February 27, 14
All 3 put together
Elasticsearch
=
Search(Content & Metadata) + Analytics
(oversimplified)
Thursday, February 27, 14
Let’s look at some search features of
Elasticsearch
Thursday, February 27, 14
Features: Complex Queries• Boolean Operators:
(apple OR pumpkin) AND pie
• Wildcards:
app*: apple, apples, appliance
appl?: apple, apply
• Fuzzy:
back~: back, pack, black, bank
• Ranged:
Thursday, February 27, 14
Features: Complex Queries
• Attribute filtering:
apple AND pie AND location:california
• Range filtering: apple AND published:[1393100055 TO 1393427055]
Thursday, February 27, 14
Features:Geo Queries
Bounding Box Queries Distance Range Queries
Thursday, February 27, 14
Feature: built in analytics
Thursday, February 27, 14
Feature: Built in tagcloud
Thursday, February 27, 14
What’s special about Elasticsearch?
Thursday, February 27, 14
Distributed
• Clustering data into multiple servers is easy and abstracted away from the developer
Thursday, February 27, 14
Performance/Scalability
• Add and take nodes on the fly without ever stopping the search service
Thursday, February 27, 14
Performance/Scalability
• Can scale independently both indexing and searching
Thursday, February 27, 14
Performance/Scalability
• With few nodes you can do complex queries on billions of documents
• 3 nodes: 20 mil documents with 2 replicas each
Thursday, February 27, 14
Easy to back up
• Elasticsearch has a built in backup solution so that you don’t have to worry about implementing one
Thursday, February 27, 14
Demo time!
Thursday, February 27, 14