93
Elasticsearch Mye mer enn søk! Alex Brasetvik [email protected] @alexbrasetvik Wednesday, September 11, 13

Elasticsearch – mye mer enn søk! [JavaZone 2013]

Embed Size (px)

DESCRIPTION

Søkemotorer kan løse langt fler utfordringer enn en søkeboks gir. Du har kanskje et søkeproblem uten å være klar over det? Elasticsearch, en open source søkemotor bygd på Lucene, får stadig mer oppmerksomhet - ikke bare fordi den er glimrende til å løse typiske søkeproblemer, men også fordi den kan brukes til analyse- og "big data"-utfordringer. Foredraget gir en oversikt over hva søkemotorer er gode på, relaterte problemer du kommer over, hvordan Elasticsearch kan bidra – samt hvordan den passer inn i teknologistacken din. Det er ingen tutorial, men med et relativt høyt tempo og eksempler med realistisk kompleksitet gis en oversikt over hva som er mulig. Vi runder av med hvordan Elasticsearch kan klassifiseres i mylderet av "NoSQL"-databaser.

Citation preview

Page 1: Elasticsearch – mye mer enn søk! [JavaZone 2013]

ElasticsearchMye mer enn søk!

Alex [email protected]@alexbrasetvik

Wednesday, September 11, 13

Page 2: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Hvem?

Co-founder av Found AS7+ år søk, 2+ Elasticsearch

Håndterer hundrevis av Elasticsearch-clustre

Wednesday, September 11, 13

Page 3: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Agenda

0. Elasticsearch

1. Bruksområder

2. Lingo

3. Datastrukturer

4. Tekstprosessering

5. Elasticsearch

6. NOSQL?

Wednesday, September 11, 13

Page 4: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Elasticsearch

Open source

Real-time søk og analyse

Skjemafri

Basert på Lucene

Wednesday, September 11, 13

Page 5: Elasticsearch – mye mer enn søk! [JavaZone 2013]

��

��

��

��

��

Wednesday, September 11, 13

Page 6: Elasticsearch – mye mer enn søk! [JavaZone 2013]

$ curl localhost:9200/sample_index/sample_type -XPOST -d '{ "user": { "name": "DEVOPS_BORAT" }, "followers": 42000, "location": { "lat": 56.78, "lon": 12.34 }, "tags": [ "questionable", "funny" ], "message": "1+1=2 only in legacy system. In modern distributed database with eventual consistent is 1+1=1.", "retweets": 123}'

{"ok":true,"_index":"sample_index","_type":"sample_message","_id":"rjs9KSmPRnqhvs7QjgxJJw","_version":1}

Wednesday, September 11, 13

Page 7: Elasticsearch – mye mer enn søk! [JavaZone 2013]

$ curl localhost:9200/sample_index/sample_type/_search -XPOST -d '{ "query":{ "match": { "message": "consistent" } }}'

Wednesday, September 11, 13

Page 8: Elasticsearch – mye mer enn søk! [JavaZone 2013]

{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.076713204, "hits" : [ { "_index" : "sample_index", "_type" : "sample_message", "_id" : "rjs9KSmPRnqhvs7QjgxJJw", "_score" : 0.076713204, "_source" : { "user": { "name": "DEVOPS_BORAT" }, "message": "1+1=2 only in legacy system. In modern distributed database with eventual consistent is 1+1=1.", "retweets": 123, ... } } ] }}

Wednesday, September 11, 13

Page 9: Elasticsearch – mye mer enn søk! [JavaZone 2013]

{ "sample_index" : { "sample_message" : { "properties" : { "followers" : { "type" : "long" }, "location" : { "properties" : { "lat" : { "type" : "double" }, "lon" : { "type" : "double" } } }, "message" : { "type" : "string" }, "retweets" : { "type" : "long" }, "tags" : { "type" : "string" }, "user" : { "properties" : { "name" : { "type" : "string" } } } } } }}

Wednesday, September 11, 13

Page 10: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 11: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 12: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 13: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 14: Elasticsearch – mye mer enn søk! [JavaZone 2013]

{"id"=>12296272736,

"text"=>

"An early look at Annotations:

http://groups.google.com/group/twitter-api-announce/browse_thread/thread/fa5da2608865453",

"created_at"=>"Fri Apr 16 17:55:46 +0000 2010",

"in_reply_to_user_id"=>nil,

"in_reply_to_screen_name"=>nil,

"in_reply_to_status_id"=>nil

"favorited"=>false,

"truncated"=>false,

"user"=>

{"id"=>6253282,

"screen_name"=>"twitterapi",

"name"=>"Twitter API",

"description"=>

"The Real Twitter API. I tweet about API changes, service issues and

happily answer questions about Twitter and our API. Don't get an answer? It's on my website.",

"url"=>"http://apiwiki.twitter.com",

"location"=>"San Francisco, CA",

"profile_background_color"=>"c1dfee",

"profile_background_image_url"=>

"http://a3.twimg.com/profile_background_images/59931895/twitterapi-background-new.png",

"profile_background_tile"=>false,

"profile_image_url"=>"http://a3.twimg.com/profile_images/689684365/api_normal.png",

"profile_link_color"=>"0000ff",

"profile_sidebar_border_color"=>"87bc44",

"profile_sidebar_fill_color"=>"e0ff92",

"profile_text_color"=>"000000",

"created_at"=>"Wed May 23 06:01:13 +0000 2007",

"contributors_enabled"=>true,

"favourites_count"=>1,

"statuses_count"=>1628,

"friends_count"=>13,

"time_zone"=>"Pacific Time (US & Canada)",

"utc_offset"=>-28800,

"lang"=>"en",

"protected"=>false,

"followers_count"=>100581,

"geo_enabled"=>true,

"notifications"=>false,

"following"=>true,

"verified"=>true},

"contributors"=>[3191321],

"geo"=>nil,

"coordinates"=>nil,

"place"=>

{"id"=>"2b6ff8c22edd9576",

"url"=>"http://api.twitter.com/1/geo/id/2b6ff8c22edd9576.json",

"name"=>"SoMa",

"full_name"=>"SoMa, San Francisco",

"place_type"=>"neighborhood",

"country_code"=>"US",

"country"=>"The United States of America",

"bounding_box"=>

{"coordinates"=>

[[[-122.42284884, 37.76893497],

[-122.3964, 37.76893497],

[-122.3964, 37.78752897],

[-122.42284884, 37.78752897]]],

"type"=>"Polygon"}},

"source"=>"web"}

The tweet's unique ID. These

IDs are roughly sorted &

developers should treat them

as opaque (http://bit.ly/dCkppc).

Text of the tweet.

Consecutive duplicate tweets

are rejected. 140 character

max (http://bit.ly/4ud3he).

Tweet's

creation

date.

DE

PR

EC

AT

ED

The ID of an existing tweet that

this tweet is in reply to. Won't

be set unless the author of the

referenced tweet is mentioned.The screen name &

user ID of replied to

tweet author. Truncated to 140

characters. Only

possible from SMS.

Th

e a

uth

or

of

the

tw

ee

t. T

his

em

be

dd

ed

ob

ject

ca

n g

et

ou

t o

f syn

c.

Th

e a

uth

or's

use

r ID

.

The author's

user name.

The author's

screen name.

The author's

biography.

The author's

URL.The author's "location". This is a free-form text field, and

there are no guarantees on whether it can be geocoded.

Rendering information

for the author. Colors

are encoded in hex

values (RGB).The creation date

for this account.Whether this account has

contributors enabled

(http://bit.ly/50npuu). Number of

favorites this

user has.

Nu

mb

er

of

twe

ets

this

use

r h

as.

Number of

users this user

is following.The timezone and offset

(in seconds) for this user.

The user's selected

language.

Whether this user is protected

or not. If the user is protected,

then this tweet is not visible

except to "friends".

Number of

followers for

this user.

Wh

eth

er

this

use

r h

as g

eo

en

ab

led

(h

ttp

://b

it.ly/4

pF

Y7

7).

DEPRECATED

in this context

Whether this user

has a verified badge.

Th

e g

eo

ta

g o

n t

his

tw

ee

t in

Ge

oJS

ON

(h

ttp

://b

it.ly/b

8L

1C

p).

The contributors' (if any) user

IDs (http://bit.ly/50npuu).

DEPRECATED

The place associated with this

Tweet (http://bit.ly/b8L1Cp).

The place ID

The URL to fetch a detailed

polygon for this placeThe printable names of this place

The type of this

place - can be a

"neighborhood"

or "city"

The country this place is in

The bounding

box for this

place

The application

that sent this

tweetMap of a Twitter Status Object

Raffi Krikorian <[email protected]>18 April 2010

Wednesday, September 11, 13

Page 15: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 16: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 17: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 18: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 19: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 20: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 21: Elasticsearch – mye mer enn søk! [JavaZone 2013]

user: name: DEVOPS_BORATmessage: “1+1=2 only in legacy system. In modern distributed database with eventual consistent is 1+1=1.”location: lon: 12.34 lat: 56.78followers: 42000retweets: 123tags: [questionable, funny]

Wednesday, September 11, 13

Page 22: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Analysis

whitespace

The quick brown fox had a day off

whitespace-tokenizer

Wednesday, September 11, 13

Page 23: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Filter: boolean match

Query: match med score

Kan være satt sammen av andre queries

Filter / Query

Wednesday, September 11, 13

Page 24: Elasticsearch – mye mer enn søk! [JavaZone 2013]

“Søk”

Hele informasjonsbehovet

Query, filtre, fasetter, paginering, ...

Wednesday, September 11, 13

Page 25: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Invertert indeks

"If you don't find it in the index, look very carefully through the entire catalog."

–Sears, Roebuck, and Co., Consumers' Guide 1897

Wednesday, September 11, 13

Page 26: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 27: Elasticsearch – mye mer enn søk! [JavaZone 2013]

AbstractEnterpriseSingletonProxyFactoryBean

Wednesday, September 11, 13

Page 28: Elasticsearch – mye mer enn søk! [JavaZone 2013]

xkcd.com/292

Wednesday, September 11, 13

Page 29: Elasticsearch – mye mer enn søk! [JavaZone 2013]

camelCase

AbstractSingletonProxyFactoryBean

camelCase-tokenizer

lowercase

Wednesday, September 11, 13

Page 30: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Prefiks-problemer!

Wednesday, September 11, 13

Page 31: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Prefiks-problemer

*suffix xiffus*

(60.6384, 6.5017) u4u8gyykk

123 {1-hundreds, 12-tens, 123} (forenkla)

Wednesday, September 11, 13

Page 32: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 33: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 34: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 35: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 36: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Elasticsearch

Distribuert

Cluster av noder

Selv-koordinerende

Wednesday, September 11, 13

Page 37: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 38: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 39: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 40: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 41: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 42: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 43: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 44: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 45: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Mapping

Wednesday, September 11, 13

Page 46: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 47: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 48: Elasticsearch – mye mer enn søk! [JavaZone 2013]

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Wednesday, September 11, 13

Page 49: Elasticsearch – mye mer enn søk! [JavaZone 2013]

+P�

��

��

��

��

��

��

!

Wednesday, September 11, 13

Page 50: Elasticsearch – mye mer enn søk! [JavaZone 2013]

+P�

��

��

��

��

��

��

Wednesday, September 11, 13

Page 51: Elasticsearch – mye mer enn søk! [JavaZone 2013]

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Wednesday, September 11, 13

Page 52: Elasticsearch – mye mer enn søk! [JavaZone 2013]

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Wednesday, September 11, 13

Page 53: Elasticsearch – mye mer enn søk! [JavaZone 2013]

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Wednesday, September 11, 13

Page 54: Elasticsearch – mye mer enn søk! [JavaZone 2013]

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Wednesday, September 11, 13

Page 55: Elasticsearch – mye mer enn søk! [JavaZone 2013]

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Wednesday, September 11, 13

Page 56: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Så langt

Inverterte indekser

Tekstprosessering

Indeks-termer

Mappings

Indeks-maler

Wednesday, September 11, 13

Page 57: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 58: Elasticsearch – mye mer enn søk! [JavaZone 2013]

��

��

��

��

��

��

��

��

��

xkcd.com/208Wednesday, September 11, 13

Page 59: Elasticsearch – mye mer enn søk! [JavaZone 2013]

��

��

��

��

��

��

��

��

��

��

��

Wednesday, September 11, 13

Page 60: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 61: Elasticsearch – mye mer enn søk! [JavaZone 2013]

  ?q={!boost b=div(popularity,price) v=$qq}         &qq={!dismax qf=desc^2,review}cheap         &bq={!lucene df=keywords}lucene solr java         &fq={!geofilt sfield=location pt=10.312,-20.556 d=3.5}         &fq={!term f=$ff v=$vv}&ff=keywords&vv=solr         &sort=query(keywords:lame) asc, score desc

Wednesday, September 11, 13

Page 62: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 63: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 64: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 65: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 66: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 67: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 68: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 69: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 70: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 71: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 72: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 73: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Filtre

Caches som bitmaps

Kompakte

Veldig raske

Wednesday, September 11, 13

Page 74: Elasticsearch – mye mer enn søk! [JavaZone 2013]

term: className: "InternalFrameInternalFrameTitlePaneInternalFrameTitlePaneMaximizeButtonWindowNotFocusedState"

Wednesday, September 11, 13

Page 75: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 76: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 77: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Filtre

Bruk filtre når du kan …

… og queries når du trenger rangering.

Wednesday, September 11, 13

Page 78: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Fasetter

Oppsummerer hele resultat-mengden

Filtre + fasetter grunnlag for analyse-bruk

Wednesday, September 11, 13

Page 79: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 80: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 81: Elasticsearch – mye mer enn søk! [JavaZone 2013]

�Wednesday, September 11, 13

Page 82: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 83: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Fasetterings-muligheter

Termer

Histogrammer

Tids-histogrammer

Geo-distanse

Statistisk fordeling

Filtre/Spørringer

Wednesday, September 11, 13

Page 84: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Fasetter

Ressurskrevende

CPU + minne

Viktig å ha nok minne

Wednesday, September 11, 13

Page 85: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Filter-cacher

Felt-cacher: fasetter, m.m.

Page-cache

CacherThere are two hard things in computer science:

cache invalidation, naming things, and off-by-one errors.

Wednesday, September 11, 13

Page 86: Elasticsearch – mye mer enn søk! [JavaZone 2013]

CacherNow you are thinking with...

Per segment

Nye segmenter invaliderer ikke gamle

Viktig for (near) real time

Wednesday, September 11, 13

Page 87: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Wednesday, September 11, 13

Page 88: Elasticsearch – mye mer enn søk! [JavaZone 2013]

PostgreSQL

Verifiserer ressursbrukTrygg >> rask

Bruker disk om den må

Wednesday, September 11, 13

Page 89: Elasticsearch – mye mer enn søk! [JavaZone 2013]

Elasticsearch stoler på degBygd for fart

What could possibly go wrong?

Wednesday, September 11, 13

Page 90: Elasticsearch – mye mer enn søk! [JavaZone 2013]

OutOfMemoryError

Woah thereI ate all the memories

Your cluster may or may not work any more

Wednesday, September 11, 13

Page 91: Elasticsearch – mye mer enn søk! [JavaZone 2013]

NOSQL?

Kjapp, ikke robust

Dokumentdatabase

Skjema-fleksibel

Ingen transaksjoner

Lett å skalere/distribuere

Naïv leader-election

Ingen auth/authz

Wednesday, September 11, 13

Page 92: Elasticsearch – mye mer enn søk! [JavaZone 2013]

?Slides og relevante linker på

found.no/jz13

(Prøv hosted Elasticsearch i 6 mnd. gratis)

Solr-meetup i community-rommeti morgen!

Wednesday, September 11, 13