38
‹#› @lucacavanna Ingest Node: (re)indexing and enriching documents within Elasticsearch

Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

  • Upload
    others

  • View
    23

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

‹#›

@lucacavanna

Ingest Node: (re)indexing and enriching documents within Elasticsearch

Page 2: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Agenda

2

Why ingest node?

How does it work?

Where can it be used?

1

2

3

Page 3: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

‹#›

Why ingest node?

Page 4: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

‹#›

I just want to tail a file.

Page 5: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Logstash: collect, enrich & transport

5

grok date mutateinput output

FiltersThe file Elasticsearch

Page 6: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Logstash common setup

6

127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24

message

Page 7: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Ingest node setup

7

127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24

Page 8: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Filebeat: collect and ship

8

127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638

{ "message" : "127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / HTTP/1.1\" 200 24"}

{ "message" : "127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] \"GET /not_found/ HTTP/1.1\" 404 7218"}

{ "message" : "127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] \"GET /favicon.ico HTTP/1.1\" 200 3638"}

Page 9: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Elasticsearch: enrich and index

9

{ "message" : "127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / HTTP/1.1\" 200 24"}

{ "request" : "/", "auth" : "-", "ident" : "-", "verb" : "GET", "@timestamp" : "2016-04-19T10:00:04.000Z", "response" : "200", "bytes" : "24", "clientip" : "127.0.0.1", "httpversion" : "1.1", "rawrequest" : null, "timestamp" : "19/Apr/2016:12:00:04 +0200"}

Page 10: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

‹#›

How does ingest node work?

Page 11: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Ingest pipeline

11

Pipeline: a set of processors

grok date removedocument enriched document

Page 12: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Define a pipelinePUT /_ingest/pipeline/apache-log{ "processors" : [ { "grok" : { "field": "message", "pattern": "%{COMMONAPACHELOG}" } }, { "date" : { "match_field" : "timestamp", "match_formats" : ["dd/MMM/YYYY:HH:mm:ss Z"] } }, { "remove" : { "field" : "message" } } ]}

12

Page 13: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Index a documentProvide the id of the pipeline to execute

PUT /logs/apache/1?pipeline=apache-log{ "message" : "127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / HTTP/1.1\" 200 24"}

13

Page 14: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

GET /logs/apache/1

{ "request" : "/", "auth" : "-", "ident" : "-", "verb" : "GET", "@timestamp" : "2016-04-19T10:00:04.000Z", "response" : "200", "bytes" : "24", "clientip" : "127.0.0.1", "httpversion" : "1.1", "rawrequest" : null, "timestamp" : "19/Apr/2016:12:00:04 +0200"}

What has actually been indexed

14

Page 15: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

PUT /_ingest/pipeline/apache-log{ …}

GET /_ingest/pipeline/apache-log

GET /_ingest/pipeline/*

DELETE /_ingest/pipeline/apache-log

Pipeline managementCreate, Read, Update & Delete

15

Page 16: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

16

grok

removeattachment

conv

ert

uppe

rcas

e

foreach

trimap

pend gsub

set

split

fail

geoip

joinlowercase

rename

date

Page 17: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Extracts structured fields out of a single text field

17

Grok processor { "grok": { "field": "message", "pattern": "%{DATE:date}" }}

Page 18: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

set, remove, rename, convert, gsub, split, join, lowercase, uppercase, trim, append

18

Mutate processors { "remove": { "field": "message" }}

Page 19: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Parses a date from a string

19

Date processor { "date": { "field": "timestamp", "match_formats": ["YYYY"] }}

Page 20: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Adds information about the geographical location of IP addresses

20

Geoip processor { "geoip": { "field": "ip" }}

Page 21: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Do something for every element of an array

21

Foreach processor

{ "foreach": { "field" : "values", "processors" : [ { "uppercase" : { "field" : "_value" } } ] }}

Page 22: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Introducing new processors is as easy as writing a plugin

22

Plugins { "your_plugin": { … }}

Page 23: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

‹#›

Ingest node internals

Page 24: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

cluster

Default scenario

24

Client

node1

logs 2P

logs 3R

CSnode2

logs 3P

logs 1R

CS

node3

logs 1P

logs 2R

CS

Cluster State

logs index: 3 primary shards, 1 replica each

All nodes are equal: - node.data: true - node.master: true - node.ingest: true

Page 25: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

cluster

Default scenario

25

Client

node1

logs 2P

logs 3R

CSnode2

logs 3P

logs 1R

CS

node3

logs 1P

logs 2R

CSPre-processing on the

coordinating node

All nodes are equal: - node.data: true - node.master: true - node.ingest: true

index request for shard 3

Page 26: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

cluster

Default scenario

26

Client

node1

logs 2P

logs 3R

CSnode2

logs 3P

logs 1R

CS

node3

logs 1P

logs 2R

CS

Indexing on the primary shardAll nodes are equal:

- node.data: true - node.master: true - node.ingest: true

index request for shard 3

Page 27: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

cluster

Default scenario

27

Client

node1

logs 2P

logs 3R

CSnode2

logs 3P

logs 1R

CS

node3

logs 1P

logs 2R

CS

Indexing on the replica shard

All nodes are equal: - node.data: true - node.master: true - node.ingest: true

index request for shard 3

Page 28: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

cluster

Ingest dedicated nodes

28

Client

node1

logs 2P

logs 3R

CSnode2

logs 3P

logs 1R

CS

node3

logs 1P

logs 2R

CSnode4

CS

node5

CS

node.data: false node.master: false node.ingest: true

node.data: true node.master: true node.ingest: false

Page 29: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

cluster

Ingest dedicated nodes

29

Client

node1

logs 2P

logs 3R

CSnode2

logs 3P

logs 1R

CS

node3

logs 1P

logs 2R

CSnode4

CS

node5

CS

index request for shard 3

Forward request to an ingest node

Page 30: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

cluster

Ingest dedicated nodes

30

Client

node1

logs 2P

logs 3R

CSnode2

logs 3P

logs 1R

CS

node3

logs 1P

logs 2R

CSnode4

CS

node5

CS

index request for shard 3

Pre-processing on the ingest node

Page 31: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

cluster

Ingest dedicated nodes

31

Client

node1

logs 2P

logs 3R

CSnode2

logs 3P

logs 1R

CS

node3

logs 1P

logs 2R

CSnode4

CS

node5

CS

index request for shard 3

Indexing on the primary shard

Page 32: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

cluster

Ingest dedicated nodes

32

Client

node1

logs 2P

logs 3R

CSnode2

logs 3P

logs 1R

CS

node3

logs 1P

logs 2R

CSnode4

CS

node5

CS

index request for shard 3

Indexing on the replica shard

Page 33: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

‹#›

Where can ingest pipelines be used?

Page 34: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

34

Index api PUT /logs/apache/1?pipeline=apache-log{"message" : "…"

}

Page 35: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

35

Bulk api

PUT /logs/_bulk{ "index": { "_type": "apache", "_id": "1", "pipeline": "apache-log" } }\n{ "message" : "…" }\n{ "index": {"_type": "mysql", "_id": "1", "pipeline": "mysql-log" } }\n{ "message" : "…" }\n

Page 36: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

Scan/scroll & bulk indexing made easy

36

Reindex api

POST /_reindex{ "source": { "index": "logs", "type": "apache" }, "dest": { "index": "apache-logs", "pipeline" : "apache-log" }}

Page 37: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

‹#›

https://www.elastic.co/downloads/elasticsearch

Go get Elasticsearch 5.0.0-alpha3!

Page 38: Ingest Node - (re)indexing and enriching your documents ... Node - (r… · Ingest Node: (re)indexing and enriching documents within Elasticsearch. Agenda 2 Why ingest node? How does

‹#›

Thank you