77
OCTOBER 13-16, 2016 AUSTIN, TX

Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, Sematext Group Inc

Embed Size (px)

Citation preview

OCTOBER 13-16, 2016 • AUSTIN, TX

Large Scale Log Analytics with Solr Rafał Kuć and Radu Gheorghe

Sematext Group

3

01 About Us

Radu Rafał

Logsene

4

02 Agenda

Logstash + Solr

rsyslog + Solr

rsyslog + Redis + Logstash + Solr

Solr

5

03 Logstash High Level

https://www.elastic.co/assets/blte4300041993ef383/Logstash%20Image.png

6

01 Flow in Logstash

/var/log/apache.log

redis

https://cdn2.iconfinder.com/data/icons/gconstruct/2118/gconstruct1-14.png

input  

7

01 Flow in Logstash

/var/log/apache.log

redis

https://cdn2.iconfinder.com/data/icons/gconstruct/2118/gconstruct1-14.png

plain

{json}

input  

codec  

8

01 Flow in Logstash

/var/log/apache.log

redis

Rafał @kucrafal grok {

"user": "Rafał", "twitter":

"@kucrafal" }

- w $numberOfWorkers

https://cdn2.iconfinder.com/data/icons/gconstruct/2118/gconstruct1-14.png

plain

{json}

input  

codec  

filter  

9

01 Flow in Logstash

/var/log/apache.log

redis

Rafał @kucrafal grok {

"user": "Rafał", "twitter":

"@kucrafal" }

- w $numberOfWorkers

https://cdn2.iconfinder.com/data/icons/gconstruct/2118/gconstruct1-14.png

workers => 2

plain

{json}

input  

codec  

filter  

output  

10

01 Simple Config https://github.com/sematext/lucene-revolution-samples/tree/master/2015

input  {  

 file  {  

     path  =>  "/opt/logs/example.log"  

     start_position  =>  "beginning"  

 }  

}  

 

output  {  

 solr_http  {  

     solr_url  =>  "http://localhost:8983/solr/gettingstarted"  

     flush_size  =>  5000  

     workers  =>  4  

 }  

}  

 

bin/plugin  install  logstash-­‐output-­‐solr_http  

apache  combined  logs  

11

01 Base Result

12

01 Parse JSON input  {  

 file  {  

     path  =>  "/opt/logs/example.log.parsed"  

     start_position  =>  "beginning"  

…  

filter  {  

 json  {  

     source  =>  "message"  

 }  

}  

 

output  {  

 solr_http  {  

…  

 

apache  combined  logs  in  JSON  

bin/logstash  -­‐f  logstash.conf  -­‐w  4  #  filterWorkers=4  

13

01 JSON Result

input  {  

 file  {  

     path  =>  "/opt/logs/example.log"  

     start_position  =>  "beginning"  

…  

filter  {  

 grok  {  

     match  =>  [  "message",  "%{COMBINEDAPACHELOG}"  ]  

 }  

}  

 

output  {  

 solr_http  {  

…  

 

14

01 Grok

15

01 Grok Result

16

01 Flow Options

https://upload.wikimedia.org/wikipedia/commons/thumb/b/bb/Gorilla-server.svg/2000px-Gorilla-server.svg.png https://www.elastic.co/assets/blt69f6410148efbab8/logstash.png

17

01 Flow Options (cont.)

http://www.hanselman.com/blog/content/binary/Windows-Live-Writer/ef572a4c3e50_13F7B/redis_logo_a83f44f3-708d-4fad-aa6e-6eb0d6f82001.png https://upload.wikimedia.org/wikipedia/commons/thumb/f/f8/Question_mark_alternate.svg/2000px-Question_mark_alternate.svg.png

or  Kafka  or  *MQ  or...  

something  light  here  

18

03 Enter rsyslog

https://cdn1.iconfinder.com/data/icons/amenities/500/socket-512.png

your app

crond

syslog socket

rsyslog   /var/log/messages

19

03 Fast-Forward 11 years

http://www.rsyslog.com/common/images/rsyslog-features-imagemap.png

20

03 Fast-Forward 11 years

http://www.rsyslog.com/common/images/rsyslog-features-imagemap.png

allows us to write to Solr

21

03 Configuring rsyslog

mail.*    /var/log/mail  

22

03 Configuring rsyslog

mail.*    /var/log/mail  

works  on  ~30-­‐year-­‐old  syslogd  

23

03 Configuring rsyslog

mail.*    /var/log/mail  

if  $facility  ==  "mail"  then  {      action(type="omfile"          file="/var/log/mail")  }  

works  on  ~30-­‐year-­‐old  syslogd  

24

03 Configuring rsyslog

mail.*    /var/log/mail  

if  $facility  ==  "mail"  then  {      action(type="omfile"          file="/var/log/mail")  }  

works  on  ~30-­‐year-­‐old  syslogd  

room  for  more  options  in  modules  +  main  flow  (e.g.  local  

vars)  

25

03 Configuring rsyslog

mail.*    /var/log/mail  

if  $facility  ==  "mail"  then  {      action(type="omfile"          file="/var/log/mail")  }  

if you see this kind while googling, it’s likely outdated (5-10 year-old rsyslog versions)

works  on  ~30-­‐year-­‐old  syslogd  

room  for  more  options  in  modules  +  main  flow  (e.g.  local  

vars)  

26

03 Configuring rsyslog

mail.*    /var/log/mail  

if  $facility  ==  "mail"  then  {      action(type="omfile"          file="/var/log/mail")  }  

written in the last 4 years

works  on  ~30-­‐year-­‐old  syslogd   if you see this kind while

googling, it’s likely outdated (5-10 year-old rsyslog versions)

room  for  more  options  in  modules  +  main  flow  (e.g.  local  

vars)  

27

01 Flow in rsyslog

/var/log/apache.log

syslog socket

input  

28

01 Flow in rsyslog

/var/log/apache.log

syslog socket main queue (RAM+Disk)

input  queue.type  queue.size  ...  

29

01 Flow in rsyslog

/var/log/apache.log

syslog socket main queue (RAM+Disk)

input  queue.type  queue.size  ...  

queue.workerThreads  (filter,  parse  and  send  events)  

30

01 Flow in rsyslog

/var/log/apache.log

syslog socket main queue (RAM+Disk)

input  queue.type  queue.size  ...  

queue.workerThreads  (filter,  parse  and  send  events)  

queue.dequeueBatchSize  

31

01 Flow in rsyslog

/var/log/apache.log

syslog socket main queue (RAM+Disk)

input  queue.type  queue.size  ...  

queue.workerThreads  (filter,  parse  and  send  events)  

queue.dequeueBatchSize  

rsyslog_solr.py

rsyslog_solr.py

rsyslog_solr.py

action  

template  {JSON}  

32

01 Flow in rsyslog

/var/log/apache.log

syslog socket main queue (RAM+Disk)

input  queue.type  queue.size  ...  

queue.workerThreads  (filter,  parse  and  send  events)  

queue.dequeueBatchSize  

rsyslog_solr.py

rsyslog_solr.py

rsyslog_solr.py

action  

template  {JSON}  

33

01 Simple Config (1/2) https://github.com/sematext/lucene-revolution-samples/tree/master/2015

module(load="imfile")  module(load="omprog")  

 

input(type="imfile"  

         File="/opt/logs/example.log"  

         Tag="apache:")    

main_queue(  

 queue.highWatermark="100000"  

 queue.lowWatermark="50000"  

 queue.maxDiskSpace="5g"  

 queue.fileName="solr_action"    queue.spoolDirectory="/opt/rsyslog/queues"  

 queue.saveOnShutdown="on"  

 queue.workerThreads="4"  

 queue.dequeueBatchSize="500"  

)  

apache  combined  logs  

34

01 Simple Config (2/2) template(name="json_lines"  type="list"  option.json="on")  {    constant(value="{")  

 constant(value="\"timestamp\":\"")  

 property(name="timereported"  dateFormat="rfc3339")  

 constant(value="\",\"message\":\"")  

 property(name="msg")  ...  

 constant(value="\",\"syslog-­‐tag\":\"")  

 property(name="syslogtag")  

 constant(value="\"}\n")  

}  

 action(  

 type="omprog"  

 binary="/opt/rsyslog/rsyslog_solr.py"  

 template="json_lines"  

)  

get from https://github.com/rsyslog/rsyslog/tree/master/plugins/external/solr

35

01 Base Result

36

01 Base Result

15%  rsyslog,  4x1%  rsyslog_solr.py  

37

01 Base Result

15%  rsyslog,  4x1%  rsyslog_solr.py  

125MB  rsyslog,  4x15MB  rsyslog_solr.py  Depends  on  queue.  Here  up  to  100K  events  in  RAM  

38

01 JSON Config #  same  main  queue  settings    and  modules    

input(type="imfile"  

         File="/opt/logs/example.log.parsed"  

         Tag="apache:")  

 module(load="mmnormalize")  

action(type="mmnormalize"  

 rulebase="/opt/rsyslog/json.rb"  

)  

 

template(name="json_lines"  type="list")  {    property(name="$!root")  constant(value="\n")  

}  

 

action(type="omprog"  

...    

apache  combined  logs  already  parsed  in  JSON  

version=2  rule=:%root:json%    

39

01 JSON Result

40

01 Normalizing Config

input(type="imfile"            File="/opt/logs/example.log"  

         Tag="apache")  

 

action(type="mmnormalize"  

 rulebase="/opt/rsyslog/apache_combined.rb"  )  

 

template(name="json_lines"  type="list")  {  

 property(name="$!all-­‐json")  

 constant(value="\n")  

}  

version=2    

rule=:%[  

 {"type":  "word",  "name":  "clientip"},  

 {"type":  "literal",  "text":  "  "},  

...    {"type":  "char-­‐to",  "name":  "agent",  "extradata":  "\""},  

 {"type":  "literal",  "text":  "\""},  

 {"type":  "rest",  "name":  "blob"}  

]%  

 

41

01 Normalizing Result

42

01 Normalizing “Should Scale”*

sys

tem log

d -ng

performance depends mostly on log length and not on the number of rules: http://blog.gerhards.net/2013/01/performance-of-liblognormrsyslog-parse.html

rule=apache_combined:%[        {"type":  "word",  "name":  "clientip"},  

...  

   {"type":  "char-­‐to",  "name":  "agent",  "extradata":  "\""},  

   {"type":  "literal",  "text":  "\""},  

   {"type":  "rest",  "name":  "blob"}  ]%  

 

rule=apache_common:%[  

   {"type":  "word",  "name":  "clientip"},  

...  

   {"type":  "number",  "name":  "bytes"},      {"type":  "rest",  "name":  "blob",  "priority":  65535}  

]%  

...  

43

01 Normalizing with Five Rules input(type="imfile"            File="/opt/logs/example*"  

         Tag="apache")  

 

action(type="mmnormalize"  

 rulebase="/opt/rsyslog/multiple_rules.rb"  )  

 

if  $!root  <>  ""  then  {  

 set  $.final-­‐json  =  $!root;  

}  else  {  

 set  $.final-­‐json  =  $!all-­‐json;  }  

 

template(name="json_lines"  type="list")  {  

 property(name="$.final-­‐json")  constant(value="\n")  

}    

44

01 5 Rules Result

45

01 OK, so this works:

rsyslog

rsyslog

rsyslog

46

01 How about this:

rsyslog

rsyslog

rsyslog

47

01 rsyslog.conf module(load="imfile")  module(load="omhiredis")  

 

input(type="imfile"  

         File="/opt/logs/example.log"  

         Tag="apache:")  template(name="json_lines"  type="list"  option.json="on")  {...}  

 

main_queue(queue.workerthreads="1"  

 queue.dequeueBatchSize="100"  

 queue.size="10000")  

 action(type="omhiredis"  

 mode="publish"  

 key="rsyslog_logstash"  

 template="json_lines")  

 

./configure  -­‐-­‐enable-­‐omhiredis  

small&light  queue  

48

01 logstash.conf

input  {    redis  {        data_type  =>  "channel"  

     key  =>  "rsyslog_logstash"        batch_count  =>  100  

 }  }  

 output  {    solr_http  {  

...    }  

}  

JSON  codec  is  implied  

49

01 Combined Result

rsyslog  1%  Redis  2%  Logstash  200%  

rsyslog  10MB  (10K  queue)  Redis  1000MB  (configurable)  Logstash  380MB  

50

01 5-Rule Normalizing Result

rsyslog  100%  Redis  2%  Logstash  200%  

rsyslog  30MB  Redis  1000MB  Logstash  450MB  

51

01 Shipper conclusions

rsyslog

rsyslog

rsyslog

rsyslog

rsyslog

rsyslog

easy setup; flexible heavy

light; fast less flexible&easy

offloads buffers and Logstash processing; flexible and efficient setup and maintenance overhead

52

01 Solr Tuning Agenda

Schema and config adjustments

Time-based collections

Tiered cluster (e.g. hot vs cold nodes)

53

01 Schema: Two Kinds of Fields

message:failed

"docValues": true "omitNorms": true,

"omitTermFreqAndPositions": true

54

01 Schema: Two Kinds of Fields

message:failed

"docValues": true "omitNorms": true,

"omitTermFreqAndPositions": true

+20 to 100% capacity* 10% faster indexing*

* http://blog.sematext.com/2014/11/17/solr-presentations-lucene-solr-revolution/

55

01 Commits

"updateHandler.autoSoftCommit.maxTime": 5000

"updateHandler.autoCommit.maxTime": 60000 <ramBufferSizeMB>200</ramBufferSizeMB>

5s feels near-realtime while searching

Flush to disk every minute of 200MB

56

01 Commits

"updateHandler.autoSoftCommit.maxTime": 5000

"updateHandler.autoCommit.maxTime": 60000 <ramBufferSizeMB>200</ramBufferSizeMB>

5s feels near-realtime while searching

Flush to disk every minute of 200MB

+10% capacity; 10% faster indexing*

57

01 Time-Based Collections

15  Oct  

14  Oct  

13  Oct  

12  Oct  

indexing, merges, most searches

doesn’t change => cache friendly can be optimized

delete without triggering merges

58

01 Time-Based Collections

15  Oct  

14  Oct  

13  Oct  

12  Oct  

indexing, merges, most searches

doesn’t change => cache friendly => can be optimized

delete without triggering merges

20-30x capacity; less indexing degradation*

* http://www.slideshare.net/sematext/side-by-side-with-elasticsearch-solr-part-2

59

01 Tiered Cluster

hot1

hot2

cold1

cold2

cold3

cold4

60

01 Tiered Cluster

hot1

hot2

cold1

cold2

cold3

cold4

(L)  13  Oct  

(R)  13  Oct  

61

01 Tiered Cluster

hot1

hot2

cold1

cold2

cold3

cold4

(L)  13  Oct  

(R)  13  Oct  

(L)  13  Oct  

(R)  13  Oct  

ADDREPLICA

62

01 Tiered Cluster

hot1

hot2

cold1

cold2

cold3

cold4

(L)  13  Oct  

(R)  13  Oct  

63

01 Tiered Cluster

hot1

hot2

cold1

cold2

cold3

cold4

(L)  13  Oct  

(R)  13  Oct  

(L)  14  Oct  

(R)  14  Oct  

64

01 Tiered Cluster

hot1

hot2

cold1

cold2

cold3

cold4

(L)  13  Oct  

(R)  13  Oct  

(L)  15  Oct  

(R)  15  Oct  

(L)  14  Oct  

(R)  14  Oct  

65

01 Tiered Cluster

hot1

hot2

cold1

cold2

cold3

cold4

(L)  13  Oct  

(R)  13  Oct  

(L)  15  Oct  

(R)  15  Oct  

(L)  14  Oct  

(R)  14  Oct  

quick recent searches and indexing rare lengthy requests

66

01 Tiered Cluster

cold1

cold2

cold3

cold4

(L)  13  Oct  

(R)  13  Oct  

(L)  15  Oct  

(R)  15  Oct  

(L)  14  Oct  

(R)  14  Oct  

quick recent searches and indexing rare lengthy requests

hot1

hot2

buffer for indexing spikes

67

01 Tiered Cluster

cold1

cold2

cold3

cold4

(L)  13  Oct  

(R)  13  Oct  

(L)  15  Oct  

(R)  15  Oct  

(L)  14  Oct  

(R)  14  Oct  

quick recent searches and indexing rare lengthy requests

hot1

hot2

buffer for indexing spikes

less shards per collection and the cluster is still balanced

68

01 Tiered Cluster

cold1

cold2

cold3

cold4

(L)  13  Oct  

(R)  13  Oct  

(L)  15  Oct  

(R)  15  Oct  

(L)  14  Oct  

(R)  14  Oct  

quick recent searches and indexing rare lengthy requests

hot1

hot2

buffer for indexing spikes

less shards per collection and the cluster is still balanced

CPU++

RAM++ IO++

69

01 Wrap-Up

70

01 Wrap-Up

DocValues

commits

71

01 Wrap-Up

DocValues

commits

https://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.png https://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png

72

01 Wrap-Up

DocValues

commits

https://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.png https://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png

73

01 Wrap-Up

DocValues

commits

http://www.funnyshirts.net/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/z/o/zombies-hate-fast-food-funny-tshirt-preview.png https://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.png https://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png

74

01 Wrap-Up

DocValues

commits

http://www.funnyshirts.net/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/z/o/zombies-hate-fast-food-funny-tshirt-preview.png https://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.png https://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png

rsyslog

75

01 Wrap-Up

DocValues

commits

http://www.funnyshirts.net/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/z/o/zombies-hate-fast-food-funny-tshirt-preview.png https://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.png https://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png

rsyslog

rsyslog

rsyslog

rsyslog

76

01 Questions?

Rafał Kuć @kucrafal [email protected]

Radu Gheorghe

@radu0gheorghe [email protected]

Sematext

@sematext http://sematext.com

77

01 Questions?

Rafał Kuć @kucrafal [email protected]

Radu Gheorghe

@radu0gheorghe [email protected]

Sematext

@sematext http://sematext.com

we’re hiring, too!