Upload
lucidworks
View
840
Download
1
Embed Size (px)
Citation preview
6
01 Flow in Logstash
/var/log/apache.log
redis
https://cdn2.iconfinder.com/data/icons/gconstruct/2118/gconstruct1-14.png
input
7
01 Flow in Logstash
/var/log/apache.log
redis
https://cdn2.iconfinder.com/data/icons/gconstruct/2118/gconstruct1-14.png
plain
{json}
input
codec
8
01 Flow in Logstash
/var/log/apache.log
redis
Rafał @kucrafal grok {
"user": "Rafał", "twitter":
"@kucrafal" }
- w $numberOfWorkers
https://cdn2.iconfinder.com/data/icons/gconstruct/2118/gconstruct1-14.png
plain
{json}
input
codec
filter
9
01 Flow in Logstash
/var/log/apache.log
redis
Rafał @kucrafal grok {
"user": "Rafał", "twitter":
"@kucrafal" }
- w $numberOfWorkers
https://cdn2.iconfinder.com/data/icons/gconstruct/2118/gconstruct1-14.png
workers => 2
plain
{json}
input
codec
filter
output
10
01 Simple Config https://github.com/sematext/lucene-revolution-samples/tree/master/2015
input {
file {
path => "/opt/logs/example.log"
start_position => "beginning"
}
}
output {
solr_http {
solr_url => "http://localhost:8983/solr/gettingstarted"
flush_size => 5000
workers => 4
}
}
bin/plugin install logstash-‐output-‐solr_http
apache combined logs
12
01 Parse JSON input {
file {
path => "/opt/logs/example.log.parsed"
start_position => "beginning"
…
filter {
json {
source => "message"
}
}
output {
solr_http {
…
apache combined logs in JSON
bin/logstash -‐f logstash.conf -‐w 4 # filterWorkers=4
input {
file {
path => "/opt/logs/example.log"
start_position => "beginning"
…
filter {
grok {
match => [ "message", "%{COMBINEDAPACHELOG}" ]
}
}
output {
solr_http {
…
14
01 Grok
16
01 Flow Options
https://upload.wikimedia.org/wikipedia/commons/thumb/b/bb/Gorilla-server.svg/2000px-Gorilla-server.svg.png https://www.elastic.co/assets/blt69f6410148efbab8/logstash.png
17
01 Flow Options (cont.)
http://www.hanselman.com/blog/content/binary/Windows-Live-Writer/ef572a4c3e50_13F7B/redis_logo_a83f44f3-708d-4fad-aa6e-6eb0d6f82001.png https://upload.wikimedia.org/wikipedia/commons/thumb/f/f8/Question_mark_alternate.svg/2000px-Question_mark_alternate.svg.png
or Kafka or *MQ or...
something light here
18
03 Enter rsyslog
https://cdn1.iconfinder.com/data/icons/amenities/500/socket-512.png
your app
crond
syslog socket
rsyslog /var/log/messages
20
03 Fast-Forward 11 years
http://www.rsyslog.com/common/images/rsyslog-features-imagemap.png
allows us to write to Solr
23
03 Configuring rsyslog
mail.* /var/log/mail
if $facility == "mail" then { action(type="omfile" file="/var/log/mail") }
works on ~30-‐year-‐old syslogd
24
03 Configuring rsyslog
mail.* /var/log/mail
if $facility == "mail" then { action(type="omfile" file="/var/log/mail") }
works on ~30-‐year-‐old syslogd
room for more options in modules + main flow (e.g. local
vars)
25
03 Configuring rsyslog
mail.* /var/log/mail
if $facility == "mail" then { action(type="omfile" file="/var/log/mail") }
if you see this kind while googling, it’s likely outdated (5-10 year-old rsyslog versions)
works on ~30-‐year-‐old syslogd
room for more options in modules + main flow (e.g. local
vars)
26
03 Configuring rsyslog
mail.* /var/log/mail
if $facility == "mail" then { action(type="omfile" file="/var/log/mail") }
written in the last 4 years
works on ~30-‐year-‐old syslogd if you see this kind while
googling, it’s likely outdated (5-10 year-old rsyslog versions)
room for more options in modules + main flow (e.g. local
vars)
28
01 Flow in rsyslog
/var/log/apache.log
syslog socket main queue (RAM+Disk)
input queue.type queue.size ...
29
01 Flow in rsyslog
/var/log/apache.log
syslog socket main queue (RAM+Disk)
input queue.type queue.size ...
queue.workerThreads (filter, parse and send events)
30
01 Flow in rsyslog
/var/log/apache.log
syslog socket main queue (RAM+Disk)
input queue.type queue.size ...
queue.workerThreads (filter, parse and send events)
queue.dequeueBatchSize
31
01 Flow in rsyslog
/var/log/apache.log
syslog socket main queue (RAM+Disk)
input queue.type queue.size ...
queue.workerThreads (filter, parse and send events)
queue.dequeueBatchSize
rsyslog_solr.py
rsyslog_solr.py
rsyslog_solr.py
action
template {JSON}
32
01 Flow in rsyslog
/var/log/apache.log
syslog socket main queue (RAM+Disk)
input queue.type queue.size ...
queue.workerThreads (filter, parse and send events)
queue.dequeueBatchSize
rsyslog_solr.py
rsyslog_solr.py
rsyslog_solr.py
action
template {JSON}
33
01 Simple Config (1/2) https://github.com/sematext/lucene-revolution-samples/tree/master/2015
module(load="imfile") module(load="omprog")
input(type="imfile"
File="/opt/logs/example.log"
Tag="apache:")
main_queue(
queue.highWatermark="100000"
queue.lowWatermark="50000"
queue.maxDiskSpace="5g"
queue.fileName="solr_action" queue.spoolDirectory="/opt/rsyslog/queues"
queue.saveOnShutdown="on"
queue.workerThreads="4"
queue.dequeueBatchSize="500"
)
apache combined logs
34
01 Simple Config (2/2) template(name="json_lines" type="list" option.json="on") { constant(value="{")
constant(value="\"timestamp\":\"")
property(name="timereported" dateFormat="rfc3339")
constant(value="\",\"message\":\"")
property(name="msg") ...
constant(value="\",\"syslog-‐tag\":\"")
property(name="syslogtag")
constant(value="\"}\n")
}
action(
type="omprog"
binary="/opt/rsyslog/rsyslog_solr.py"
template="json_lines"
)
get from https://github.com/rsyslog/rsyslog/tree/master/plugins/external/solr
37
01 Base Result
15% rsyslog, 4x1% rsyslog_solr.py
125MB rsyslog, 4x15MB rsyslog_solr.py Depends on queue. Here up to 100K events in RAM
38
01 JSON Config # same main queue settings and modules
input(type="imfile"
File="/opt/logs/example.log.parsed"
Tag="apache:")
module(load="mmnormalize")
action(type="mmnormalize"
rulebase="/opt/rsyslog/json.rb"
)
template(name="json_lines" type="list") { property(name="$!root") constant(value="\n")
}
action(type="omprog"
...
apache combined logs already parsed in JSON
version=2 rule=:%root:json%
40
01 Normalizing Config
input(type="imfile" File="/opt/logs/example.log"
Tag="apache")
action(type="mmnormalize"
rulebase="/opt/rsyslog/apache_combined.rb" )
template(name="json_lines" type="list") {
property(name="$!all-‐json")
constant(value="\n")
}
version=2
rule=:%[
{"type": "word", "name": "clientip"},
{"type": "literal", "text": " "},
... {"type": "char-‐to", "name": "agent", "extradata": "\""},
{"type": "literal", "text": "\""},
{"type": "rest", "name": "blob"}
]%
42
01 Normalizing “Should Scale”*
sys
tem log
d -ng
performance depends mostly on log length and not on the number of rules: http://blog.gerhards.net/2013/01/performance-of-liblognormrsyslog-parse.html
rule=apache_combined:%[ {"type": "word", "name": "clientip"},
...
{"type": "char-‐to", "name": "agent", "extradata": "\""},
{"type": "literal", "text": "\""},
{"type": "rest", "name": "blob"} ]%
rule=apache_common:%[
{"type": "word", "name": "clientip"},
...
{"type": "number", "name": "bytes"}, {"type": "rest", "name": "blob", "priority": 65535}
]%
...
43
01 Normalizing with Five Rules input(type="imfile" File="/opt/logs/example*"
Tag="apache")
action(type="mmnormalize"
rulebase="/opt/rsyslog/multiple_rules.rb" )
if $!root <> "" then {
set $.final-‐json = $!root;
} else {
set $.final-‐json = $!all-‐json; }
template(name="json_lines" type="list") {
property(name="$.final-‐json") constant(value="\n")
}
47
01 rsyslog.conf module(load="imfile") module(load="omhiredis")
input(type="imfile"
File="/opt/logs/example.log"
Tag="apache:") template(name="json_lines" type="list" option.json="on") {...}
main_queue(queue.workerthreads="1"
queue.dequeueBatchSize="100"
queue.size="10000")
action(type="omhiredis"
mode="publish"
key="rsyslog_logstash"
template="json_lines")
./configure -‐-‐enable-‐omhiredis
small&light queue
48
01 logstash.conf
input { redis { data_type => "channel"
key => "rsyslog_logstash" batch_count => 100
} }
output { solr_http {
... }
}
JSON codec is implied
49
01 Combined Result
rsyslog 1% Redis 2% Logstash 200%
rsyslog 10MB (10K queue) Redis 1000MB (configurable) Logstash 380MB
50
01 5-Rule Normalizing Result
rsyslog 100% Redis 2% Logstash 200%
rsyslog 30MB Redis 1000MB Logstash 450MB
51
01 Shipper conclusions
rsyslog
rsyslog
rsyslog
rsyslog
rsyslog
rsyslog
easy setup; flexible heavy
light; fast less flexible&easy
offloads buffers and Logstash processing; flexible and efficient setup and maintenance overhead
52
01 Solr Tuning Agenda
Schema and config adjustments
Time-based collections
Tiered cluster (e.g. hot vs cold nodes)
53
01 Schema: Two Kinds of Fields
message:failed
"docValues": true "omitNorms": true,
"omitTermFreqAndPositions": true
54
01 Schema: Two Kinds of Fields
message:failed
"docValues": true "omitNorms": true,
"omitTermFreqAndPositions": true
+20 to 100% capacity* 10% faster indexing*
* http://blog.sematext.com/2014/11/17/solr-presentations-lucene-solr-revolution/
55
01 Commits
"updateHandler.autoSoftCommit.maxTime": 5000
"updateHandler.autoCommit.maxTime": 60000 <ramBufferSizeMB>200</ramBufferSizeMB>
5s feels near-realtime while searching
Flush to disk every minute of 200MB
56
01 Commits
"updateHandler.autoSoftCommit.maxTime": 5000
"updateHandler.autoCommit.maxTime": 60000 <ramBufferSizeMB>200</ramBufferSizeMB>
5s feels near-realtime while searching
Flush to disk every minute of 200MB
+10% capacity; 10% faster indexing*
57
01 Time-Based Collections
15 Oct
14 Oct
13 Oct
12 Oct
indexing, merges, most searches
doesn’t change => cache friendly can be optimized
delete without triggering merges
58
01 Time-Based Collections
15 Oct
14 Oct
13 Oct
12 Oct
indexing, merges, most searches
doesn’t change => cache friendly => can be optimized
delete without triggering merges
20-30x capacity; less indexing degradation*
* http://www.slideshare.net/sematext/side-by-side-with-elasticsearch-solr-part-2
61
01 Tiered Cluster
hot1
hot2
cold1
cold2
cold3
cold4
(L) 13 Oct
(R) 13 Oct
(L) 13 Oct
(R) 13 Oct
ADDREPLICA
64
01 Tiered Cluster
hot1
hot2
cold1
cold2
cold3
cold4
(L) 13 Oct
(R) 13 Oct
(L) 15 Oct
(R) 15 Oct
(L) 14 Oct
(R) 14 Oct
65
01 Tiered Cluster
hot1
hot2
cold1
cold2
cold3
cold4
(L) 13 Oct
(R) 13 Oct
(L) 15 Oct
(R) 15 Oct
(L) 14 Oct
(R) 14 Oct
quick recent searches and indexing rare lengthy requests
66
01 Tiered Cluster
cold1
cold2
cold3
cold4
(L) 13 Oct
(R) 13 Oct
(L) 15 Oct
(R) 15 Oct
(L) 14 Oct
(R) 14 Oct
quick recent searches and indexing rare lengthy requests
hot1
hot2
buffer for indexing spikes
67
01 Tiered Cluster
cold1
cold2
cold3
cold4
(L) 13 Oct
(R) 13 Oct
(L) 15 Oct
(R) 15 Oct
(L) 14 Oct
(R) 14 Oct
quick recent searches and indexing rare lengthy requests
hot1
hot2
buffer for indexing spikes
less shards per collection and the cluster is still balanced
68
01 Tiered Cluster
cold1
cold2
cold3
cold4
(L) 13 Oct
(R) 13 Oct
(L) 15 Oct
(R) 15 Oct
(L) 14 Oct
(R) 14 Oct
quick recent searches and indexing rare lengthy requests
hot1
hot2
buffer for indexing spikes
less shards per collection and the cluster is still balanced
CPU++
RAM++ IO++
71
01 Wrap-Up
DocValues
commits
https://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.png https://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png
72
01 Wrap-Up
DocValues
commits
https://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.png https://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png
73
01 Wrap-Up
DocValues
commits
http://www.funnyshirts.net/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/z/o/zombies-hate-fast-food-funny-tshirt-preview.png https://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.png https://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png
74
01 Wrap-Up
DocValues
commits
http://www.funnyshirts.net/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/z/o/zombies-hate-fast-food-funny-tshirt-preview.png https://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.png https://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png
rsyslog
75
01 Wrap-Up
DocValues
commits
http://www.funnyshirts.net/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/z/o/zombies-hate-fast-food-funny-tshirt-preview.png https://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.png https://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png
rsyslog
rsyslog
rsyslog
rsyslog
76
01 Questions?
Rafał Kuć @kucrafal [email protected]
Radu Gheorghe
@radu0gheorghe [email protected]
Sematext
@sematext http://sematext.com
77
01 Questions?
Rafał Kuć @kucrafal [email protected]
Radu Gheorghe
@radu0gheorghe [email protected]
Sematext
@sematext http://sematext.com
we’re hiring, too!