25
Search and Analyze Data in Real Time Prashant Shewale and Rohit Kalsarpe

Search and analyze data in real time

Embed Size (px)

Citation preview

Page 1: Search and analyze data in real time

Search and Analyze Data in Real TimePrashant Shewale and Rohit Kalsarpe

Page 2: Search and analyze data in real time

Agenda

1 Problem in validating logs

2 How Logstash can help

3 ELK Stack (Elastic Search, Logstash, Kibana)

4 Some hands on

5 How we used ELK stack in our automation framework

6 World beyond

Page 3: Search and analyze data in real time

Problem in validating logs

Follow active log files.

Logs keep growing and are rotated.

Collating multiline logs in single event is difficult task.

We have different kinds of applications and hence different kinds of logs. And that have different formats.

Page 4: Search and analyze data in real time

192.168.198.92 - - [22/Dec/2002:23:08:37 -0400] "GET / HTTP/1.1" 200 6394 www.yahoo.com "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1...)" "-"

192.168.198.92 - - [22/Dec/2002:23:08:38 -0400] "GET /images/logo.gif HTTP/1.1" 200 807 www.yahoo.com "http://www.some.com/" "Mozilla/4.0 (compatible; MSIE 6...)" "-"

192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /news/sports.html HTTP/1.1" 200 3500 www.yahoo.com "http://www.some.com/" "Mozilla/4.0 (compatible; MSIE ...)" "-"

192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /favicon.ico HTTP/1.1" 404 1997 www.yahoo.com "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)..." "-"

192.168.72.177 - - [22/Dec/2002:23:32:15 -0400] "GET /style.css HTTP/1.1" 200 4138 www.yahoo.com "http://www.yahoo.com/index.html" "Mozilla/5.0 (Windows..." "-"

192.168.72.177 - - [22/Dec/2002:23:32:16 -0400] "GET /js/ads.js HTTP/1.1" 200 10229 www.yahoo.com "http://www.search.com/index.html" "Mozilla/5.0 (Windows..." "-"

192.168.72.177 - - [22/Dec/2002:23:32:19 -0400] "GET /search.php HTTP/1.1" 400 1997 www.yahoo.com "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; ...)" "-"

Sample Apache Log

Page 5: Search and analyze data in real time

Feb 4 06:10:09 techy sendmail[5392]: o140e90B005392: from=, size=2434, class=0, nrcpts=1, msgid=<[email protected]>, proto=ESMTP, daemon=MTA, relay=localhost [127.0.0.1]

Feb 4 06:10:09 techy sendmail[5380]: o140e9Mi005380: to=root, ctladdr=root (0/0), delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=32168, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (o140e90B005392 Message accepted for delivery)

Sample SendMail Log

Page 6: Search and analyze data in real time

Oct 20 03:45:50 hostname kernel: iptables denied: IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=1059 TOS=0x00 PREC=0x00 TTL=115 ID=31368 DF PROTO=TCP SPT=17992 DPT=80 WINDOW=16477 RES=0x00 ACK PSH URGP=0

Oct 20 03:46:02 hostname kernel: iptables denied: IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=52 TOS=0x00 PREC=0x00 TTL=52 ID=763 DF PROTO=TCP SPT=20229 DPT=22 WINDOW=15588 RES=0x00 ACK URGP=0

Oct 20 03:46:14 hostname kernel: iptables denied: IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=324 TOS=0x00 PREC=0x00 TTL=49 ID=64245 PROTO=TCP SPT=47237 DPT=80 WINDOW=470 RES=0x00 ACK PSH URGP=0

Oct 20 03:46:26 hostname kernel: iptables denied: IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=52 TOS=0x00 PREC=0x00 TTL=45 ID=2010 PROTO=TCP SPT=48322 DPT=80 WINDOW=380 RES=0x00 ACK URGP=0

Sample IPTable Log

Page 7: Search and analyze data in real time

Use RegEx to parse data

Source:xkcd.com

Page 8: Search and analyze data in real time

Actual RegEx to parse Apace log

Page 9: Search and analyze data in real time

Source:xkcd.com

Page 10: Search and analyze data in real time

How Logstash can help

LogStash is a data pipeline that helps you process logs from a variety of systems.

Logstash allows you to parse data and converge on a common format.

Logstash provides a fast and convenient way to custom logic for parsing these logs 

Support for multiple plugins

Page 11: Search and analyze data in real time

LogStash

Input Section Filter Section Output Section

• File• Stdin• Syslog• SNMP Traps• TCP/UDP• and many more

• Grok• Mutate• Geoip• Drop• and many more

• Elastic Search• File• Email• and many more

Page 12: Search and analyze data in real time

Logstash Config File

input {

...

}

filter {

...

}

output {

...

}

Page 13: Search and analyze data in real time

Logstash-forwarder

A tool to collect logs locally for processing elsewhere

Secure, low latency, low resource usage, and reliable.

Another option: Log-courier

Logstash-forwarder

Logstash

Page 14: Search and analyze data in real time

ELK Stack

Elasticsearch, Logstash and Kibana

End-to-end stack that delivers actionable insights in real time from almost any type of structured and unstructured data sourceI. Logstash is used for cooking data

II. Elastic Search is used for storing this cooked data

III. Kibana gives shape to your data

Each one is packed and fully self contained in a jar and easy to use

Page 15: Search and analyze data in real time

What is ELK?

Shipper

Shipper

Shipper

Page 16: Search and analyze data in real time

What is ELK?

Shipper

Shipper

Shipper

Page 17: Search and analyze data in real time

Elastic Search

Real time search and indexing tool

Easy to setup; RESTful API

Easy to cluster and scale

High Availability

Schema-Free

Page 18: Search and analyze data in real time

What is ELK?

Shipper

Shipper

Shipper

Page 19: Search and analyze data in real time

Kibana

Seamless Integration with Elasticsearch

Give Shape to Your Data

Sophisticated Analytics

Easy Setup

Simple Data Export

Page 20: Search and analyze data in real time

What is ELK?

Shipper

Shipper

Shipper

Page 21: Search and analyze data in real time

Demo

Page 22: Search and analyze data in real time

How we used ELK stack in our automation framework

Page 23: Search and analyze data in real time

Automation Box 1

Automation Box 2

Automation Box n

Mail Server

Mail Server

Mail Server

Logstash

Cook

Correlate

Elastic Search

Index

Store

Mail

Logs

Structured data

Structured data

Page 24: Search and analyze data in real time

World Beyond

Analytics - count things and summarize your data.

Crawling and Document Processing1. For crawling, people are using both Scrapy and Nutch together

with Elasticsearch.

Variety of companies are using ELK stack to pump their search infrastructure. 1. Wikimedia

2. Empowers GitHub's 4 million members through providing search across GitHub's 8 million+ code repositories.

Page 25: Search and analyze data in real time

Thank You