Scaling ELK Stack - DevOpsDays Singapore

ELKLog processing at Scale

#DevOpsDays 2015, Singapore@DevOpsDaysSG

Angad Singh

About meDevOps at Viki, Inc - A global video streaming site with subtitles.

Previously a Twitter SRE, National University of Singapore

Twitter @angadsg,

Github @angad

Elasticsearch - Log Indexing and Searching

Logstash - Log Ingestion plumbing

Kibana - Frontend{

Metrics vs LoggingMetrics

● Numeric timeseries data

● Actionable

● Counts, Statistical (p90, p99 etc.)

● Scalable cost-effective solutions

already available

Logging

● Useful for debugging

● Catch-all

● Full text searching

● Computationally intensive, harder

to scale

Metrics vs LoggingMetrics

● Numeric timeseries data

● Actionable

● Counts, Statistical (p90, p99 etc.)

● Scalable cost-effective solutions

already available

Alerting and Monitoring at Viki

Deeper level debugging with application logs

Success Rate Alert for service X

Logs● Application logs - Stack Traces, Handled Exceptions

● Access Logs - Status codes, URI, HTTP Method at all levels of the stack

● Client Logs - Direct HTTP requests containing log events from client-side

Javascript or Mobile application (android/ios)

● Standardized log format to JSON - easy to add / remove fields.

● Request tracing through various services using Unique-ID at Load Balancer

● Log aggregator● Log preprocessing

(Filtering etc.)● 3 stage pipeline● Input > Filter > Output

Logstash

Logstash Elasticsearch● Full text searching and

indexing● on top of Apache

Lucene● RESTful web interface● Horizontally scalable

Logstash Elasticsearch● Full text searching and

indexing● on top of Apache

Lucene● RESTful web interface● Horizontally scalable

Kibana● Frontend● Visualizations,

Dashboards● Supports Geo

visualizations● Uses ES REST API

Any Stream

● local file● queue● tcp, udp● twitter● etc..

LogstashFilter

Mutation

● add/remove field● parse as json● ruby code● parse geoip● etc..

Output

● elasticsearch● redis● queue● file● pagerduty● etc..

● Golang program that sits next to log files, lumberjack protocol

● Forwards logs from a file to a logstash server

● Removes the need for a buffer (such as redis, or a queue) for

logs pending ingestion to logstash.

● Docker container with volume mounted /var/log.

Configuration stored in Consul.

● Application containers with volume mounted /var/log to

/var/log/docker/<container>/application.log

Logstash Forwarder

Logstash pool with HAProxy4 x logstash machines, 8 cores, 16 GB RAM

7 x logstash processes per machine, 5 for application logs, 2 for HTTP client logs.

Fronted by HAProxy for both lumberjack protocol as well as HTTP protocol.

Easily scalable by adding more machines and spinning up more logstash processes.

Application ServiceContainer 1

Application ServiceContainer 2

Logstash-Forwarder Container

Mounted /var/log to/var/log/docker/ on host

Elasticsearch Hardware12 core, 64GB RAM with RAID 0 - 2 x 3TB 7200rpm disks.

20 nodes, 20 shards, 3 replicas (with 1 primary).

Each day ~300GB x 4 copies (3 + 1) ~ 3 months of data on 120TB.

Average 6k-8k logs per second, peak 25k logs per second.

https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html

Elasticsearch Hardware

● < 30.5 GB Heap - JAVA compressed pointers below 30.5GB heap● Sweet spot - 64GB of RAM with half available for Lucene file buffers.● SSD or RAID 0 (or multiple path directories similar to RAID 0). ● If SSD then set I/O scheduler to deadline instead of cfq.● RAID0 - no need to worry about disks failing as machines can easily be

replaced due to multiple copies of data.● Disable swap.

Hardware Tuning

● 20 days of indexes open based on available memory, rest closed - open on demand

● Field data - cache used while sorting and aggregating data.● Circuit breaker - cancels requests which require large memory, prevent OOM,

http://elasticsearch:9200/_cache/clear if field data is very close to memory limit.

● Shards >= Number of nodes● Lucene forceMerge - minor performance improvements for older indexes

(https://www.elastic.co/guide/en/elasticsearch/client/curator/current/optimize.html)

Elasticsearch Configuration

Prevent split brain situation to avoid losing data - set minimum number of master eligible nodes to (n/2 + 1)

Set higher ulimit for elasticsearch process

Daily cronjob which deletes data older than 90 days, closes indices older than 20 days, optimizes (forceMerge) indices older than 2 days

And also...

Marvel - Official plugin from Elasticsearch

KOPF - Index management plugin

CAT APIs - REST APIs to view cluster information

Curator - Data management

Monitoring

Thanksemail: angad@viki.com

twitter: @angadsg

Scaling ELK Stack - DevOpsDays Singapore

Internet

Analyzing Data with the ELK Stack

Application Logging With The ELK Stack

ELK stack Big Data visualization using D3 library

Jusqu'à il y a un an ou deux, ELK Stack était une …lecurseur.e-monsite.com/medias/files/elkpartie1.pdfElasticSearch Qu'est-ce que la ELK Stack? Jusqu'à il y a un an ou deux, ELK

ELK stack & log parsing - agenda.infn.it · Output configuration to route parsed data in a search analytics engine (Elasticsearch). ELK stack & log parsing TOMMASO DIOTALEVI Parse

WHITE PAPER Key Issues Scaling ELK Stack - Sumo Logic · PDF fileELK Stack is short for Elasticsearch, Kibana, and Logstash. These are three separate projects. In addition to ELK stack,

ELK Stack Deployment w/ Vagrant - inovex GmbH · ELK Stack Deployment w/ Vagrant Setting up a local Search & Analyzation Platform Markus Rodi & Arnold Bechtoldt Karlsruhe, 24.09.2015

Log analysis with the elk stack

Diventare famosi con lo stack ELK

ATM Monitoring using the ELK stack

Elk - Elasticsearch Logstash Kibana stack explained

ELK - stack dla przezornego i zawsze ubezpieczonego developera

Suivre son activité avec une stack Elastic (ELK)

INTRODUCTION TO ELK STACK - Mindtree Whitepaper.pdf · INTRODUCTION TO ELK STACK G oal of this document: A simple yet effective document for folks who want to learn basics of ELK

ELK Stack Deployment w/ Vagrant

Introduction into the ELK stack

ELK stack Introduction

ELK stack at weibo.com

Powering Monitoring Analytics with ELK stack

CBS Logfiles ELK Stack - tech-seo.de