Monitoring docker container and dockerized applications

Monitoring Docker Containers&

Dockerized Applications

Anantha Padmanabhan CB (@cbananth)Rahul Krishna Upadhyaya (@rakrup_)Satya Sanjivani Routray (@er_sanj007)Meenakshi Sundaram Lakshmanan (@lxmeenakshi1)

Cloud and Network SolutionsCisco Systems Inc.

Agenda

• Introduction• Monitoring Containers - Challenges • Approach• Design• Demo• Q&A

Containers – Introduction

• Containers virtualize the OS just like hypervisors virtualizes the hardware

• Containers enable any payload to be encapsulated as a lightweight, Portable self-sufficient container, that can be manipulated using standard operations and run consistently on any hardware platform.

• Wraps up a piece of software in a complete filesystem that contains everything it needs to run such as : code, runtime, system tools, libraries etc., they share the OS kernel and bins/libs where needed, otherwise each of them operate in a self contained environment.

Containers – Introduction

• Docker, LXCs are some of the most popular implementations of containers today.

• Can be run on any Linux Server - VMs, physical Hosts, openstack..

• Ability to move around between machines without any modification

• Ability of containers to work together.

Monitoring Containers - Challenges

• Traditionally Monitoring brings to mind, Monitoring of the infrastructure – Server, Networks and Monitoring the Apps which run on them.

• In the world of containers – monitoring infrastructure alone or Application alone may not be able to provide the full picture.

• Complete Monitoring = (App + software defined components/devices + Infra) • Challenges with the monitoring tools are

– Vast set of monitoring tools to collect various statistics– Each tool gives different set of attributes in different format– Data collection tools may tend to overload the container itself, making the

statistics inaccurate.– Differentiating metrics for containers that are related and share resources– More than everything, lot of computation is required to come up with meaningful

inferences from all the data that is collected

Monitoring Containers - Challenges

• Categorizing container utilization and statistics for multitenant applications is complex

• Different applications provide different format of logs• Identifying failure points of applications• Analyzing the interconnectivity between applications in different containers, hosts

or regions.• Assessing the response time of application is complicated in a web based cloud

application, since there are lot of other parameters (region, internet speed) which could influence response time

• Clustered applications might require monitoring all the instances to identify the faulty node

Monitoring Containers - Approach

• Apps are embedded within the containers which are in turn within a VM or physical host

• Containerization requires monitoring at these different levels in order to collect complete statistics

• Containers can be linked – ability to monitor and make sense of statistics from linked containers becomes critical.

• Ability to intelligently correlate collected data in the context of App Container Host relation

• Abstraction of monitoring methods and data in order to enable integration with any monitoring tool of choice.

• Ability to do proactive, reactive and adaptive monitoring.

Monitoring at different levels

• Host

• Container

• Application

• Cluster

What to Monitor?

• Following are the major set of parameters which can be monitored– CPU

• total_usage• per_cpu_usage• system_usage• host_usage• load_average etc.,

– Memory• mem_pgfault• mem_usage• mem_cache• mem_kernel etc.,

What to Monitor – Disk

• total_bytes• bytes_read• bytes_written• bytes_async• bytes_sync etc.,

– Network • rxbytes• rxpackets• rxdropped• rxerrors• txbytes• txerrors etc.,

• Intelligently correlate the collected data that is monitored at different levels mentioned earlier.

• Enable queries and filters to make meaningful inferences from the raw data

How to Monitor?

Monitoring Strategy

• Proactive : – Prevent failure situations

• Reactive : – Raise events and alerts when failures occur.

• Adaptive : – Automatically monitor new components and model statistics

What to use when? How?Different levels need different type of monitoring strategy

Design Objectives

• Not overloading the Docker Daemon.• Different approaches of monitoring at different

levels.• Modular & Driver based approach for all possible

components• Running multiple agent drivers simultaneously.• Added considerations for Linked/Clustered

Containers

High Level Component Design

DataStorageIQ

Agent

Engine

API (REST)

CLIUIRest Client

QueueAgent

Agent

Host

Host

Host

C

C

C

CC

CCC

C

Monitoring Controller

Functions

Host

Container

Apps

Model&

ProcessData

Store

Collect Data /Logs

Analyze

Present Result Predictions/Suggestion

Agent

Container

Apps

Host

Agent Driver

Driver

Driver

Que

ue

Dump to Queue

Logs & Stats

Logs & Stats

Logs

& S

tats

To E

ngin

e

Agent

• One Agent per host• Agent monitors the host, containers on that host, applications on these

containers• Agent send & receive to the engine in a async model using queues.• Driver based log/stats collection can be done for

host/application/containers.• Drivers based on tool of choice of user for stats/log collection can be used

for each/multiple for hosts/applications/containers.• More than one driver can run in parallel to collect even more diverse

params.• Takes care of sanity of data collected to conform to the data-model in the

engine.

Monitoring controller

• Logical grouping of components• REST API to be connected via CLI, UI or any other REST-client• Driver based storage module that uses any columnar database• IQ module that provide intelligent predictions• Engine

– Aggregate stats & logs from different Docker Hosts.– Integration with Identity providers (like keystone) for supporting multitenant

deployments– Communication from agents via asynchronous queues.– Grouping & Processing of data based on use-cases.

IQ Module

• Log & stats collected and stored make up a lot of unstructured data.• Meaningful Inferences out of this data would be of better value to the user.• Analytic tools like pandas, scipy planned be used to derive inteferences.• Error predictions, usage/load pattern, capacity planning can be direct output.• Suggestions regarding infra would be output for this module.

Agent driver configuration

Containers monitored

New container spawned

Adaptively monitored

Sample parameters

Sample graphs

Thank You.

Software

Monitoring docker container and dockerized applications