Upload
ananth-padmanabhan
View
20.592
Download
3
Embed Size (px)
Citation preview
Monitoring Docker Containers&
Dockerized Applications
Anantha Padmanabhan CB (@cbananth)Rahul Krishna Upadhyaya (@rakrup_)Satya Sanjivani Routray (@er_sanj007)Meenakshi Sundaram Lakshmanan (@lxmeenakshi1)
Cloud and Network SolutionsCisco Systems Inc.
Agenda
• Introduction• Monitoring Containers - Challenges • Approach• Design• Demo• Q&A
Containers – Introduction
• Containers virtualize the OS just like hypervisors virtualizes the hardware
• Containers enable any payload to be encapsulated as a lightweight, Portable self-sufficient container, that can be manipulated using standard operations and run consistently on any hardware platform.
• Wraps up a piece of software in a complete filesystem that contains everything it needs to run such as : code, runtime, system tools, libraries etc., they share the OS kernel and bins/libs where needed, otherwise each of them operate in a self contained environment.
Containers – Introduction
• Docker, LXCs are some of the most popular implementations of containers today.
• Can be run on any Linux Server - VMs, physical Hosts, openstack..
• Ability to move around between machines without any modification
• Ability of containers to work together.
Monitoring Containers - Challenges
• Traditionally Monitoring brings to mind, Monitoring of the infrastructure – Server, Networks and Monitoring the Apps which run on them.
• In the world of containers – monitoring infrastructure alone or Application alone may not be able to provide the full picture.
• Complete Monitoring = (App + software defined components/devices + Infra) • Challenges with the monitoring tools are
– Vast set of monitoring tools to collect various statistics– Each tool gives different set of attributes in different format– Data collection tools may tend to overload the container itself, making the
statistics inaccurate.– Differentiating metrics for containers that are related and share resources– More than everything, lot of computation is required to come up with meaningful
inferences from all the data that is collected
Monitoring Containers - Challenges
• Categorizing container utilization and statistics for multitenant applications is complex
• Different applications provide different format of logs• Identifying failure points of applications• Analyzing the interconnectivity between applications in different containers, hosts
or regions.• Assessing the response time of application is complicated in a web based cloud
application, since there are lot of other parameters (region, internet speed) which could influence response time
• Clustered applications might require monitoring all the instances to identify the faulty node
Monitoring Containers - Approach
• Apps are embedded within the containers which are in turn within a VM or physical host
• Containerization requires monitoring at these different levels in order to collect complete statistics
• Containers can be linked – ability to monitor and make sense of statistics from linked containers becomes critical.
• Ability to intelligently correlate collected data in the context of App Container Host relation
• Abstraction of monitoring methods and data in order to enable integration with any monitoring tool of choice.
• Ability to do proactive, reactive and adaptive monitoring.
Monitoring at different levels
• Host
• Container
• Application
• Cluster
What to Monitor?
• Following are the major set of parameters which can be monitored– CPU
• total_usage• per_cpu_usage• system_usage• host_usage• load_average etc.,
– Memory• mem_pgfault• mem_usage• mem_cache• mem_kernel etc.,
What to Monitor – Disk
• total_bytes• bytes_read• bytes_written• bytes_async• bytes_sync etc.,
– Network • rxbytes• rxpackets• rxdropped• rxerrors• txbytes• txerrors etc.,
• Intelligently correlate the collected data that is monitored at different levels mentioned earlier.
• Enable queries and filters to make meaningful inferences from the raw data
How to Monitor?
Monitoring Strategy
• Proactive : – Prevent failure situations
• Reactive : – Raise events and alerts when failures occur.
• Adaptive : – Automatically monitor new components and model statistics
What to use when? How?Different levels need different type of monitoring strategy
Design Objectives
• Not overloading the Docker Daemon.• Different approaches of monitoring at different
levels.• Modular & Driver based approach for all possible
components• Running multiple agent drivers simultaneously.• Added considerations for Linked/Clustered
Containers
High Level Component Design
DataStorageIQ
Agent
Engine
API (REST)
CLIUIRest Client
QueueAgent
Agent
Host
Host
Host
C
C
C
CC
CCC
C
Monitoring Controller
Functions
Host
Container
Apps
Model&
ProcessData
Store
Collect Data /Logs
Analyze
Present Result Predictions/Suggestion
Agent
Container
Apps
Host
Agent Driver
Driver
Driver
Que
ue
Dump to Queue
Logs & Stats
Logs & Stats
Logs
& S
tats
To E
ngin
e
Agent
• One Agent per host• Agent monitors the host, containers on that host, applications on these
containers• Agent send & receive to the engine in a async model using queues.• Driver based log/stats collection can be done for
host/application/containers.• Drivers based on tool of choice of user for stats/log collection can be used
for each/multiple for hosts/applications/containers.• More than one driver can run in parallel to collect even more diverse
params.• Takes care of sanity of data collected to conform to the data-model in the
engine.
Monitoring controller
• Logical grouping of components• REST API to be connected via CLI, UI or any other REST-client• Driver based storage module that uses any columnar database• IQ module that provide intelligent predictions• Engine
– Aggregate stats & logs from different Docker Hosts.– Integration with Identity providers (like keystone) for supporting multitenant
deployments– Communication from agents via asynchronous queues.– Grouping & Processing of data based on use-cases.
IQ Module
• Log & stats collected and stored make up a lot of unstructured data.• Meaningful Inferences out of this data would be of better value to the user.• Analytic tools like pandas, scipy planned be used to derive inteferences.• Error predictions, usage/load pattern, capacity planning can be direct output.• Suggestions regarding infra would be output for this module.
Agent driver configuration
Containers monitored
New container spawned
Adaptively monitored
Sample parameters
Sample graphs
Thank You.