Upload
canturk-isci
View
308
Download
0
Embed Size (px)
Citation preview
0
Built-in Operational Visibility and AnalyticsDesigned for Cloud
Canturk Isci
IBM Research, NY
@canturkisci
Boston UniversityThu Apr 28, 11:00 AM
CloudSightResearch
Vulnerability Advisor
1
Cloud Evolution: Greats and Needs
What is GreatWhat is Great
�Density
�Scale
�Portability
�Repeatability
�Speed
What Needs WorkWhat Needs Work
�Visibility
�Operational Insight
Utility Cost Scale Automation Agility (u)ServicesOperational Intelligence
- Modernization of IT infra and SW delivery
- Complex made simple
- Unprecedented efficiency and TTV
- Lots of shiny toys across IT lifecycle
- Visibility into our environments remains an issue
- Also lots of shiny toys for monitoring & analytics
BUT:
- Still based on traditional IT Principles!
2
- Provide unmatched deep, seamless visibility into cloud instances- Drive operational insights to solve real-world pain points
Our Work: Built-in Op Visibility & Analytics Designed for Cloud
3
- Provide unmatched deep, seamless visibility into cloud instances- Drive operational insights to solve real-world pain points
Built-in Operational Visibility & Analytics Designed for Cloud
4
- Provide unmatched deep, seamless and unified visibility into ALL cloud instances- Drive operational insights to solve real-world pain points
Built-in Operational Visibility & Analytics Designed for Cloud
Agentless System Crawler (ASC)
5
Traditional Monitoring vs. Crawlers
OS
Host
Wkld
Agent
Agent
Agent
Agent
OS
Host
Wkld A A
AA
VM
OS Wkld A A
AA
Host
OS
Wkld
A A
AA
Cont
. Wkld
A A
AA
Cont
. Wkld
A A
AA
Cont
.
VMBMS Container
OS
Host
Wkld OS
Host
Wkld
VM
OS Wkld
Host
OS
Wkld
Cont
. Wkld
Cont
. Wkld
Cont
.
VMBMS Container
6
Some Data Points
From an employee- "This is the BES client agent. I don't know what it does but it's always at
50%. I would be the first customer to remove this evil thing from my machines:”
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3515 root 20 0 781m 21m 6272 R 53.8 0.3 51:28.92 BESClient
C. Colohan. The Scariest Outage Ever. CMU SDI Seminar Series, 2012. http://pdl.cmu.edu/SDI/2012/083012b.html
Amazon. Summary of Oct. 22 '12 AWS Service Event in US-East Region.
http://aws.amazon.com/message/680342/
7
”Users do not have to do anything to get this visibility. It is already there by default”
Container Cloud
Docker Hosts
App
Cont.App
Cont.App
Cont.App
Cont.
Docker Hosts
App
Cont.App
Cont.App
Cont.App
Cont.
Docker Hosts
App
Cont.App
Cont.App
Cont.App
Cont.
Metrics & LogsBus
MultitenantIndex
LogmetSvc
ProvisioningTenancy Info
StateEvents
� Built-in in every compute node, all geos
� Enabled by default for all users in all prod
� O(10K) metrics/s & logs/s
Current State
Seamless: Built-in Monitoring & Logging in Bluemix Containers
8
Container Cloud
App
Cont.App
Cont.App
Cont.App
Cont.
Cool!
Happy User: Effortless, painless
visibility in user world
magic
Seamless: Built-in Monitoring & Logging in Bluemix Containers
”Users do not have to do anything to get this visibility. It is already there by default”
9
Key AdvantagesKey Advantages
App
Cont.App
Cont.App
Cont.App
Cont.
Why Agentless System Crawlers
magic
�Monitoring built into the platform not in end-user systems
�No complexity to end user (They do nothing, all they see is the service)
�No agents/credentials/access(nothing built into userworld)
�Works out of the box
�Makes data consumable* (lower barrier to data collection and analytics)
�Better Security* for end user(No attack surface, in userworld)
�Better Availability* of monitoring (From birth to death, inspect even defunct guest)
�Guest Agnostic (Build for platform, not each user distro)
�Decoupled* from user context (No overhead/side-effect concerns)
�Monitoring done right for the processes of the Cloud OS
10
Deep Visibility: What We Actually Collect (and Annotate)
- OS Info- Processes- Disk Info- Metrics- Network Info- Packages - Files- Config Info
From Container/VM
- Docker metadata(docker inspect)
- CPU metrics(/cgroup/cpuacct/)
- Memory metrics(/cgroup/memory)
- Docker history
Docker Runtime
ConfigAnnotator
Vulnerability Annotator
Compliance Annotator
Password Annotator
SW Annotator
LicenceAnnotator
11
Deep Visibility � Operational Insights/Analytics � Solve Real Problems
Index (Data)
Data Bus Annotators Index (Data)
Vuln. &
Compl.
Analysis
Config
Analytics
(SecConfig)
Cloud Time
Machine
(Audit/PD)
Pipeline
Service
(DevOps)
Remediation
Service
Analyitcs
* All analytics services work from the
same data & pipeline!
Today’s Special:
Vulnerability Advisor- OS Info- Processes- Disk Info- Metrics- Network Info- Packages - Files- Config Info
From Container/VM
- Docker metadata(docker inspect)
- CPU metrics(/cgroup/cpuacct/)
- Memory metrics(/cgroup/memory)
- Docker history
Docker Runtime
ConfigAnnotator
Vulnerability Annotator
Compliance Annotator
Password Annotator
SW Annotator
LicenceAnnotator
12
Crawler: How it Works for VMs
• Leverage VM Introspection (VMI) techniques to access VM Mem and Disk state
(We built bunch or our own optimizations that make this very efficient and practical)
• Can even remote both (decouple all from VM and host)
• Almost no new dependencies on host
• Currently support 1000+ kernel distros
Hypervisor
MEM View
KB
APP
Analytics Apps
Memory CrawlAPI
VM
OS
MEMDisk
Disk View
Disk CrawlAPI
Cloud Analytics
CrawlLogic Structured
view of VM states
APP
APP
{..............
}
Frames
13
Crawler: How it Works for Containers
• Leverage Docker APIs for base container information
• Exploit container abstractions (namespace mapping and cgroups) for deeper insight
• Provide deep state info at scale with no visible overheads to end user
1) Get visibility into container world
by namespace mapping
2) Crawl the container
(Crawler dependencies still borrowed from host.
No need to inject into container!)
3) Return to original namespace
4) Push data to backend index
14
Crawler: Typical Deployment
• Typical deployment, able to track diverse cloud runtimes w parity
• Need not be on same host, most crawler functions can be even remoted
15
Crawler: Design
• Same crawler across runtimes for unified operational visibility
• Multiple fanouts as use cases grow
16
Open Innovation <3
April 13
Open Container Introspection Toolkit
for Security Analysis
Open Container Introspection Toolkit
for Security Analysis
17
DEMO TIME
This SessionThis Session
�Agentless System Crawler
�Bluemix Test Drive (live – ldwave)https://developer.ibm.com/bluemix/2015/11/16/built-in-monitoring-and-logging-for-bluemix-containers/
�LogCrawler and JSON Parsing (live – CanoLibUK3)
�Vanilla LogCrawler(20150619_LogCrawlerDemo)
�Crawl even Non-responsive systems(oopsRconsole2)
�Out of Band SIEM(QRadarDemo)
�TopoLog for Topology Discovery(newTopo)
�RTop for Realtime Monitoring(RtopAnnotatedMOV)
�Crawling for Rootkits with RConsole(RConsoleAnnotatedMOV)
Sunday & WednesdaySunday & Wednesday
�Vulnerability Advisor
�Coming soon…
18
Bluemix Test Drive
Just start a Bluemix Container
(https://console.ng.bluemix.net/)
Go to Container Overview
(Metrics show up in few mins)
19
… Bluemix Test Drive
Go to Monitoring and Logs
>> Monitoring
20
… Bluemix Test Drive
Go to Monitoring and Logs
>> Logging
21
Back to: Deep Visibility � Operational Insights/Analytics � Solve Real Problems
- OS Info- Processes- Disk Info- Metrics- Network Info- Packages - Files- Config Info
From Container/VM
- Docker metadata(docker inspect)
- CPU metrics(/cgroup/cpuacct/)
- Memory metrics(/cgroup/memory)
- Docker history
Docker Runtime
ConfigAnnotator
Vulnerability Annotator
Compliance Annotator
Password Annotator
SW Annotator
LicenceAnnotator
How can I identify my vulnerable/non-compliant images before they go live?
How can I detect and block systems with password access
configurations and weak passwords?
21
22
- OS Info- Processes- Disk Info- Metrics- Network Info- Packages - Files- Config Info
From Container/VM
- Docker metadata(docker inspect)
- CPU metrics(/cgroup/cpuacct/)
- Memory metrics(/cgroup/memory)
- Docker history
Docker Runtime
ConfigAnnotator
Vulnerability Annotator
Compliance Annotator
Password Annotator
SW Annotator
LicenceAnnotator
How can I track, query and analyze my configurations in a simpleand robust manner for drift/config analytics?
How can I do better resource management and allocation?
22
Deep Visibility � Operational Insights/Analytics � Solve Real Problems
23
DEMO TIME
This SessionThis Session
�Vulnerability Advisor, Policy Mgr
�Go to Bluemix Catalog
�See VA Image Status (Safe, Caution, Blocked)
�Go to Create View
�Explore Status Details(Vulnerabilities, Policy Violations)
�Browse Policy Manager(Policy Settings, Deployment Impact)
�Change Org Policies
�Override Policies(Don’t do it)
�See Weak Password Discovery
�Update Image in Local Dev
�Fix Policy Violation
PreviouslyPreviously
�Built-in Monitoring & Logging
�We just did that one…
24
Getting Started: Let’s Go to London
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
25
Deployment Status
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy
26
Deployment Status
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution
27
Deployment Status
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution | Blocked
28
Create View
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution | Blocked
Click on Image to go to Create View
See Verdict Details and Explore Options
29
Vulnerability Advisor Report
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution | Blocked
Click on Image to go to Create View
See Verdict Details and Explore Options
View Vulnerability Advisor Report:Discovered Vulnerabilities | Policy Violations
30
Vulnerability Advisor Report
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution | Blocked
Click on Image to go to Create View
See Verdict Details and Explore Options
View Vulnerability Advisor Report:Discovered Vulnerabilities | Policy Violations
31
Policy Manager and Deployment Impact
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution | Blocked
Click on Image to go to Create View
See Verdict Details and Explore Options
View Vulnerability Advisor Report:Discovered Vulnerabilities | Policy Violations
Policy Manager and Deployment Impact
32
Policy Manager and Deployment Impact
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution | Blocked
Click on Image to go to Create View
See Verdict Details and Explore Options
View Vulnerability Advisor Report:Discovered Vulnerabilities | Policy Violations
Policy Manager and Deployment ImpactChange Org Policy and Observe Impact
33
Policy Override
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution | Blocked
Click on Image to go to Create View
See Verdict Details and Explore Options
View Vulnerability Advisor Report:Discovered Vulnerabilities | Policy Violations
Policy Manager and Deployment ImpactChange Org Policy and Observe Impact
Create View > Click One-time Override
Name your risky container and deploy
34
Also: One-stop Shop “Michael View” for the Purists
3535
Some Nostalgia: Big Vision = Systems as Data
Transform systems into documents/frames/data
Crawl the cloud like you crawl the web
Query & mine the cloud like query/mine the web
Learn good & bad sytem/SW configurations automagically
36
Operational Analytics Data Pipeline [Where We Started]
Images
(Registry)
Kafka
Configuration Channel
Compliance Channel
Vulnerability Channel
Indexers
Vulnerability Annotator
Elastic
Configuration Index
Compliance Index
Vulnerability Index
Compliance Annotator
37
Operational Analytics Data Pipeline [Where We Are]
Images
(Registry) Notification Channel
Kafka
Configuration Channel
Compliance Channel
Vulnerability Channel
Indexers
Vulnerability Annotator
Discovery Channel
Instances
(Compute) SecConfig Channel
Rootkit Channel
Licence Channel
Notification Index
Elastic
Configuration Index
Compliance Index
Vulnerability Index
Discovery Index
SecConfig Index
Rootkit Index
Licence Index
USNs Index
Compliance Annotator
Password Annotator
Config Parser
SecConfig Annotator
SW Discovery
Rootkit Annotator
Licence Discovery
Notification Parser
Security
Notices
38
Our Other Key Operational Analytics Directions
Config Analytics SW and System Discovery by Examples
Secure Config Advisor Cloud Time Machine
Risk Analysis Licence Discovery
Licence Discovery
Data Pipeline Licence Db
Im
g
39
Summary & Open Problems�Summary:
� Challenges: Operational visibility into complex cloud applications; need for real operational intelligence
� Opportunities: Transform systems to data; New line of ops data analytics; So many low-hanging pain points
� Agentless System Crawler and Vulnerability Advisor as simple ground-floor examples
�Parting Thoughts:
� Operational Visibility >> Metrics & Logs (although a good start, add state, config, interactions, dependencies,…)
� Cloud lends itself to novel & elegant “monalytics” designed with cloud-native thinking
� Everything analytics can be as-a-service when we decouple systems | observations | recommendations | actions
�Open Research Questions:
� Truly Seamless OpVis: No performance impact (~/~) + Absolutely no side effects (+/-)
� Extensibility and configurability: Deep visibility into system, application and infra
� Scale out across runtimes and scale up to many instances; challenges & limits
� How do you design DDOS-mitigation/admission-control/fair sharing in this model of built-in service
� Privacy and data sensitivity with Ops data analytics
� Piecemeal analytics/security solutions � Cloud analytics/security roadmap
� Rules/annotators � Actually smart analytics that learn good and bad configs for security, performance, availability, etc.
� Cross-silo analytics across Time, Space, Dev/Ops [CloudSight Dream]
40
The More You Know�Papers:
� Operational Visibility: IC2E’14, Sigmetrics’14, VEE’15, HotCloud’15, ATC’16 (InterConnect’15)
� Operational Analytics: BigData’14, IBM JRD’16:{SWDisc,NFM,DevOps} (InterConnect’16)
�Blogs:
� Crawl the Cloud Like You Crawl the Web: https://developer.ibm.com/open/2015/07/18/crawl-cloud-like-crawl-web/
� Monitoring and Logging for IBM Containers. No configuration needed: https://developer.ibm.com/bluemix/2015/07/06/monitoring-and-logging-for-containers-no-config-required/
� Test Driving Built-in Monitoring and Logging in IBM Containers:https://developer.ibm.com/bluemix/2015/11/16/built-in-monitoring-and-logging-for-bluemix-containers/
� Is your Docker container secure? Ask Vulnerability Advisor!:https://developer.ibm.com/bluemix/2015/07/02/vulnerability-advisor/
�Demos:
� https://www.youtube.com/channel/UCf8Fn8dKQzBCJRgI1jOlGYg
�Open Source:
� dwOpen Tech Talk: https://developer.ibm.com/open/events/dw-open-tech-talk-agentless-system-crawler/
� dwOpen Page: https://developer.ibm.com/open/agentless-system-crawler/
� Agentless System Crawler: http://github.com/cloudviz/agentless-system-crawler
� PSVMI Introspection Library: https://github.com/cloudviz/psvmi
�Try It:
� As-a-service today: http:///www.bluemix.net
� Run it yourself: http://github.com/cloudviz/agentless-system-crawler
41
Thank YouSeamless, Unified Operational visibility and Analytics Designed fro Cloud
[feat. Agentless System Crawler & Vulnerability Advisor]
IBM Research
Cloud Monitoring, Operational and DevOps Analytics
http://www.canturkisci.com/blog
@canturkisci