35
How a Cloud Computing Provider Reached the Holy Grail of Visibility Elad Gotfrid, CloudShare Leena Joshi, Splunk Inc SPO3378 #vmworldsponsor

How a Cloud Computing Provider Reached the Holy Grail of Visibility

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: How a Cloud Computing Provider Reached the Holy Grail of Visibility

How a Cloud Computing Provider Reached the Holy Grail of Visibility

Elad Gotfrid, CloudShare

Leena Joshi, Splunk Inc

SPO3378

#vmworldsponsor

Page 2: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL

SPO3378 Elad Gotfrid Director of IT @ CloudShare

How A Cloud Computing Provider Reached the Holy Grail of Visibility

Leena Joshi Director, Solutions Marketing, Splunk

Page 3: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 3

Company Overview About: Headquartered in San Mateo, CA Founded in 2007 70,000+ users worldwide Backed by leading VCs:

Sequoia, CRV, Globespan, Gemini

The Leading Cloud for Pre-Production Focus on Dev/Test/Pre Production Segment Many Fortune 500 customers including: McAfee, HP, SAP, Cisco, Dell , Microsoft , IBM , Juniper 40% of Microsoft SharePoint MVPs and MCMs already adopted

CloudShare for development, testing and training

Page 4: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 4

Company Platform Benefits CloudShare IAAS (infrastructure as a service) platform grants each

customer his own private multi-VM networked environment including compute resources, networking, IP, Preinstalled OS.

Page 5: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 5

CloudShare Operations Overview

CloudShare platform is designed to handle high load:

Running 150,000 Customer Virtual Machines per month

During peak hours our system perform ~500 VM Resume/Suspend operations in an hour

Robust dynamic assignment of infrastructure resources including: ESX Server Storage units Firewall Switches VLANs Public IPs

Page 6: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 6

CloudShare Custom Cloud CloudShare uses its own patent pending Backend

“private cloud” system designed to handle all virtual machine and datacenter life cycle: Environments operation Environment lifecycle Self healing & Error correction Resource management

Manage large scale infrastructure: 15 VMware Virtual Centers 20 storage units Hundreds of switch ports/Gateway configuration

Page 7: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 7

IT/Operations Challenges Looking for a centralized console for complete IT/Operations visibility

Business Requirements:

Data Aggregation

• Aggregate all IT/Infrastructure data into a single console

Data Correlation

• Correlate business data with performance/application data

Data Analysis

• Analyze and search the data • Find patterns and correlation between events

Page 8: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 8

The Trick Is Finding a Way to Interact

Page 9: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 9

Enter Splunk

Evaluated Splunk for a narrow use initially

Quickly realized it could do a lot more

Eventually standardized on it

Page 10: How a Cloud Computing Provider Reached the Holy Grail of Visibility

Copyright © 2012, Splunk Inc. Listen to your data.

Customer Facing Data

Outside the Datacenter

Applications

Web logs Log4J, JMS, JMX .NET events Code and scripts

Networking

Configurations syslog SNMP netflow

Databases

Configurations Audit/query logs Tables Schemas

Virtualization & Cloud

Hypervisor Guest OS, Apps Cloud

Linux/Unix

Configurations syslog File system ps, iostat, top

Windows

Registry Event logs File system sysinternals

Logfiles Configs Messages Traps Alerts

Metrics Scripts Tickets Changes

Click-stream data Shopping cart data Online transaction data

Manufacturing, logistics… CDRs & IPDRs Power consumption RFID data GPS data

Splunk Collects and Indexes Any Machine Data

10

Page 11: How a Cloud Computing Provider Reached the Holy Grail of Visibility

Copyright © 2012, Splunk Inc. Listen to your data.

Customer Facing Data

Outside the Datacenter

Applications

Web logs Log4J, JMS, JMX .NET events Code and scripts

Networking

Configurations syslog SNMP netflow

Databases

Configurations Audit/query logs Tables Schemas

Virtualization & Cloud

Hypervisor Guest OS, Apps Cloud

Linux/Unix

Configurations syslog File system ps, iostat, top

Windows

Registry Event logs File system sysinternals

Logfiles Configs Messages Traps Alerts

Metrics Scripts Tickets Changes

Click-stream data Shopping cart data Online transaction data

Manufacturing, logistics… CDRs & IPDRs Power consumption RFID data GPS data

Splunk Collects and Indexes Any Machine Data

11

•Any amount, any location, any source No upfront schema No custom connectors No RDBMS

Page 12: How a Cloud Computing Provider Reached the Holy Grail of Visibility

Copyright © 2012, Splunk Inc. Listen to your data.

Turn Machine Data Into Operational Intelligence

12

Business Insights Gain real-time insight from your machine data to make better-informed business decisions.

Operational Visibility Gain operational visibility to make

better-informed IT decisions.

Proactive Monitoring Monitor infrastructure to identify issues, problems and attacks before they impact

your customers and services.

Search and Investigation Find and fix problems across the organization using machine data.

Machine Data Operational Intelligence

Page 13: How a Cloud Computing Provider Reached the Holy Grail of Visibility

Copyright © 2012, Splunk Inc. Listen to your data.

A Single Solution for Operational Intelligence

13

Real-time Visibility • Live dashboards • Event correlation • Monitoring and alerting • Performance issues • Transaction levels • SLA tracking

Three Primary Capabilities Historical Analytics • Baseline and thresholds • Trending • Operational insights • Historical patterns • Compliance reports

Single Data Store Single UI Across Use Cases

Search/ Navigation • Data drilldown • “Needle in a haystack” • Root cause analysis /

troubleshooting • Incident investigations

Page 14: How a Cloud Computing Provider Reached the Holy Grail of Visibility

Copyright © 2012, Splunk Inc. Listen to your data.

Create and Share Dashboards in Minutes

Deliver new levels of visibility and insight for IT and the business from operational data

Marketing & Business Analysts Other

Executives & Business Owners

IT Executives

Auditors

14

Page 15: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 15

Splunk Adoption

Splunk adoption was IT and R&D driven:

From hundreds of daily e-mail alerts to few actionable email alerts

Massive use in QA for finding anomalies and issues

Dashboards for: Performance trends Current system status Capacity planning Root cause analysis Business metrics

Viral adoption within the organization. From DevOps to IT, R&D, Marketing and Management

Page 16: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 16

Splunk As a Data Aggregator

Network/ GWs/FW

Backend

VMware

App IIS

Google Docs

SQL data

Storage

API Actions Incident Management

Salesforce

Page 17: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 17

Splunk As a Central Platform in CloudShare

Developer Framework

Support/NOC:

• Performance Data

• System Alerts

Compliance Security IT

Operations

Marketing:

• Tracking – Visits, Leads, Deals, Usage patterns

• Qualifying leads

• A/B Testing

Management:

• SLA

• BI (Cohort analysis, dashboards)

R&D:

• Debug / Error logs

• Health measurements

IT/Ops:

• Capacity Planning

• Performance Monitoring

• System Usage

• Logs

Support/NOC:

• Performance data

• System Alerts

Page 18: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 18

Splunk Provides Operational Intelligence

Allows CloudShare to correlate the business data (Users, Usage) with the IT/Infrastructure data

Examples : Understand how much resources each customer consumes

(CPU, Memory, Network, etc…) and when Customer can have more than 1 VM or environment, Splunk helps

us aggregate the data easily and look at the customer level usage

Page 19: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 19

Splunk Dashboards

SLA Dashboards - Measure service level - Analyze and present

statistics according to business guidelines

Capacity Planning - High Level status for

management on capacity - True visibility into

operational data

Management Dashboard – full visibility to business critical Metrics

Page 20: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 20

Splunk Dashboards

Dashboard for high utilization storage consumers: All storage related data is collected by splunk List of number of IOPS per business unit or customer We can easily identify our top storage consumer

Page 21: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 21

Splunk Dashboards

Dashboard for storage latency : All VMware storage related data is collected by splunk We can easily identify poor performing storage unit Latency is calculated when a 20ms threshold was breached

Page 22: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 22

Splunk Dashboards High Level Dashboard of system usage (used by the NOC/Support)

Page 23: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 23

Splunk Dashboards High Level Dashboard of Network usage (used by the IT/Network Ops)

Show drilldown per user on number of connections, packets, traffic

Full visibility on traffic usage and patterns per customer

Page 24: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 24

BI usage for Marketing and Management Tracking of conversion: Visit to Lead Lead to Deal

Lead Qualification A/B Testing Churn analysis Cohort Analysis User engagement score Feature usage Customers usage patterns

Page 25: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 25

Splunk App for VMware Active participants in the beta group for Splunk app

for VMware

Splunk app collect data from ESX,VC including : ESX Logs, VC Logs Performance Tasks/Events Inventory Topology

Collect metrics from the host ESX/ESXi servers at a low level of granularity (20 second granularity)

Page 26: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 26

Splunk App for VMware #2 Collecting VMware performance data in large scale

is a “Big Data” problem : 50,000,000 events per day ~2 million events per hour

Five dedicated Splunk FA (data forwarder appliance) are used to gather all data in real time

ESX Forwarder Appliance Splunk Real Time Data

Dashboard

Page 27: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 27

Splunk Data Statistics How Splunk handles CloudShare “Big Data”

Total of 6,000,000,000 events stored in splunk datastore 90,000,000 events per day ~3.5 million events per hour

CloudShare deployed Splunk scale out architecture : 2 Indexers 1 Search Head 32GB RAM per server 1.5TB (total DB size)

Page 28: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 28

Best Practices Deploying Splunk For Your Cloud Environments

Splunk App for VMware: Take in consideration the high amount of data each ESX generate Properly size the FA hardware (CPU , Memory) Add more engine process to each FA if needed Consider creating a dedicated indexer for the VMware data in order to

reduce the load

Storage monitoring Let splunk collect both physical storage latency (from storage disks) and VMware

vDisk latency to better understand and get root cause on latency problems In a linked clone environment consider monitoring both volume level and vDisk level

latency in order to understand if the problem is on the master disk or the clones

Network monitoring Monitor network traffic for anomalies like high rate of open connection Generate real time search in order to react quickly and shutdown abuse

Page 29: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 29

Summary For Cloudshare, Splunk is our platform for operational

intelligence

Once all data is placed in a central repository its very easy to correlate events and understand patterns

Use Splunk dashboards to create visibility in to other groups in the organization

Page 30: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 30

Experiment with Splunk All users joining the session will receive 2 weeks trial account in

CloudShare with a dedicated pre- installed and fully configured Splunk environment

To get access to your dedicated environment browse to: http://tinyurl.com/SplunkDemo

Page 31: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 31

Visit Splunk & CloudShare at Booth # 1909

Theater presentations every hour

Learn more: www.splunk.com/goto/vmware [email protected] [email protected]

Page 32: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 32

Page 33: How a Cloud Computing Provider Reached the Holy Grail of Visibility

CONFIDENTIAL | 33

Company Highlights Company Founded 2004, first software release in 2006 Headquarters: San Francisco, CA

Regional headquarters in Hong Kong and London

Over 580 employees, based in 10 countries Q1 Revenue: $37.2 million; +80% year-over-year

Business Model / Product Free download Current release: Splunk Enterprise 4.3

4,000+ Customers Customers in over 75 countries 54 of the Fortune 100 Largest Customer: 100 Terabytes per day

Page 34: How a Cloud Computing Provider Reached the Holy Grail of Visibility

FILL OUT A SURVEY

EVERY COMPLETE SURVEY IS ENTERED INTO DRAWING FOR A

$25 VMWARE COMPANY STORE GIFT CERTIFICATE

Page 35: How a Cloud Computing Provider Reached the Holy Grail of Visibility

How a Cloud Computing Provider Reached the Holy Grail of Visibility

Elad Gotfrid, CloudShare

Leena Joshi, Splunk Inc

SPO3378

#vmworldsponsor