Upload
qubole
View
1.119
Download
0
Embed Size (px)
Citation preview
Cleveland Big Data and Hadoop User Group
Great Lakes Science Center
September 14, 2015
Jason Huang
Senior Solutions Architect, Qubole
Company Founding
Qubole founders built the Facebook data platform.
The Facebook model changed the role for datain an enterprise.
• Needed to turn the data assets into a “utility” to make a viable business.
– Collaborative: over 30% of employees use the data directly.
– Accessible: developers, analysts, business analysts or business users all running queries. Has made the company more data driven and agile with data use.
– Scalable: Exabyte's of data moving fast
It took the founders a team of over 30 people to create this infrastructure and currently the team managing this infrastructure has more than 100 people.
Work at Facebook inspired the founding of Qubole
Operations
Analyst
Marketing Ops
Analyst
Data
Architect
Business
Users
Product
SupportCustomer
Support
Developer
Sales Ops
Product
Managers
Data
Infrastructure
State of the Big Data Industry (n=417)
0%
10%
20%
30%
40%
50%
60%
70%
80%
Hadoop MapReduce Pig Spark Storm Presto Cassandra HBase Hive
Impediments for an Aspiring Data Driven Enterprise
Where Big
Data falls
short:
• 6-18 month implementation time
• Only 27% of Big Data initiatives are
classified as “Successful” in 2014
Rigid and inflexible
infrastructure
Non adaptive software services
Highly specialized
systems
Difficult to build and operate
• Only 13% of organizations achieve full-scale production
• 57% of organizations cite skills gap as a major inhibitor
Impediments for an Aspiring Data Driven Enterprise
What you need to work in the cloud:
Central
Governance &
Security
Internet
Scale
Instant
Deployment
Isolated
Multitenancy
Elastic
Object Store
Underpinnings
Demo
Qubole Case Study
Qubole Case Study
• 1 out of 3 employees
leverages Big Data
• Stores 60PB+ of data
• Logs 20TB+ of new data
per day
• Processes 3PB+ per day
over 2,000+ jobs
Qubole Case Study
Qubole Case Study
Why Hive?
“Qubole has enabled more
users within Pinterest to
get to the data and has
made the data platform lot
more scalable and stable”
Mohammad Shahangian
Lead, Data Science
and Infrastructure
Hive
Metastore
Pig
Cascading
Hive
HDFS/S3
Hive’s metastore serves as the canonical source of truth for all Hadoop jobs
Metadata Data
Qubole Case Study
Qubole Case Study
Operations
Analyst
Marketing
Ops
Analyst
Data
Architect
Busines
s
Users
Product
SupportCustomer
Support
Developer
Sales Ops
Product
Managers
Ease of use for analysts
• Dozens of Data
Scientist and
Analyst users
• Produces double-
digit TBs of data
per day
• Does not have
dedicated staff
to setup and
manage clusters
and Hadoop
Distributions
010110101010
Qubole Case Study
Qubole Case Study
Producers Continuous Processing Storage Analytics
CDN
Real Time
Bidding
Retargeting
Platform
ETL
Kinesis S3 Redshift
Machine LearningStreaming
Customer Data
Why Spark?
010110101010
010110101010
010110101010
“Qubole put our cluster
management, auto-scaling
and ad-hoc queries on
autopilot. Its higher
performance for Big Data
queries translates directly
into faster and more
actionable marketing
intelligence for our
customers.”
Yekesa Kosuru
VP, Technology
Qubole Case Study
Qubole Case Study
• Designed for
scientists &
clinicians
• Leveraging
massive
datasets from
institutes,
public sources
and more…
• Cloud-based
product
delivered via
web
Qubole Case Study
Qubole Case Study
"Our customers have varying
needs: clinical researchers
might use GenePool to
examine genomic data from a
single patient, while a major
research institution might use
the platform to perform
analyses over 10,000 patients
at once”
Anish Kejariwal - Senior Director of
Engineering• Unified Metadata
• Auto-Scaling
• Spot Optimized
• Policy Keeper
• Cloud Tuned
• Cluster Lifecycle Management
Developer
CenterAnalyst Workbench UI Policy, Governance &
Security Center
QDS Unified Control Panel
QDS Data Engines
Why Presto?