Upload
looker
View
96
Download
0
Embed Size (px)
Citation preview
1
Power to the People: A Stack to Empower Every User to
Make Data-Driven Decisions
Housekeeping
• We will do Q&A at the end.
• You should see a box on the right
side of your screen.
• There is a button marked “Q&A” on
the bottom menu.
• We are recording this
• We will send you the recording & slides
tomorrow.
Recording Q&A
Zev Lebowitz Senior Sales Engineer
Daniel de Sybel CTO
Meet Our Presenters
Karol Ussher Head of Technology Partnerships, EMEA
AGENDA
1.
2.
3.
Meet Google BigQuery
Meet Looker
Case Study: Data-driven Decisions at Infectious Media
Meet Google BigQuery
Google confidential Do not distribute
What is Google BigQuery?
Durable and Highly Available
Convenience of SQL
Petabyte-scale Storage and Queries
Fully Managed, Serverless Enterprise Data Warehouse
BigQuery for Enterprise Features
SQL Flat-rate Pricing Standard SQL
ODBC & JDBC
Connectors
DML Identity Access and Management
Stackdriver
Google confidential Do not distribute
2012 2013 2002 2004 2006 2008 2010
Google Research Publications referenced are available here: http://research.google.com/pubs/papers.html
GFS
MapReduce
BigTable
Google Research in Data Technologies
Colossus
Dremel Flume
Megastore
Spanner
Millwheel
PubSub
F1
Now: Typical Big Data Tasks
Next: Big Data with Google
No-Ops Auto Everything
Analysis and Insights
Resource provisioning
Performance tuning
Monitoring
Reliability Deployment & configuration
Handling growing
scale
Utilization improvements
Analysis and Insights
Understanding
Google confidential Do not distribute
Think about the Data Warehouse
Laura
Dremel BigQuery
Confidential & Proprietary Google Cloud Platform 12
Analyze Store Capture
BigQuery (SQL)
Process
Cloud Dataflow (stream and batch)
Cloud Storage (objects)
Cloud Datastore (NoSQL)
BigQuery Storage
(structured)
Cloud Dataproc (Hadoop & Ecosystem)
Cloud Bigtable (NoSQL HBase)
Cassandra hBase MongoDB Rabbit MQ Kafka
Cloud 2.0
Cloud 3.0
Visualize
Cloud DataLab (iPython/Jupyter)
Looker
Pub/Sub Logs
BQ Streaming
App Engine
Cloud SQL (SQL)
Cloud Machine Learning
Focus on the Analysis not the Maintenance
Confidential & Proprietary Google Cloud Platform 13
"We are very excited about the productivity benefits offered by Cloud Dataflow and Cloud Pub/Sub. It took half a day to rewrite something that had previously taken over six months to build using Spark"
Paul Clarke, Director of Technology, Ocado
http://googlecloudplatform.blogspot.co.uk/2015/08/Announcing-General-Availability-of-Google-Cloud-Dataflow-and-Cloud-Pub-Sub.html
Confidential & Proprietary Google Cloud Platform 14
“Spotify chose Google in part because its services for analyzing large amounts of data, tools like BigQuery, are more advanced than data services from other cloud providers.” Nicholas Harteau, VP of Infrastructure, Spotify
https://labs.spotify.com/2016/02/25/spotifys-event-delivery-the-road-to-the-cloud-part-i/
Confidential & Proprietary Google Cloud Platform 15
“Right at the start of the partnership we were able to reduce time to insight from 96 hours to 30 minutes by using BigQuery.”
– Gary Sanders, Head of Digital Analytics, Lloyds Banking Group
Meet Looker
Makes it easy for everyone to find, explore and
understand the data that drives your
business.
A Data Analytics platform that...
DATA BOTTLENECK
Which features increase
engagement?
What triggers a customer
churn?
Which web page works
best?
How is pipeline for
Q4?
Will we meet our revenue
targets?
Which customer is at
risk?
Which campaigns
convert best?
Which rep is converting
best?
Can we speed up our
operations? Are we
investing in the right area?
Who are our happiest
customers?
What industries are we doing
well in?
Where should we spend
more budget?
DATA CHAOS
? ?
? ?
IS THERE A WAY TO FIND BALANCE?
Standards
Scalability
Governance
Self-Service
Agility
Flexibility
THE TECHNICAL PILLARS THAT MAKE IT POSSIBLE
100% In Database
Leverage all your data Avoid summarizing or
moving it
Modern Web Architecture
Access from anywhere Share and collaborate
Extend to anyone
LookML Intelligent Modeling Layer
Describe the data Create reusable and
shareable business logic
LOOKER: A DATA PLATFORM
Find, explore and understand all the data
Explore Everything Find, explore and
understand all the data
Create Standards Define your data and
business metrics
Any SQL Database Analyze all of your data
where it is stored
Build a Data Culture Anyone can ask and
answer questions
How is pipeline for
Q4?
Will we meet our revenue
targets?
Which campaigns
convert best?
Which rep is converting
best?
Which customer is at
risk?
Can we speed up our
operations?
Looker - BigQuery Integration Highlights
In-Database Architecture
The power of BigQuery is directly leveraged by
Looker because all transformation is done in-
database
Support for Native BigQuery Functions
Integration with unique features to BigQuery in the product and modeling layer
make for a seamless integration.
Highest Level of Looker Features
We’ve invested in providing Looker features for BigQuery to make the best experience possible.
Data-Driven Decisions at Infectious Media
OUR BUSINESS
● Founded in 2008
● Leading International Programmatic agency
● Covering all biddable media
● Activity live in 30+ markets
● Highly customisable O&O technology stack – DMP & DSP
● Transparent model
Impression Desk OUR DATA-DRIVEN ADVERTISING PLATFORM THAT PROVIDES FULL ACCESS TO THE FRAGMENTED LANDSCAPE OF INVENTORY AND DATA
BIDDER
BIDDERS
Data Processing • 4k requests / sec @ 1kb = 4Mbps
(0.4Tb / day) • 500k requests / sec @ 1kb = 0.5Gbps
(40Tb / day)
RTB: The Data Problem
Analytics • Impression level data is a goldmine • Anything that doesn’t fit in Excel
generally needs techie help
Infobright Community Edition • Fantastic open source columnar database • Could be easily installed in Amazon Web Services on a single server • Used standard SQL for queries
Where we started...
Problems • Concurrency wasn’t great • Single threaded • Could only manage around 1-2TB of data • Data load could be slow
Infobright Enterprise Edition • Simple upgrade path • Multi-threaded • Parallel data loads
Up next...
Problems • Concurrency still wasn’t great • Not cloud native • Licence costs grew linearly with data volume
Hadoop • Everyone else is doing it • No licence costs • Perfect for cloud deployment
From there...
Problems • Analysts had to learn new ways of writing queries • Concurrency was non-existent • Server costs were difficult to control • Took an army of infrastructure engineers to maintain it
Enter
Why? • Probably processes the most data in the world • No infrastructure engineers required • Cloud native • Oh, and…
Before BQ • 20 mins to query 1 month of data • Stored < 5Tb of data • 1 infrastructure engineer to manage
server • 2 data engineers to manage data • 3 analysts to query data
Some Stats
After BQ • 2 mins to query 3 months of data • Store > 50Tb of data • 0 infrastructure engineers (no-one
cares about the backend) • 1 data engineer to manage data • 6 analysts to query data
They cost the same!
Something missing
• Optimisation managers still had to go to Analytics to ask questions • Slowed down campaign optimisations and insights • Led to impatience and frustration
• Elegant abstraction of our perfect DW via LookML • Safe data exploration for Optimisers without needing Analysts • Simple automated queries to email or import into Excel for clients • Easy extension and evolution of data model with db • Wait... user defined dashboards?
Enter
Optimisers looking to extend travel campaign to Paris
Compared Paris audience with existing London audience
Use insight to create new strategy
Sped up optimal campaign creation by a week
Audience Comparison
Dashboard can pinpoint problems on sites/exchanges
Identifying fraud/brand safety early reduces wasted spend
Problem sites/exchanges added to blocklists
Traders need to tackle arms race with fraudsters
Fraud and Brand Safety
Ongoing work
• Costs have quickly increased Built cost monitoring dash in Looker Investigating flat rate pricing
• Release of standard SQL Has made queries faster Requires a migration in LookML
• Release of BigQuery regions Allows better data governance But creates problems for querying across region
Final thoughts
• Scale is the constant enemy • Scale makes even simple questions require smart
solutions • BigQuery handles the scale most use Hadoop for • Layering on Looker allows your team to get more
answers, not more problems
Q&A
THANK YOU FOR JOINING
Recording and slides will be posted.
We will email you the links tomorrow.
Our Next Webinar: Parse.ly & Looker
Beyond the Dashboard: What You Can Learn From Raw
Audience Data on Thursday
See how Google BigQuery and Looker work with your data.
Visit cloud.google.com/free-trial and looker.com/free-trial or
email [email protected].
41
Thank you!