Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Presto Summit Bangalore 2019
Martin Traverso, Dain Sundstrom, David Phillips
Brief History
• First commit on August 8, 2012
• First production deployment in January 2013
• Open sourced in November 2013
• … bunch of things happened, and then…
• Presto Software Foundation founded in January 2019
In the Beginning…
• “It is a good day when I can run 6 Hive queries” - a data scientist
• Peregrine (https://xrds.acm.org/article.cfm?aid=2331056)
And Presto was Born• Team of 4 engineers
• Initial goal: “make interactive analytics over Hive data better”
• Vision
• Build a SQL warehouse engine capable of competing with the best commercial engines
• Make it open source
• For the long term (20+ years)
• Fast AND correct
Early Days
• First production version in ~6 months
• Support for SELECT with JOINs and aggregations
• We rewrote everything at least once in between
• Replaced Peregrine by July 2013
Some Missteps
• Modeling imports as materialized views, syntax and all
• Approximate queries (a la BlinkDB)
Some !Missteps
• ANSI SQL
• HTTP
• Plugins
Plugins
• Why?
• Clean separation between engine and storage
• FB was running forked version of Hive & HDFS
• We wanted to open source Presto eventually
• BEST. DECISION. EVER
Presto for User-Facing Apps
Website Aggregator
Analytics Front-end
Time-bucketedSummaries
Events
Query
Requirements• “UPSERT” semantics
• Normalized data set
• 15-way joins
• Large data set, but very selective queries
• Interactive latencies (< 5s)
• 24/7 availability
What Storage?• Hive/HDFS not a good fit
• First attempt: HBase
• Inconsistent performance
• Breaking the abstraction
• Data mapping impedance
• Auto-scaling and auto-balancing?
• Sharded MySQL
Architecture
Aggregator
Analytics Front-end Presto
Loader
Sharded MySQL
Presto Gained…
• Index JOINs
• Table Layouts
• Latency improvements (a.k.a., avoid check-sleep loops)
• JDBC-based connectors
A/B Testing Backend• Replace analytics backend of A/B testing framework
• Requirements
• Reliable data loads
• 5-10 minute load latency
• Consistent performance
• Seconds to minutes
• Very large data sets
Presto Gained…• Raptor connector
• Atomic loads
• Data organization
• Balancing, compaction, garbage collection
• INSERT
• DELETE
• Co-located JOINs
Presto for Batch
• Capacity and expectations
• Per cluster scalability limits
• Deploying risky changes
• UDFs
• Long running queries vs failures
• High memory queries
• Resource groups
• Local scheduling improvements
• Server-controlled session properties
• Lambda expressions
• Grouped execution
Challenges Presto Gained…
Presto for Batch
• By end of 2018…
• > 50% of workload running on Presto
• > 85% of jobs written for Presto
• Largest deployment of Presto at FB
Presto graduates…
Presto Software Foundation“An independent, non-profit organization with the mission of supporting a community of passionate users and developers devoted to the advancement of the Presto distributed SQL query engine for big data.”
“It is dedicated to preserving the vision of high quality, performant, and dependable software.”
“Ensuring the project remains open, collaborative and independent for decades to come”
Presto Community
• Github: https://github.com/prestosql
• Website: https://prestosql.io
• Blog: https://prestosql.io/blog
• Twitter: @prestosql
• Slack: prestosql.slack.com
Since the Launch…• Launched on January 31, 2019
• 19 releases (1-2 weeks between releases)
• 1600+ commits
• 200k lines changed (> 20% of the codebase)
• 900+ pull requests closed
• 50+ contributors
• 215+ weekly active members on Slack
Contributors
kokosing
raunaqmorarka
pgagnonMiguelWeezardo
MarvinCai
Praveen2112
chancez
hustnn
kasiafi
sopel39
stagraqubole
yui-knk
Yaliang
dain
11xor6
Lewuathe
garvit-gupta
VicoWu
qqibrow
findepi
pettyjamesm
martint
electrum
vincentpoon
wyukawa
guyco33bill-warshaw vkorukanti
anusudarsandilipkasana
sshardool linxingyuan1102
luohao
zhenxiao
rzeyde-varada
takezoe
kabunchiryanrupp
ilfrinChethanUK
ebyhrxumingming
Roadmap
Roadmap
• Dynamic
• Real world priorities and requirements
• What volunteers work on
• Not a wish list
• https://github.com/prestosql/presto/labels/roadmap
Roadmap
• Iceberg Connector
• Complex operation pushdown
• Dynamic filtering
• Dynamically-resolved functions
Getting Involved• Join Slack
• https://prestosql.io/community.html
• #troubleshooting channel
• File issues/bugs:
• https://github.com/prestosql/presto
• Write blog posts
• https://prestosql.io/blog