27
Presto Summit Bangalore 2019 Martin Traverso, Dain Sundstrom, David Phillips

Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Presto Summit Bangalore 2019

Martin Traverso, Dain Sundstrom, David Phillips

Page 2: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Brief History

• First commit on August 8, 2012

• First production deployment in January 2013

• Open sourced in November 2013

• … bunch of things happened, and then…

• Presto Software Foundation founded in January 2019

Page 3: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

In the Beginning…

• “It is a good day when I can run 6 Hive queries” - a data scientist

• Peregrine (https://xrds.acm.org/article.cfm?aid=2331056)

Page 4: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

And Presto was Born• Team of 4 engineers

• Initial goal: “make interactive analytics over Hive data better”

• Vision

• Build a SQL warehouse engine capable of competing with the best commercial engines

• Make it open source

• For the long term (20+ years)

• Fast AND correct

Page 5: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Early Days

• First production version in ~6 months

• Support for SELECT with JOINs and aggregations

• We rewrote everything at least once in between

• Replaced Peregrine by July 2013

Page 6: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Some Missteps

• Modeling imports as materialized views, syntax and all

• Approximate queries (a la BlinkDB)

Page 7: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Some !Missteps

• ANSI SQL

• HTTP

• Plugins

Page 8: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Plugins

• Why?

• Clean separation between engine and storage

• FB was running forked version of Hive & HDFS

• We wanted to open source Presto eventually

• BEST. DECISION. EVER

Page 9: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Presto for User-Facing Apps

Website Aggregator

Analytics Front-end

Time-bucketedSummaries

Events

Query

Page 10: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Requirements• “UPSERT” semantics

• Normalized data set

• 15-way joins

• Large data set, but very selective queries

• Interactive latencies (< 5s)

• 24/7 availability

Page 11: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

What Storage?• Hive/HDFS not a good fit

• First attempt: HBase

• Inconsistent performance

• Breaking the abstraction

• Data mapping impedance

• Auto-scaling and auto-balancing?

• Sharded MySQL

Page 12: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Architecture

Aggregator

Analytics Front-end Presto

Loader

Sharded MySQL

Page 13: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Presto Gained…

• Index JOINs

• Table Layouts

• Latency improvements (a.k.a., avoid check-sleep loops)

• JDBC-based connectors

Page 14: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

A/B Testing Backend• Replace analytics backend of A/B testing framework

• Requirements

• Reliable data loads

• 5-10 minute load latency

• Consistent performance

• Seconds to minutes

• Very large data sets

Page 15: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Presto Gained…• Raptor connector

• Atomic loads

• Data organization

• Balancing, compaction, garbage collection

• INSERT

• DELETE

• Co-located JOINs

Page 16: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Presto for Batch

• Capacity and expectations

• Per cluster scalability limits

• Deploying risky changes

• UDFs

• Long running queries vs failures

• High memory queries

• Resource groups

• Local scheduling improvements

• Server-controlled session properties

• Lambda expressions

• Grouped execution

Challenges Presto Gained…

Page 17: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Presto for Batch

• By end of 2018…

• > 50% of workload running on Presto

• > 85% of jobs written for Presto

• Largest deployment of Presto at FB

Page 18: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Presto graduates…

Page 19: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Presto Software Foundation“An independent, non-profit organization with the mission of supporting a community of passionate users and developers devoted to the advancement of the Presto distributed SQL query engine for big data.”

“It is dedicated to preserving the vision of high quality, performant, and dependable software.”

“Ensuring the project remains open, collaborative and independent for decades to come”

Page 20: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Presto Community

• Github: https://github.com/prestosql

• Website: https://prestosql.io

• Blog: https://prestosql.io/blog

• Twitter: @prestosql

• Slack: prestosql.slack.com

Page 21: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Since the Launch…• Launched on January 31, 2019

• 19 releases (1-2 weeks between releases)

• 1600+ commits

• 200k lines changed (> 20% of the codebase)

• 900+ pull requests closed

• 50+ contributors

• 215+ weekly active members on Slack

Page 22: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Contributors

kokosing

raunaqmorarka

pgagnonMiguelWeezardo

MarvinCai

Praveen2112

chancez

hustnn

kasiafi

sopel39

stagraqubole

yui-knk

Yaliang

dain

11xor6

Lewuathe

garvit-gupta

VicoWu

qqibrow

findepi

pettyjamesm

martint

electrum

vincentpoon

wyukawa

guyco33bill-warshaw vkorukanti

anusudarsandilipkasana

sshardool linxingyuan1102

luohao

zhenxiao

rzeyde-varada

takezoe

kabunchiryanrupp

ilfrinChethanUK

ebyhrxumingming

Page 23: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers
Page 24: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Roadmap

Page 25: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Roadmap

• Dynamic

• Real world priorities and requirements

• What volunteers work on

• Not a wish list

• https://github.com/prestosql/presto/labels/roadmap

Page 26: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Roadmap

• Iceberg Connector

• Complex operation pushdown

• Dynamic filtering

• Dynamically-resolved functions

Page 27: Presto Summit Bangalore 2019 - Qubole · Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers

Getting Involved• Join Slack

• https://prestosql.io/community.html

• #troubleshooting channel

• File issues/bugs:

• https://github.com/prestosql/presto

• Write blog posts

• https://prestosql.io/blog