Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
11
Derek GorthySenior Software Development Engineer
Yuan FengSoftware Development Engineer
Empowering Zillow’s Developers with Self-Service ETL
2
Who We Are
Zillow Offers Data Engineering Team @ Zillow
Derek GorthySenior Software Development Engineer, Big Data
Yuan FengSoftware Development Engineer, Big Data
3
Agenda
● How We Think About Self-Service ETL
● Core Components
● Self-Service ETL in Action at Zillow○ Zetlas○ Zagger
● Next Steps and Takeaways
Zillow
About Zillow
● Reimagining real estate to make it easier to unlock life’s next chapter
● Offer customers an on-demand experience for selling, buying, renting and financing with transparency and nearly seamless end-to-end service
● Most-visited real estate website in the United States
* As of Q4-2020
How We Think About Self-Service ETL
Zagger Integrations
Zagger Pipeline Utilities Package
User Interaction Zagger Managed Service
Integrations
Execution
Zetlas
DQ Module
API
Parser 1
Parser N
Airflow Renderer
... ...
Kafka Renderer
What Is Self-Service ETL?
User Interaction Pipeline
Configuration File
?
How We Think About Self-Service ETL
User Interaction Pipeline
Interpret Pipeline Metadata
Render
Configuration File
Opinionated Unopinionated
Core Components
User Interaction
User Interaction Pipeline
Interpret Pipeline Metadata
Render
Configuration File
Opinionated Unopinionated
Interpret User Input
User Interaction Pipeline
Interpret Pipeline Metadata
Render
Configuration File
Opinionated Unopinionated
Pipeline Metadata
User Interaction Pipeline
Interpret Pipeline Metadata
Render
Configuration File
Opinionated Unopinionated
Render Pipeline
User Interaction Pipeline
Interpret Pipeline Metadata
Render
Configuration File
Opinionated Unopinionated
Data Pipeline & Shared Integrations
User Interaction Pipeline
Interpret Pipeline Metadata
Render
Configuration File
Opinionated Unopinionated
Self-Service ETL in Action at Zillow
Applied Self-Service ETL - Zetlas
Motivation Features Target Users
● Modernized and reliable self-service tool to automate SQL based workflows
● No coding experience needed to create ETL workflows
● UI-driven
● Rapid prototyping and deployment
● Job monitoring/alerting
● Automated validation
● Integration with multiple internal services
● Scalable and expandable
● Data scientists
● Data analysts
Applied Self-Service ETL - Zagger
Motivation Features Target Users
● Provide a developer-friendly abstraction from ETL tools
● Create a service that automates data engineering ancillary tasks
● Create common processing patterns for fast pipeline development
● Integrates with Terraform
● Exposes create/delete endpoints for other access patterns
● Allows for custom interpreter creation
● Integration with multiple internal services
● Data engineers
● Data producer teams
Zagger Integrations
Zagger Pipeline Utilities Package
User Interaction Zagger Managed Service
Integrations
Execution
Zetlas
DQ Module
API
Parser 1
Parser N
Airflow Renderer
... ...
Kafka Renderer
Next Steps and Takeaways
Development Timeline
2019 2020 2021
Pipeler shared Spark processing
library development
Zetlas official launch in Zillow
Zagger Managed Service and Pipeline Utilities
Package library
User Growth for Zagger and Zetlas
ZETL retirement Zetlas and Zagger backend unification
Takeaways
● UI must be designed to meet the needs of its users
● Self-service ETL isn’t just for non-data engineers
● Modular platform design allows for capabilities to be developed in piecemeal
● Abstraction from tool-specific implementation gives flexibility
More From Zillow
Democratizing Data Quality Through a Centralized Platform5/27 @ 3:15 PM PST
Scaling AutoML-Driven Anomaly Detection With Luminaire5/27 @ 5:00 PM PST
Questions?Thank you!
https://www.zillow.com/careers/