Self-Service ETL Developers with Empowering Zillow’s

Preview:

Citation preview

11

Derek GorthySenior Software Development Engineer

Yuan FengSoftware Development Engineer

Empowering Zillow’s Developers with Self-Service ETL

2

Who We Are

Zillow Offers Data Engineering Team @ Zillow

Derek GorthySenior Software Development Engineer, Big Data

Yuan FengSoftware Development Engineer, Big Data

3

Agenda

● How We Think About Self-Service ETL

● Core Components

● Self-Service ETL in Action at Zillow○ Zetlas○ Zagger

● Next Steps and Takeaways

Zillow

About Zillow

● Reimagining real estate to make it easier to unlock life’s next chapter

● Offer customers an on-demand experience for selling, buying, renting and financing with transparency and nearly seamless end-to-end service

● Most-visited real estate website in the United States

* As of Q4-2020

How We Think About Self-Service ETL

Zagger Integrations

Zagger Pipeline Utilities Package

User Interaction Zagger Managed Service

Integrations

Execution

Zetlas

DQ Module

API

Parser 1

Parser N

Airflow Renderer

... ...

Kafka Renderer

What Is Self-Service ETL?

User Interaction Pipeline

Configuration File

?

How We Think About Self-Service ETL

User Interaction Pipeline

Interpret Pipeline Metadata

Render

Configuration File

Opinionated Unopinionated

Core Components

User Interaction

User Interaction Pipeline

Interpret Pipeline Metadata

Render

Configuration File

Opinionated Unopinionated

Interpret User Input

User Interaction Pipeline

Interpret Pipeline Metadata

Render

Configuration File

Opinionated Unopinionated

Pipeline Metadata

User Interaction Pipeline

Interpret Pipeline Metadata

Render

Configuration File

Opinionated Unopinionated

Render Pipeline

User Interaction Pipeline

Interpret Pipeline Metadata

Render

Configuration File

Opinionated Unopinionated

Data Pipeline & Shared Integrations

User Interaction Pipeline

Interpret Pipeline Metadata

Render

Configuration File

Opinionated Unopinionated

Self-Service ETL in Action at Zillow

Applied Self-Service ETL - Zetlas

Motivation Features Target Users

● Modernized and reliable self-service tool to automate SQL based workflows

● No coding experience needed to create ETL workflows

● UI-driven

● Rapid prototyping and deployment

● Job monitoring/alerting

● Automated validation

● Integration with multiple internal services

● Scalable and expandable

● Data scientists

● Data analysts

Applied Self-Service ETL - Zagger

Motivation Features Target Users

● Provide a developer-friendly abstraction from ETL tools

● Create a service that automates data engineering ancillary tasks

● Create common processing patterns for fast pipeline development

● Integrates with Terraform

● Exposes create/delete endpoints for other access patterns

● Allows for custom interpreter creation

● Integration with multiple internal services

● Data engineers

● Data producer teams

Zagger Integrations

Zagger Pipeline Utilities Package

User Interaction Zagger Managed Service

Integrations

Execution

Zetlas

DQ Module

API

Parser 1

Parser N

Airflow Renderer

... ...

Kafka Renderer

Next Steps and Takeaways

Development Timeline

2019 2020 2021

Pipeler shared Spark processing

library development

Zetlas official launch in Zillow

Zagger Managed Service and Pipeline Utilities

Package library

User Growth for Zagger and Zetlas

ZETL retirement Zetlas and Zagger backend unification

Takeaways

● UI must be designed to meet the needs of its users

● Self-service ETL isn’t just for non-data engineers

● Modular platform design allows for capabilities to be developed in piecemeal

● Abstraction from tool-specific implementation gives flexibility

More From Zillow

Democratizing Data Quality Through a Centralized Platform5/27 @ 3:15 PM PST

Scaling AutoML-Driven Anomaly Detection With Luminaire5/27 @ 5:00 PM PST

Questions?Thank you!

https://www.zillow.com/careers/

Recommended