25
1 1 Derek Gorthy Senior Software Development Engineer Yuan Feng Software Development Engineer Empowering Zillow’s Developers with Self-Service ETL

Self-Service ETL Developers with Empowering Zillow’s

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Self-Service ETL Developers with Empowering Zillow’s

11

Derek GorthySenior Software Development Engineer

Yuan FengSoftware Development Engineer

Empowering Zillow’s Developers with Self-Service ETL

Page 2: Self-Service ETL Developers with Empowering Zillow’s

2

Who We Are

Zillow Offers Data Engineering Team @ Zillow

Derek GorthySenior Software Development Engineer, Big Data

Yuan FengSoftware Development Engineer, Big Data

Page 3: Self-Service ETL Developers with Empowering Zillow’s

3

Agenda

● How We Think About Self-Service ETL

● Core Components

● Self-Service ETL in Action at Zillow○ Zetlas○ Zagger

● Next Steps and Takeaways

Page 4: Self-Service ETL Developers with Empowering Zillow’s

Zillow

Page 5: Self-Service ETL Developers with Empowering Zillow’s

About Zillow

● Reimagining real estate to make it easier to unlock life’s next chapter

● Offer customers an on-demand experience for selling, buying, renting and financing with transparency and nearly seamless end-to-end service

● Most-visited real estate website in the United States

* As of Q4-2020

Page 6: Self-Service ETL Developers with Empowering Zillow’s

How We Think About Self-Service ETL

Page 7: Self-Service ETL Developers with Empowering Zillow’s

Zagger Integrations

Zagger Pipeline Utilities Package

User Interaction Zagger Managed Service

Integrations

Execution

Zetlas

DQ Module

API

Parser 1

Parser N

Airflow Renderer

... ...

Kafka Renderer

Page 8: Self-Service ETL Developers with Empowering Zillow’s

What Is Self-Service ETL?

User Interaction Pipeline

Configuration File

?

Page 9: Self-Service ETL Developers with Empowering Zillow’s

How We Think About Self-Service ETL

User Interaction Pipeline

Interpret Pipeline Metadata

Render

Configuration File

Opinionated Unopinionated

Page 10: Self-Service ETL Developers with Empowering Zillow’s

Core Components

Page 11: Self-Service ETL Developers with Empowering Zillow’s

User Interaction

User Interaction Pipeline

Interpret Pipeline Metadata

Render

Configuration File

Opinionated Unopinionated

Page 12: Self-Service ETL Developers with Empowering Zillow’s

Interpret User Input

User Interaction Pipeline

Interpret Pipeline Metadata

Render

Configuration File

Opinionated Unopinionated

Page 13: Self-Service ETL Developers with Empowering Zillow’s

Pipeline Metadata

User Interaction Pipeline

Interpret Pipeline Metadata

Render

Configuration File

Opinionated Unopinionated

Page 14: Self-Service ETL Developers with Empowering Zillow’s

Render Pipeline

User Interaction Pipeline

Interpret Pipeline Metadata

Render

Configuration File

Opinionated Unopinionated

Page 15: Self-Service ETL Developers with Empowering Zillow’s

Data Pipeline & Shared Integrations

User Interaction Pipeline

Interpret Pipeline Metadata

Render

Configuration File

Opinionated Unopinionated

Page 16: Self-Service ETL Developers with Empowering Zillow’s

Self-Service ETL in Action at Zillow

Page 17: Self-Service ETL Developers with Empowering Zillow’s

Applied Self-Service ETL - Zetlas

Motivation Features Target Users

● Modernized and reliable self-service tool to automate SQL based workflows

● No coding experience needed to create ETL workflows

● UI-driven

● Rapid prototyping and deployment

● Job monitoring/alerting

● Automated validation

● Integration with multiple internal services

● Scalable and expandable

● Data scientists

● Data analysts

Page 19: Self-Service ETL Developers with Empowering Zillow’s

Applied Self-Service ETL - Zagger

Motivation Features Target Users

● Provide a developer-friendly abstraction from ETL tools

● Create a service that automates data engineering ancillary tasks

● Create common processing patterns for fast pipeline development

● Integrates with Terraform

● Exposes create/delete endpoints for other access patterns

● Allows for custom interpreter creation

● Integration with multiple internal services

● Data engineers

● Data producer teams

Page 20: Self-Service ETL Developers with Empowering Zillow’s

Zagger Integrations

Zagger Pipeline Utilities Package

User Interaction Zagger Managed Service

Integrations

Execution

Zetlas

DQ Module

API

Parser 1

Parser N

Airflow Renderer

... ...

Kafka Renderer

Page 21: Self-Service ETL Developers with Empowering Zillow’s

Next Steps and Takeaways

Page 22: Self-Service ETL Developers with Empowering Zillow’s

Development Timeline

2019 2020 2021

Pipeler shared Spark processing

library development

Zetlas official launch in Zillow

Zagger Managed Service and Pipeline Utilities

Package library

User Growth for Zagger and Zetlas

ZETL retirement Zetlas and Zagger backend unification

Page 23: Self-Service ETL Developers with Empowering Zillow’s

Takeaways

● UI must be designed to meet the needs of its users

● Self-service ETL isn’t just for non-data engineers

● Modular platform design allows for capabilities to be developed in piecemeal

● Abstraction from tool-specific implementation gives flexibility

Page 24: Self-Service ETL Developers with Empowering Zillow’s

More From Zillow

Democratizing Data Quality Through a Centralized Platform5/27 @ 3:15 PM PST

Scaling AutoML-Driven Anomaly Detection With Luminaire5/27 @ 5:00 PM PST

Page 25: Self-Service ETL Developers with Empowering Zillow’s

Questions?Thank you!

https://www.zillow.com/careers/