38
© 2019 Snowflake Inc. All Rights Reserved ARCHITECTING MODERN DATA APPLICATIONS MEGAN SCHOENDORF SNOWFLAKE BRAD CULBERSON TWILIO VIKRAM KAPOOR LACEWORK

ARCHITECTING MODERN DATA APPLICATIONS · 2019-12-01 · Lacework automates security across AWS, Azure, GCP, and private clouds, providing a comprehensive view of risks across cloud

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

© 2019 Snowflake Inc. All Rights Reserved

ARCHITECTING MODERN

DATA APPLICATIONS

MEGAN SCHOENDORF SNOWFLAKE

BRAD CULBERSON TWILIO

VIKRAM KAPOOR LACEWORK

© 2019 Snowflake Inc. All Rights Reserved

MEGAN SCHOENDORFSenior Engineer

Snowflake, Ex-Stride

BRAD CULBERSONSr Principal Engineer

Twilio Sendgrid

VIKRAM KAPOORCo-founder and CTO

Lacework

© 2019 Snowflake Inc. All Rights Reserved

AGENDA

3

SOAMicro (just right) Services

CI/CDDeploy early, often, and automatically

Twilio Sendgrid

Lacework

Q/A

© 2019 Snowflake Inc. All Rights Reserved

SERVICE ORIENTED ARCHITECTUREMicro (just right) Services

© 2019 Snowflake Inc. All Rights Reserved

MICRO (JUST RIGHT) SERVICES

5

Data

Apps

Services

Internal

Organization

• Encapsulation of logic

• Reusable components

• Loosely coupled

Independently...

• Architected

• Deployed

• Scaled

© 2019 Snowflake Inc. All Rights Reserved

MICRO (JUST RIGHT) SERVICES

6

AppsInternal

Why is This Hard?

1. Overhead

2. Database concurrency

limitations

3. Independently deployable Services

Data

© 2019 Snowflake Inc. All Rights Reserved

SNOWFLAKE

© 2019 Snowflake Inc. All Rights Reserved

SNOWFLAKE COMPLEMENTS A SOA

Multi-Cluster ComputeAutoscale the number of

warehouses to meet concurrency

demand

Storage and ComputeTune your services by selecting

the right size warehouse for the job

Software as a ServiceFully hosted + Snowpipe + Tasks

= fewer services to manage

Cloud Agnostic Layer

Multi-Cluster

Compute

Scale Out Services

Centralized

Storage

8

Locking PolicyReads and inserts are non-

blocking. Updates and deletes

lock micropartitions

© 2019 Snowflake Inc. All Rights Reserved

CI/CDDeploy early, often, and automatically

© 2019 Snowflake Inc. All Rights Reserved

CI/CD

2

3

4

5

6

0 Develop locally in a

branchOpen pull request

Review, compile, test, build

Merge

Deploy to Staging

Final tests and approvals

Deploy to Prod

1

Continuous Integration (CI) Continuous Delivery (CD)

1 2 3 4 5 60

10

© 2019 Snowflake Inc. All Rights Reserved

CI/CDWhat Slows Down Deployments?

• Anything that needs to happen in step that cannot happen in step

• Schema + data migrations that require coordinated deploys or downtime

How do you fix?

• Require all tests to happen before merging to master

• Never make backwards incompatible schema changes

11

25

Continuous Integration (CI) Continuous Delivery (CD)

1 2 3 4 5 60

Review, compile, test, build Final tests

© 2019 Snowflake Inc. All Rights Reserved

SNOWFLAKE

© 2019 Snowflake Inc. All Rights Reserved

SNOWFLAKE COMPLEMENTS CI/CD

Unstructured DataPerformant queries on

VARIANT columns means

you have the benefits of

a NoSQL database

Zero Copy ClonesClone an entire database in

seconds to run unit, integration,

and regression tests in an

isolated environment.

Cloud Agnostic Layer

Multi-Cluster

Compute

Scale Out Services

Centralized

Storage

13

© 2019 Snowflake Inc. All Rights Reserved

TWILIO SENDGRID

© 2019 Snowflake Inc. All Rights Reserved

ABOUT TWILIO SENDGRID

15

Send more than 50B transactional and marketing emails every month

• Founded in 2009

• IPO in Nov 2017 ~40% Average growth

• Twilio acquisition closed Feb 2019 ~80% Average growth

Marketing & Growth Product

• Product for Marketers to leverage Mail Pipeline for Email and Ad Campaigns

• Store contacts and events for customers - with segmentation

• Schedule and send Email or Ad Campaigns to targeted audiences

• $5.2M in Q4 2018 double over 2017

16© 2019 Snowflake Inc. All Rights Reserved

Customer Facing Application with HA

• Store contacts and events with custom

attributes

• Flexible and Fast contact segmentation

Goals:

• Highly Durable and Available

• Fast Ingest and Egress

• Sustain growth at 100%

REQUIREMENTSMarketing Campaigns

17© 2019 Snowflake Inc. All Rights Reserved

• Moving out of our 3 Data Centers

• Very large contract and commit with AWS

• Access to AWS architects, support, and

product teams

• Need dramatically faster product velocity

OTHER FACTORSMarketing Campaigns

© 2019 Snowflake Inc. All Rights Reserved

WHY SNOWFLAKE

Storage and Compute Separation

• Large usage variability weekly/seasonally

• Can EASILY add compute w/o

redistributing data

• Data at rest is vastly cheaper

• Can add other workloads to database w/o

impacting customers

• Ingress, Egress, BI, ML

Performance

• Columnar

indexes and

clustering allows

for FAST

segmentation

• Supports high

concurrency on

demand

VARIANT Type

• Allows us to

store AND

query custom

attributes easily

18

© 2019 Snowflake Inc. All Rights Reserved

WHY SNOWFLAKE

Uptime

• Good enough

uptime in

west-2 for Beta

Client Libraries

• Native Golang

driver very stable

Synchronous Usage

• Could put synchronous load on the

warehouses from our application

• Auto scaling of warehouse critical along

with the index and clustering performance

19

© 2019 Snowflake Inc. All Rights Reserved

ARCHITECTURE

20

CONTACT INGEST

Prod DB

KinesisKinesis FirehoseS3

EVENT INGEST

Virtual Warehouse Lambda

SEGMENTATION

Virtual Warehouse

API Gateway Lambda SQSAPI Gateway Lambda

S3 EC2

Dynamo

Lambda

Kinesis Firehose

S3Virtual Warehouse

Kafka

21© 2019 Snowflake Inc. All Rights Reserved

Developers LOVE Snowflake

• SQL Compliant

• It just works

Time Travel

• Truncate table in production

Complements OLTP

• Stores all the long term data

• Available in background to populate

DynamoDB “caches”.

SURPRISEMarketing Campaigns

22© 2019 Snowflake Inc. All Rights Reserved

RESULTS

Product is in Production with

o v e r

2000Paid Customers

Product Requirements Exceeded

• Ingress and Egress faster

• Segmentation is more capable

• Mail and Custom Events are marginal

load and affordable

• Much simpler to develop new

features

• Undifferentiated heavy lifting

minimized

© 2019 Snowflake Inc. All Rights Reserved

LACEWORK

“Through 2022, at least 95% of

cloud security failures will be the

fault of the customer.” –

“If you can configure it, you need to

secure it.” –

Security *of*the cloud

Security *in*the cloud

Cloud Security: The Shared Responsibility Model

24

Cloud Security : Challenges

Dynamic Environment

Containers & PAAS

Encryption & Low Network Visibility

New Attack Surfaces

25

Lacework automates security across AWS, Azure, GCP, and private clouds, providing a

comprehensive view of risks across cloud workloads and containers. Lacework’s unified

cloud security platform provides unprecedented visibility, automates intrusion detection,

delivers one-click investigation, and simplifies cloud compliance.

Lacework’s Unified Cloud Security Platform

26

ObservationsApplication launch

Initiated connection

External IP calls

Information exchanged

Configurations

File changes

Threat intelligence

Scoredalerts

Compliance violations

Visibilitygraphical view

All data to investigate

Lacework’s Unified Cloud Security Platform

POLYGRAPH™

Users

Apps

Processes

Containers

Accounts

Files

Hosts

Connections

• Analyze behaviors

• Define rules of normal behavior

• Detect deviations

• Decide whether anomaly is a threat

27

Cloud security: A big data problem

28

COMPLIANCE

RESOURCES

356K

CLOUDTRAIL

LOGS

14.2MPROCESSES

797MCONNECTIONS

234BFILES

SCANNED

210M

Incoming Data | Past 30 days

Snowflake

Using Snowflake as a Data warehouse for our application

• We are building a Security App (not a BI tool)

Using it for last 4 years

• 2 years in production

No planned downtime

• 24x7x365 availability

29

Architecture

Agents

Snowflake

Lacework Query service - SQL Generation

Lacework UIData collection Data Processing

Load Batch UI

Web

30

How we use Snowflake

31

DBs

• 1 per customer

Security

• Network ACL

Import/Export

• S3 as staging

Warehouses -- Sharded

• Multi-tenant processing

• Separate different kinds of workloads

• Load

• Batch

• UI

• Helps us better meet SLA @ cost

Sizes vary based on shard needs

How we use Snowflake

LaceworkQuery Service - SQL Generation

Lacework UIData

collectionBatch

Load Batch UI

32

How we use Snowflake

33

21 sharded Warehouses

• 8 Load Warehouses (x-small, medium)

• 8 Batch Warehouses (x-small, medium)

• 4 (Load + Batch) (x-small)

• 1 UI Queries Warehouse (medium - x-large)

Max cluster size

• Typically set to 2

How we use Snowflake

34

3 unique freedoms of scalability

• We can re-shard anytime by adding or removing WHs.

• We can increase WH size, if needed

• We can add ‘Max Cluster count’ if needed

Shared storage in ‘S3’

• Means we don’t have to move data to scale

We can adjust in seconds!

• Metrics based Auto-scaler implemented now

How we use Snowflake

35

Table writes

• All writes for 1 table happen at 1 WH

Table Reads

• Can happen from 2 WHs max

• Maximizes cache efficiency, while reducing contention

Use JSON Support for flexible schemas

Scale today, Ready for 10x

15 TB of Data

per Day

(uncompressed)

200k Statements

per Hour

1 PB

Aggregated Data

(compressed)

5T

Aggregated Rows

36

© 2019 Snowflake Inc. All Rights Reserved

Q&A

© 2019 Snowflake Inc. All Rights Reserved

Thank You