29
Eric Lubow @elubow [email protected] Big Architectures for Big Data

C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Embed Size (px)

DESCRIPTION

Having many different technologies within an organization can be problematic for developers and operations alike. Structuring those systems into discrete modules not only abstracts away a lot of the complexity of a heterogeneous architecture, it also allows the evolution of systems using common access and storage patterns. This session will discuss how to think about, architect, and maintain a service architecture for a big data system.

Citation preview

Page 1: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Eric Lubow

@elubow

[email protected]

Big Architecturesfor Big Data

Page 2: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Overvie• SimpleReach

• Goals

• Tools

• Architecture Implementation

Page 3: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

The 2 Truths

Page 4: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Even with the right tools, 80% of the work of building a big data system is acquiring and refining

The Real Truth

Page 5: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Page 6: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Page 7: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

• Millions of URLs per day

• Over 1.25 billion page views per month

• 500m events per day (~6k events/second)

• Auto-scale 125-160 machines depending on traffic

SimpleReach

Page 8: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

And It Goes Like This...

C*

Vertica

Page 9: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Goals• Consistent non-data storage layer access patterns

• Data accuracy across storage engines

• Minimize downtime/Minimize cost of downtime

• High availability

• Allow access to many toolsets (for all languages, DBs, Engines)

• Clients should have minimal architecture knowledge

Page 10: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Consistent Access Patterns

realtime_score

(‘score’, ‘realtime’)

Page 11: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Authentication, Tracking,

Per service access keys

Track call volume by access key

Prevent internal denial of service

Monitor availability and performance

Page 12: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Controlled Data Flow

Social Event Collector

Social Data

Batch & Write Processed DataBatch & Write Raw Data

Calculate Score

Write

NSQ Multicast NSQ NSQ

Page 13: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

NSQ by Bit.ly• Distributed and de-centralized topology

• At least once delivery guaranteed

• Multicast style message routing

• Runtime discovery for consumers to find producers

• Allow for maintenance windows with no downtime

Page 14: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Path of a Packet

InternetEC

Inte

rnal

API

Solr

C*

Mong

Redis

Vertic

API

Fire Hos

SC

Cons

umer

s

Que

ue

Page 15: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Evolution Takes Work• Know your access patterns

• Service Oriented Architecture (Internal API)

• Data accuracy checks: visual and programmatic

• Built framework for testing out engines (Storage, Queueing, etc)

Page 16: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Homogeneous Machines at Base Application

Base AMI

Organizational Base

Event Collection

NSQ

Mongos

App Config

Users

Monitoring

Consumer

NSQ

Mongos

App Config

Users

Base Image Layout Producer Consumer

Amazon Linux

Monitoring

Amazon Linux

Application Group

Page 17: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

DevOps Wizardry• Extensive use of AWS

• Monitor: Nagios, Statsd, and Graphite

• Manage: Chef, OpsWorks, cSSHx, Vagrant

• Deployments

Page 18: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Evolving Amazon Tools• Full Featured API

• OpsWorks

• Cloud Formation

• S3 / CloudFront

• Elastic Beanstalk

• Elastic MapReduce

Page 19: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Service

Internal API

Solr

Real-timeC*

C*

Vertica

Page 20: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Service Architecture MachinesApplication

Base AMI

Organizational Base

iAPI Front End

nginx

App Config

Users

Monitoring

Data Store

App Config

Users

Base Image Layout Proxy Machines Storage Machines

Amazon Linux

Monitoring

Amazon Linux

Application Group

Page 21: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Anatomy of an Endpoint

Mong

Mong

Vertic

C*

C*

hour

lyco

nten

t Mong

Mong

Vertic

C*

C*ten

min

ute

cont

ent

Que

ryin

g M

achi

nes

Helen

Helen

PyVertic

PyMon

PyMon

PyVertic

Page 22: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Endpoint Breakout • Availability

• Consistent Access Patterns

• Minimal downtime changes

• Smaller code deploys

• Non-monolithic code base

Page 23: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Architecture DistributionUS-EAST-1a

MONGO-SHARD-0001-B

MONGO-SHARD-0000-A

CASSANDRA-0001

CASSANDRA-0010

REDIS-0001A

VERTICA-0001

iAPI-0001

US-EAST-1b

MONGO-SHARD-0002-B

MONGO-SHARD-0001-A

CASSANDRA-0002

CASSANDRA-0011

REDIS-0001B

iAPI-0002

US-EAST-1e

MONGO-SHARD-0002-A

MONGO-SHARD-0000-B

CASSANDRA-0003

CASSANDRA-0012

VERTICA-0003

iAPI-0003

VERTICA-0002

Page 24: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Problems?

Page 25: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

The Schrute of the Problem

Page 26: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

New Service Questions• Can its host be completely homogenous?

• Can it accept downtime (and what should downtime look like)?

• Does it fit into an existing service?

• Does it require datacenter distribution?

Page 27: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Summary• Solutions Require Evolution

• Build, Use, and Integrate Tools

• Abstraction

• Homogeneous Distribution

• Monitoring & Automation

Page 28: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

We’re (Ask about Food Coma Fridays)

Page 29: C* Summit 2013: Big Architectures for Big Data by Eric Lubow

Big Architectures for Big Data

Eric Lubow @elubow #Cassandra13

Questions are guaranteed in life.Answers aren’t.

Eric Lubow

@elubow

[email protected]

Thank you.