Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Preview:

Citation preview

MOVINGMOUNTAINS OF

PLAYER DATASEAN MALONEY

RIOT GAMES @SEAN_SEANNERY

SCALABLE INTERNET SERVICESUCLA/UCSB - NOV 2015

SEAN MALONEYBIG DATA ENGINEER

WHO IS THIS GUY?

Lead developer on Riot’s ETL tools

FUN FACT:Was a student in this class 4 years agoIntern at Appfolio

MOVING MOUNTAINS OF DATAINTRODUCTION1.

THE GAME PLATFORM: OUR MAIN DATA SOURCE2.

HOW WE INGEST AND QUERY DATA3.

HOW WE SCALE IN AWS4.

CONCLUSION - SEAN’S PRO TIPS5.

INTRODUCTION

WHAT IS LEAGUE OF LEGENDS?

2009LAUNCH

ONLINEMULTIPLAYER

WINDOWS / OSX

40-50 MIN GAMES

THETEAM

YOUR CHAMP

THE BATTLEGROUND

THE GAME PLATFORM

THE CLIENT.

CHAT

STORE AUDIT

Load Balancers and Firewalls

CHAT

ORACLE COHERENCE (IN MEMORY DB)

STORE AUDIT GAME ETC.

CHAT

CHAT

STORE AUDIT GAME ETC.

STORE AUDIT GAME ETC.

PRIMARY DB

HOT BACKUP DB

2nd BACKUP DB / ETL

OTHER DATA SOURCES

<REST>

DATA INGESTION

PUSH-BASED

PULL-BASED / ETL

BATCH QUERIES

INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS

SINGLE-ROW QUERIES

AGGREGATE QUERIES

FuETL- OLTP game data- External Data Sources

MASTER WAREHOUSE

HONU- Anything pushed to it- Server logs

DATA AUDITING

PUSH-BASED

PULL-BASED / ETL

BATCH QUERIES

INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS

SINGLE-ROW QUERIES

AGGREGATE QUERIES

FuETL- OLTP game data- External Data Sources

MASTER WAREHOUSE

HONU- Anything pushed to it- Server logs

DATA AUDITING

Distributed ETL Software written in Ruby.

Scales Horizontally

Same ETL applied to multiple regions / datacenters

Self-Service UI with SQL query templating.

NA Korea Russia

Create an ETL

Create an ETL

Amazon S3SQS(S)FTPHiveMicrosoft SQL ServerMySQLDynamoDBVerticaRedshiftREST websites

FUETL CAN

CONNECT TO

Create an ETL

Webapp

Core Libraries

Task Service

Tasks

Helper Service

Helpers

Environment Service

Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool

View - backbone.js - Bootstrap CSS

Task DAO Helper DAOEnvironment DAO

Env. Task DAO Env. Helper DAO

Webapp

Core Libraries

Task Service

Tasks

Helper Service

Helpers

Environment Service

Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool

View - backbone.js - Bootstrap CSS

Task DAO Helper DAOEnvironment DAO

Env. Task DAO Env. Helper DAO

Webapp

Core Libraries

Task Service

Tasks

Task DAO

Helper Service

Helpers

Helper DAO

Environment Service

Environment DAO

Scheduler Process Worker Process Task / Helper / Controllers

Env. Task DAO Env. Helper DAO

Command Line Tool

View - backbone.js - Bootstrap CSS

Webapp

Core Libraries

Task Service

Tasks

Helper Service

Helpers

Environment Service

Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool

View - backbone.js - Bootstrap CSS

Task DAO Helper DAOEnvironment DAO

Env. Task DAO Env. Helper DAO

FuETL STATISTICS

14 TBDATA MOVED DAILY

5213ACTIVE REGIONAL

ETLS

23125DAILY ETL RUNS

FuETL SCALING

FuETL SCALING

IdempotencyIdempotent - an operation that will produce the same results if executed once or multiple times

EXAMPLE:Non-Idempotent: - x = x * 5; - Submitting a purchaseIdempotent: - abs( abs(x) ) = abs(X) - Cancelling a purchase

Idempotent?In the transactional OLTP world….

INSERT INTO games_played(SELECT * FROM games_played_na WHERE date >= ‘2015-10-25’)

Idempotent?In the big data / OLAP world….

INSERT INTO games_played(SELECT * FROM games_played_na WHERE date >= ‘2015-10-25’)

KEEPING INTEGRITY

X

Message Queues

ETL2ETL3ETL4ETL5. . .ETLN

ETL1

X

XSCHEDULERakaPRODUCER

WORKER aka CONSUMER

Message Queues● REDUNDANCY● DELIVERY GUARANTEE● SCALABILITY● ASYCH. COMMUNICATION● ABSTRACTION / DECOUPLING

Message Queues● AMAZON SIMPLE QUEUE SERVICE● APACHE ACTIVEMQ● RABBITMQ● HORNETQ● MICROSOFT MQ (MSMQ)

PUSH-BASED

PULL-BASED / ETL

BATCH QUERIES

INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS

SINGLE-ROW QUERIES

AGGREGATE QUERIES

FuETL- OLTP game data- External Data Sources

MASTER WAREHOUSE

HONU- Anything pushed to it- Server logs

DATA AUDITING

Self Service, Custom HTTP Edge Service (Java)

0

Fronted by ELB in front of ~40 autoscaled m1.xlarge instances

Forwards JSON data indirectly to S3

Honu

The batches need to then be unpacked and converted into Hive tables

0

Custom Collector Infrastructure (Java) - Derived from Netflix Suro

0

Deployed in every data center worldwide and also AWS

Self Service, Custom HTTP Edge Service (Java API)

Honu

Honu =

Custom HTTP Edge Service (Java)0

DRADIS Fronted by ELB in front of ~40 m1.xlarge instances

Forwards data indirectly to S3 via Honu Collectors

Honu

JSONJSONJSONJSONJSONJSON

COLLECTORS

REST

ENDPOINT

JSONJSONJSONJSONJSONJSON

JSONJSONJSONJSONJSONJSON

JSONJSONJSONJSONJSONJSON

Honu

JSONJSONJSONJSONJSONJSON

COLLECTORS

REST

ENDPOINT

JSONJSONJSONJSONJSONJSON

JSONJSONJSONJSONJSONJSON

JSONJSONJSONJSONJSONJSON

batchid = 20150512

Honu

JSONJSONJSONJSONJSONJSON

COLLECTORS

REST

ENDPOINT

JSONJSONJSONJSONJSONJSON

GAM1GAM1GAM1GAMXGAM1GAM1

JSONJSONJSONJSONJSONJSON

IdempotencyUse application logic to make idempotentmsg = queue.pop;if (processed_games.contains( msg.game_id ) { return; //do nothingelse { process_game(msg);}

What’s in there?Data team doesn’t know everything that is submitted

ComplianceAre we violating international data laws?

Inconsistent data structureIts formatted however developer submits it

THE DOWN SIDE

User DocumentationNo one likes doing it, but it helps a lot.

Onboard trainingGet new coworkers in-the-know

Familiar ProtocolsUse REST or RPC so developers are on the same page

Focus on UXYour tools need to be easy for non-technical people to use.

SELF SERVICE HOW?

PUSH-BASED

PULL-BASED / ETL

BATCH QUERIES

INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS

SINGLE-ROW QUERIES

AGGREGATE QUERIES

FuETL- OLTP game data- External Data Sources

MASTER WAREHOUSE

HONU- Anything pushed to it- Server logs

DATA AUDITING

AMAZON S3

s3n://datawarehouse/ schema1/ table1/ env/ dt/ time/ table2/ table3/ schema2/

s3n://telemetrydata/ application1/ table1/ env/ dt/ table2/ application2/

AMAZON S3 STRUCTUREHIVE

‣ schema1 table1 env

dt time table2 table3

‣ schema2 table1 ...

‣ schema3‣ schema4

PUSH-BASED

PULL-BASED / ETL

BATCH QUERIES

INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS

SINGLE-ROW QUERIES

AGGREGATE QUERIES

FuETL- OLTP game data- External Data Sources

MASTER WAREHOUSE

HONU- Anything pushed to it- Server logs

DATA AUDITING

REST micro-service built with Java and docker.

Reports and visualizations we can use to find problems.

Source and target comparison.WarehouseAuditingServicePlatform

HOW TO AUDIT

VISUALIZING

VISUALIZING

HOW TO AUDIT

PUSH-BASED

PULL-BASED / ETL

BATCH QUERIES

INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS

SINGLE-ROW QUERIES

AGGREGATE QUERIES

FuETL- OLTP game data- External Data Sources

MASTER WAREHOUSE

HONU- Anything pushed to it- Server logs

DATA AUDITING

BATCH OLAP POINT

SCALING IN AWS

RESOURCE CONTENTION

SCALING

RDS

AWS Infrastructure TodayEMR EC2 Storage

Data Science

Analytics / Hue

ETL Telemetry

PlatforaDynamoDB Loading

Auditing ETL

Telemetry collectors

Data dictionary

Rocana(real time

dashboard)

Solr (real time)

Point Data Service

Metastore

Data Science

Fraud

DYNAMODB

ETL App DB

Point Data Store

S3

Source of “Truth”

Networking

VPCAWS Direct

Connect

AWS Direct Connect

AWS Direct Connect

AWS Direct Connect

CONCLUSION

DON’T

SEAN’S PRO TIPS OF THE DAYDO

➔ Don’t wait. Create S3 permissions and naming standards early

➔ Get an auditing solution for DW accuracy

➔ Allocate time for tuning AWS infrastructure

➔ Don’t forget to track cost. AWS bills can surprise you

➔ Don’t underestimate simple problems in big data.

➔ Prepare for multiple data access patterns

➔ Keep idempotency in mind and use MQ architecture

➔ Don’t stop. Believing

Custom rewards for mastering different champions

Intensive query that spans every game that every player has played

Improves player engagement

CHAMPION MASTERY

Full copy of our data warehouse in DynamoDB

Hive->DynamoDB Dynamic Partition

Support can answer questions faster than ever.

PLAYER SUPPORT

Data science team queries all chat messages in game

Sentiment analysis and classification

Identifies negative, offensive players and mutes them automatically.

OFFENSIVE CHAT

DETECTION

QUESTIONS? SMALONEY@RIOTGAMES.COM

@SEAN_SEANNERYengineering.riotgames.com

ENGINEERING

BLOG

Recommended