74
MOVING MOUNTAINS OF PLAYER DATA SEAN MALONEY RIOT GAMES @SEAN_SEANNERY SCALABLE INTERNET SERVICES UCLA/UCSB - NOV 2015

Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Embed Size (px)

Citation preview

Page 1: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

MOVINGMOUNTAINS OF

PLAYER DATASEAN MALONEY

RIOT GAMES @SEAN_SEANNERY

SCALABLE INTERNET SERVICESUCLA/UCSB - NOV 2015

Page 2: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

SEAN MALONEYBIG DATA ENGINEER

WHO IS THIS GUY?

Lead developer on Riot’s ETL tools

FUN FACT:Was a student in this class 4 years agoIntern at Appfolio

Page 3: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

MOVING MOUNTAINS OF DATAINTRODUCTION1.

THE GAME PLATFORM: OUR MAIN DATA SOURCE2.

HOW WE INGEST AND QUERY DATA3.

HOW WE SCALE IN AWS4.

CONCLUSION - SEAN’S PRO TIPS5.

Page 4: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

INTRODUCTION

Page 5: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

WHAT IS LEAGUE OF LEGENDS?

2009LAUNCH

ONLINEMULTIPLAYER

WINDOWS / OSX

40-50 MIN GAMES

Page 6: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

THETEAM

YOUR CHAMP

THE BATTLEGROUND

Page 8: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Page 9: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Page 10: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

THE GAME PLATFORM

Page 11: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

THE CLIENT.

Page 12: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Page 13: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

CHAT

STORE AUDIT

Load Balancers and Firewalls

Page 14: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

CHAT

ORACLE COHERENCE (IN MEMORY DB)

STORE AUDIT GAME ETC.

CHAT

CHAT

STORE AUDIT GAME ETC.

STORE AUDIT GAME ETC.

PRIMARY DB

HOT BACKUP DB

2nd BACKUP DB / ETL

Page 15: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

OTHER DATA SOURCES

<REST>

Page 16: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Page 17: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

DATA INGESTION

Page 18: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

PUSH-BASED

PULL-BASED / ETL

BATCH QUERIES

INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS

SINGLE-ROW QUERIES

AGGREGATE QUERIES

FuETL- OLTP game data- External Data Sources

MASTER WAREHOUSE

HONU- Anything pushed to it- Server logs

DATA AUDITING

Page 19: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

PUSH-BASED

PULL-BASED / ETL

BATCH QUERIES

INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS

SINGLE-ROW QUERIES

AGGREGATE QUERIES

FuETL- OLTP game data- External Data Sources

MASTER WAREHOUSE

HONU- Anything pushed to it- Server logs

DATA AUDITING

Page 20: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Distributed ETL Software written in Ruby.

Scales Horizontally

Same ETL applied to multiple regions / datacenters

Self-Service UI with SQL query templating.

Page 21: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

NA Korea Russia

Page 22: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Create an ETL

Page 23: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Create an ETL

Page 24: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Amazon S3SQS(S)FTPHiveMicrosoft SQL ServerMySQLDynamoDBVerticaRedshiftREST websites

FUETL CAN

CONNECT TO

Page 25: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Create an ETL

Page 26: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Page 27: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Page 28: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Webapp

Core Libraries

Task Service

Tasks

Helper Service

Helpers

Environment Service

Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool

View - backbone.js - Bootstrap CSS

Task DAO Helper DAOEnvironment DAO

Env. Task DAO Env. Helper DAO

Page 29: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Webapp

Core Libraries

Task Service

Tasks

Helper Service

Helpers

Environment Service

Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool

View - backbone.js - Bootstrap CSS

Task DAO Helper DAOEnvironment DAO

Env. Task DAO Env. Helper DAO

Page 30: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Webapp

Core Libraries

Task Service

Tasks

Task DAO

Helper Service

Helpers

Helper DAO

Environment Service

Environment DAO

Scheduler Process Worker Process Task / Helper / Controllers

Env. Task DAO Env. Helper DAO

Command Line Tool

View - backbone.js - Bootstrap CSS

Page 31: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Webapp

Core Libraries

Task Service

Tasks

Helper Service

Helpers

Environment Service

Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool

View - backbone.js - Bootstrap CSS

Task DAO Helper DAOEnvironment DAO

Env. Task DAO Env. Helper DAO

Page 32: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

FuETL STATISTICS

14 TBDATA MOVED DAILY

5213ACTIVE REGIONAL

ETLS

23125DAILY ETL RUNS

Page 33: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

FuETL SCALING

Page 34: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

FuETL SCALING

Page 35: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

IdempotencyIdempotent - an operation that will produce the same results if executed once or multiple times

EXAMPLE:Non-Idempotent: - x = x * 5; - Submitting a purchaseIdempotent: - abs( abs(x) ) = abs(X) - Cancelling a purchase

Page 36: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Idempotent?In the transactional OLTP world….

INSERT INTO games_played(SELECT * FROM games_played_na WHERE date >= ‘2015-10-25’)

Page 37: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Idempotent?In the big data / OLAP world….

INSERT INTO games_played(SELECT * FROM games_played_na WHERE date >= ‘2015-10-25’)

Page 38: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

KEEPING INTEGRITY

X

Page 39: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Message Queues

ETL2ETL3ETL4ETL5. . .ETLN

ETL1

X

XSCHEDULERakaPRODUCER

WORKER aka CONSUMER

Page 40: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Message Queues● REDUNDANCY● DELIVERY GUARANTEE● SCALABILITY● ASYCH. COMMUNICATION● ABSTRACTION / DECOUPLING

Page 41: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Message Queues● AMAZON SIMPLE QUEUE SERVICE● APACHE ACTIVEMQ● RABBITMQ● HORNETQ● MICROSOFT MQ (MSMQ)

Page 42: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

PUSH-BASED

PULL-BASED / ETL

BATCH QUERIES

INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS

SINGLE-ROW QUERIES

AGGREGATE QUERIES

FuETL- OLTP game data- External Data Sources

MASTER WAREHOUSE

HONU- Anything pushed to it- Server logs

DATA AUDITING

Page 43: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Self Service, Custom HTTP Edge Service (Java)

0

Fronted by ELB in front of ~40 autoscaled m1.xlarge instances

Forwards JSON data indirectly to S3

Honu

The batches need to then be unpacked and converted into Hive tables

0

Page 44: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Custom Collector Infrastructure (Java) - Derived from Netflix Suro

0

Deployed in every data center worldwide and also AWS

Self Service, Custom HTTP Edge Service (Java API)

Honu

Page 45: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Honu =

Page 46: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Custom HTTP Edge Service (Java)0

DRADIS Fronted by ELB in front of ~40 m1.xlarge instances

Forwards data indirectly to S3 via Honu Collectors

Page 47: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Honu

JSONJSONJSONJSONJSONJSON

COLLECTORS

REST

ENDPOINT

JSONJSONJSONJSONJSONJSON

JSONJSONJSONJSONJSONJSON

JSONJSONJSONJSONJSONJSON

Page 48: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Honu

JSONJSONJSONJSONJSONJSON

COLLECTORS

REST

ENDPOINT

JSONJSONJSONJSONJSONJSON

JSONJSONJSONJSONJSONJSON

JSONJSONJSONJSONJSONJSON

batchid = 20150512

Page 49: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Honu

JSONJSONJSONJSONJSONJSON

COLLECTORS

REST

ENDPOINT

JSONJSONJSONJSONJSONJSON

GAM1GAM1GAM1GAMXGAM1GAM1

JSONJSONJSONJSONJSONJSON

Page 50: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

IdempotencyUse application logic to make idempotentmsg = queue.pop;if (processed_games.contains( msg.game_id ) { return; //do nothingelse { process_game(msg);}

Page 51: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

What’s in there?Data team doesn’t know everything that is submitted

ComplianceAre we violating international data laws?

Inconsistent data structureIts formatted however developer submits it

THE DOWN SIDE

Page 52: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

User DocumentationNo one likes doing it, but it helps a lot.

Onboard trainingGet new coworkers in-the-know

Familiar ProtocolsUse REST or RPC so developers are on the same page

Focus on UXYour tools need to be easy for non-technical people to use.

SELF SERVICE HOW?

Page 53: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

PUSH-BASED

PULL-BASED / ETL

BATCH QUERIES

INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS

SINGLE-ROW QUERIES

AGGREGATE QUERIES

FuETL- OLTP game data- External Data Sources

MASTER WAREHOUSE

HONU- Anything pushed to it- Server logs

DATA AUDITING

Page 54: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

AMAZON S3

s3n://datawarehouse/ schema1/ table1/ env/ dt/ time/ table2/ table3/ schema2/

s3n://telemetrydata/ application1/ table1/ env/ dt/ table2/ application2/

AMAZON S3 STRUCTUREHIVE

‣ schema1 table1 env

dt time table2 table3

‣ schema2 table1 ...

‣ schema3‣ schema4

Page 55: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Page 56: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Page 57: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

PUSH-BASED

PULL-BASED / ETL

BATCH QUERIES

INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS

SINGLE-ROW QUERIES

AGGREGATE QUERIES

FuETL- OLTP game data- External Data Sources

MASTER WAREHOUSE

HONU- Anything pushed to it- Server logs

DATA AUDITING

Page 58: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

REST micro-service built with Java and docker.

Reports and visualizations we can use to find problems.

Source and target comparison.WarehouseAuditingServicePlatform

Page 59: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

HOW TO AUDIT

Page 60: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

VISUALIZING

Page 61: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

VISUALIZING

Page 62: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Page 63: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

HOW TO AUDIT

Page 64: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

PUSH-BASED

PULL-BASED / ETL

BATCH QUERIES

INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS

SINGLE-ROW QUERIES

AGGREGATE QUERIES

FuETL- OLTP game data- External Data Sources

MASTER WAREHOUSE

HONU- Anything pushed to it- Server logs

DATA AUDITING

Page 65: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

BATCH OLAP POINT

Page 66: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

SCALING IN AWS

Page 67: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

RESOURCE CONTENTION

SCALING

Page 68: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

RDS

AWS Infrastructure TodayEMR EC2 Storage

Data Science

Analytics / Hue

ETL Telemetry

PlatforaDynamoDB Loading

Auditing ETL

Telemetry collectors

Data dictionary

Rocana(real time

dashboard)

Solr (real time)

Point Data Service

Metastore

Data Science

Fraud

DYNAMODB

ETL App DB

Point Data Store

S3

Source of “Truth”

Networking

VPCAWS Direct

Connect

AWS Direct Connect

AWS Direct Connect

AWS Direct Connect

Page 69: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

CONCLUSION

Page 70: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

DON’T

SEAN’S PRO TIPS OF THE DAYDO

➔ Don’t wait. Create S3 permissions and naming standards early

➔ Get an auditing solution for DW accuracy

➔ Allocate time for tuning AWS infrastructure

➔ Don’t forget to track cost. AWS bills can surprise you

➔ Don’t underestimate simple problems in big data.

➔ Prepare for multiple data access patterns

➔ Keep idempotency in mind and use MQ architecture

➔ Don’t stop. Believing

Page 71: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Custom rewards for mastering different champions

Intensive query that spans every game that every player has played

Improves player engagement

CHAMPION MASTERY

Page 72: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Full copy of our data warehouse in DynamoDB

Hive->DynamoDB Dynamic Partition

Support can answer questions faster than ever.

PLAYER SUPPORT

Page 73: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Data science team queries all chat messages in game

Sentiment analysis and classification

Identifies negative, offensive players and mutes them automatically.

OFFENSIVE CHAT

DETECTION

Page 74: Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

QUESTIONS? [email protected]

@SEAN_SEANNERYengineering.riotgames.com

ENGINEERING

BLOG