63
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sean Maloney, Riot Games Data Engineer @SEAN_SEANNERY October 2015 GAM303 Riot Games: Migrating Mountains of Data to AWS

(GAM303) Riot Games: Migrating Mountains of Data to AWS

Embed Size (px)

Citation preview

Page 1: (GAM303) Riot Games: Migrating Mountains of Data to AWS

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Sean Maloney, Riot Games Data Engineer

@SEAN_SEANNERY

October 2015

GAM303

Riot Games:Migrating Mountains

of Data to AWS

Page 2: (GAM303) Riot Games: Migrating Mountains of Data to AWS

SEAN

MALONEYBIG DATA ENGINEER

WHO IS THIS GUY?

Lead developer on Riot’s ETL tools

FAVORITE ACTIVITY:

Attempting to grow facial hair but

failing miserably

Page 3: (GAM303) Riot Games: Migrating Mountains of Data to AWS

MOVING MOUNTAINS OF DATA

INTRODUCTION1.

WHY WE NEEDED TO MOVE2.

TRY, TRY, TRY AGAIN3.

WHAT WE CAN DO NOW4.

HOW IT IMPACTS OUR USERS5.

Page 4: (GAM303) Riot Games: Migrating Mountains of Data to AWS

INTRODUCTION

Page 5: (GAM303) Riot Games: Migrating Mountains of Data to AWS

WHAT IS LEAGUE OF LEGENDS?

2009LAUNCH

ONLINEMULTIPLAYER

WINDOWS / OSX

40-50 MIN GAMES

Page 6: (GAM303) Riot Games: Migrating Mountains of Data to AWS

THE

TEAM

YOUR CHAMP

THE

BATTLE

GROUND

Page 7: (GAM303) Riot Games: Migrating Mountains of Data to AWS
Page 8: (GAM303) Riot Games: Migrating Mountains of Data to AWS
Page 9: (GAM303) Riot Games: Migrating Mountains of Data to AWS

WHY MOVE?

Page 10: (GAM303) Riot Games: Migrating Mountains of Data to AWS
Page 11: (GAM303) Riot Games: Migrating Mountains of Data to AWS

CHAT

STORE AUDIT

Load Balancers and Firewalls

Page 12: (GAM303) Riot Games: Migrating Mountains of Data to AWS
Page 13: (GAM303) Riot Games: Migrating Mountains of Data to AWS

-30 HADOOP NODES (CDH)

-250 TB (FULL)

-PARTITIONS: 1.4MILLION

-HDFS REPL FACTOR: 2 :(

SQOOP + OOZIE

Page 14: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Data center was filling upOur game was growing!

We own our infrastructureMore game servers > More analytics servers

WHY

MOVE?

Page 15: (GAM303) Riot Games: Migrating Mountains of Data to AWS

RESOURCE CONTENTIONHive .08 pre YARN, immature resource scheduling

WHY

MOVE?

Page 16: (GAM303) Riot Games: Migrating Mountains of Data to AWS

TRANSACTIONAL DATA

a b c d e f

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4 5 6

Page 17: (GAM303) Riot Games: Migrating Mountains of Data to AWS

SERVER TELEMETRY

TRANSACTIONAL DATA

a b c d e f

map[‘a’=>1, ‘b’=>2,’c’=> ...]ts

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4 5 6

map[‘a’=>1, ‘b’=>2,’c’=> ...]ts

map[‘a’=>1, ‘b’=>2,’c’=> ...]ts

map[‘a’=>1, ‘b’=>2,’c’=> ...]ts

Page 18: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Can’t join the dataWHY

MOVE?

Page 19: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Slower performance

HIVE MAP

TYPECaptures upstream schema changes

We have a lot of upstream schema

changes!

Page 20: (GAM303) Riot Games: Migrating Mountains of Data to AWS

AMAZON SIMPLE

STORAGE

SERVICE

(AMAZON S3)

Page 21: (GAM303) Riot Games: Migrating Mountains of Data to AWS

TRY, TRY, TRY AGAIN

Page 22: (GAM303) Riot Games: Migrating Mountains of Data to AWS

FIRST ATTEMPT

Page 23: (GAM303) Riot Games: Migrating Mountains of Data to AWS

PROPOSED AMAZON EC2 / AMAZON EMR STRUCTURE

METASTORE (RDS)

AMAZON S3

TELEMETRY EMR ETL EMR USER EMR

Page 24: (GAM303) Riot Games: Migrating Mountains of Data to AWS

HDFS

hdfs://user/hive/warehouse/

schema1.db/

table1/

realm/

dt/

time/

table2/

table3/

schema2.db/

table1/

schema3.db/

schema4.db/

S3

s3n://datawarehouse/

schema1/

table1/

env/

dt/

time/

table2/

table3/

schema2/

s3n://telemetrydata/

application1/

table1/

env/

dt/

table2/

application2/

PROPOSED AMAZON S3 STRUCTURE

HIVE

‣ schema1

table1

env

dt

time

table2

table3

‣ schema2

table1

...

‣ schema3

‣ schema4

Page 25: (GAM303) Riot Games: Migrating Mountains of Data to AWS

HOW LONG?

< 6 months 6 mo < t <1 yr > 1yr

Page 26: (GAM303) Riot Games: Migrating Mountains of Data to AWS

HOW LONG?

< 6 months 6 mo < t <1 yr > 1yr

Page 27: (GAM303) Riot Games: Migrating Mountains of Data to AWS

HOW LONG?

< 6 months 6 mo < t <1 yr > 1yr

Page 28: (GAM303) Riot Games: Migrating Mountains of Data to AWS

DO IT IN 6 WEEKSWe had one tool that was already storing data in the cloud

PROJECT

PLANNING

Page 29: (GAM303) Riot Games: Migrating Mountains of Data to AWS

1. DISTCP Copy -> S3 prod

location

3. Insert overwrite temp table from

prod table with map conversion

4. Copy files from staging to prod

location

PLAN

A

2. Create temp table in Hive on

staging location

5. Choose cut-over date and repoint

incoming data ETLS to S3

Page 30: (GAM303) Riot Games: Migrating Mountains of Data to AWS

~$ hadoop distcp

‘hdfs://riothive:54310/user/hive/warehouse/lol_prod.db/store’

‘s3n://warehouse/prod/store’ &> output.log

hive> CREATE EXTERNAL TABLE copy_stage.store_tmp LIKE prod.store

LOCATION ‘s3n://warehouse/temp/store_tmp/’

hive> INSERT OVERWRITE copy_stage.store_tmp PARTITION (env, dt, h)

SELECT MAP( ‘id’, CAST(id as string),

‘type’, CAST(type as string),

‘date_created’, CAST(dt_created as string)

),

dt, h, CASE realm_id WHEN ‘1’ THEN ‘NA1’ WHEN ‘2’ THEN ‘KR1’ …

FROM lol_prod.store

WHERE dt = $dt AND h = $h AND realm_id = $realm_id

~$ aws s3 cp s3n://warehouse/temp/store_temp

s3n://warehouse/prod/store

PROPOSED AMAZON S3 STRUCTURE

Page 31: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Over 70 tables x 15 regions to move

Python script to generate sqlPLAN

A Ran SQL scripts in parallel for each table

DONE! Tell our customers! Celebrate

Page 32: (GAM303) Riot Games: Migrating Mountains of Data to AWS

MISSING PARTITIONS

CORRUPTED PARTITIONS

PLAN A

IS THE

WORSTPOOR QUERY PERFORMANCE

Page 33: (GAM303) Riot Games: Migrating Mountains of Data to AWS

DON’T

LEARN FROM OUR MISTAKESDO

➔ Use DISTCP tool to move

files➔ Don’t use Hive .08 to

migrate

➔ Audit every file that gets

copied

➔ Allocate time for tuning

AWS infrastructure

➔ Don’t deliver until everything

is working;

lost trust is hard to regain

➔ Don’t underestimate simple

problems in big data

Page 34: (GAM303) Riot Games: Migrating Mountains of Data to AWS
Page 35: (GAM303) Riot Games: Migrating Mountains of Data to AWS

WHAT WOULD YOU DO?

Fix the holes

with good data?

Wipe out

everything, start

from scratch

Give up? Move

to the woods,

become a

lumberjack

Page 36: (GAM303) Riot Games: Migrating Mountains of Data to AWS

WHAT WE DID

Fix the holes

with good data?

Wipe out

everything, start

from scratch

Give up? Move

to the woods,

become a

lumberjack

Page 37: (GAM303) Riot Games: Migrating Mountains of Data to AWS

SECOND ATTEMPT

Page 38: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Leverage our ETL tools to repair

Compare rowcounts of iron hive vs

cloud hive for each partition

If rowcount bad, run script to re-import

the data

PLAN

B

Page 39: (GAM303) Riot Games: Migrating Mountains of Data to AWS

ROW COUNTS

ROW COUNTS

Page 40: (GAM303) Riot Games: Migrating Mountains of Data to AWS

duplicated data: 2540

missing partitions: 27777

partial partitions: 10528

total bad partitions: 40844 (>=2013)

10 seconds to fix dupes

10 minutes to fix missing / partial backfill

PLAN

B

Page 41: (GAM303) Riot Games: Migrating Mountains of Data to AWS

We didn’t have statistics enabled on

the cloud hive

Finding bad partitions is expensive

PLAN B

IS THE

WORST

Row counts in Hive .08 means map

reduce jobs

Page 42: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Fix all tables 2013-01-01 onwards, all regions:

266 days

Fix all tables, all of time, all regions:

787 days

PLAN B

IS THE

WORST

Page 43: (GAM303) Riot Games: Migrating Mountains of Data to AWS

DON’T

LEARN FROM OUR MISTAKESDO

➔ Estimate how long the move

will take using extrapolation➔ Don’t assume repairing is

faster than starting fresh

➔ Turn on rowcount statistics

in hive

➔ Get an auditing solution for

DW accuracy

➔ Don’t assume your source

data warehouse is 100%

accurate

Page 44: (GAM303) Riot Games: Migrating Mountains of Data to AWS

THIRD ATTEMPT

Page 45: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Start over from scratch

Modify Hadoop DISTCP tool to be data

driven

MAPRED TOOL TO COPY FILES

FROM HDFS->S3

3RD TIME’S

THE

CHARM

Page 46: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Recursively list all files needed to

move

Write that list to a DB table for tracking

and auditing

3RD TIME’S

THE

CHARM

Page 47: (GAM303) Riot Games: Migrating Mountains of Data to AWS

appl_job_id hdfs_source s3_target hdfs_size s3_size hdfs_chksum s3_chksum copy_status chksum_status

job_xx_112 hdfs://mytbl1/file1 s3://mybkt1/my

tbl1/file1

132594 mlk567lkm5 not_run not_run

job_xx_113 hdfs://mytbl1/file2 s3://mybkt1/my

tbl1/file2

292694 87gf879sdf9 not_run not_run

job_xx_124 hdfs://mytbl1/file3 s3://mybkt1/my

tbl1/file3

3259 h43jhak4h5s not_run not_run

job_xx_129 hdfs://mytbl1/file4 s3://mybkt1/my

tbl1/file4

62484 fd767a7e7f6 not_run not_run

DATA DRIVEN COPY TOOL

Page 48: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Query DB for failed files and retry /

debug.

Compare file sizes / checksums after

copy completes

Store success / fail status for each

copy job

3RD TIME’S

THE

CHARM

Page 49: (GAM303) Riot Games: Migrating Mountains of Data to AWS

appl_job_id hdfs_source s3_target hdfs_size s3_size hdfs_chksum s3_chksum copy_status chksum_status

job_xx_112 hdfs://mytbl1/file1 s3://mybkt1/my

tbl1/file1

132594 132594 mlk567lkm5 mlk567lkm5 success success

job_xx_113 hdfs://mytbl1/file2 s3://mybkt1/my

tbl1/file2

292694 87gf879sdf9 failed not_run

job_xx_124 hdfs://mytbl1/file3 s3://mybkt1/my

tbl1/file3

3259 3259 h43jhak4h5s fg53hj65un success failed

job_xx_129 hdfs://mytbl1/file4 s3://mybkt1/my

tbl1/file4

62484 62484 fd767a7e7f6 fd767a7e7f6 success success

DATA DRIVEN COPY TOOL

Page 50: (GAM303) Riot Games: Migrating Mountains of Data to AWS

DON’T

LEARN FROM OUR MISTAKESDO

➔ Make your migration tool

repeatable➔ Don’t wait too long to migrate

or else DISTCP might have

issues➔ Create S3 permissions and

naming standards early

➔ Upgrade your hive version

to more stable releases

➔ Hire people smarter than

yourself

➔ Don’t forget to clean up temp

S3 files

➔ Don’t stop. Believing.

Page 51: (GAM303) Riot Games: Migrating Mountains of Data to AWS

hive> SHOW SCHEMAS;

OK

copy_stage

test_warehouse

prod_warehouse

DELETE_ME_1

DELETE_ME_2

DELETE_ME_3

DELETE_ME_4

DELETE_ME_5

insights_tech

data_science

sand_box

Time taken: 0.457 seconds, Fetched: 11 row(s)

Page 52: (GAM303) Riot Games: Migrating Mountains of Data to AWS

NOTE TO SELF:

Even if a database schema is named ‘DELETE_ME_1’

Check where Hive managed tables are pointed before running

CASCADE DELETE

Also, turn on S3 versioning

OOPS

Page 53: (GAM303) Riot Games: Migrating Mountains of Data to AWS

WHAT CAN WE DO NOW?

Page 54: (GAM303) Riot Games: Migrating Mountains of Data to AWS

POST-MOVE STRUCTURE

METASTORE (RDS)

AMAZON S3

TELEMETRY EMR ETL EMR USER EMR

Page 55: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Amazon RDS

AWS INFRASTRUCTURE TODAY

EMR EC2 Storage

Data Science Analytics /

Hue

ETL Telemetry

PlatforaAmazon

DynamoDB

Loading

Auditing ETL

Telemetry

collectors

Data

dictionary

Rocana

(real time

dashboard)

Solr (real

time)

Point Data

Service

Metastore

Data Science Fraud

DynamoDB

ETL App DB

Point Data Store

S3

Source of “Truth”

Networking

VP

CAWS Direct

Connect

AWS Direct

Connect

AWS Direct

Connect

AWS Direct

Connect

Page 56: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Create Azd-hoc EMR clusters

NEW AND

IMPROVED

Track billing for teams using our

resources

Amazon CloudWatch Monitoring

Page 57: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Easy Metastore Scaling

NEW AND

IMPROVED

Don’t have to manage HDFS name

nodes

No more debugging hardware issues

(just spin up a new instance)

Page 58: (GAM303) Riot Games: Migrating Mountains of Data to AWS

FOR THE USERS

Page 59: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Custom rewards for mastering different

champions

Intensive query that spans every game

that every player has played

Improves player engagement

CHAMPION

MASTERY

Page 60: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Full copy of our data warehouse in

DynamoDB

Hive->DynamoDB Dynamic Partition

Support can answer questions faster

than ever

PLAYER

SUPPORT

Page 61: (GAM303) Riot Games: Migrating Mountains of Data to AWS

Data science team queries all chat

messages in game

Sentiment analysis and classification

Identifies negative, offensive players and

mutes them automatically

OFFENSIVE

CHAT

DETECTION

Page 62: (GAM303) Riot Games: Migrating Mountains of Data to AWS

FINAL THOUGHTS...

Page 63: (GAM303) Riot Games: Migrating Mountains of Data to AWS

QUESTIONS?

[email protected]

@SEAN_SEANNERYengineering.riotgames.com

ENGINEERING

BLOG

EAT. DRINK.

PLAY re:Invent After

Party

TONIGHT! 6pm-10pm @ Palazzo Tower

3rd Floor - Palazzo Parlor