17
Tailored for Spark Hadoop Summit Dublin 2016 Petr Igrevski John Scheibmeir eBay

Tailored for Spark

Embed Size (px)

Citation preview

Page 1: Tailored for Spark

Tailored for Spark

Hadoop Summit Dublin 2016Petr IgrevskiJohn ScheibmeireBay

Page 2: Tailored for Spark

eBay - Tailored for Spark 2

How to tailor Spark for maximum impact

1. Optimal infrastructure

2. Customized user experience

Page 3: Tailored for Spark

eBay - Tailored for Spark 3

Outline

1. eBay, Analytics, Hadoop, and Spark2. Spark Opportunities at eBay3. QA

Page 4: Tailored for Spark

BACKGROUND

Page 5: Tailored for Spark

5

eBay

eBay - Tailored for Spark

Q4 2015

Page 6: Tailored for Spark

6

Analytics at eBay

Analytics

BI

Kylin MicroStrategy Tableau R / SAS

ETL

Ab Initio

Data Platform

Hadoop Teradata

eBay - Tailored for Spark

Streaming Spark

Page 7: Tailored for Spark

7

Hadoop at eBay

1. Search Index2. Log Management3. Operation Metric Management4. Analytics

eBay - Tailored for Spark

Page 8: Tailored for Spark

8

Hadoop Hardware

Multiple Generations

12-18 Cores

72-128GB RAM

24-72TB Storage

Provisioned by cabinet

eBay - Tailored for Spark

Page 9: Tailored for Spark

9

Spark at eBay

• Uses– Spark 1.4 to Spark 1.6

• Methods– Yarn

• Current utilization– 20% analytic clusters

• Use Cases– Purchase Suggestions– Marketing Optimization– Customer Interests, Consistency, and Similarity– Kylin Cube Building

eBay - Tailored for Spark

Page 10: Tailored for Spark

10

Spark Challenges

• Capacity Management and Efficiency– Map Reduce => Yarn– Job Sizing

• Support– Missing vendor support– Missing expertise

• Deployment– Library conflicts– Configuration challenges– Distribution sprawl

• Integration– Configuration

eBay - Tailored for Spark

Page 11: Tailored for Spark

TAILORING SPARKSimple things should be simple. Complex things should be possible.

Alan Kay

eBay - Tailored for Spark11

Page 12: Tailored for Spark

12

We can

• Copy• Test • Run

eBay - Tailored for Spark

Page 13: Tailored for Spark

13

Opportunities for Spark

•Flexibility•Usability•Simplicity•Speed•Transparency

eBay - Tailored for Spark

Page 14: Tailored for Spark

14

On YARN

• Security• Multitenancy• Reliability• Experience• Performance

eBay - Tailored for Spark

YARNSpark

HDFS HDFS SWIFT NFS

Ker

bero

s

Page 15: Tailored for Spark

15

Does it fit?

• Compute• Storage• Network• Provisioning

eBay - Tailored for Spark

Shared Compute resources

Independently scalable storage

Flat Network

Page 16: Tailored for Spark

16

Can we make it feel better?

• Standard ADLC• Test to your level of comfort• Single click deployment• Watch every step• Certify your job• Let it run• Did you say UI?

eBay - Tailored for Spark

Development

Test

Packaging

Certification

Runtime

RegisterRepos

CIMetadata DBProvisioning

Runtime farmOrchestrator

Page 17: Tailored for Spark

17

Q/A

eBay - Tailored for Spark