Transcript
Page 1: How LinkedIn Democratizes Big Data Visualization

ORGANIZATION NAME

Photo: Courtesy of O'Reilly Conference on Flickr

How LinkedIn Democratizes

Big Data Visualization

Page 2: How LinkedIn Democratizes Big Data Visualization

Democratizes

Big Data Visualization

How

Jonathan Wu

Praveen Neppalli Naga

Chi-Yi Kuan

Page 3: How LinkedIn Democratizes Big Data Visualization

313,000,000 Members

End of Q2 2014

Page 4: How LinkedIn Democratizes Big Data Visualization

25,000,000,000 Page Views

Q2 2014

Page 5: How LinkedIn Democratizes Big Data Visualization

3,000,000+ Endorsements

Page 6: How LinkedIn Democratizes Big Data Visualization

3,500,000+ Companies

Page 7: How LinkedIn Democratizes Big Data Visualization

What can we do with Linkedin data

?

Page 8: How LinkedIn Democratizes Big Data Visualization

Sales

Talent flow between companies

Page 9: How LinkedIn Democratizes Big Data Visualization

Product & engineering

Page 10: How LinkedIn Democratizes Big Data Visualization

Is it simple?

Member attributes Page View events data

Page 11: How LinkedIn Democratizes Big Data Visualization

Photo Credit: https://www.flickr.com/photos/johnjoh/1060267344

Data is the new vineyard

Page 12: How LinkedIn Democratizes Big Data Visualization

Photo Credit: https://www.flickr.com/photos/johnjoh/1060267344

Data is the new vineyard

Page 13: How LinkedIn Democratizes Big Data Visualization

Data infra: collect & prepare data

Collect & Prepare Data Mysql, Oracle, Kafka + Hadoop

Serve Data Pinot

Taste Data Easy-to-use visualization

Page 14: How LinkedIn Democratizes Big Data Visualization

Data Computation

ETL

HDFS

Y

A

R

N

Map-Reduce Spark Tez

Pig Hive Cubert

Kafka Data Stores

Hadoop

Page 15: How LinkedIn Democratizes Big Data Visualization

Data infra: Serve data

Collect & Prepare Data Kafka + Hadoop

Serve Data Pinot

Taste Data Easy-to-use visualization

Page 16: How LinkedIn Democratizes Big Data Visualization

Products for members/customers with real-time

interactive analytics

• Who’s Viewed Your Profile

• Ads Reporting

• Jobs Analytics

Categories of interactive analytics products

Interactive business analytics for internal use

• How feature X is performing

Real-time business monitoring

• Page view changes across mobile devices in different

regions

Page 17: How LinkedIn Democratizes Big Data Visualization

Requirements for real-time interactive analytics

Slice and dice billions of records,

hundreds of dimensions

End to end freshness of minutes

not hours

Sub-second query response times

e.g. Which are top regions that contribute to my profile views? Which

industries in those regions?

Page 18: How LinkedIn Democratizes Big Data Visualization

Pinot

Distributed Analytics Infrastructure that

serves Interactive Analytics products at

Linkedin.

Page 19: How LinkedIn Democratizes Big Data Visualization

Data

Indexes

Distributed

System

Ingestion

What is Pinot?

Compressed Columnar indexes

(supports Mmap and In-memory)

Apache Helix for cluster

management

Apache Kafka (for near real-time)

and Hadoop

Page 20: How LinkedIn Democratizes Big Data Visualization

Data Indexes

Single Value

Index

Multi Value

Index

Inverted Index

• Fixed bit length encoding

• Sorted Index

• Secondary Sorted Index

• Multi-value Fixed bit length encoding

• BitMap Multi-value Index

• P4Delta

• Modified P4Delta

• BitMap

Page 21: How LinkedIn Democratizes Big Data Visualization

Cluster Management

• Create Resources

• Update Resource meta data

• Expand/Contract partitions dynamically

• Query Router

Page 22: How LinkedIn Democratizes Big Data Visualization

Data Ingestion

Kafka for Realtime

Hadoop for Historical

Page 23: How LinkedIn Democratizes Big Data Visualization

High Level Architecture

PINOT

Hadoop Kafka

Historical Realtime

CLUSTER MANAGER

Controller

Helix

Zookeeper

Broker 1 Broker 2

Server 1 Server 2 Server 3

Page 24: How LinkedIn Democratizes Big Data Visualization

Core Features

Low latency and high QPS OLAP Queries

with real-time ingestion

Support complex dimensions

Operational simplicity

Data bootstrapping & reconciliation

Page 25: How LinkedIn Democratizes Big Data Visualization

Usage @ Linkedin

About 18 member facing products

on Linkedin.com

Internal Reporting

Open-source.…coming soon

Page 26: How LinkedIn Democratizes Big Data Visualization

Reporting UI: serve & taste data

Collect & Prepare Data Kafka + Hadoop

Serve Data Pinot

Taste Data Easy-to-use visualization

Page 27: How LinkedIn Democratizes Big Data Visualization

I want to access big data without

running SQL

Business need

Page 28: How LinkedIn Democratizes Big Data Visualization

Start a new dashboard with one click

Page 29: How LinkedIn Democratizes Big Data Visualization

Select what metrics/dimensions you want

Page 30: How LinkedIn Democratizes Big Data Visualization

Charts are rendered in just a few seconds

Page 31: How LinkedIn Democratizes Big Data Visualization

Zoom into a single chart

Page 32: How LinkedIn Democratizes Big Data Visualization

Filter on various dimensions

Page 33: How LinkedIn Democratizes Big Data Visualization

Access everywhere

Page 34: How LinkedIn Democratizes Big Data Visualization

Portal that connects dashboards, internal reports,

and internal Wiki Pages

Enterprise analytics portal

Page 35: How LinkedIn Democratizes Big Data Visualization

Scale of the data

Pinot for interactive analysis

Self service visualization for insights

Summary

Page 36: How LinkedIn Democratizes Big Data Visualization
Page 37: How LinkedIn Democratizes Big Data Visualization

We are hiring

Jonathan Wu

www.linkedin.com/in/pneppalli

www.linkedin.com/in/jiyewu

www.linkedin.com/in/chiyikuan

[email protected]

Praveen Neppalli Naga [email protected]

Chi-yi Kuan [email protected]

650-605-2184

650-962-3299

650-426-6301


Recommended