How LinkedIn Democratizes Big Data Visualization

ORGANIZATION NAME

Photo: Courtesy of O'Reilly Conference on Flickr

How LinkedIn Democratizes

Big Data Visualization

Democratizes

Big Data Visualization

How

Jonathan Wu

Praveen Neppalli Naga

Chi-Yi Kuan

313,000,000 Members

End of Q2 2014

25,000,000,000 Page Views

Q2 2014

3,000,000+ Endorsements

3,500,000+ Companies

What can we do with Linkedin data

?

Sales

Talent flow between companies

Product & engineering

Is it simple?

Member attributes Page View events data

Photo Credit: https://www.flickr.com/photos/johnjoh/1060267344

Data is the new vineyard

Photo Credit: https://www.flickr.com/photos/johnjoh/1060267344

Data is the new vineyard

Data infra: collect & prepare data

Collect & Prepare Data Mysql, Oracle, Kafka + Hadoop

Serve Data Pinot

Taste Data Easy-to-use visualization

Data Computation

ETL

HDFS

Y

A

R

N

Map-Reduce Spark Tez

Pig Hive Cubert

Kafka Data Stores

Hadoop

Data infra: Serve data

Collect & Prepare Data Kafka + Hadoop

Serve Data Pinot


Products for members/customers with real-time

interactive analytics

• Who’s Viewed Your Profile

• Ads Reporting

• Jobs Analytics

Categories of interactive analytics products

Interactive business analytics for internal use

• How feature X is performing

Real-time business monitoring

• Page view changes across mobile devices in different

regions

Requirements for real-time interactive analytics

Slice and dice billions of records,

hundreds of dimensions

End to end freshness of minutes

not hours

Sub-second query response times

e.g. Which are top regions that contribute to my profile views? Which

industries in those regions?

Pinot

Distributed Analytics Infrastructure that

serves Interactive Analytics products at

Linkedin.

Data

Indexes

Distributed

System

Ingestion

What is Pinot?

Compressed Columnar indexes

(supports Mmap and In-memory)

Apache Helix for cluster

management

Apache Kafka (for near real-time)

and Hadoop

Data Indexes

Single Value

Index

Multi Value

Index

Inverted Index

• Fixed bit length encoding

• Sorted Index

• Secondary Sorted Index

• Multi-value Fixed bit length encoding

• BitMap Multi-value Index

• P4Delta

• Modified P4Delta

• BitMap

Cluster Management

• Create Resources

• Update Resource meta data

• Expand/Contract partitions dynamically

• Query Router

Data Ingestion

Kafka for Realtime

Hadoop for Historical

High Level Architecture

PINOT

Hadoop Kafka

Historical Realtime

CLUSTER MANAGER

Controller

Helix

Zookeeper

Broker 1 Broker 2

Server 1 Server 2 Server 3

Core Features

Low latency and high QPS OLAP Queries

with real-time ingestion

Support complex dimensions

Operational simplicity

Data bootstrapping & reconciliation

Usage @ Linkedin

About 18 member facing products

on Linkedin.com

Internal Reporting

Open-source.…coming soon

Reporting UI: serve & taste data

Collect & Prepare Data Kafka + Hadoop

Serve Data Pinot


I want to access big data without

running SQL

Business need

Start a new dashboard with one click

Select what metrics/dimensions you want

Charts are rendered in just a few seconds

Zoom into a single chart

Filter on various dimensions

Access everywhere

Portal that connects dashboards, internal reports,

and internal Wiki Pages

Enterprise analytics portal

Scale of the data

Pinot for interactive analysis

Self service visualization for insights

Summary

We are hiring

Jonathan Wu

www.linkedin.com/in/pneppalli

www.linkedin.com/in/jiyewu

www.linkedin.com/in/chiyikuan

[email protected]

Praveen Neppalli Naga [email protected]

Chi-yi Kuan [email protected]

650-605-2184

650-962-3299

650-426-6301

Data & Analytics

How LinkedIn Democratizes Big Data Visualization