107
go.indeed.com/IndeedEngTalks

[@IndeedEng] Large scale interactive analytics with Imhotep

Embed Size (px)

DESCRIPTION

Link to video: https://www.youtube.com/watch?v=IZ-kC6ut1Lg In a previous talk, we explained how we developed Imhotep, a distributed system for building decision trees for machine learning. We went on to describe how we build large scale interactive analytics tools using the same platform. This has kept our engineering and product organizations focused on key metrics by analyzing test results. It also gives our marketing organization timely and accurate insight into our data - allowing us to identify opportunities, spot trends, and learn about our job seekers. In this talk, Zak Cocos, who leads our Marketing Sciences team, and Product Manager Tom Bergman will discuss and provide examples of the valuable insights that can be gained by using Imhotep with almost any data set.

Citation preview

Page 2: [@IndeedEng] Large scale interactive analytics with Imhotep

Large Scale Interactive Analytics

with Imhotep

Page 3: [@IndeedEng] Large scale interactive analytics with Imhotep

Tom BergmanProduct Manager

Page 4: [@IndeedEng] Large scale interactive analytics with Imhotep

Zak CocosManager

Marketing Science

Page 5: [@IndeedEng] Large scale interactive analytics with Imhotep

We help people get jobs.

Page 6: [@IndeedEng] Large scale interactive analytics with Imhotep
Page 7: [@IndeedEng] Large scale interactive analytics with Imhotep

What is Imhotep?

Imhotep is a highly scalable analytics architecture for querying faceted datasets

Page 8: [@IndeedEng] Large scale interactive analytics with Imhotep

Open sourcing Imhotep

Imhotep will be an OPEN SOURCE highly scalable analytics architecture for querying faceted datasets

Page 9: [@IndeedEng] Large scale interactive analytics with Imhotep

People

Tools

System

Data

Page 10: [@IndeedEng] Large scale interactive analytics with Imhotep

People

Tools

Data

System

Page 11: [@IndeedEng] Large scale interactive analytics with Imhotep

People

Data

Tools

System

Page 12: [@IndeedEng] Large scale interactive analytics with Imhotep

People

Data

Tools

System

Page 13: [@IndeedEng] Large scale interactive analytics with Imhotep

A Brief History of Analytics

@Indeed

Page 14: [@IndeedEng] Large scale interactive analytics with Imhotep

What's best for thejob seeker?

Page 15: [@IndeedEng] Large scale interactive analytics with Imhotep

Test & Measure EVERYTHING

Page 16: [@IndeedEng] Large scale interactive analytics with Imhotep
Page 17: [@IndeedEng] Large scale interactive analytics with Imhotep

Query

Page 18: [@IndeedEng] Large scale interactive analytics with Imhotep

Query Location

Page 19: [@IndeedEng] Large scale interactive analytics with Imhotep

Query Location

Impression

Page 20: [@IndeedEng] Large scale interactive analytics with Imhotep

Title: Front End Software EngineerPosition: 1Clicked: 0Country: USQuery: indeed software engineerLocation: austinTimestamp:2014-04-30T20:00:00

Organic Impression Log Entry

Page 21: [@IndeedEng] Large scale interactive analytics with Imhotep

Analytics on Raw Logs

Page 22: [@IndeedEng] Large scale interactive analytics with Imhotep

Ramses

Page 23: [@IndeedEng] Large scale interactive analytics with Imhotep

● Search logs● Extract metrics from matches● Graph aggregated metrics

Ramses

Page 24: [@IndeedEng] Large scale interactive analytics with Imhotep

● Search logs● Extract metrics from matches● Graph aggregated metrics

Input -> Query and MetricOutput -> Aggregated metrics by bucket

Ramses

Page 25: [@IndeedEng] Large scale interactive analytics with Imhotep

How many organic clicks did we have in Australia?

Page 26: [@IndeedEng] Large scale interactive analytics with Imhotep

QUERY

country:au

METRIC

organic_clicks

How many organic clicks did we have in Australia?

Page 27: [@IndeedEng] Large scale interactive analytics with Imhotep

How many organic clicks did we have in Australia?

Page 28: [@IndeedEng] Large scale interactive analytics with Imhotep

Does test group A or B have more revenue?

Page 29: [@IndeedEng] Large scale interactive analytics with Imhotep

QUERY

testgroup:A, testgroup:B

METRIC

revenue

Does test group A or B have more revenue?

Page 30: [@IndeedEng] Large scale interactive analytics with Imhotep

Does test group A or B have more revenue?

Page 31: [@IndeedEng] Large scale interactive analytics with Imhotep

How has traffic from Yahoo! changed over time in Great Britain, Germany, and Japan?

Page 32: [@IndeedEng] Large scale interactive analytics with Imhotep

QUERY

from:yahoo AND country:(gb, de, jp)

METRIC

visits

How has traffic from Yahoo! changed over time in Great Britain, Germany, and Japan?

Page 33: [@IndeedEng] Large scale interactive analytics with Imhotep

How has traffic from Yahoo! changed over time in Great Britain, Germany, and Japan?

Page 34: [@IndeedEng] Large scale interactive analytics with Imhotep

● How many unique queries in the US?

● What are the top 50 queries in the US?

● How many clicks did each of those queries receive?

Questions Ramses can’t answer

Page 35: [@IndeedEng] Large scale interactive analytics with Imhotep

Imhotep

Page 36: [@IndeedEng] Large scale interactive analytics with Imhotep

Began as a distributed iteration and group-by

engine for building click prediction models.

Imhotep Origins

Page 37: [@IndeedEng] Large scale interactive analytics with Imhotep

We use an iterative algorithm to build decision

trees level-by-level.

Decision Tree Builder

Page 38: [@IndeedEng] Large scale interactive analytics with Imhotep
Page 39: [@IndeedEng] Large scale interactive analytics with Imhotep

Began as a distributed iteration and group-by

engine for building click prediction models.

Leveraged ability to do massive group-bys and

aggregates to make real-time analytics engine.

Imhotep Origins

Page 40: [@IndeedEng] Large scale interactive analytics with Imhotep

How many Android App users with accounts

older than 30 days saved at least 1 job in the

past week?

Page 41: [@IndeedEng] Large scale interactive analytics with Imhotep

What titles have the highest click-through rate

for the query “Architecture” in the US?

What about the lowest click-through rate?

Page 42: [@IndeedEng] Large scale interactive analytics with Imhotep

For job seekers who click on Google jobs in

Ireland, what other company’s jobs do they

click on?

Page 43: [@IndeedEng] Large scale interactive analytics with Imhotep

Zak CocosManager

Marketing Science

Page 44: [@IndeedEng] Large scale interactive analytics with Imhotep

I also help people get jobs.

Page 45: [@IndeedEng] Large scale interactive analytics with Imhotep

Marketing Sciences

Research, analysis, and automation team supporting marketing initiatives

Page 46: [@IndeedEng] Large scale interactive analytics with Imhotep

Imhotep

Imhotep is a highly scalable, [soon to be] open source, analytics architecture for querying faceted datasets

Page 47: [@IndeedEng] Large scale interactive analytics with Imhotep

Imhotep@Indeed

Ad hoc exploration

Page 48: [@IndeedEng] Large scale interactive analytics with Imhotep

Imhotep@Indeed

Ad hoc exploration

Specific analysis

Page 49: [@IndeedEng] Large scale interactive analytics with Imhotep

Imhotep@Indeed

Ad hoc exploration

Specific analysis

Extensible infrastructure

Page 50: [@IndeedEng] Large scale interactive analytics with Imhotep

Ad hoc exploration

Public Crunchbase Dataset

Source: CrunchBaseCrunchBase 2013 Snapshot © 2013

Page 51: [@IndeedEng] Large scale interactive analytics with Imhotep

Ad hoc exploration

Public Crunchbase Dataset

Document

Source: CrunchBaseCrunchBase 2013 Snapshot © 2013

Page 52: [@IndeedEng] Large scale interactive analytics with Imhotep

Ad hoc exploration

Public Crunchbase Dataset

Fields

Source: CrunchBaseCrunchBase 2013 Snapshot © 2013

Page 53: [@IndeedEng] Large scale interactive analytics with Imhotep

Ad hoc exploration

Public Crunchbase Dataset

Metric

Source: CrunchBaseCrunchBase 2013 Snapshot © 2013

Page 54: [@IndeedEng] Large scale interactive analytics with Imhotep

Interactive tool for exploring Imhotep data

Imhotep Data Explorer

Page 55: [@IndeedEng] Large scale interactive analytics with Imhotep

Interactive tool for exploring Imhotep data

Also: a badass hyperlinked pivot table

Imhotep Data Explorer

Page 56: [@IndeedEng] Large scale interactive analytics with Imhotep

Imhotep is Large Scale

Total size of all indexes: 125TB

Jobsearch index (largest): 30TB

● Over 48 billion documents

Page 57: [@IndeedEng] Large scale interactive analytics with Imhotep
Page 58: [@IndeedEng] Large scale interactive analytics with Imhotep

Query

Page 59: [@IndeedEng] Large scale interactive analytics with Imhotep

Query Location

Page 60: [@IndeedEng] Large scale interactive analytics with Imhotep

Query Location

Organic Impression

Page 61: [@IndeedEng] Large scale interactive analytics with Imhotep

Organic Impression

A job that was displayed as the result of a search

Page 62: [@IndeedEng] Large scale interactive analytics with Imhotep
Page 63: [@IndeedEng] Large scale interactive analytics with Imhotep

Title

Page 64: [@IndeedEng] Large scale interactive analytics with Imhotep

Company Information

Page 65: [@IndeedEng] Large scale interactive analytics with Imhotep

Description

Page 66: [@IndeedEng] Large scale interactive analytics with Imhotep

Job Age

Page 67: [@IndeedEng] Large scale interactive analytics with Imhotep

abredistimeacmetimeaddltimeadscadsdelayadsibadscbadsiboostojcboostojibsjcbsjcwiabsjibsjindappliesbsjindappviewsbsjrevbsjwiackcntckszcountsctkagectkagedaysdayofweekdcpingtimedomTotalTimeds-mpo

dsmissdstimefeatempfjfreekwacfreekwarevfreesjcfreesjrevfrmtimegalatdelayiplatiplongjslatdelayjsvdelaykwackwacdelaykwaikwarevkwcntlacinsizelacsgsizelmstimempotimemprtimenavTotTimendxtime

ojcojclongojcshortojcwiaojiojindappliesojindappviewsojwiaoocscpageprcvdlatencyprimfollowcntprvwojiprvwojlatprvwojopentimeprvwojreqradscradsirecidlookupbudgetrectimeredirCountredirTimerelfollowcntrespTimereturnvisitrojc

rojirqcntrqlcntrqqcntrrsjcrrsjirrsjrevrsavailrsjcrsjirsusedrsviableserpsizesjcsjcdelaysjclongsjcntsjcshortsjcwiasjisjindappliessjindappviewssjrevsjwiasllatsllong

sqcsqisugtimesvjsvjnostarsvjstartadsctadsitimetimeofdaytotcnttotfollowcnttotrevtottimetsjctsjcwiatsjitsjindappliestsjindappviewstsjrevtsjwiaunqcntvpwacinsizewacsgsize

Page 68: [@IndeedEng] Large scale interactive analytics with Imhotep

Organic Impression Document

Title: Front End Software EngineerPosition: 1Clicked: 0Country: USQuery: indeed software engineerLocation: austinTimestamp:2014-04-30T20:00:00

Page 69: [@IndeedEng] Large scale interactive analytics with Imhotep

Organic Impression Index

Title: Front End Software EngineerPosition: 1Clicked: 0Country: USQuery: indeed software engineerLocation: austinTimestamp:2014-04-30T20:00:00

Page 70: [@IndeedEng] Large scale interactive analytics with Imhotep

Imhotep Data Explorer can’t...

Combine results from multiple datasets

Page 71: [@IndeedEng] Large scale interactive analytics with Imhotep

Combine results from multiple datasets

Be easily automated

Imhotep Data Explorer can’t...

Page 72: [@IndeedEng] Large scale interactive analytics with Imhotep

Imhotep Query Language (IQL)

Page 73: [@IndeedEng] Large scale interactive analytics with Imhotep

IQL - Imhotep Query Language

Can combine results from multiple datasets

Allows for automation of data tools

Page 74: [@IndeedEng] Large scale interactive analytics with Imhotep

IQL queries - requirements

Index Date rangeMetrics

Page 75: [@IndeedEng] Large scale interactive analytics with Imhotep

IQL queries - optional

Index Date rangeMetrics

FiltersGroup by

Page 76: [@IndeedEng] Large scale interactive analytics with Imhotep

IQL - Metrics

select count()

from organic

‘2013-12-05’

‘2013-12-10’

where country=ie

and clicked=1

group by companyid

Metrics

Page 77: [@IndeedEng] Large scale interactive analytics with Imhotep

select count()

from organic

‘2013-12-05’

‘2013-12-10’

where country=ie

and clicked=1

group by companyid

IQL - Indexes

Index

Page 78: [@IndeedEng] Large scale interactive analytics with Imhotep

select count()

from organic

‘2013-12-05’

‘2013-12-10’

where country=ie

and clicked=1

group by companyid

IQL - Date Range

Date Range

Page 79: [@IndeedEng] Large scale interactive analytics with Imhotep

select count()

from organic

‘2013-12-05’

‘2013-12-10’

where country=ie

and clicked=1

group by companyid

IQL - Filters

Filters

Page 80: [@IndeedEng] Large scale interactive analytics with Imhotep

select count()

from organic

‘2013-12-05’

‘2013-12-10’

where country=ie

and clicked=1

group by companyid

IQL - Filters

Groups

Page 81: [@IndeedEng] Large scale interactive analytics with Imhotep

IQL Question

Do companies that have raised more than $10 million in the Austin get more clicks on average than those raised less than $10 million?

Page 82: [@IndeedEng] Large scale interactive analytics with Imhotep

Methodology

1) organic index: select companies in the US which received organic clicks

Page 83: [@IndeedEng] Large scale interactive analytics with Imhotep

Methodology

1) organic index: select companies in the US which received organic clicks

2) crunchbase index: select companies, and the amount of funding for companies receiving investments in Austin

Page 84: [@IndeedEng] Large scale interactive analytics with Imhotep

Methodology

1) organic index: select companies in the US which received organic clicks

2) crunchbase index: select companies, and the amount of funding for companies receiving investments in Austin

3) Join, segment, and do the math!

Page 85: [@IndeedEng] Large scale interactive analytics with Imhotep

Tom BergmanProduct Manager

Page 86: [@IndeedEng] Large scale interactive analytics with Imhotep

I still help people get jobs.

Page 87: [@IndeedEng] Large scale interactive analytics with Imhotep

Large Scale Interactive Analytics Platform

● 123 Unique Indexes● Largest Index 30TB● Total size ~125TB

Page 88: [@IndeedEng] Large scale interactive analytics with Imhotep

Large Scale Interactive Analytics Platform

IQL -> Largely Programmatic access● approx 76k queries/day● Avg time to execute 0.67 seconds

Ramses -> Largely Human● approx 3,400 queries/day● Avg time to execute 4.4 seconds

Page 89: [@IndeedEng] Large scale interactive analytics with Imhotep

Large Scale Interactive Analytics Platform

Users● 198 unique users in past month● 25,622 unique queries in past month● Avg 53 queries/user per day

Page 90: [@IndeedEng] Large scale interactive analytics with Imhotep

Large Scale Interactive Analytics Platform

40+ internal clients● 6 Analytics Webapps● 5 dashboards● 10 programming/scripting shells● 6 monitoring apps● … and more

Page 91: [@IndeedEng] Large scale interactive analytics with Imhotep

Large Scale Interactive Analytics Platform

One Tool-set for all data● Website usage● Operational Monitoring● Financial Reporting● Google Analytics● Internal Webapp Usage● External Reports

Page 92: [@IndeedEng] Large scale interactive analytics with Imhotep

Solving a real problem

Page 93: [@IndeedEng] Large scale interactive analytics with Imhotep

Providing the Best Results

Show the jobs that users are most interesting to our users

Page 94: [@IndeedEng] Large scale interactive analytics with Imhotep

Providing the Best Results

Clicks are a very good indicator of interest

Page 95: [@IndeedEng] Large scale interactive analytics with Imhotep

Providing the Best Results

Clicks are a very good indicator of interest

More clicks -> More RelevantLess clicks -> Less Relevant

Page 96: [@IndeedEng] Large scale interactive analytics with Imhotep

Architecture

Very hard query to serve correctly

Page 97: [@IndeedEng] Large scale interactive analytics with Imhotep

Architecture

Very hard query to serve correctly

Architecture terminology has been co-opted by technology

Page 98: [@IndeedEng] Large scale interactive analytics with Imhotep

Terminology Common to both Software and Architecture

BlueprintDesignFrameworkInfrastructureEngineerProject manager

DevelopmentTechnical architectSoftwareModelingComputationCode reviews

Page 99: [@IndeedEng] Large scale interactive analytics with Imhotep

Architecture vs Software Titles

ArchitectCAD DesignerProject Manager

vs

Software ArchitectUI DesignerProject Manager

Page 100: [@IndeedEng] Large scale interactive analytics with Imhotep

Query Management

Indeed uses Imhotep to improve matching

Page 101: [@IndeedEng] Large scale interactive analytics with Imhotep

Query Management

Indeed uses Imhotep to improve matching

Automatically detect results that should be added or removed from queries

Page 102: [@IndeedEng] Large scale interactive analytics with Imhotep

Query Management

Indeed uses Imhotep to improve matching

Automatically detect results that should be added or removed from queries

26,790 rules across all countries

Page 103: [@IndeedEng] Large scale interactive analytics with Imhotep

Imhotep Open Source

Imhotep Open Source ETA:August 1, 2014

Page 104: [@IndeedEng] Large scale interactive analytics with Imhotep

Imhotep Open Source

Follow along at our blogengineering.indeed.com

Sign up for mailing list to get latest updatesgo.indeed.com/imhotep-announce

Page 105: [@IndeedEng] Large scale interactive analytics with Imhotep

Q & A

Page 106: [@IndeedEng] Large scale interactive analytics with Imhotep

Next @IndeedEng TalkLaunching Indeed Around the World

Davide Novelli, International DirectorDavid Tulig, Tech Lead

May 28, 2014

http://engineering.indeed.com/talks

Page 107: [@IndeedEng] Large scale interactive analytics with Imhotep

More Questions?Jason David James Jeff