27
Cache Provisioning for Interactive NLP Services Jaimie Kelley and Christopher Stewart The Ohio State University Sameh Elnikety and Yuxiong He Microsoft Research

Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Cache Provisioning for Interactive NLP Services

Jaimie Kelley and Christopher StewartThe Ohio State University

Sameh Elnikety and Yuxiong HeMicrosoft Research

Page 2: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Interactive NLP

“Watson looks at the language. It tries to understand what is being said and find the most appropriate, relevant answer...”

Rob High, IBM Fellow

● Natural Language Processing (NLP) Service– Users issue queries via network messages

– Analyzes and understands human text in context

– Response times should be fast (bounded latency)

– Examples: Bing, Google, IBM Watson, OpenEphyra

Page 3: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Interactive NLP

NLP Service Layer

Lucene

Scalable, Distributed Main Memory Storage

Memcache-1 Memcache-N

Persistent Disk Storage Layer

MYSQL-1 MYSQL-M

Data Flow

1. NLP processing: Extract Keywords and synonyms

2. Fetch relevant data

inverted indexes

raw content

3. More processing, reply

OpenEphyra

Query: Who volunteered as District 12's tribute in the Hunger Games?

Answer: Katniss Everdeen

Page 4: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Interactive NLP Services

NLP Service Layer

Lucene

Scalable, Distributed Main Memory Storage

Memcache-1 Memcache-N

Persistent Disk Storage Layer

MYSQL-1 MYSQL-M

● To compete with Jeopardy champions, IBM Watson had 3 sec. latency bound

● Our experience: 8th graders -> 4 sec. Bound

● Internet services demand sub-second response times

Tight Latency Bounds

● Access DRAM in parallel

● Disk accesses timeout!

OpenEphyra

Page 5: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Problem: Data Growth

Data is growing too fast to keep in main memory.

Wor

ld D

ata

(Bill

ions

of

GB

)

90% 2009-2011

62%2008-2009

Sources: IDC, Radicati Group, Facebook, TR research, Pew Internet

Page 6: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Cache Management

● When the data is too large to fit in cache, what should we evict?

● Traditional Compute Workloads:

Every data access is needed to answer a query

Evict least recently used (LRU)

● NLP services:

Only some data are needed (redundant content)

Remove redundant data with little quality loss

Page 7: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Quality-Aware Cache Management

NLP Service Layer

Main Memory Storage

Persistent Disk Storage Layer

New DataHarry PotterChapter 1

New York TimesNov 2006

Hunger GamesChapter 9

New York TimesOct 2006What Data Will

LRU Evict?New York Times or

Hunger Games

For NLP services, evict data that will cause the least quality loss.

Page 8: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Main Memory Storage

NLP Service Layer

Our Approach

● Does existing cache management work well for NLP?

NLP Service Layer

Main Memory StorageSize < Data Growth

Most Relevant Document:New York Times Oct 2006

Query: Who volunteered as District 12's tribute?

Size > Data Growth

Real System Ideal System

Most Relevant Document:Hunger Games, Chapter 9≠

We measure quality lossi.e., dissimilarity between real and ideal

Page 9: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

NLP Service Layer

Main Memory StorageSize: 40 GB

Cache Miss Rate: 20%Avg. Quality Loss: 15%

Our Approach

● Can quality-aware cache management reduce provisioning costs over time?

NLP Service Layer

Main Memory StorageSize: 20 GB

Cache Miss Rate: 40%Avg. Quality Loss: 15%

Page 10: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Outline

● Introduction

● Defining Quality Loss– Intuition, base model, full model

● Quality Loss in NLP Services– Representative queries, data sets, infrastructure,

results

● Quality-Aware Cache Provisioning

● Conclusion

Page 11: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Intuition: What is quality loss?

Real System:

Query: Who volunteered as District 12's tribute?

Answers:

Katniss Foxface

Harry Potter

Peeta Mellark

Jeanine F. Pirro

Ideal System:

Query: Who volunteered as District 12's tribute?

Answers:

Katniss Everdeen

Peeta Mellark

Prim

Foxface

Page 12: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Base Model: Quality Loss

S(w, ŵ, D, Q) = 1 - ∑

Q ∑

KФ(∑

k2 |R

q,k (ŵ, D) ∩ R

q,k2 (w, D)|)

|Q|K

Harry Potter

Jeanine F. Pirro

Katniss Everdeen

Prim

Foxface

Peeta Mellark

Real Real & Ideal Ideal

Page 13: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Full Model: Quality Loss

● NLP responses present challenges:

– Synonyms● Answers from ideal setup fall within categories● Real setup should match categories

Query: Flowers in Washington State

Answers: florists, gardening, Coast Rhododendron

– Noise Tolerance● Answers from the real setup can be a superset

of answers from ideal setup

Page 14: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Outline

● Introduction

● Defining Quality Loss– Intuition, base model, full model

● Quality Loss in NLP Services– Query trace, data sets, infrastructure, results

● Quality-Aware Cache Provisioning

● Conclusion

Page 15: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Infrastructure

NLP Service

Lucene

Scalable Main Memory

Redis

Disk Storage

DiskSim

Real NLP Service:10 sec latency boundAnalyzes keywords

Ranks all indexes requested from storage.

Distributed Cache:Set max size < | data |

Implemented interface between cache and disk.

Disk Storage:Two 3-TB hard disks

We used Lucene libraries

Ideal NLP Service:Exact same processing

Distributed Cache:No set maximum size

9 GB / cache node

Provision more as needed

Disk Storage:No timeouts

Key Insight: Ideal setup returns the result created by processing all relevant data without timeouts.

Page 16: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Obtaining a Query Trace

● Google Trends

– Trace of most popular queries per category 2004-2013

– Most (over 70%) are multiple word queries

Sept. 2013 Books:

The BibleFifty Shades of Grey

The Great GatsbyUnder the Dome

The Hunger GamesPsalms

The Lord of the RingsSword Art Online

1984The 85 Ways to Tie a Tie

June 2009 Books:

The BibleAlice's Adventures in ...The Lord of the Rings

Midnight Sun1984

Kama SutraRomeo and Juliet

QuranDiagnostic and ...

Diccionario de la lengua...

Jan. 2004 Books:

The BibleThe Lord of the Rings

The Da Vinci Code1984

Kama SutraRomeo and Juliet

HamletMacbeth

To Kill a MockingbirdThe World Factbook

Page 17: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Data Sets

Wikipedia

January 2001 – March 2013

Total Index size: 4.7 TB

Max Data/Month: 30 GB

New York Times

October 2004 – March 2006

Total Index size: 3 GB

Max Data/Month: 88 MB

Page 18: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Types of Data

Content:

Key: HG_ch9.txt

Value:... Katniss Everdeen, tribute from District Twelve

Inverted Index:

Key: Katniss_term

Value:

HG_ch2.txtHG_ch9.txt

Katniss

Everdeen

tribute

District

Twelve

Key: Tribute_term

Value:

New York Times 10/22WikipediaHG_ch9.txt

Page 19: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Caching Policies: LRU

● Least Recently Used Cache Management

– Common approach in distributed stores

– Implemented in Redis

● Infrequent search terms are sent to disk, unable to be accessed within latency bounds

Main Memory StorageTribute(index)

Broomstick(content)

Gift(index)

NY Times11/2013

Main Memory StorageTribute(index)

Gift(index)

New Data Eviction Replacement

NY Times11/2013

Page 20: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Caching Policies: LRU

LRU

1 1.5 2 2.5 3 3.5 40%

20%

40%

60%

80%

100%NYT WIKI

Data / DRAM

Qua

lity

Loss

(%)

Quality loss rises as terms are evicted

Multiple word queries and single-word synonyms benefit from redundancy

● Non-evicted data may overlap evicted data

● Answers from ideal system can come from non-evicted terms

In both Wikipedia pages and NYT articles, content related to queries were found in non-evicted terms.

Page 21: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Caching Policies: Limit Content

Content:

Key: HG_ch9.txt

Value:... Katniss Everdeen, tribute from District Twelve

Inverted Index:

Key: Katniss_term

Value:

HG_ch2.txtHG_ch9.txt

Key: Tribute_term

Value:

New York Times 10/22WikipediaHG_ch9.txt

Katniss

Everdeen

tribute

District

Twelve

Page 22: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

1 1.5 2 2.5 3 3.5 40%

20%

40%

60%

80%

100% NYT WIK I

D a ta / D R AM

Caching Policies: Limit Content

Limit Content Quality loss rises as key documents are excluded from index

Documents may contain content that effectively has the same meaning

Wikipedia pages are less redundant, as a result of their review process

On average 1 of 2 NYT articles can be removed with low quality loss

Page 23: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Outline

● Introduction

● Defining Quality Loss– Intuition, base model, full model

● Quality Loss in NLP Services– Representative queries, data sets, infrastructure,

results

● Quality-Aware Cache Provisioning

● Conclusion

Page 24: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

DRAM Trends

Provisioning Cost = #GB RAM * $/GB RAMhttp://www.jcmit.com/memoryprice.htm

2005 2007 2009 2011 20130

50

100

150

Year

$/G

B R

AM

Page 25: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Cache Provisioning Policies

● Over provisioning

– Always have more RAM available than data

● Provision on Data Growth

– Wait for more data to be added

● Provision on Quality Loss

– Wait for quality loss to pass a threshold

Main Memory StorageTribute(index)

Broomstick(content)

Gift(index)

NY Times11/2013

Replace or Provision?

Page 26: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Cost Savings

LRU Limit Content

0 1 2 3 4 5 6 7 80%

20%

40%

60%

80%

100%OverprovisioningProvision on Data GrowthProvision on Quality Loss

#Months

Rela

tive

Cost

0 5 10 150%

20%

40%

60%

80%

100%OverprovisioningProvision on Data GrowthProvision on Quality Loss

#Months

Rela

tive

Cost

Page 27: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage

Conclusion

● Data is growing fast, forcing NLP services to respond to queries after accessing only a portion of the data

● NLP services can remove redundant content and/or terms from distributed caches with little quality loss

● New cache management approach: Wait until quality loss occurs before provisioning DRAM to reduce costs