28
Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Embed Size (px)

Citation preview

Page 1: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Undo: Update and Futures

Aaron Brown

ROC Research GroupUniversity of California, Berkeley

Summer 2003 ROC Retreat5 June 2003

Page 2: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 2

Outline

• Recap of Undo for Operators

• Measurements of e-mail undo

prototype

• Upcoming: human evaluation

• Potential future extensions

Page 3: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 3

Recap: What Is “Operator Undo”?• Give operators and system admins

the ability to “travel in time”– to undo the effects of erroneous actions

» configuration changes» new software deployment» patches and upgrades» problem repairs

– to retroactively repair other problems affecting state» software bugs» viruses» external attacks

Aaron Brown
From 10,000 feet this looks very much like the kind of undo we see in productivity applications like MS Word. But there are a few significant differences that make the problem harder
Aaron Brown
DON"T ENUMERATE IN VOICEOVER
Page 4: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 4

Recap: Three R’s Undo Model

• Time travel for system operators– Rewind: roll back all state, users’ and operator’s

– Replay: re-execute rewound user events» operator timeline must be restored manually, if desired» may cause externally-visible paradoxes for users

– Repair: alter past operator events to avert problems

User timeline

Operator timeline

“Undo!”

Aaron Brown
Paradox problem is impossible to solve in general, and so it's a problem that people in the research community have been unwilling to tackle. But it's absolutely essential to solve if we're going to build undo for operators, and so, undeterred, I set out to find a regime in which the problem was tractable. And it turns out that...
Page 5: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 5

A Simple Solution for a Common Case• Undo for services with human end-users

– centralized state scopes the problem– human users provide flexibility for handling

paradoxes» undo is typically transparent to end-user, but not perfect» worst-case: end-user must reconcile mental model based

on supplied hints

• Applicabilityideally suited to Undo poorly suited to Undo

onlineauctions

missilelaunchcontrol

onlineshopping

sharedcalendaring

e-mail financialapplications

file/blockstorageservice

websearch

Aaron Brown
These (left) are all apps where the state is reasonably self-contained and where the consumer of the state is a human user, so if there's a paradox we can either compensate for it or help the user understand why it happened
Aaron Brown
These (right) either have decentralized state, or expose state to entities that can't deal with paradoxes
Page 6: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 6

Architecture in Brief

• Target– black-box services with

human end-users– single-host, for simplicity

• Approach– rewindable storage– intercept, log, replay

user requests

• Fault assumptions– service can be

arbitrarily incorrect

Operator

Repairs

Application Service

Can include: - user state - application - OS

RewindableStorage

Users

App. Proxy

App. protocol

App. protocol

UserTimeline

Log

User events

Page 7: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 7

Instantiation: E-mail Prototype

Operator

Repairs

E-mail Store Service

Can include: - mailboxes - server code - OS

NetAppFiler

Users

IMAP/SMTPProxy

IMAP/SMTP

IMAP

UserTimeline

Log

E-mail events

• Prototype target– e-mail store service

» leaf node in e-mail delivery network

• Implementation– NetApp filer provides

rewindable storage layer– e-mail-specific proxy

intercepts/replays IMAP & SMTP requests

SMTP

Page 8: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 8

Key Concept: Verbs

• Verbs encode user events– encapsulate application protocol commands

» record of desired user action» context-independent record of parameters» record of externally-visible output

– intended to capture intent of protocol commands, not effects on system state

• Example verbs for e-mail (simplified)

– SMTP: DELIVER {to, from, messageText} {}– IMAP: COPY {srcFolder, msgNum[], dstFolder}

{}FETCH {folder, msgNum[], fetchSpec}

{text}

Page 9: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 9

Role of Verbs

• Verbs enable replay– verb log forms a history of end-user interaction

» dissociated from original system context» annotated with original output to end-user» annotated with external consistency policy and

compensations for consistency violations

• Verbs make it easier to reason about 3R’s– define exactly what user state is preserved by 3R cycle

• Verbs capture key application semantics– consistency model and commutativity of operations

Page 10: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 10

Outline

• Recap of Undo for Operators

• Measurements of e-mail undo

prototype

• Upcoming: human evaluation

• Potential future extensions

Page 11: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 11

E-mail Prototype Details

• Target service: e-mail store service– a leaf node in the Internet e-mail network

• Prototype details– wraps an existing IMAP/SMTP e-mail store

service» not platform-specific» evaluation uses sendmail and the UW IMAP server

– written in Java» ~25K lines (~9K semicolons)» about 1/8 the size of the mail service itself, in LoC

Page 12: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 12

Prototype Measurements

• Experiments– space overhead– time overhead– rewind & replay time

• Evaluation workload– modified SPECmail2000 workload with 10,000

users » simulates traffic seen by ISP mail server» modified to use IMAP instead of POP; all mail kept

local

Page 13: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 13

IMAP SMTP IMAP SMTP

Ses

sio

n L

eng

th (

ms)

0

200

400

600

800

1000

1200Without UndoWith Undo

Null Session Median Session

2.3x

1.8x

1.7x

1.2x

Feasibility: Space & Time Overhead• Space overhead

– 0.45 GB/day/1000 users » uncompressed» Java serialization bug

overhead factored out (>2x bigger)

– ~250,000 user-days of data on one 120GB disk

• Time overhead– IMAP/SMTP session lengths

for SPECmail workload:

– below perceived “sluggishness” threshold for interactive apps.

Aaron Brown
overhead is below threshold for human annoyance
Page 14: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 14Users

Rep

lay

Sp

eed

up

0

5

10

15

20

25

30

1.3x2.6x

10,0005,0001,000

Real-Time

12.8x

29.2x

500

Feasibility: Rewind and Replay• Rewind

– NetApp filer snapshot restore: ~8 seconds

» independent of amount of data to restore

» but not undoable– alternative is O(#files)

» 10 minutes for 10,000 users

• Replay– replay speed: ~9

verbs/sec– with parallel, O-O-O

replay– better connection

management will help– compared to real-time:

Aaron Brown
actually, avg is 8.84 v/s, with stddev 0.49
Page 15: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 15

Outline

• Recap of Undo for Operators

• Measurements of e-mail undo

prototype

• Upcoming: human evaluation

• Potential future extensions

Page 16: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 16

Evaluating Undo: Human Factors• Undo is a recovery tool for human

operators– effectiveness depends on how it is used

» will it address the problems faced by real operators?» will operators know when/how to use it?» does it improve dependability over manual recovery?

• Need methodology that synthesizes systems benchmarking with human studies– include human operators to drive recovery– but focus is on the system and system metrics

» recovery time, dependability, performance

Page 17: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 17

Evaluating Human Factors of Undo• Three-step process

1) survey operators to identify real-world problems» evaluate whether Undo will address them» collect scenarios for step 2

2) controlled laboratory experiments involving humans» evaluate Undo against manual recovery» use scenarios from step 1» evaluate with dependability metrics: recovery time,

correctness, performance

3) long-term ethnographic study of deployed system» evaluate dependability benefits of Undo “in the wild”» requires time and resources beyond the scope of this work

Page 18: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 18

Step 1: Survey Operators

• Online survey of e-mail system operators– questions on daily tasks, challenges, recent problems– 68 responses

• Results

configurationdeployment/upgradeotherundoablenon-undoable

Common Tasks Challenging Tasks Lost e-mail problems

50%56%

25%

26% 17%

25%18%

31%

33%12%1%

6%

(151 total) (68 total) (12 total)

» configuration and deployment issues dominate» Undo potentially useful for majority of tasks, problems

Page 19: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 19

Step 2: Lab Experiments w/Humans• Questions to answer

– do operators know when Undo is appropriate?– does having Undo improve dependability?

• Compare e-mail systems with & without Undo– randomized human trials– each trial structured as a dependability

benchmark

• In progress

Page 20: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 20

Dependability Benchmarks

• Dependability benchmark basics– apply workload– simulate realistic problem scenario– measure recovery time, correctness, performance

recovery time

performability impact(performance, correctness)

start ofscenario

normal behavior

perf

orm

ab

ilit

y end ofscenario

– trial scenarios chosen based on survey results» including scenarios where Undo is unlikely to help

See: Brown, Chung, Patterson, “Including the Human Factor in Dependability Benchmarks”, DSN WDB 2003.Brown, Patterson, “Towards Availability Benchmarks...”, USENIX 2000.

Page 21: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 21

Lab Experiments with Humans• Some key subtleties

– overcoming mental model inertia» select and train less-experienced subjects

– making scenarios tractable» subject plays role of shift-work operator repairing

documented problem from previous shift

• Status: in progress– experimental protocol defined– just received Human Subjects Committee

approval– data collection to begin shortly

Page 22: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 22

Outline

• Recap of Undo for Operators

• Measurements of e-mail undo

prototype

• Upcoming: human evaluation

• Potential future extensions

Page 23: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 23

Extending Undo: Other Apps

• When is undo possible?– state is centralized (or observable)– all output to external entities can be intercepted

» and can be correlated to user requests

– external output is provisional for some time window» e.g., can be cancelled, altered, reissued» or simply doesn’t matter in application’s external

consistency model

ideally suited to Undo poorly suited to Undo

onlineauctions

missilelaunchcontrol

onlineshopping

sharedcalendaring

e-mail financialapplications

file/blockstorageservice

websearch

Page 24: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 24

Extending Undo: Spheres of Undo• Rewindable storage defines a sphere of undo

• All info crossing sphere must be intercepted– input: becomes verbs– output: becomes externalized output

» must be possible to associate output with a verb

Rewindable

Storage

Application ServiceSphere of

Undo

Users

Service

RS

Externalservice(output consumer)

Externaldata source

P

P

P

Page 25: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 25

Further Extensions

• Verb concept may have broader applicability– impact analysis of configuration changes

» use verb log as annotated history to evaluate changes on cloned system

– self-checking data set for self-testing components– general approach to defining & encapsulating

application consistency from end-user point of view?» today, procedural and implicit» can verbs be made declarative? » can verbs be extracted automatically from object

relationships?

Page 26: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 26

More Verb Extensions

• Extending verbs to administrative tasks– in desktop environment

» manage software installations/upgrades» provide “system refresh” using undo techniques» capture configuration changes at intent level

– in server environment» move common tasks into undo framework» dynamically identify and guide ongoing operations

tasks by analyzing verb sequences

– key challenge in either environment is to capture breadth of administrative tasks

Page 27: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 27

Conclusions

• E-mail implementation demonstrates feasibility of Undo– improvements in protocols, base storage

technology would help reduce overhead

• Human experiments to evaluate usefulness about to begin

• Verb construct has significant potential for further research– extending Undo to broader domains– exploring other tools to support human operators

Page 28: Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Undo: Update and Futures

• Acknowledgements– ROC Undergraduate Benchmarking Group

» Leonard Chung, Billy Kakes, Calvin Ling

– Berkeley/Stanford ROC Research Group

• For more info:– [email protected]– http://roc.cs.berkeley.edu/