27
Trust-Sensitive Scheduling on the Open Grid Jon B. Weissman with help from Jason Sonnek and Abhishek Chandra Department of Computer Science University of Minnesota Trends in HPDC Workshop Amsterdam 2006

Trust-Sensitive Scheduling on the Open Grid

  • Upload
    sook

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Trust-Sensitive Scheduling on the Open Grid. Jon B. Weissman with help from Jason Sonnek and Abhishek Chandra Department of Computer Science University of Minnesota Trends in HPDC Workshop Amsterdam 2006. Background. Public donation-based infrastructures are attractive - PowerPoint PPT Presentation

Citation preview

Page 1: Trust-Sensitive Scheduling on the Open Grid

Trust-Sensitive Scheduling on the Open Grid

Jon B. Weissmanwith help from Jason Sonnek and Abhishek

ChandraDepartment of Computer Science

University of MinnesotaTrends in HPDC Workshop

Amsterdam 2006

Page 2: Trust-Sensitive Scheduling on the Open Grid

Background

• Public donation-based infrastructures are attractive– positives: cheap, scalable, fault tolerant

(UW-Condor, *@home, ...)

– negatives: “hostile” - uncertain resource availability/connectivity, node behavior, end-user demand => best effort service

Page 3: Trust-Sensitive Scheduling on the Open Grid

Background

• Such infrastructures have been used for throughput-based applications– just make progress, all tasks equal

• Service applications are more challenging– all tasks not equal– explicit boundaries between user requests– may even have SLAs, QoS, etc.

Page 4: Trust-Sensitive Scheduling on the Open Grid

Service Model

• Distributed Service– request -> set of independent tasks– each task mapped to a donated node– makespan

– E.g. BLAST service• user request (input sequence) + chunk of DB form

a task

Page 5: Trust-Sensitive Scheduling on the Open Grid

BOINC + BLAST

workunit = input_sequence + chunk of DBgenerated when a request arrives

Page 6: Trust-Sensitive Scheduling on the Open Grid

The Challenge

• Nodes are unreliable– timeliness: heterogeneity, bottlenecks, …– cheating: hacked, malicious (> 1% of SETi

nodes), misconfigured– failure– churn

• For a service, this matters

Page 7: Trust-Sensitive Scheduling on the Open Grid

Some data- timeliness

Computation Heterogeneity

- both across and within nodes

Communication Heterogeneity

- both across and within nodes

PlanetLab – lower bound

Page 8: Trust-Sensitive Scheduling on the Open Grid

The Problem for Today

• Deal with node misbehavior

• Result verification– application-specific verifiers – not general– redundancy + voting

• Most approaches assume ad-hoc replication– under-replicate: task re-execution (^ latency)– over-replicate: wasted resources (v throughput)

• Using information about the past behavior of a node, we can intelligently size the amount of redundancy

Page 9: Trust-Sensitive Scheduling on the Open Grid

System Model

Page 10: Trust-Sensitive Scheduling on the Open Grid

Problems with ad-hoc replication

Unreliable node

Reliable nodeTask x sent to group A

Task y sent to group B

Page 11: Trust-Sensitive Scheduling on the Open Grid

Smart Replication• Reputation

– ratings based on past interactions with clients

– simple sample-based prob. (ri) over window

– extend to worker group (assuming no collusion) => likelihood of correctness (LOC)

• Smarter Redundancy– variable-sized worker groups– intuition: higher reliability clients => smaller groups

Page 12: Trust-Sensitive Scheduling on the Open Grid

Terms• LOC (Likelihood of Correctness), g

– computes the ‘actual’ probability of getting a correct answer from a group of clients (group g)

• Target LOC (target)– the task success-rate that the system tries to ensure while

forming client groups– related to the statistics of the underlying distribution

12

1:,

12

1

1

12

1121

)1(k

kmm

k

iii

k

iik

ii rr

Page 13: Trust-Sensitive Scheduling on the Open Grid

Trust Sensitive Scheduling

• Guiding metrics– throughput : is the number of successfully

completed tasks in an interval

– success rate s: ratio of throughput to number of tasks attempted

Page 14: Trust-Sensitive Scheduling on the Open Grid

Scheduling Algorithms

• First-Fit– attempt to form the first group that satisfies target

• Best-Fit– attempt to form a group that best satisfies target

• Random-Fit– attempt to form a random group that satisfies target

• Fixed-size– randomly form fixed sized groups. Ignore client

ratings. • Random and Fixed are our baselines• Min group size = 3

Page 15: Trust-Sensitive Scheduling on the Open Grid

Scheduling Algorithms

Page 16: Trust-Sensitive Scheduling on the Open Grid

Scheduling Algorithms (cont’d)

Page 17: Trust-Sensitive Scheduling on the Open Grid

Different Groupings

target = .5

Page 18: Trust-Sensitive Scheduling on the Open Grid

Evaluation• Simulated a wide-variety of node

reliability distributions

• Set target to be the success rate of Fixed– goal: match success rate of fixed (which over-

replicates) yet achieve higher throughput– if desired, can drive tput even higher (but

success rate would suffer)

Page 19: Trust-Sensitive Scheduling on the Open Grid

Comparison

gain: 25-250%open question: how much better could we have done?

Page 20: Trust-Sensitive Scheduling on the Open Grid

Non-stationarity• Nodes may suddenly shift gears

– deliberately malicious, virus, detach/rejoin– underlying reliability distribution changes

• Solution– window-based rating (reduce from infinite)

• Experiment: “blackout” at round 300 (30% effected)

Page 21: Trust-Sensitive Scheduling on the Open Grid

Role of target

• Key parameter• Too large

– groups will be too large (low throughput)• Too small

– groups will be too small (low success rate)• Adaptively learn it (parameterless)

– maximizing * s : “goodput”– or could bias toward or s

Page 22: Trust-Sensitive Scheduling on the Open Grid

Adaptive algorithm

• Multi-objective optimization– choose target LOC to simultaneously

maximize throughput and success rate s1 2 s

– use weighted combination to reduce multiple objectives to a single objective

– employ hill-climbing and feedback techniques to control dynamic parameter adjustment

Page 23: Trust-Sensitive Scheduling on the Open Grid

Adapting target

• Blackout example

Page 24: Trust-Sensitive Scheduling on the Open Grid

Throughput (1=1, 2=0)

BF

-Uniform

BF

-Norm

Low

BF

-Norm

Hig

h

BF

-HeavyLow

BF

-HeavyH

igh

BF

-Bim

odal Min

AdaptMax0

5

10

15

20

25

30

Xput comparison - BF

Min

Adapt

Max

Page 25: Trust-Sensitive Scheduling on the Open Grid

Current/Future Work

• Implementation of reputation-based scheduling framework (BOINC and PL)

• Mechanisms to retain node identities (hence ri) under node churn

– “node signatures” that capture the characteristics of the node

Page 26: Trust-Sensitive Scheduling on the Open Grid

Current/Future Work (cont’d)

• Timeliness– extending reliability to encompass time– a node whose performance is highly variable is less

reliable

• Client collusion– detection: group signatures– prevention:

• combine quiz-based tasks with reputation systems• form random-groupings

Page 27: Trust-Sensitive Scheduling on the Open Grid

Thank you.