Workloads 02 Tutorial

  • View
    3

  • Download
    0

Embed Size (px)

DESCRIPTION

workload modelling of performance testing

Text of Workloads 02 Tutorial

  • Workload Modeling
    and its Effect on
    Performance Evaluation

    Dror Feitelson

    Hebrew University

    Thanks toparticipants and progam committee; thanks to Monien; abuse hospitality talk about agenda

  • Performance Evaluation

    In system designSelection of algorithmsSetting parameter valuesIn procurement decisionsValue for moneyMeet usage goalsFor capacity planing

    Important and basic activity

  • The Good Old Days

    The skies were blueThe simulation results were conclusiveOur scheme was better than theirs

    Feitelson & Jette, JSSPP 1997

    Focus on system design. Widely different designs lead to conclusive results.

  • But in their papers,

    Their scheme was better than ours!

    But literature is full of contradictory results.

  • How could they be so wrong?

    Leads to question of what is the cause for contradictions.

  • Performance evaluation depends on:

    The systems design

    (What we teach in algorithms and data structures)

    Its implementation

    (What we teach in programming courses)

    The workload to which it is subjectedThe metric used in the evaluationInteractions between these factors

    Next: our focus is the workloads.

  • Performance evaluation depends on:

    The systems design

    (What we teach in algorithms and data structures)

    Its implementation

    (What we teach in programming courses)

    The workload to which it is subjectedThe metric used in the evaluationInteractions between these factors
  • Outline for Today

    Three examples of how workloads affect performance evaluationWorkload modelingGetting dataFitting, correlations, stationarityHeavy tails, self similarityResearch agenda

    In the context of parallel job scheduling

    Job scheduling, not task scheduling

  • Example #1

    Gang Scheduling and

    Job Size Distribution

  • Gang What?!?

    Time slicing parallel jobs with coordinated context switching

    Ousterhout

    matrix

    Ousterhout, ICDCS 1982

  • Gang What?!?

    Time slicing parallel jobs with coordinated context switching

    Ousterhout

    matrix

    Optimization:

    Alternative

    scheduling

    Ousterhout, ICDCS 1982

  • Packing Jobs

    Use a buddy system for allocating processors

    Feitelson & Rudolph, Computer 1990

  • Packing Jobs

    Use a buddy system for allocating processors

    Start with full system in one block

  • Packing Jobs

    Use a buddy system for allocating processors

    To allocate repeatedly partition in two to get desired size

  • Packing Jobs

    Use a buddy system for allocating processors

  • Packing Jobs

    Use a buddy system for allocating processors

    Or use existing partition

  • The Question:

    The buddy system leads to internal fragmentationBut it also improves the chances of alternative scheduling, because processors are allocated in predefined groups

    Which effect dominates the other?

  • The Answer (part 1):

    Feitelson & Rudolph, JPDC 1996

    Answer as function of workload, but not full answer because workload unknown. Dashed lines: provable bounds.

  • The Answer (part 2):

    Note logarithmic Y axis

  • The Answer (part 2):

  • The Answer (part 2):

  • The Answer (part 2):

    Many small jobsMany sequential jobsMany power of two jobsPractically no jobs use full machine

    Conclusion: buddy system should work well

  • Verification

    Feitelson, JSSPP 1996

    Using Feitelson workload

  • Example #2

    Parallel Job Scheduling

    and Job Scaling

  • Variable Partitioning

    Each job gets a dedicated partition for the duration of its executionResembles 2D bin packingPacking large jobs first should lead to better performanceBut what about correlation of size and runtime?

    First-fit decreasing is optimal

  • Scaling Models

    Constant workParallelism for speedup: Amdahls LawLarge first SJFConstant timeSize and runtime are uncorrelatedMemory boundLarge first LJFFull-size jobs lead to blockout

    Worley, SIAM JSSC 1990

    Question is which model applies within the context of a single machine

  • Scan Algorithm

    Keep jobs in separate queues according to size (sizes are powers of 2)Serve the queues Round Robin, scheduling all jobs from each queue (they pack perfectly)Assuming constant work model, large jobs only block the machine for a short timeBut the memory bound model would lead to excessive queueing of small jobs

    Krueger et al., IEEE TPDS 1994

    Important point: schedule order determined by size

  • The Data

  • The Data

  • The Data

  • The Data

    Data: SDSC Paragon, 1995/6

  • The Data

    Data: SDSC Paragon, 1995/6

    Partitions with equal numbers of jobs; many more small jobs.

  • The Data

    Data: SDSC Paragon, 1995/6

    Similar range, different shape; 80th percentile moves from

  • Conclusion

    Parallelism used for better results, not for faster resultsConstant work model is unrealisticMemory bound model is reasonableScan algorithm will probably not perform well in practice
  • Example #3

    Backfilling and

    User Runtime Estimation

  • Backfilling

    Variable partitioning can suffer from external fragmentationBackfilling optimization: move jobs forward to fill in holes in the scheduleRequires knowledge of expected job runtimes
  • Variants

    EASY backfilling

    Make reservation for first queued job

    Conservative backfilling

    Make reservation for all queued jobs

  • User Runtime Estimates

    Lower estimates improve chance of backfilling and better response timeToo low estimates run the risk of having the job killedSo estimates should be accurate, right?
  • They Arent

    Mualem & Feitelson, IEEE TPDS 2001

    Short=failed; killed typically exceeded runtime estimate, ~15%

  • Surprising Consequences

    Inaccurate estimates actually lead to improved performancePerformance evaluation results may depend on the accuracy of runtime estimatesExample: EASY vs. conservativeUsing different workloadsAnd different metrics

    Will focus on second bullet

  • EASY vs. Conservative

    Using CTC SP2 workload

  • EASY vs. Conservative

    Using Jann workload model

    Note: jann model of CTC

  • EASY vs. Conservative

    Using Feitelson workload model

  • Conflicting Results Explained

    Jann uses accurate runtime estimatesThis leads to a tighter scheduleEASY is not affected too muchConservative manages less backfilling of long jobs, because respects more reservations

    Relative measure: more by EASY = less by conservative

  • Conservative is bad for the long jobs
    Good for short ones that are respected


    Conservative



    EASY

  • Conflicting Results Explained

    Response time sensitive to long jobs, which favor EASYSlowdown sensitive to short jobs, which favor conservativeAll this does not happen at CTC, because estimates are so loose that backfill can occur even under conservative
  • Verification

    Run CTC workload with accurate estimates

  • But What About My Model?

    Simply does not have such small long jobs

  • Workload Data Sources

  • No Data

    Innovative unprecedented systemsWirelessHand-heldUse an educated guessSelf similarityHeavy tailsZipf distribution
  • Serendipitous Data

    Data may be collected for various reasonsAccounting logsAudit logsDebugging logsJust-so logsCan lead to wealth of information
  • NASA Ames iPSC/860 log

    42050 jobs from Oct-Dec 1993

    user job nodes runtime date time

    user4 cmd8 32 70 11/10/93 10:13:17

    user4 cmd8 32 70 11/10/93 10:19:30

    user42 nqs450 32 3300 11/10/93 10:22:07

    user41 cmd342 4 54 11/10/93 10:22:37

    sysadmin pwd 1 6 11/10/93 10:22:42

    user4 cmd8 32 60 11/10/93 10:25:42

    sysadmin pwd 1 3 11/10/93 10:30:43

    user41 cmd342 4 126 11/10/93 10:31:32

    Feitelson & Nitzberg, JSSPP 1995

  • Distribution of Job Sizes

  • Distribution of Job Sizes

  • Distribution of Resource Use

  • Distribution of Resource Use

  • Degree of Multiprogramming

  • System Utilization

  • Job Arrivals

  • Arriving Job Sizes

  • Distribution of Interarrival Times

  • Distribution of Runtimes

  • User Activity

  • Repeated Execution

  • Application Moldability

    Of jobs run more than once

  • Distribution of Run Lengths

  • Predictability in Repeated Runs

    For jobs run more than 5 times

  • Recurring Findings

    Many small and serial jobsMany power-of-two jobsWeak correlation of job size and durationJob runtimes are bounded but have CV>1Inaccurate user runtime estimatesNon-stationary arrivals (daily/weekly cycle)Power-law user activity, run lengths
  • Instrumentation

    Passive: snoop without interferingActive: modify the systemCollecting the data interferes with system behaviorSaving or downloading the data causes additional interferencePartial solution: model the interference
  • Data Sanitation

    Strange things happenLeaving them in is safe and faithful to the real dataBut it risks situations in which a non-representative situation dominates the evaluation results
  • Arrivals to SDSC SP2

  • Arrivals to LANL CM-5

  • Arrivals to CTC SP2