20
What is the Cost of Determinism? Cedomir Segulja, Tarek S. Abdelrahman University of Toronto

What is the Cost of Determinism?

  • Upload
    keren

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

What is the Cost of Determinism?. Cedomir Segulja, Tarek S. Abdelrahman University of Toronto. Source: [ Intel ]. Source: [ Youtube ]. Non-Determinism. Same program + same input ≠ same output This is bad for … Testing Too many interleaving to test Debugging - PowerPoint PPT Presentation

Citation preview

What is the Cost of Determinism?

What is the Cost of Determinism?Cedomir Segulja, Tarek S. AbdelrahmanUniversity of Toronto1

Source: [Intel]Source: [Youtube]2Same program + same input same output This is bad for TestingToo many interleaving to testDebuggingHard to debug when behavior is not repeatableSellingCAD tools users expect each run to produce the same circuit

Non-Determinism3DeterminismIs good, but costly

What is the fundamental cost of determinism?What is this cost across various execution environments?Determinism in the field

Deterministic SchedulersMaximum SlowdownDMP [Devietti et al. 2009]1.7xKendo [Olszewski et al. 2009]1.6xGrace [Berger et al. 2009]3.6xCoreDet [Bergan et al. 2010]10xCalvin [Hower et al. 2011]1.7xRCDC [Devietti et al. 2011]1.7xDthreads [Liu et al. 2011]4xConversion [Merrield and Eriksson 2013]5xParrot [Cui et al. 2013]3.8xRFDet [Lu et al. 2014]2.6xSource: [Bergan et al. 2011] and the respective papers*Only to show that determinism comes at a cost, and not to be used for a direct comparison (different features, benchmarks, # threads, etc.)124What is Determinism?Property that requires observing the same output whenever program runs with the same input SyncOrder determinism [Lu and Scott 11]Require the same program result and same order of synchronization More flexible than internal determinismStill greatly eases testing [Cui et al. 13]We assume data-race-freedomDeterminism during debugging is neededBut the cost of determinism matters the most in productionAll data races are bugs [Boehm 2008, S. Adve 2010, Marino et al. 2010, Lucia et al. 2010, ] Data races in general do not help performance [Boehm 12]

ExternalSyncOrderInternal5

What is the impact of enforcing a fixed synchronization order on program execution time?6Schedule-Record-Replay Frameworkapplicationserialdynamic-Around-robindynamic-Shybridschedulerrecorder applicationreplayerschedulethread1thread2idlearchitecturessmall perturbationsbackground processesDVFSperturberNUMA12

7ReplayerForce threads to wait only when absolutely necessary under the scheduleAnd do so with as little overhead as possible

Non-deterministic execution vs. Non-deterministic execution with the replayers overhead8SchedulesDeterministic SchedulersScheduleGrace [Berger et al. 2009]serialDthreads [Liu et al. 2011]round-robinConversion [Merrield and Eriksson 2013]round-robinParrot [Cui et al. 2013]round-robinKendo [Olszewski et al. 2009]dynamicRCDC [Devietti et al. 2011]dynamicRFDet [Lu et al. 2014]dynamicDMP [Devietti et al. 2009]hybridCoreDet [Bergan et al. 2010]hybridCalvin [Hower et al. 2011]hybridWhen does a thread pass its turn?At the end serialAfter each synchronization operation round-robinAfter each instruction/store dynamic-A/dynamic-SAfter N instructions hybridN = 100,000No reduced serial mode

9Platform10Benchmarksserialround-robindynamic-Sdynamic-Ahybridsplashbarnes1.100.980.950.960.99cholesky3.392.391.071.051.10fft4.361.021.011.011.02fmm6.341.331.161.131.19lu_cb1.001.001.001.001.00lu_ncb1.000.991.011.001.00ocean_cp1.001.001.001.001.00ocean_ncp1.001.001.001.001.00radiosity7.583.041.091.082.67radix1.001.001.001.001.00raytrace7.722.931.081.021.88volrend6.121.911.081.021.67water_nsquared1.001.001.001.001.00water_spatial1.001.001.001.001.00parsecblackscholes1.001.001.001.001.00bodytrack5.871.041.051.051.05dedup5.041.771.631.331.34facesim6.191.001.001.001.00ferret6.193.191.581.231.25fluidanimate1.810.990.990.970.97raytrace7.261.521.061.011.01streamcluster1.001.001.001.001.00swaptions1.001.001.001.001.00vips7.615.271.311.061.05average slowdown3.611.601.091.041.17maximum slowdown7.725.271.631.332.6711For this set of benchmarks and our platform, and implementation overhead set aside, the fundamental cost of determinism is small.12What is the performance cost of insisting on the same schedule across different environments?13Schedule-Record-Perturb-Replay Frameworkapplicationserialdynamic-Around-robindynamic-Shybridschedulerrecorder applicationreplayerschedulethread1thread2idlearchitecturessmall perturbationsbackground processesDVFSperturberNUMA1214PerturberSmall perturbations (context switches, thread migrations, page faults)Simulate first order effects by inserting small delays (s and ms) Background processesSpawn additional threads and control their work to sleep ratioDynamic voltage and frequency scaling (DVFS)Use Linuxs cpufreq system to explore different DVFS policiesNon-uniform memory access (NUMA)Spread threads over two NUMA nodesAsymmetric architecturesUse DVFS to create asymmetry [Shelepov et al. 2009]15Metric16BenchmarksQuietSmall perturbationsBackgroud proc.DVFSNUMAAsym. Arch.balancedunbalancedbalancedunbalancedautomanual4/41/7splashbarnes0.960.950.960.960.970.920.960.910.940.96cholesky1.051.051.051.061.251.061.021.081.031.09fft1.011.011.021.071.021.011.011.011.001.01fmm1.131.131.131.191.241.131.131.141.150.97lu_cb1.001.001.000.991.031.001.001.001.000.98lu_ncb1.001.011.011.011.030.971.011.030.990.98ocean_cp1.001.001.001.001.011.001.001.001.001.01ocean_ncp1.001.001.001.001.001.001.001.001.001.00radiosity1.081.071.081.191.941.131.071.111.461.71radix1.001.001.001.001.001.001.001.001.001.00raytrace1.021.031.031.141.921.081.021.031.441.69volrend1.021.031.031.081.191.061.021.031.381.55water_nsquared1.001.001.001.001.001.001.001.001.020.97water_spatial1.001.001.011.001.081.001.001.001.001.03parsecblackscholes1.001.001.001.011.001.001.001.001.001.00bodytrack1.051.041.061.051.511.041.051.031.331.56dedup1.331.331.331.351.311.291.331.321.641.31facesim1.001.001.011.001.000.991.001.001.001.00ferret1.231.241.251.191.291.211.251.151.371.10fluidanimate0.970.970.970.970.980.970.970.970.981.01raytrace1.011.001.011.071.771.051.011.021.391.63streamcluster1.001.001.001.011.001.001.001.001.001.00swaptions1.001.001.001.001.001.001.001.001.001.00vips1.061.061.071.091.091.151.061.061.431.53avg. slowdown1.041.041.041.061.191.041.041.041.151.17max. slowdown1.331.331.331.351.941.291.331.321.641.7117 Insisting on the same schedule in the presence of skewed conditions can slow down execution by a factor of almost 2x.18ConclusionsEmployed the schedule-record-replay framework to divorce implementation overhead from the fundamental cost of enforcing deterministic executionFundamental cost of determinism is small (4% on avg., 33 % max.)There is room for lowering overheads in current deterministic systemsMeasured this fundamental cost across a range of execution environmentsThe cost of raises to almost 2x when threads face skewed conditionsDo we need a more relaxed definition of determinism?Quantified various sources of non-determinismDeterministic logical clocks are not deterministic (not only due to the performance counters imperfections [Weaver et al. 2013])

19Thank you!