DIOS - compilers

  • View
    918

  • Download
    8

Embed Size (px)

DESCRIPTION

My DIOS presentation for compilers. This is meant more for a compiler-oriented audience

Transcript

  • 1. DIOS: Dynamic Instrumentation for (not so) Outstanding Scheduling Blake Sutton & Chris Sosa
  • 2. Motivation ON OR
  • 3. Approach: Adaptive Distributed Scheduler
    • Centralized global scheduler and distributed local services
    • Hares monitor machines for undesirable events
    • Hares also gather application-specific info with Pin
    • Rhino schedules jobs and responds to events from Hares
      • Migrate
      • Pause / Resume
      • Kill / Restart
  • 4. Pinvolvement: What it is
    • Insert new code into apps on the fly
      • No recompile
      • Operates on a copy
      • Code caching
    • Our Pintool
      • Routine-level
      • Instruction-level
    pin t mytool -- ./myprogram Borrowed from Luk et al. 2005.
  • 5. Pinvolvement: What it measures
    • No reliance on hardware-specific-performance counters
    • Want to capture memory behavior over time
    • Gathered:
      • Ratio of malloc to free calls
      • Wall-clock time to execute 10,000,000 insns
      • Number of memory ops in last 2,000,000 insns
  • 6. Evaluation
    • Distributed scheduler
      • Rhino on realitytv13, Hare on realitytv13-16
      • heatedplate with modified parameters
      • Hares detect if lower than 10% memory available and informs Rhino to take action
      • Rhino reschedules youngest job at Hare site
      • Baseline: Smallest Queues
    • Pintool
      • 2 applications from SPLASH-2
      • Heatedplate
  • 7. Results: The Good
    • Scheduler shows potential for improvement
    • Lower total runtime with simple policy
  • 8. Results: The Bad
    • Overhead from Pintool is too high to realize gains
      • Pin isnt designed for on-the-fly analysis
      • Could not reattach
      • Code caching isnt enough
    7.64 7.90 14.51 6.27 1.25 1.00 lu 5.81 6.04 7.84 2.87 1.48 1.00 ocean 7.26 7.45 5.43 2.65 1.88 1.00 heatedplate latency # mems malloc/free count only pin native application
  • 9. Results: The Interesting
    • Pintool does capture intriguing info
  • 10. Other Issues
    • Condor
      • Process migration requires re-linking
      • Doesnt support multithreaded applications
      • Other user-level process migration mechanisms have similar requirements
    • Pin
      • Unable to intersperse low and high overhead with Pintool
      • Even the smallest overhead was not negligible
      • Up to almost 2x slowdown just using Pin with heatedplate and no extra instrumentation
    • Scheduling decisions have a bigger impact for long-running jobs
  • 11. Conclusion: the Future of DIOS
    • Overhead is prohibitive (for now)
      • Pin needs to support reattach
      • Lighter instrumentation framework
    • However, instrumentation can capture aspects of application-specific behavior
    • Future Work
      • Pin as a process migration mechanism
  • 12. Preguntas?
  • 13. Waithasnt this been solved?
    • Condor
      • popular user-space distributed scheduler
      • process migration
      • tries to keep queues balanced
        • but jobs have different behavior
        • over time
        • from each other
    • LSF (Load Sharing Facility)
      • monitors system, moves processes around based on what they need
      • must input static job information (requires profiling etc beforehand)
        • what if something about your job isn't captured by your input?
        • what if you end up giving it margins that are too large? too small?
        • unnecessary inefficiencies?
        • it's not exactly hassle-free...
    • Hardware feedback
      • PAPI
      • Still not very portable (invasive kernel patch for install)
    • Wouldn't it be nice if the scheduler could just..."do the right thing"?