View
220
Download
1
Category
Tags:
Preview:
Citation preview
Network Weather Network Weather ServiceService
Sathish VadhiyarSathish Vadhiyar
Sources / Credits:
• NWS web site: http://nws.cs.ucsb.edu
• NWS papers
IntroductionIntroduction
““NWS provides accurate forecasts of NWS provides accurate forecasts of dynamically changing performance dynamically changing performance characteristics from a distributed set of characteristics from a distributed set of metacomputing resources”metacomputing resources”What What willwill be the be the futurefuture load (not current load (not current load) when a program is executed?load) when a program is executed?Producing short-term performance Producing short-term performance forecasts based on historical performance forecasts based on historical performance measurementsmeasurementsThe forecasts can be used by dynamic The forecasts can be used by dynamic scheduling agentsscheduling agents
IntroductionIntroduction
Resource allocation and scheduling Resource allocation and scheduling decisions must be based on decisions must be based on predictionspredictions of resource of resource performance during a timeframeperformance during a timeframe
NWS takes periodic measurements of NWS takes periodic measurements of performance and using numerical performance and using numerical models, forecasts resource models, forecasts resource performanceperformance
NWS GoalsNWS Goals
ComponentsComponents Persistent statePersistent state Name serverName server SensorsSensors
Passive (CPU availability)Passive (CPU availability)
Active (Network measurements)Active (Network measurements) ForecasterForecaster
ArchitectureArchitecture
ArchitectureArchitecture
Performance measurementsPerformance measurements
Using sensorsUsing sensorsCPU sensorsCPU sensors
Measures CPU availabilityMeasures CPU availability UsesUses
uptimeuptimevmstatvmstatActive probesActive probes
Network sensorsNetwork sensors Measures latency and bandwidthMeasures latency and bandwidth
Each host maintainsEach host maintains Current dataCurrent data One-step ahead predictionsOne-step ahead predictions Time series of dataTime series of data
Network MeasurementsNetwork Measurements
Issues with Network SensorsIssues with Network Sensors
Appropriate Appropriate transfer size for transfer size for measuring measuring throughputthroughput
Collision of network Collision of network probesprobes
SolutionsSolutions Tokens and Tokens and
hierarchical trees hierarchical trees with cliqueswith cliques
Available CPU measurementAvailable CPU measurement
Available CPU measurementAvailable CPU measurement
The formulae The formulae shown does not shown does not take into account take into account job prioritiesjob priorities
Hence periodically Hence periodically an active probe is an active probe is run to adjust the run to adjust the estimatesestimates
PredictionsPredictions
To generate a forecast, forecaster requests To generate a forecast, forecaster requests persistent state datapersistent state dataWhen a forecast is requested, forecaster makes When a forecast is requested, forecaster makes predictions for existing measurements using predictions for existing measurements using different forecast modelsdifferent forecast modelsDynamic choice of forecast models based on the Dynamic choice of forecast models based on the best Mean Absolute Error, Mean Square Prediction best Mean Absolute Error, Mean Square Prediction Error, Mean Percentage Prediction ErrorError, Mean Percentage Prediction ErrorForecasts requested by:
InitForecaster() RequestForecasts()
Forecasting methodsForecasting methods Mean-basedMean-based Median basedMedian based AutoregressiveAutoregressive
Forecasting MethodsForecasting Methods
Notations:
Prediction Accuracy:
Mean Absolute Error (MAE) is the average of the above
Prediction Method:
Forecasting Methods – Mean-Forecasting Methods – Mean-basedbased
1.
2.
3.
Forecasting Methods – Mean-Forecasting Methods – Mean-basedbased
4.
5.
Forecasting Methods – Median-Forecasting Methods – Median-basedbased
1.
2.
3.
AutoregressionAutoregression1.
ai found such that it minimizes the overall error.
ri ,j is the autocorellation function for the series of N measurements.
Forecasting MethodologyForecasting Methodology
Forecast ResultsForecast Results
Forecasting Complexity vs Forecasting Complexity vs AccuracyAccuracy
•Semi Non-parametric Time Series Analysis (SNP) – an accurate but complicated model
•Model fit using iterative search
•Calculation of conditional expected value using conditional probability density
Sensor ControlSensor Control
Each sensor connects to Each sensor connects to other sensors and other sensors and perform measurements perform measurements O(NO(N22))To reduce the time To reduce the time complexity, sensors complexity, sensors organized in hierarchy organized in hierarchy called cliquescalled cliquesTo avoid collisions, To avoid collisions, tokens are usedtokens are usedAdaptive control using Adaptive control using adaptive token timeoutsadaptive token timeoutsAdaptive time-out Adaptive time-out discovery and distributed discovery and distributed leader election protocolleader election protocol
Synchronizing network probesSynchronizing network probes
Consistent periodicity and Consistent periodicity and mutual exclusionmutual exclusionTokenToken
List of hosts to probeList of hosts to probe Periodicity of probePeriodicity of probe Parameters to the probeParameters to the probe Sequence numberSequence number
Leader initiates the tokenLeader initiates the tokenA hosts after receiving a A hosts after receiving a token:token:
Conducts probes with the Conducts probes with the other hosts in the tokenother hosts in the token
Passes the token to the Passes the token to the next hostnext host
Token passed back to the Token passed back to the leaderleader
Contd…Contd…
Leader notes the token circuit time and calculates Leader notes the token circuit time and calculates the next token initiation time as (desired the next token initiation time as (desired periodicity – token circuit time)periodicity – token circuit time)To avoid long delays in token circulation and to To avoid long delays in token circulation and to have fault tolerance:have fault tolerance:
Each host maintains a timerEach host maintains a timer When the timer times out, the host declares itself as the When the timer times out, the host declares itself as the
leader and initiates a new tokenleader and initiates a new token When a host encounters two tokens, the old token is When a host encounters two tokens, the old token is
destroyeddestroyed
Calculation of time-outsCalculation of time-outs Each host records token circuit time, variance of the Each host records token circuit time, variance of the
timetime Uses NWS forecasting models to predict the next token Uses NWS forecasting models to predict the next token
arrival timearrival time
New ProtocolNew Protocol
Compromise between periodicity and Compromise between periodicity and mutual exclusionmutual exclusionNWS administrator specifies periodicity, NWS administrator specifies periodicity, and an upper range of desired periodicityand an upper range of desired periodicity If network conditions are stable and if tokens If network conditions are stable and if tokens
are received within the upper range, then are received within the upper range, then mutual exclusion is guaranteedmutual exclusion is guaranteed
If not, hosts times out and start conducting If not, hosts times out and start conducting probes with possible collisionsprobes with possible collisions
Thus the protocol switches between good Thus the protocol switches between good and bad phasesand bad phases
IllustrationIllustration
Comparison of 2 protocols – Comparison of 2 protocols – Experimental setupExperimental setup
4 machines – 2 in Lyon, France and 2 4 machines – 2 in Lyon, France and 2 in Tennessee, USAin Tennessee, USA
240 second periodicity240 second periodicity
5 second range5 second range
Comparison - PeriodicityComparison - Periodicity
Comparison – Mutual exclusionComparison – Mutual exclusion
Use of NWS: Use of NWS: Scheduling a Jacobi applicationScheduling a Jacobi application
The problem: Appropriate partitioning strategy to balance processor efficiencies and communication overheads, i.e. deriving partitions to obtain resource performance
Deriving Partitions for JacobiDeriving Partitions for Jacobi
NotationsNotations
Per-processor execution timePer-processor execution time
The goalThe goal
Deriving Partitions for JacobiDeriving Partitions for Jacobi
Communication timeCommunication time
Soultion: system of linear equations by Gaussian Soultion: system of linear equations by Gaussian EliminationElimination
NWS in JacobiNWS in Jacobi
Resource Selection and SchedulingResource Selection and Scheduling
Resource Selection and SchedulingResource Selection and Scheduling
ReferencesReferences
Implementing a Performance Forecasting System for Implementing a Performance Forecasting System for Metacomputing: The Network Weather Service. Rich Metacomputing: The Network Weather Service. Rich Wolski, Neil Spring, Chris Peterson, in Proceedings of Wolski, Neil Spring, Chris Peterson, in Proceedings of SC97, November, 1997.SC97, November, 1997.Dynamically Forecasting Network Performance Using Dynamically Forecasting Network Performance Using the Network Weather Service. Rich Wolski, in Journal of the Network Weather Service. Rich Wolski, in Journal of Cluster Computing, Volume 1, pp. 119-132, January, Cluster Computing, Volume 1, pp. 119-132, January, 1998.1998.The Network Weather Service: A Distributed Resource The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Performance Forecasting Service for Metacomputing. Rich Wolski, Neil Spring, and Jim Hayes, Journal of Rich Wolski, Neil Spring, and Jim Hayes, Journal of Future Generation Computing Systems,Volume 15, Future Generation Computing Systems,Volume 15, Numbers 5-6, pp. 757-768, October, 1999.Numbers 5-6, pp. 757-768, October, 1999.
ReferencesReferences
Synchronizing Network Probes to avoid Synchronizing Network Probes to avoid Measurement Intrusiveness with the Network Measurement Intrusiveness with the Network Weather Service, B. Gaidioz, R. Wolski, and B. Weather Service, B. Gaidioz, R. Wolski, and B. Tourancheau, Proceedings of 9th IEEE High-Tourancheau, Proceedings of 9th IEEE High-performance Distributed Computing Conference, performance Distributed Computing Conference, August, 2000, pp. 147-154.August, 2000, pp. 147-154.Experiences with Predicting Resource Experiences with Predicting Resource Performance On-line in Computational Grid Performance On-line in Computational Grid Settings, Rich Wolski, ACM SIGMETRICS Settings, Rich Wolski, ACM SIGMETRICS Performance Evaluation Review, Volume 30, Performance Evaluation Review, Volume 30, Number 4, pp 41--49, March, 2003. Number 4, pp 41--49, March, 2003.
Forecasting Methods SummaryForecasting Methods Summary
Prediction AccuracyPrediction Accuracy
Recommended