Upload
yehia-el-khatib
View
282
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Yehia El-khatib, Chris Edwards, Michael Mackay and Gareth Tyson. "Providing Grid Schedulers with Passive Network Measurements". In Proceedings of the 18th International IEEE Conference on Computer Communications and Networks: Workshop on Grid and P2P Systems and Applications (GridPeer 2009), San Francisco, CA, USA, August 2-6 2009.
Citation preview
Yehia El-khatib, Christopher Edwards, Michael Mackay, and Gareth Tyson
Computing Department, Lancaster University, UK
www.ec-gin.eu
� Motivation
� Objective
� System Overview◦ The GridMAP Service
◦ Monitoring Daemon
� Network Measurement Evaluation
� Conclusions
� Future Work
� Public networks are unpredictable.◦ Heterogeneous components
◦ Independent standards and protocols
◦ IP provides best-effort, “one size fits all” delivery.
� Potential to hinder the performance of any networked application.
Why measure the network?
networked application.
� IP networks do not readily provide feedback about their operational performance.
� Hence, numerous network monitoring tools:◦ management, troubleshooting, and/or pre- and post-deployment probation.
� Stand-alone tools are ad hoc and manual.
� Grids are dynamic systems that aggregate resources to run very demanding applications.
� High performance is always expected and hence contention on resources is similarly high.
Why measure the network in
grids?
hence contention on resources is similarly high.
� Efficient scheduling is only possible if access to correct resource information is available.
� Current middlewares and Grid Information Systems (GIS) are ineffective and cumbersome.◦ Information is insufficient and/or outdated
◦ Needs to be gathered from different publishers
� Our aim is to enable schedulers to make more informed decisions on node selection and resource allocation.
� There is a need for a means by which grid schedulers can obtain knowledge about changes in the grid.changes in the grid.◦ changes in the availability of remote computational resources
◦ changes in the network path to those resources
� This requires accurate end-to-end measurements to be provided to schedulers.
�Motivation
�Objective
� System Overview◦ The GridMAP Service
◦ Monitoring Daemon
� Network Measurement Evaluation
� Conclusions
� Future Work
� GridMAP (Grid Monitoring, Analysis and Prediction) is a distributed system which collects network performance and resource availability information.
� This information is used by GridMAP to provide, analyze and predict performance and provide, analyze and predict performance and availability.
� It is made up of:Grid Service Monitoring Daemon
� It is a grid service; i.e. a WSRF Web Service that also conforms to the OGSI standard.
� It provides a set of standard interfaces that allow convenient access for schedulers.
� The retrieved information can be � The retrieved information can be incorporated by schedulers into job and data allocation processes to automatically adapt to perceived and foreseeable resource and network status.
� A daemon runs on each grid node to measure resource and network performance.
� These measurements are sent to the GridMAP service to be indexed and stored.
Application
Scheduler
Run job xyz
GridMAP
requirements: delay, CPU, memory
NodeDelay CPU Memory
Average Predicted Average Predicted Average Predicted
Application
Scheduler
Copy file
GridMAP
NodeDelay Throughput Disk Space
Average Predicted Average Predicted Average Predicted
requirements: delay, throughput, disk space
� Measurements are accessible via one interface, from one publisher.
� Behind the interface is a distributed application:◦ distributed repository
� automatic replication
� no single point-of-failure
� ensures resilience
◦ makes it possible to afford the demanding costs of storing, indexing and analyzing the measurements
�Motivation
�Objective
� System Overview�The GridMAP Service
◦ Monitoring Daemon
� Network Measurement Evaluation
� Conclusions
� Future Work
� Why passive measurements?◦ Active techniques obligates the network to accommodate artificial traffic probes in addition to real traffic, decreasing overall performance.
� e.g. TTCP, iperf, UDPmon
◦ Passive techniques: arguably less accurate.◦ Passive techniques: arguably less accurate.
� e.g. Sting, Synack, IPTraf
◦ ICMP messaging: It is not uncommon for ICMP to be disabled or treated differently than TCP traffic.
� e.g. ping, fping, traceroute
�Best of both worlds: to avoid added network overhead without compromising accuracy.
� Why is passive measurement relevant for grid systems?◦ Grid nodes constantly exchange data sets, job state, result sets, and control signals.
◦ Most if not all grid traffic is TCP-based.
� We exploit such frequent TCP interactions to � We exploit such frequent TCP interactions to extract network metrics (RTT, throughput).
� This technique is not viable in systems other than grids, which is partly the reason why other TCP-based passive techniques are usually supplemented with active probes.
� Uses pcap to capture packet headers.◦ 3-way handshake is used to measure RTT.
◦ As connections end, throughput is calculated.
� Metrics are calculated for each flow, and stored in a local cache.
The daemon also measures availability of � The daemon also measures availability of local resources (such as CPU, memory, etc.).
� On a regular basis, these ‘performance snapshots’ are sent to the GridMAP service.
�Motivation
�Objective
�System Overview�The GridMAP Service
�Monitoring Daemon
� Network Measurement Evaluation
� Conclusions
� Future Work
� Aim: to verify the accuracy of the obtained measurements.
� Setup:◦ 5 connections of varying lengths.
◦ Trigger 34 iperf probes of different durations (1-◦ Trigger 34 iperf probes of different durations (1-500 seconds).
◦ Run the daemon on the sending node.
◦ Compare results against those of ping and iperf.
� Experiment 1: Ethernet connection
1 hop1 hop~0.57 ms~0.57 ms
� Experiment 2: Local DSL connection
4 hops4 hops~19 ms~19 ms
� Experiment 3: Lancaster → Oxford
12 hops12 hops~9 ms~9 ms~9 ms~9 ms
� Experiment 4: Lancaster → Munich
15 hops15 hops~29 ms~29 ms
� Experiment 5: Innsbruck→ Lancaster
17 hops17 hops~48 ms~48 ms
Ethernet Oxford
Munich Innsbruck
Note: During the DSL connection test, ping packets did not get through due to disabled ICMP messaging.
Ethernet DSL
Oxford
Munich Innsbruck
� On average, our measurements were:◦ 1.55% away from the minimum ping values and 2.33% away from the mean ping values
◦ 2.20% away from the iperf measurements
�Motivation
�Objective
�System Overview�The GridMAP Service
�Monitoring Daemon
�Network Measurement Evaluation
� Conclusions
� Future Work
� Daemon works in an entirely passive fashion:
◦ no disruption caused to real traffic
◦ measurements cannot be mistaken for threats such as TCP-SYN floods or DoS attacks
� Independent operation:
◦ no need for peer coordination/synchronization◦ no need for peer coordination/synchronization
◦ no reliance on router accounting schemes (e.g. IP accounting, NetFlow)
� Monitoring traffic becomes an automatic process.
� The technique is quite trivial, but provides a powerful viewpoint which results in measurements that directly reflect the experience of grid traffic.
� Development of the GridMAP grid service is ongoing.
� We will expand the range of metrics.◦ e.g. one-way delay variation is important to virtualization applications
We then plan to test our technique against � We then plan to test our technique against more active and passive measurement tools.
Yehia El-khatib yehiayehia
@ comp.lancs.ac.uk@ comp.lancs.ac.ukChristopher Edwards cece
Michael Mackay m.mackaym.mackay
Computing Department, Lancaster University, Lancaster, LA1 4WA, United Kingdom
Michael Mackay m.mackaym.mackay
Gareth Tyson g.tysong.tyson
www.ec-gin.eu