Upload
lester-anderson
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
A Framework for Monitoring and Measuring a Large-Scale Distributed
System in Real Time By Lei ZHAN
Aug 16th, 2013
2
Outline
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
• Background• Framework Design• Case Study• Demonstration• Future Works
Background
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
• Internet Services on Distributed Infrastructure• Content Delivery Network• P2P Systems• Data Centers• Cloud Computing Services
• Monitoring Framework• to guarantee reliable services and high quality of user
experience• monitor and manage the deployed systems.
3
Objectives
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
• Accuracy• Real-time• Visualization• Scalability
4
5
Framework Design
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
Components – End Hosts
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
• Refer to peer in P2P system, processing unit in Cloud, data center in CDN, etc.
• Deployed in a large-scale and distributed manner• Measurement Data Resources• unique id for each End Host• generate feedback message periodically
6
7
Components – Coordinator
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
• Locates between End Host and Feedback Server• Responsible for• collecting feedback messages from End Hosts• forwarding them to Feedback Servers
• Why Coordinator?• unique target for all End Hosts• making Feedback Server more flexible
8
Components – Feedback Server
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
• Locates between Coordinator & Monitoring Platform• Responsible for• aggregating feedback messages from Coordinator• responding data requests from Monitoring Platform
9
Components – Monitoring Platform
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
• Provides• measurement data processing and analysis• visualization views of data statistic for administrator
• Operates in• real-time mode: communicate with Feedback Server• static mode: read data from local log files
10
Framework Design
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
Feedback Messages
Feedback Messages
Request Aggregated Log Files
11
Case Study
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
• 2012 London Olympic Games• live broadcast through the Internet within HK
• P2P Video Streaming System• developed by ASTRI*
• adopted by i-Cable**
* The Hong Kong Applied Science and Technology Research Institute (ASTRI) was founded by the Government of Hong Kong SAR in 2000 with a mission to enhance Hong Kong’s competitiveness in technology-based industries through applied research.** i-Cable is an internet Service Provider in Hong Kong, and is now one of Hong Kong's leading integrated communications companies.
12
Real-time Monitoring
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
• Whole Period• 17 days (July 27th – Aug 12th)
• Key Metrics• system statistics
• number of new peers, total number of peers• average peer upload rate, average peer download rate • average peer contribution ratio
• system performance• peer startup delay, peer continuity• quality of experience
13
Monitoring Platform
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
• Playback in 2 Modes• Visualization• 4 different views• Map View• District View• Histogram View• Timeline View
• filtering & control• More in the Demonstration
14
Measurement Results
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
15
Demonstration
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
• Monitoring Platform• operates in static mode• the data of Aug 2nd, 2012• 4 visualization views
16
Discussion (I)
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
• Measurement Result• window based statistics
• identify End Host by its id• update records upon new feedback message• consider latest state inside the window as current
state• time window moving average method for analysis
• window size >= feedback message period• data synchronized at Feedback Servers
• avoid synchronization problem of feedback messages
17
Discussion (II)
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
• Real-time Delay• more Feedback Servers• log files at Feedback Servers • generate more frequently• compress before sending to Monitoring Platform
• Scalability• multiple Coordinators• more Feedback Servers• sampling on feedback messages
18
Future Works
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN
• Generalize for other Systems• IP Geo-location• Map View & District View• IP -> Physical Address• wired IP Geo-location
19
Q&A
Department of Information EngineeringThe Chinese University of Hong Kong
A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN