19
Department of Information Engineering The Chinese University of Hong A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real Time By Lei ZHAN A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real Time By Lei ZHAN Aug 16 th , 2013

Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

Embed Size (px)

Citation preview

Page 1: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

A Framework for Monitoring and Measuring a Large-Scale Distributed

System in Real Time By Lei ZHAN

Aug 16th, 2013

Page 2: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

2

Outline

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

• Background• Framework Design• Case Study• Demonstration• Future Works

Page 3: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

Background

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

• Internet Services on Distributed Infrastructure• Content Delivery Network• P2P Systems• Data Centers• Cloud Computing Services

• Monitoring Framework• to guarantee reliable services and high quality of user

experience• monitor and manage the deployed systems.

3

Page 4: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

Objectives

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

• Accuracy• Real-time• Visualization• Scalability

4

Page 5: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

5

Framework Design

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

Page 6: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

Components – End Hosts

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

• Refer to peer in P2P system, processing unit in Cloud, data center in CDN, etc.

• Deployed in a large-scale and distributed manner• Measurement Data Resources• unique id for each End Host• generate feedback message periodically

6

Page 7: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

7

Components – Coordinator

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

• Locates between End Host and Feedback Server• Responsible for• collecting feedback messages from End Hosts• forwarding them to Feedback Servers

• Why Coordinator?• unique target for all End Hosts• making Feedback Server more flexible

Page 8: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

8

Components – Feedback Server

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

• Locates between Coordinator & Monitoring Platform• Responsible for• aggregating feedback messages from Coordinator• responding data requests from Monitoring Platform

Page 9: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

9

Components – Monitoring Platform

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

• Provides• measurement data processing and analysis• visualization views of data statistic for administrator

• Operates in• real-time mode: communicate with Feedback Server• static mode: read data from local log files

Page 10: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

10

Framework Design

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

Feedback Messages

Feedback Messages

Request Aggregated Log Files

Page 11: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

11

Case Study

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

• 2012 London Olympic Games• live broadcast through the Internet within HK

• P2P Video Streaming System• developed by ASTRI*

• adopted by i-Cable**

* The Hong Kong Applied Science and Technology Research Institute (ASTRI) was founded by the Government of Hong Kong SAR in 2000 with a mission to enhance Hong Kong’s competitiveness in technology-based industries through applied research.** i-Cable is an internet Service Provider in Hong Kong, and is now one of Hong Kong's leading integrated communications companies.

Page 12: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

12

Real-time Monitoring

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

• Whole Period• 17 days (July 27th – Aug 12th)

• Key Metrics• system statistics

• number of new peers, total number of peers• average peer upload rate, average peer download rate • average peer contribution ratio

• system performance• peer startup delay, peer continuity• quality of experience

Page 13: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

13

Monitoring Platform

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

• Playback in 2 Modes• Visualization• 4 different views• Map View• District View• Histogram View• Timeline View

• filtering & control• More in the Demonstration

Page 14: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

14

Measurement Results

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

Page 15: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

15

Demonstration

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

• Monitoring Platform• operates in static mode• the data of Aug 2nd, 2012• 4 visualization views

Page 16: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

16

Discussion (I)

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

• Measurement Result• window based statistics

• identify End Host by its id• update records upon new feedback message• consider latest state inside the window as current

state• time window moving average method for analysis

• window size >= feedback message period• data synchronized at Feedback Servers

• avoid synchronization problem of feedback messages

Page 17: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

17

Discussion (II)

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

• Real-time Delay• more Feedback Servers• log files at Feedback Servers • generate more frequently• compress before sending to Monitoring Platform

• Scalability• multiple Coordinators• more Feedback Servers• sampling on feedback messages

Page 18: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

18

Future Works

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN

• Generalize for other Systems• IP Geo-location• Map View & District View• IP -> Physical Address• wired IP Geo-location

Page 19: Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in

19

Q&A

Department of Information EngineeringThe Chinese University of Hong Kong

A Framework for Monitoring and Measuring a Large-Scale Distributed System in Real TimeBy Lei ZHAN