Upload
caron
View
50
Download
0
Tags:
Embed Size (px)
DESCRIPTION
A High-Fidelity Temperature Distribution Forecasting System for Data Centers. Guoliang Xing Assistant Professor Department of Computer Science and Engineering Michigan State University. Cyber-Physical Systems. - PowerPoint PPT Presentation
Citation preview
A High-Fidelity Temperature Distribution Forecasting System for
Data Centers
Guoliang Xing
Assistant ProfessorDepartment of Computer Science and Engineering
Michigan State University
Cyber-Physical Systems
• “Cyber-physical systems are engineered systems that are built from and depend upon the synergy of computational and physical components”1
• Many critical application domains– Medical, auto, energy, transportation…
• # 1 national priority for Networking and IT Research and Development (NITRD)
– NITRD Review report by President's Council of Advisors on Science and Technology (PCAST) titled “Leadership Under Challenge: Information Technology R&D in a Competitive World”, 2007
1 NSF Cyber-physical systems solicitation135022
Our CPS Projects
• Data center thermal monitoring• Real-time volcano monitoring• Aquatic process profiling
Robotic fish, Smart Microsystems Lab, MSU
Tungurahua Volcano, Ecuador
Volcano Monitoring Sensors
Data Center Monitoring, HPCC, MSU
Harmful Algae Bloom in Lake Mendota in Wisconsin, 1999
3
Outline
• Data center thermal monitoring– Background – System design– Testbed evaluation
• Real-time volcano monitoring• Barcode streaming for smartphones
4
Motivation
• Data centers are critical computing infrastructure– 509,147 data centers world wide, 285 million sq. ft.1 – 2.8M hours of downtime, 142 billions direct loss/year1
• 23% server outages are heat-induced shutdowns
An aerial view of EMC's new data center in Durham, North Carolina2 An EMC data center 2
1Emerson Network Power, State of the Data Centers 2011, 2http://www.datacenterknowledge.com/archives/2011/09/15/emc-opens-new-cloud-data-center-in-nc/. 5
Motivation
• Many data centers are overcooled– Low AC set-points, high server fan speeds– Excessive cooling energy
• up to 50% or more of total power consumption
• Rapid increase of energy use in data centers– From 2005 to 2010, electricity use in data centers grew 36%
(US) and 56% (world wide)1
– An estimated 2% of electricity budget of US1
1Jonathan G. Koomey, “Grouth in data center electricity use 2005 to 2010”, Analytics Press, 2011. 6
Temperature Forecasting
• Predict server temperature evolution– Identify potential hot spots– Enable high CRAC set-points for energy saving
• Temperature at inlets/outlets indicates hotspots
Inlets Outlets 7
cool air hot air
Requirements
• High-fidelity Prediction– 1 oC prediction error
– Long prediction horizons (e.g., 10 minutes)
– Coverage: normal conditions & emergencies (e.g., AC failures)
• Timeliness and low overhead– Real-time online prediction
– Decouple from infrastructure in data center
8
Challenges
• Complex air and thermal dynamics
• Highly dynamic workloads
• Physical failures – ACs, servers, fans
Row 1
Row 2
Raised-floor cold air
Server exhaust
12-day CPU utilization data of one rack (64 servers with 512 CPU cores) in High Performance Computer Center at Michigan State University
9
Related Work
• Data-driven prediction approach– Collect in situ sensor data– Construct prediction model (parameter learning)
• Regression, neural networks, etc.
– Real-time prediction
• Limitation– Require extensive training
• Rare but critical physical failures in data centers?
10
Related Work• Computational Fluid Dynamics (CFD) modeling
– Spatially discretized geometry model– Iteratively solve partial differential
equations
• Limitation– Inaccuracy, high compute complexity
error
11
System Architecture• CFD + Wireless Sensing + Data-driven Prediction
– Preserve realistic physical characteristics in training data– Capture dynamics by in situ sensing and real-time prediction
Data Center
Calibration
Sensing(CPU, fan speed, temperature, airflow)
Real-time Prediction
CFD Modeling
geometric model (server/rack dimension and placement)
12
Thermal Sensing
Inlet / Outlet Temperature
Sensing
Air velocity CRAC Temp
CPU utilization
Fan speed
Temperature
Airflow velocity
LAN
13
CFD Modeling & Calibration
Data Center
Calibration
Sensing(CPU, fan speed, temperature, airflow)
Real-time Prediction
CFD Modeling
14
CFD Modeling & Calibration
Polynomial Calibration
Physical Geometry
Model
CFD Modeling
Steady/Transient CFD
𝑦 𝑖=∑𝑘=1
𝐾
𝑎𝑖 ,𝑘 ⋅𝑥 𝑖𝑘
Steady
Sensor DataCalibration coefficients
Temperature from CFD
Calibration order
Training: sensor readingRuntime: calibrated temperature
t t+3 min t+6 minTransient
15
Real-time Prediction
Data Center
Calibration
Sensing(CPU, fan speed, temperature, airflow)
Real-time Prediction
CFD Modeling
16
Real-time Prediction
• Thermal variable vector – t : server inlet/outlet temperature– c : CRAC supply air temperature– v : CRAC airflow– u : CPU utilization– s : Server fan speed
– R : The amount of historical data
• Prediction with k –step horizon
– : Linear regression parameter matrix
• Least-squared based training
�̂� (𝑡+𝑘 )=𝐀𝑘𝐗 (𝑡 )
Real-time Prediction
Training
Linear Prediction
Model
Prediction
17
Single-rack Experiment
• Testbed configuration– 30 temperature sensors
• Telosb, Iris
– 2 airflow sensors• AccuSense F333
– 15 servers• Dell PowerEdge 850 • Western Scientific
• Controlled CPU utilization
Temperature sensor
Ceiling vent airflow sensor Insulation
Temperature sensor
Airflow sensor AC inlet 18
Experiment Results
• Multi-horizon prediction
• CFD-assisted prediction
Error increases with horizon
RMSE (𝐭𝑖)=√∑𝑡=1
𝑇
[𝐭𝑖 (𝑡 )− �̂�𝑖 (𝑡 ) ]2
𝑇
19
Production Data Center Experiment
• Testbed configuration– 5 racks, 229 servers, 2016 cores– 4 in-row CRAC units– 35 temperature sensors– 4 airflow sensors
• Dynamic CPU utilization
Airflow sensor
Temperature sensor
Chained Temp. sensor
In-row CRACs
In-row CRACs
20
Experiment Results
• Long-term experiment (12 days)Outlet
Inlet
21
Outline
• Data center thermal monitoring• Real-time volcano monitoring
– Background – Quality-driven earthquake detection– Deployment and evaluation
• Barcode streaming for smartphones
22
Volcano Hazards
• 7% world population live near active volcanoes• 20 - 30 explosive eruptions/year
Eruption in Chile, 6/4, 2011$68 M instant damage, $2.4 B future relief.www.boston.com/bigpicture/2011/06/volcano_erupts_in_chile.html
23
Eruptions in Iceland 2010A week-long airspace closure[Wikipedia]
Volcano Monitoring• Traditional seismometer
– Expensive (~ $10K), bulky, difficult to install, up to a dozen of nodes for most active volcanoes!
• Data collection and retrieval– ~10G data in a month
• Processing– Detection, timing, localization– 4D Tomography computation
• Real-time, 3D fluid dynamics of a volcano conduit system
– Extremely computation-intensive
24
VolcanoSRI Project
• Large-scale, long-term deployment– Up to 500 nodes on an active volcano in Ecuador– Sampling@100Hz, several month lifetime
• Collaborative in-network processing– Detection, timing, localization– 4D tomography computation
The tentative deployment map at Ecuador (Photo credits: Prof. Jonathan Lees) 25
26
Challenge 1: Spatial Diversity
• Complicated physical process– Highly dynamic magnitude– Dynamic source location
Two earthquakes on Mt St Helens
27
Challenge 2: Frequency Diversity
• Responsive to P-wave within [1 Hz, 10 Hz]• Freq. spectrum changes with signal magnitude
[1 Hz, 5 Hz] [5 Hz, 10 Hz]Signal energy: X 10000 X 100
Approach Overview
• Select sensors with best signal qualities– FFT (computation-intensive)
• Local detection• Decision fusion
sensor selectiondecision fusion
system decision
FFTFFT
FFTseismic sensor
‘1’
‘0’
‘1’
28
avoid raw data transmission
Smartphone-based Node
SeismometerGeospace Geophone
model GS-11D
LG GT540Android 1.6
IOIO boardAmplifier
External GPS
GPS antenna
29
Field Deployment• First deployment on Tungurahua, Ecuador
– Six nodes, one week, 8/2012
30
Results
19 days
3.9 months
5% detect prob.
Signal collected by permanent seismometer
Signal collected by our node
• Centralized processing– Data collection w/ compression
• STA/LTA– Heuristic seismic detection algorithm
• Weighted decision fusion– No sensor selection 31
Outline
• Data center thermal monitoring• Real-time volcano monitoring• Barcode streaming for smartphones
– Background – Barcode streaming– Implementation and evaluation
32
Barcode-based Communication
• Wireless payment- Preserve security and privacy
33
• Advertisement- Broadcast brochures, coupons and
maps (e.g., retail stores, museums)
• Data exchange- Transfer small piece of info btw
smartphones (e.g., contacts, photos)
PayPal inStore App
34
Existing 2D Barcodes
QR code [1]
Low capacity (typically 50 chars)
HCCB [2]
(High Capacity Color Barcode)
High decoding overhead
[1] I. 18004:2006. Automatic identification and data capture techniques - QR code 2005 bar code symbology specification.[2] D. Parikh and G. Jancke. Localization and segmentation of a 2d high capacity color barcode. In Applications of Computer Vision, 2008.
Not suitable for high-rate streaming
35
COBRA Barcode Design
• High capacity & fast decoding rate
• Smart frame– Corner Tracker– Timing Reference Blocks
• Code area– Blocks with 4 orthogonal colors
Single barcode capacity up to 20 Kbits (4 inch)
p
Typical received barcode imageOriginal barcodeDistorted barcode image
• Poor image quality- Low quality camera- Small size and low
resolution screen- Relative movement
Challenges
• Perspective distortion
Severe blur in captured images
• Limited computation resource- Need to capture and process up to 30 images per second
36
37
Blur-aware Color Ordering
Blur usually occurs along the border of blocks with different colors
Typical barcode image captured by smartphone camera
Blur-aware Color Ordering
38 Color ordering
Goal: Group blocks with same color to reduce border length.
Implementation & Evaluation
39
• Implementation- Android 2.3.3 Gingerbread- Sender: 56 KB storage, 5MB RAM, Nexus S (4 inch screen, 800x480)- Receiver: 72 KB storage, 3.5~12MB RAM, HTC Inspire (8MP camera)
• 200Kbps throughput under various settings- Block size- View angle, alignment
- Screen refreshing rate, camera resolution- Mobility, distance- Ambient lighting, screen brightness
Nexus S HTC Inspire
Future Work
• Data center monitoring– Workload scheduling, power optimization
• Volcano monitoring– Signal processing: timing and localization– System building: power management and
programming interfaces• Barcode streaming for smartphones
– Security of light channel and user authentication
40
Acknowledgement • Group members
– Tian Hao (Ph.D, 2010-), Yu Wang (Ph.D, 2010-), Jun Huang (Ph.D, 2009-), Ruogu Zhou (Ph.D, 2009-), Dennis Philips (Ph.D, 2009-), Jinzhu Chen (Ph.D, 2010-), Mohammad-Mahdi Moazzami (Ph.D, 2011-), Fatme El-Moukaddem (Ph.D, co-supervised with Dr. Eric Torng), Rui Tan (Postdoc)
• National Science Foundation– CDI, VolcanoSRI, 2011-2015 (in collaboration with WenZhan Song @ Georgia
State University, Jonathan Lees@University of North Carolina, Chapel Hill)– CAREER, performance-critical sensor networks, PI, 2010-2015.– ECCS, aquatic sensor networks, PI, 2010-2013 (in collaboration with Xiaobo
Tan @ MSU)– CNS, real-time and performance control of networked sensor system, MSU
PI, 2012-2015 (in collaboration with Xiaorui Wang @ Ohio State) – CNS, Interference in crowded spectrum, MSU PI, 2009-2012 (in collaboration
with Gang Zhou @ William & Mary)
41
Representative Publications• J. Chen, R. Tan, Y. Wang, G. Xing, X. Wang, X. Wang, B. Punch, D. Colbry, A High-Fidelity
Temperature Distribution Forecasting System for Data Centers, The 33st IEEE Real-Time Systems Symposium (RTSS), 2012, acceptance ratio: 35/157=22%
• R. Tan, G. Xing, J. Chen, W. Song, R. Huang, Quality-driven Volcanic Earthquake Detection using Wireless Sensor Networks, 31st IEEE Real-Time Systems Symposium (RTSS), 2010.
• T. Hao, R. Zhou, G. Xing, COBRA: Color Barcode Streaming for Smartphone Systems, The 10th International Conference on Mobile Systems, Applications, and Services (MobiSys), 2011, acceptance ratio: 32 / 182 = 17.5%
• J. Huang, G. Xing, G. Zhou, R. Zhou, Beyond Co-existence: Exploiting WiFi White Space for ZigBee Performance Assurance, The 18th IEEE International Conference on Network Protocols (ICNP), 2010, acceptance ratio: 31/170 = 18.2%, Best Paper Award (1 out of 170 submissions).
• R. Zhou, Y. Xiong, G. Xing, L. Sun, J. Ma, ZiFi: Wireless LAN Discovery via ZigBee Interference Signatures, The 16th Annual International Conference on Mobile Computing and Networking (MobiCom), acceptance ratio: 33/233=14.2%.
• S. Liu, G. Xing, H. Zhang, J. Wang, J. Huang, M. Sha, L. Huang, Passive Interference Measurement in Wireless Sensor Networks, The 18th IEEE International Conference on Network Protocols (ICNP), acceptance ratio: 31/170 = 18.2%, Best Paper Candidate (6 out of 170 submissions).
• X. Xu, L. Gu, J. Wang, G. Xing, Negotiate Power and Performance in the Reality of RFID Systems, The 8th Annual IEEE International Conference on Pervasive Computing and Communications (PerCom), acceptance ratio: 27/227=12%, Best Paper Candidate (3 out of 227 submissions) . 42
43
COBRA
Streaming barcodes btw screen and camera
or
receiversender
Real-time visible light communication (VLC) system for off-the-shelf smartphones- Encode info into color barcodes - Stream barcodes from screen to camera- High communication throughput (70~200 kbps for 4 inch, 800x640 screen)
Quality-driven Earthquake Detection
• Assured false alarm rate & detection probability • Real-time detection
– Temporal resolution: 1s• Long network lifetime
– Avoid raw data transmission
44
System Overview
Encode data into barcodes and display on the screen
PRE-PROCESSING
Color enhancement
Blur assessment
CODE EXTRACTION
Code Scan
Smart Frame detection
Sender
Receiver
CODE GENERATION
Motion-aware coding
Blur-aware color ordering
45
System Overview
CODE GENERATION
Motion-aware coding
Sender
Receiver
Select and enhance the received images
PRE-PROCESSING
Color enhancement
Blur assessment
CODE EXTRACTION
Code Scan
Smart Frame detection
46
Blur-aware color ordering
System Overview
Sender
Receiver
Extract data from enhanced images
CODE EXTRACTION
Code Scan
Smart Frame detection
CODE GENERATION
Motion-aware coding
PRE-PROCESSING
Color enhancement
Blur assessment
47
Blur-aware color ordering
48
Decision Fusion at BS• Extended majority rule
• Closed-form detection performance
> threshold, decide 1# of positive local decisionstotal # of sensors
PF = f ( PF1, PF2, …, PFN )PD = f ( PD1, PD2, …, PDN )
PFi / PDi : false alarm rate / detection prob. of sensor i
49
Block Size
Small block size can achieve higher throughput (>200 kbps) at the cost of lower decoding rate (<80%).
Big block size can achieve higher decoding rate (>99.5%) at the cost of lower throughput (<150kbps)
Measured on 800x480 resolution screen
Putting It All Together
18/24
Prediction Models
Real-time Data Collection
CFD Transient Modeling
Historical Sensor Data
𝐭 1 (𝑡+𝑘 ) ,𝐗(𝑡)
𝐭 2 (𝑡+𝑘 )
𝐗 (𝑡) Prediction
𝐗 ′(𝑡)
[𝐭𝟏 (𝑡+𝑘 ) ;𝐭𝟐 (𝑡+𝑘 )]
Training
50
51
Conclusion and Future Work
• A visible light communication system for off-the-shelf smartphones- New barcode design optimized for streaming- Blur-aware color ordering- Motion-aware barcode layout adaptation- Transmission rate up to 200kbps
• Future work- 2-way communication via front facing camera- Leverage motion level feedback from the receiver
52
Alignment and View Angle
Sender
Receiver
Sender
Receiver
Mis-alignment leads to shrinking of color blocks
Alignment Angle
View Angle
View angle leads to an un-uniform size of color blocks.
53
Motion-aware Coding
Relative motion causes more blur
Decoding bit error rate
Real-time sender acceleration
Experiment Results
• Long-term experiment (12 days)
• Multiple prediction horizons
23/24
Outlet
Inlet
54
Conclusion
• Design a temperature prediction system
• Novel Integration of Computational Fluid Dynamics and real-time data-driven prediction
• Implemented on both single-rack testbed and production testbed
• High-fidelity prediction in the presence of highly dynamic server workload
24/2455
Backup Slides
56
Challenges• Stringent performance requirements
– Real-time, high sensing fidelity• Complex and dynamic physical processes
– Stochastic noises, unpredictable dynamics• Constraints on power, bandwidth, sensor quality• Our methodology: tightly couple physical
modeling, sensing, and in-network processing
59
60
Sensor Selection for Decision Fusion
• Exclude sensors w/ low signal qualities– Avoid unnecessary FFT
• Configurable system detection performance
Given local false alarm rates, detection prob. {PFi, PDi | i=1, …, N}, find a sensor subset S
DF PPS
s.t. imizemin
Sensor Selection Algorithm
• Select sensor every detection period• Brutal-force search: O(2N)
– Long latency
• Derive PD, PF, PDi, PFi
• PD monolithically increases w/ Σi(PDi-PFi)
• Sort sensors by (PDi-PFi), include one by one
61
Basic idea of Color Ordering
63
1 2 3 1 2 3 1 2 3 1 2
Original sequenceBorder length = 9
New sequence (group size = 3)
Divide sequence into groups of equal size ( = 3)
Arrange together color blocks with same index in all groups
Choose the group size that minimizes the border length Border length = 5