A High-Fidelity Temperature Distribution Forecasting System for Data Centers Guoliang Xing Assistant...
If you can't read please download the document
A High-Fidelity Temperature Distribution Forecasting System for Data Centers Guoliang Xing Assistant Professor Department of Computer Science and Engineering
A High-Fidelity Temperature Distribution Forecasting System for
Data Centers Guoliang Xing Assistant Professor Department of
Computer Science and Engineering Michigan State University
Slide 2
Cyber-Physical Systems Cyber-physical systems are engineered
systems that are built from and depend upon the synergy of
computational and physical components 1 Many critical application
domains Medical, auto, energy, transportation # 1 national priority
for Networking and IT Research and Development (NITRD) NITRD Review
report by President's Council of Advisors on Science and Technology
(PCAST) titled Leadership Under Challenge: Information Technology
R&D in a Competitive World, 2007 1 NSF Cyber-physical systems
solicitation13502 2
Slide 3
Our CPS Projects Data center thermal monitoring Real-time
volcano monitoring Aquatic process profiling Robotic fish, Smart
Microsystems Lab, MSU Tungurahua Volcano, Ecuador Volcano
Monitoring Sensors Data Center Monitoring, HPCC, MSU Harmful Algae
Bloom in Lake Mendota in Wisconsin, 1999 3
Slide 4
Outline Data center thermal monitoring Background System design
Testbed evaluation Real-time volcano monitoring Barcode streaming
for smartphones 4
Slide 5
Motivation Data centers are critical computing infrastructure
509,147 data centers world wide, 285 million sq. ft. 1 2.8M hours
of downtime, 142 billions direct loss/year 1 23% server outages are
heat-induced shutdowns An aerial view of EMC's new data center in
Durham, North Carolina 2 An EMC data center 2 1 Emerson Network
Power, State of the Data Centers 2011, 2
http://www.datacenterknowledge.com/archives/2011/09/15/emc-opens-new-cloud-data-center-in-nc/.
5
Slide 6
Motivation Many data centers are overcooled Low AC set-points,
high server fan speeds Excessive cooling energy up to 50% or more
of total power consumption Rapid increase of energy use in data
centers From 2005 to 2010, electricity use in data centers grew 36%
(US) and 56% (world wide) 1 An estimated 2% of electricity budget
of US 1 1 Jonathan G. Koomey, Grouth in data center electricity use
2005 to 2010, Analytics Press, 2011. 6
Slide 7
Temperature Forecasting Predict server temperature evolution
Identify potential hot spots Enable high CRAC set-points for energy
saving Temperature at inlets/outlets indicates hotspots Inlets
Outlets 7 cool airhot air
Slide 8
Requirements High-fidelity Prediction 1 o C prediction error
Long prediction horizons (e.g., 10 minutes) Coverage: normal
conditions & emergencies (e.g., AC failures) Timeliness and low
overhead Real-time online prediction Decouple from infrastructure
in data center 8
Slide 9
Challenges Complex air and thermal dynamics Highly dynamic
workloads Physical failures ACs, servers, fans Row 1 Row 2
Raised-floor cold air Server exhaust 12-day CPU utilization data of
one rack (64 servers with 512 CPU cores) in High Performance
Computer Center at Michigan State University 9
Slide 10
Related Work Data-driven prediction approach Collect in situ
sensor data Construct prediction model (parameter learning)
Regression, neural networks, etc. Real-time prediction Limitation
Require extensive training Rare but critical physical failures in
data centers? 10
Slide 11
Related Work Computational Fluid Dynamics (CFD) modeling
Spatially discretized geometry model Iteratively solve partial
differential equations Limitation Inaccuracy, high compute
complexity error 11
Slide 12
System Architecture CFD + Wireless Sensing + Data-driven
Prediction Preserve realistic physical characteristics in training
data Capture dynamics by in situ sensing and real-time prediction
Data Center Calibration Sensing (CPU, fan speed, temperature,
airflow) Sensing (CPU, fan speed, temperature, airflow) Real-time
Prediction CFD Modeling geometric model (server/rack dimension and
placement) 12
Slide 13
Thermal Sensing Inlet / Outlet Temperature Sensing Air velocity
CRAC Temp CPU utilization Fan speed Temperature Airflow velocity
LAN 13
Slide 14
CFD Modeling & Calibration Data Center Calibration Sensing
(CPU, fan speed, temperature, airflow) Sensing (CPU, fan speed,
temperature, airflow) Real-time Prediction CFD Modeling 14
Slide 15
CFD Modeling & Calibration Polynomial Calibration Physical
Geometry Model CFD Modeling Steady/Transient CFD Steady Sensor Data
Calibration coefficients Temperature from CFD Calibration order
Training: sensor reading Runtime: calibrated temperature Training:
sensor reading Runtime: calibrated temperature t t+3 mint+6 min
Transient 15
Slide 16
Real-time Prediction Data Center Calibration Sensing (CPU, fan
speed, temperature, airflow) Sensing (CPU, fan speed, temperature,
airflow) Real-time Prediction CFD Modeling 16
Slide 17
Real-time Prediction Training Linear Prediction Model
Prediction 17
Slide 18
Single-rack Experiment Testbed configuration 30 temperature
sensors Telosb, Iris 2 airflow sensors AccuSense F333 15 servers
Dell PowerEdge 850 Western Scientific Controlled CPU utilization
Temperature sensor Ceiling vent airflow sensor Insulation
Temperature sensor Airflow sensor AC inlet 18
Production Data Center Experiment Testbed configuration 5
racks, 229 servers, 2016 cores 4 in-row CRAC units 35 temperature
sensors 4 airflow sensors Dynamic CPU utilization Airflow sensor
Temperature sensor Chained Temp. sensor In-row CRACs 20
Outline Data center thermal monitoring Real-time volcano
monitoring Background Quality-driven earthquake detection
Deployment and evaluation Barcode streaming for smartphones 22
Slide 23
Volcano Hazards 7% world population live near active volcanoes
20 - 30 explosive eruptions/year Eruption in Chile, 6/4, 2011 $68 M
instant damage, $2.4 B future relief.
www.boston.com/bigpicture/2011/06/volcano_erupts_in_chile.html 23
Eruptions in Iceland 2010 A week-long airspace closure [
Wikipedia]
Slide 24
Volcano Monitoring Traditional seismometer Expensive (~ $10K),
bulky, difficult to install, up to a dozen of nodes for most active
volcanoes! Data collection and retrieval ~10G data in a month
Processing Detection, timing, localization 4D Tomography
computation Real-time, 3D fluid dynamics of a volcano conduit
system Extremely computation-intensive 24
Slide 25
VolcanoSRI Project Large-scale, long-term deployment Up to 500
nodes on an active volcano in Ecuador Sampling@100Hz, several month
lifetime Collaborative in-network processing Detection, timing,
localization 4D tomography computation The tentative deployment map
at Ecuador (Photo credits: Prof. Jonathan Lees) 25
Slide 26
Challenge 1: Spatial Diversity Complicated physical process
Highly dynamic magnitude Dynamic source location Two earthquakes on
Mt St Helens 26
Slide 27
Challenge 2: Frequency Diversity Responsive to P-wave within [1
Hz, 10 Hz] Freq. spectrum changes with signal magnitude [1 Hz, 5
Hz] [5 Hz, 10 Hz] Signal energy: X 10000 X 100 27
Slide 28
Approach Overview Select sensors with best signal qualities FFT
(computation-intensive) Local detection Decision fusion sensor
selection decision fusion system decision FFT seismic sensor 1 0 1
28 avoid raw data transmission
Slide 29
Smartphone-based Node Seismometer Geospace Geophone model
GS-11D LG GT540 Android 1.6 IOIO board Amplifier External GPS GPS
antenna 29
Slide 30
Field Deployment First deployment on Tungurahua, Ecuador Six
nodes, one week, 8/2012 30
Slide 31
Results 19 days 3.9 months 5% detect prob. Signal collected by
permanent seismometer Signal collected by our node Centralized
processing Data collection w/ compression STA/LTA Heuristic seismic
detection algorithm Weighted decision fusion No sensor selection
31
Slide 32
Outline Data center thermal monitoring Real-time volcano
monitoring Barcode streaming for smartphones Background Barcode
streaming Implementation and evaluation 32
Slide 33
Wireless payment -Preserve security and privacy 33
Advertisement -Broadcast brochures, coupons and maps (e.g., retail
stores, museums) Data exchange -Transfer small piece of info btw
smartphones (e.g., contacts, photos) PayPal inStore App
Slide 34
34 QR code [1] Low capacity (typically 50 chars) HCCB [2] (High
Capacity Color Barcode) High decoding overhead [1] I. 18004:2006.
Automatic identification and data capture techniques - QR code 2005
bar code symbology specification. [2] D. Parikh and G. Jancke.
Localization and segmentation of a 2d high capacity color barcode.
In Applications of Computer Vision, 2008. Not suitable for
high-rate streaming
Slide 35
35 High capacity & fast decoding rate Smart frame Corner
Tracker Timing Reference Blocks Code area Blocks with 4 orthogonal
colors Single barcode capacity up to 20 Kbits (4 inch) p
Slide 36
Typical received barcode image Original barcode Distorted
barcode image Poor image quality -Low quality camera -Small size
and low resolution screen -Relative movement Perspective distortion
Severe blur in captured images Limited computation resource - Need
to capture and process up to 30 images per second 36
Slide 37
37 Blur usually occurs along the border of blocks with
different colors Typical barcode image captured by smartphone
camera
Slide 38
38 Color ordering Goal: Group blocks with same color to reduce
border length.
Slide 39
39 Implementation - Android 2.3.3 Gingerbread - Sender: 56 KB
storage, 5MB RAM, Nexus S (4 inch screen, 800x480) - Receiver: 72
KB storage, 3.5~12MB RAM, HTC Inspire (8MP camera) 200Kbps
throughput under various settings - Block size - View angle,
alignment - Screen refreshing rate, camera resolution - Mobility,
distance - Ambient lighting, screen brightness Nexus S HTC
Inspire
Slide 40
Future Work Data center monitoring Workload scheduling, power
optimization Volcano monitoring Signal processing: timing and
localization System building: power management and programming
interfaces Barcode streaming for smartphones Security of light
channel and user authentication 40
Slide 41
Acknowledgement Group members Tian Hao (Ph.D, 2010-), Yu Wang
(Ph.D, 2010-), Jun Huang (Ph.D, 2009-), Ruogu Zhou (Ph.D, 2009-),
Dennis Philips (Ph.D, 2009-), Jinzhu Chen (Ph.D, 2010-),
Mohammad-Mahdi Moazzami (Ph.D, 2011-), Fatme El- Moukaddem (Ph.D,
co-supervised with Dr. Eric Torng), Rui Tan (Postdoc) National
Science Foundation CDI, VolcanoSRI, 2011-2015 (in collaboration
with WenZhan Song @ Georgia State University, Jonathan
Lees@University of North Carolina, Chapel Hill) CAREER,
performance-critical sensor networks, PI, 2010-2015. ECCS, aquatic
sensor networks, PI, 2010-2013 (in collaboration with Xiaobo Tan @
MSU) CNS, real-time and performance control of networked sensor
system, MSU PI, 2012-2015 (in collaboration with Xiaorui Wang @
Ohio State) CNS, Interference in crowded spectrum, MSU PI,
2009-2012 (in collaboration with Gang Zhou @ William & Mary)
41
Slide 42
Representative Publications J. Chen, R. Tan, Y. Wang, G. Xing,
X. Wang, X. Wang, B. Punch, D. Colbry, A High-Fidelity Temperature
Distribution Forecasting System for Data Centers, The 33st IEEE
Real-Time Systems Symposium (RTSS), 2012, acceptance ratio:
35/157=22% R. Tan, G. Xing, J. Chen, W. Song, R. Huang,
Quality-driven Volcanic Earthquake Detection using Wireless Sensor
Networks, 31st IEEE Real-Time Systems Symposium (RTSS), 2010. T.
Hao, R. Zhou, G. Xing, COBRA: Color Barcode Streaming for
Smartphone Systems, The 10th International Conference on Mobile
Systems, Applications, and Services (MobiSys), 2011, acceptance
ratio: 32 / 182 = 17.5% J. Huang, G. Xing, G. Zhou, R. Zhou, Beyond
Co-existence: Exploiting WiFi White Space for ZigBee Performance
Assurance, The 18th IEEE International Conference on Network
Protocols (ICNP), 2010, acceptance ratio: 31/170 = 18.2%, Best
Paper Award (1 out of 170 submissions). R. Zhou, Y. Xiong, G. Xing,
L. Sun, J. Ma, ZiFi: Wireless LAN Discovery via ZigBee Interference
Signatures, The 16th Annual International Conference on Mobile
Computing and Networking (MobiCom), acceptance ratio: 33/233=14.2%.
S. Liu, G. Xing, H. Zhang, J. Wang, J. Huang, M. Sha, L. Huang,
Passive Interference Measurement in Wireless Sensor Networks, The
18th IEEE International Conference on Network Protocols (ICNP),
acceptance ratio: 31/170 = 18.2%, Best Paper Candidate (6 out of
170 submissions). X. Xu, L. Gu, J. Wang, G. Xing, Negotiate Power
and Performance in the Reality of RFID Systems, The 8th Annual IEEE
International Conference on Pervasive Computing and Communications
(PerCom), acceptance ratio: 27/227=12%, Best Paper Candidate (3 out
of 227 submissions). 42
Slide 43
43 Streaming barcodes btw screen and camera or receiversender
Real-time visible light communication (VLC) system for
off-the-shelf smartphones -Encode info into color barcodes -Stream
barcodes from screen to camera -High communication throughput
(70~200 kbps for 4 inch, 800x640 screen)
Slide 44
Quality-driven Earthquake Detection Assured false alarm rate
& detection probability Real-time detection Temporal
resolution: 1s Long network lifetime Avoid raw data transmission
44
Slide 45
Encode data into barcodes and display on the screen
PRE-PROCESSING Color enhancement Blur assessment CODE EXTRACTION
Code Scan Smart Frame detection Sender Receiver CODE GENERATION
Motion-aware coding Blur-aware color ordering 45
Slide 46
CODE GENERATION Motion-aware coding Sender Receiver Select and
enhance the received images PRE-PROCESSING Color enhancement Blur
assessment CODE EXTRACTION Code Scan Smart Frame detection 46
Blur-aware color ordering
Slide 47
Sender Receiver Extract data from enhanced images CODE
EXTRACTION Code Scan Smart Frame detection CODE GENERATION
Motion-aware coding PRE-PROCESSING Color enhancement Blur
assessment 47 Blur-aware color ordering
Slide 48
Decision Fusion at BS Extended majority rule Closed-form
detection performance 48 > threshold, decide 1 # of positive
local decisions total # of sensors P F = f ( P F1, P F2, , P FN ) P
D = f ( P D1, P D2, , P DN ) P Fi / P Di : false alarm rate /
detection prob. of sensor i
Slide 49
49 Small block size can achieve higher throughput (>200
kbps) at the cost of lower decoding rate (99.5%) at the cost of
lower throughput (