Lab for System Informatics and Data Analytics (SIDA)
Industrial Big Data Analytics for Quality
Improvement in Complex Systems
Department of Industrial and Systems Engineering
University of Wisconsin-Madison
Dr. Kaibo Liu
1
Lab for System Informatics and Data Analytics (SIDA)
Background
• A.P. 2013-now, Department of industrial and Systems Engineering, UW-Madison
• Ph.D. 2013, Industrial Engineering (Minor: Machine Learning), Georgia Institute of Technology
• M.S. 2011, Statistics, Georgia Institute of Technology
• B.S. 2009, Industrial Engineering and Engineering Management, Hong Kong University of Science and Technology, Hong Kong
2
Lab for System Informatics and Data Analytics (SIDA)
My Research & Expertise
Research Interests Expertise
System Informatics and data analytics:
• Complex system modeling and performance assessment
• Data fusion for online process monitoring, diagnosis and prognostics
• Statistical learning, data mining, and decision making
Multi-disciplinary Research
3
Spatiotemporal Field Modeling and Prediction
Sensor Measurement and Monitoring Strategy
System Degradation Analysis and Prognostics
Engineering
Statistics/ Machine Learning
Operation Research/
Control
Multidisciplinary approach
Overall, my research goal is to make sense of big data for better decision making!
Lab for System Informatics and Data Analytics (SIDA) 4
Sensor Measurement and
Monitoring Strategy
Lab for System Informatics and Data Analytics (SIDA) 5
Objective-oriented sensor system designs in
complex systemsObjective• Obtain an optimal sensor allocation design at
minimum cost under different user specified quality requirements
Results Summary• Ensure customer satisfaction by optimally
designing sensor allocation strategy• The average cycle time, cost and inventory
level can be greatly reduced• Algorithms have been tested in several
applications, e.g., the hot forming and the cap alignment processes
• Supported several studentsEffectively search for optimal sensor system design solutions
Approaches• A best allocation subsets by intelligent search,
named BASIS algorithm that intelligently searches for the optimal sensor allocation solution
• Features• Consider the trade-off of detection speed,
fault diagnosis accuracy, and cost savings
Lab for System Informatics and Data Analytics (SIDA) 6
Causation-based monitoring, diagnosis and
controlObjective• Transform from existing correlation-based
techniques into a new causation-based quality control paradigm to achieve effective online quality monitoring and inference, root cause diagnosis, and proactive process control
Approaches
• Features• Engineering knowledge enhanced causal
modeling• Causation-based online quality monitoring,
inference, and diagnosis• Causation-based online feed-forward and
feed-back process control
Results Summary• Establish a series of causation-based
monitoring, diagnosis and control techniques for quality improvement in complex systems
• Algorithms have been tested in the hot forming, the cap alignment, and the rolling processes
• Supported several studentsimproved efficiency, yield, and quality
Lab for System Informatics and Data Analytics (SIDA) 7
Online monitoring of Big Data Streams
Objective• Create a new paradigm of dynamic data-driven
modeling, sampling and monitoring schemes for Big Data Streams (e.g., Video streams)
Approaches• A self-updated statistical model to fully
characterize the changing background• A dynamic, data-driven sampling strategy
subject to practical resources constraints • A scalable and robust statistical process
control method tailored for Big Data Streams
• Features• Scalability: linear complexity that ensures
practical implementation• Adaptability: automatically localize the
anomaly regions without any prior knowledge
Results Summary• Establish a series of real-time monitoring
methodologies that are tailored for Big data streams for quick anomaly detection (either cyber of physical) and localization
• Algorithms have been tested in various applications, e.g., diaper manufacturing, climate monitoring and solar flare detection
• Supported several students
Examples of thermal profiles on the polishing pad
during CMP process under different conditions
Maximize the detection capability with practical resources constraints
Lab for System Informatics and Data Analytics (SIDA)
Dynamic Data-Driven Modeling, Sampling and
Monitoring for Real-Time Solar Flare Detection
8
(a) Applications𝑡
Original Solar Image
(b) Applications modeling
Updated Solar Image
(c) Application measurement
systems and methods
Dynamic Sampling
𝑡
DDDAS
Framework
(d) Mathematical and
statistical algorithms
SPC Chart
Update
Model
Update
SPC
Update samplingSample data
• A dynamically updated
spatial-temporal
statistical model fully
characterize the
changing background
• A dynamic sampling
algorithm that
actively decides
which data streams to
observe given the
resources constraints
• A scalable and robust
SPC to effectively
combine the information
from significant data
streams to produce an
overall global
monitoring system
Lab for System Informatics and Data Analytics (SIDA) 9
Sensor Measurement and Monitoring Strategy
• Objective-Oriented Optimal Sensor Allocation Strategy: determine the minimum number of sensors needed given user specified requirements
• Adaptive Sensor Allocation Strategy: Adaptively adjust sensor allocation in a Bayesian Network to enhance monitoring and diagnosis
• A Top-r based Adaptive Sampling Strategy: Online monitor normally distributed big data streams in the context of limited resources
• A Nonparametric Adaptive Sampling Strategy: Online monitor non-normal big data streams in the context of limited resources
• Effective Online Data Monitoring and Saving Strategy: intelligently select and record the most informative extreme values in the simulation data
• A Spatial Adaptive Sampling Procedure: leverage the spatial information and adaptively and intelligently integrate two seemingly contradictory ideas (Wide and deep searches)
• A Rank-based Sampling Algorithm by Data Augmentation: automatically augment information for unobservable variables based on the online observations
Lab for System Informatics and Data Analytics (SIDA) 10
System Degradation Modeling and
Prognostics
Lab for System Informatics and Data Analytics (SIDA) 11
Internet of Things-enabled Condition-based
Monitoring, Diagnosis, and Prognostics
Objective• Leverage condition monitoring signals
collected from multiple and heterogeneous sensors to better visualize and assess the current system health status and predict its future behavior in real time
Approaches• Novel data fusion methods that select
best sensors and combine their information to construct health indices for system performance assessment
and visualization, ℎ𝑖,𝑡 = 𝑓 𝒙𝑖,.,𝑡
• Features• Combine data-driven approaches and
engineering principles governing the underlying failure mechanism to ensure satisfactory performance
Results Summary• Establish a series of data fusion
methodologies that are tailored for IoT-enabled service systems for health status visualization, characterization and prediction
• Algorithms have been tested in various applications, e.g., engine health monitoring, Alzheimer's disease and forklift management
• Supported several students
Aircraft engine diagram
Better health status characterization
Better fault diagnosis
Better RUL prediction
Lab for System Informatics and Data Analytics (SIDA)
Case Study – Engine RUL prediction
Name T24 T50 P30 Nf Ps30 phi NRf BPR htBleed W31 W32
Value 0.13 0.37 -0.03 -0.05 0.23 -0.21 -0.08 0.16 0.12 -0.05 -0.16
12
• Optimal weights 𝒘∗: ℎ𝑖 𝑡 = 𝑳𝑖 𝑡 𝒘∗
T24…
Health index
W32
The stochastic degradation models
(Gebraeel, 2006)Bayesian updating methodsReal time sensor
information
Remaining life prediction
• Developed HI-QL improved the RUL prediction accuracy
o by 64.83% compared with the best single sensor
o by 20.7% compared with existingHI-based models
Lab for System Informatics and Data Analytics (SIDA) 13
System Degradation Modeling and Prognostics
• Non-parametric data fusion model: does not need to know the parametric form of the degradation signal
• semi-parametric data fusion model: integrate degradation modeling and prognostics in an integrated manner
• SNR-based data fusion model: immune to the heterogeneous sensor challenges in terms of signal scales and measurement units
• Quantile regression-based data fusion model: ensure to recover the underlying degradation status with estimated fusion coefficients converging to the true values
• Sensory-based Failure Threshold Estimation: online update the failure threshold estimation of the in-field unit
• Kernel-trick for nonlinear data fusion model
• Generic data fusion model with automatic sensor selection
• Data fusion model for multiple failure modes
• Data fusion model when there are multiple environmental conditions
• Generic data fusion model when mutisensor signals are asynchronous
• Dynamic control of degradation speed and RLD via workload adjustment
Lab for System Informatics and Data Analytics (SIDA)
Smart Monitoring of Alzheimer’s Disease via Data Fusion,
Personalized Prognostics, and Selective Sensing
14
The model of AD trajectory [3]
Existing Screening Approaches
New Methodology
Biomarkers Screening Tests Smart Monitoring
Effective-ness
Expensive, e.g., $ 5000 per scan for
PiB-PET
Passive information collection:
burden, and complexity
Proactive information
collection driven by accurate
statistical models Proposed Smart Monitoring Method
Lab for System Informatics and Data Analytics (SIDA)
Data-Driven Failure Predictive Analytics for
Internet of Things (IoT) enabled Service Systems
Establish a core set of data-driven modeling, failure prognosis, and service decision-making methodologies for emerging Internet of Things (IoT)
enabled service systems, particularly in the context of TMHNA
15
Historical off-line dataon multiple units
Time0
Condition monitoring (CM) data
Failure
Censored
Time-to-failure data
Fai
lure
cas
es
Failure event data
Real-time on-line CM dataon individual units
0 5 10 15
34
56
78
910
Time
CM
Sig
na
l
0 5 10 15
34
56
78
910
0 5 10 15
34
56
78
910
0 5 10 15
34
56
78
910
0 5 10 15
34
56
78
910
0 5 10 15
34
56
78
910
0 5 10 15
34
56
78
910
0 5 10 15
34
56
78
910
Car #1 signal
Car #2 signal
Car #i signal… …
.
Equipment
in the field
Communication
network
Back-office
Processing center
Sensing dataService alert
Unit
Unit
Unit
Lab for System Informatics and Data Analytics (SIDA)
Big data analytics solutions to improve nuclear power
plant efficiency: Online monitoring, visualization,
prognosis, and maintenance decision making
Advance the ability to assess equipment condition and predict the remaining useful life (RUL) to support optimal maintenance decision
making in nuclear power plants.
16
Lab for System Informatics and Data Analytics (SIDA) 17
Spatiotemporal Field Modeling
and Prediction
Lab for System Informatics and Data Analytics (SIDA) 18
Real-time travel demand modeling and
prediction in smart and connected citiesObjective• Online prediction of the origin-destination
(OD) demand in traffic networks • Existing literature models the demand count
data separately for different OD pairs without considering spatial correlations or domain knowledge
Approaches• Propose a multivariate Poisson log-normal
model with specific parametrization tailored to the traffic demand problem
• Capture the spatiotemporal correlations of the traffic demand across different routes and epochs and automatically clusters the routes based on the demand correlations
• The model is estimated using an Expectation-Maximization (EM) algorithm and applied for predicting future demand counts at the subsequent epochs
Results Summary• The proposed method integrates traffic
network domain knowledge and achieves a sparse estimation based on clusters of routes.
• Estimate the parameters of the model accurately with the developed EM algorithm
• Has been applied on a real New York yellow taxi dataset
• Supported several students
ഥ 𝝁
𝑡
Lab for System Informatics and Data Analytics (SIDA) 19
Modeling of dynamic thermal fields via
grid-based sensor networksObjective• Accurate modeling and estimation of the full-
scale grain thermal field based on the grid-based sensor networks.
• Challenges:• Grid-based but sparse sensor data• Spatiotemporal correlation structures• Local variability of grain temperature
Approaches• Integrate physical dynamics model (for global
profile) and spatiotemporal stochastic processes (for local profile)
• Develop a spatiotemporal transfer learning technique for 3D field estimation using sensor observations from several homogeneous data sources
• Estimate time-varying parameters in PDE models from the obtained data to acquire a more accurate description of the dynamics
Results Summary• The proposed methods integrate physical
dynamics model, spatiotemporal statistical model, and advanced machine learning technique to achieves an accurate estimation of the 3D thermal fields based on grid-based sensor networks.
• Has been tested and verified on several real datasets for grain storage application
𝑡1 𝑡2 𝑡𝑀…
…
Time
𝑌(𝑠, 𝑡1) 𝑌(𝑠, 𝑡2) 𝑌(𝑠, 𝑡𝑀)…
Lab for System Informatics and Data Analytics (SIDA) 20
Other Research Projects
Lab for System Informatics and Data Analytics (SIDA) 21
Operator activity index development and
performance improvement
Objective• Propose a generic approach to develop an
effective composite index to identify high-performing operators on multiple dimensions
Results Summary• Developed an OAI by combining worker
metrics information to measure the activity of operators
• OAI by NPCA meaningfully explains the operator activity and also provides guidance for performance improvement
• Algorithms have been tested in the forklift operator activity analyses
• Supported several students
Approaches• a new nonnegative principal component
analysis (NPCA) approach with optimal balance• Best separation of operators• Comply with practical interpretation
Lab for System Informatics and Data Analytics (SIDA)
Obstructive Sleep Apnea Detection
22
Lab for System Informatics and Data Analytics (SIDA) 23
Retail Site Location Analysis by Business Data
AnalyticsObjective• Choose an optimal location for the opening of
a new retail site
Results Summary• Established a generic guideline on leveraging
data analytics tools for resolving business issues when dealing with business big data
• Algorithms have been tested in a real case study involving choosing an optimal location for the opening of a new retail site
• Supported several students
Approaches• Estimate the new market shares of the
company over the country if the new retail site is tentatively opened at different potential locations
The company of interest conducts gas station equipment repair and replacement business, who provided a dataset contains a total of more than 1 million detailed business transactions with a size about 8 GB over the past 5 years.
Lab for System Informatics and Data Analytics (SIDA)
Research Summary
Engineering
Statistics OR/Control
Engineering
Statistics/ Data
Mining
Operation Research/
Control
Industrial Big Data Analytics
24
Lab for System Informatics and Data Analytics (SIDA) 25
Acknowledgement
Lab for System Informatics and Data Analytics (SIDA)
Thank you!
Questions?
26