Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
IIC Journal of Innovation - 1 -
Causal Analytics in IIoT – AI That Knows What
Causes What, and When
Authors:
Dr. John Galloway Founder Au Sable [email protected] Pieter van Schalkwyk Chief Executive Officer XMPro [email protected] Dann Marian Engineering and Projects Manager [email protected]
Causal Analytics in IIoT – AI That Knows What Causes What, and When
- 2 - June 2018
INTRODUCTION
Finding the “because” behind certain
business or operations events has always
been a key part of any engineering,
maintenance or operations manager’s job in
industrial businesses. “The First Stage
compressor failed because…” or “the supply
tank ran dry because…” are common
phrases in maintenance and operations
departments in industrial businesses.
Finding the “because” traditionally relies on
experienced engineers that can interpret
event, contextual and temporal data to
deduce the likelihood of specific factors
causing others in either a negative or
positive way.
Knowing the real root causes of events is
critical to resolving problems rather than
continuously dealing with the symptoms. It
resulted in popular, formalized approaches
such as “Root Cause Analysis,” or RCA as it is
generally known. The challenge is that there
are often multiple causal factors for these
events, and finding the one “root cause”
may not always be possible. Understanding
other causal factors that may influence the
outcome of industrial processes and the
behavior of equipment need to be
considered.
Au Sable, in collaboration with XMPro,
developed an algorithmic, artificial
intelligence-based, approach for “Reliable
Causal Analytics” (rCA) in industrial IoT
applications. This article demonstrates:
It is possible to perform Reliable Causal
Analytics using industrial IoT data and
Artificial Intelligence (AI) to determine
causality of business and operational
events such as equipment failure or
operational issues
How Reliable Causal Analytics provides
data-driven decision support for
traditional Root Cause Analysis
approaches
The approach to embed this causal
analytics methodology in IoT Process
Management software to be able to
perform this in a repeatable and
automated manner.
rCA is the result of many years of research
and application of causal analytics in real-
world scenarios. Through this, Au Sable
developed rCA that enables cause and effect
relationships to be identified from sensor-
driven data and made known to the analyst
(e.g. wear on part #105 has causally
impacted the performance of device #65
with a causal coefficient of 0.86), as well as
correlation relationships in the data.
This means:
the risk of making false decisions about
what were, or will be predictively, the
causal drivers of an effect is reduced,
and
the potential for costly or disastrous
mistakes is thereby reduced.
This article provides background on
traditional Root Cause Analysis and the
evolution of Causal Analytics. It
demonstrates how to automate the
analytics to scale with an IoT Process
Management platform and how it is applied
in an industrial application. It provides a
practical example of Reliable Causal
Causal Analytics in IIoT – AI That Knows What Causes What, and When
IIC Journal of Innovation - 3 -
Analytics (rCA) applied to a floating
production storage and offloading (FPSO)1
vessel for an Oil & Gas company.
Crude oil, gas and water from the reservoir
are separated on board the FPSO. Oil is
stored on the facility in six pairs of tanks,
before export to trading tankers. The vessel
is designed to store 1.4 million barrels of oil
and processes approximately 170,000
barrels of oil per day (bopd).
Example Floating Production Storage and Offloading Vessel
Au Sable’s rCA has functioned in intelligence,
defense and anti-terrorism applications for
many years. The solution described in this
article is the combination of advanced IoT
Process Management software from XMPro
and the rCA AI software from Au Sable.
1 Floating production storage and offloading https://en.wikipedia.org/wiki/Floating_production_storage_and_offloading
2 http://asq.org/learn-about-quality/root-cause-analysis/overview/roots-of-root-cause.html
3 Wilson, Paul F.; Dell, Larry D.; Anderson, Gaylord F. (1993). Root Cause Analysis: A Tool for Total Quality Management.
Milwaukee, Wisconsin: ASQ Quality Press. pp. 8–17. ISBN 0-87389-163-5.
4 Adapted from https://en.wikipedia.org/wiki/Root_cause_analysis (classification)
ROOT CAUSE ANALYSIS BASED ON
CORRELATION DOESN’T WORK IN
THE IOT ERA
Industrial RCA Background
Formal Root Cause Analysis for industrial
applications started with the Total Quality
Management (TQM)2 movement advocated
by Deming in Japan in the late 1980’s and
early 1990’s.
Paul Wilson et al3 described the root cause
analysis process for Quality Management in
detail during the TQM era. “Root cause
analysis is a method of problem-solving used
for identifying the root causes of faults or
problems. A factor is considered a root cause
if removal thereof from the problem-fault-
sequence prevents the final undesirable
outcome from recurring; whereas a causal
factor is one that affects an event's outcome,
but is not a root cause. Though removing a
causal factor can benefit an outcome, it does
not prevent its recurrence with certainty.”
Even though root cause analysis formally
originated in TQM, it finds many applications
in industrial environments:4
Safety-based Root Cause Analysis arose from the fields of accident analysis and occupational safety and health.
Causal Analytics in IIoT – AI That Knows What Causes What, and When
- 4 - June 2018
Production-based Root Cause Analysis has roots in the field of quality control for industrial manufacturing.
Process-based Root Cause Analysis, a follow-on to production-based RCA, broadens the scope of RCA to include business processes.
Failure-based Root Cause Analysis originates in the practice of failure analysis as employed in engineering and maintenance.
Systems-based Root Cause Analysis has emerged as an amalgam of the preceding schools, incorporating elements from other fields such as change management, risk management and systems analysis.
Root Cause Analysis became popular as an
approach to methodically identify and
correct the root causes of events instead of
addressing symptomatic results of these
events. The objective of root cause analysis
is to prevent problem recurrence. Some
popular root cause analysis techniques
include “Five Whys” and Cause and Effect
(Fishbone) diagrams. These techniques rely
on human interpretation of event
information and data and require
experienced practitioners to conduct the
analysis. It is often limited to a few critical
production assets as the manual process is
time-consuming and laborious. Wilson’s
distinction between root causes and other
causal factors provides some guidance on
the application of causal analytics in an IoT
context for this article. Traditional
techniques focused only on finding the root
causes through manual review. Modern
techniques such as rCA described in this
article, combined with IoT data and
advances in AI, enable engineers to not only
assess root causes, but also find other causal
factors. These causal factors may not lead to
equipment or process failure but may still
impact equipment or process performance.
Recent advances in cloud computing and AI
provide the necessary infrastructure to
analyze event data for IoT and other sources
at massive scale. This means analysts can
have a more expansive view of causal events
rather than a reductionist view where the
scope of an analysis is limited to what a
human can process.
MOTIVATION FOR DATA-DRIVEN,
RELIABLE CAUSAL ANALYTICS
There are three main reasons to find a
reliable, data-driven approach to finding
root causes and causal factors for equipment
failure and operational performance in
industrial environments:
Aging workforce and a large number of experienced engineers retiring soon
Complexity of equipment, making it harder to troubleshoot
Inaccuracy of Root Cause Analysis
Retiring Workforce
With a retiring workforce in many industrial
sectors, the experience needed to conduct
meaningful RCAs is decreasing. As much of
the traditional approaches rely on
observational analysis, the number of
experienced engineers that can provide
reliable analysis is fast reducing.
According to a January 2017 assessment by
the US Department of Energy, 25% of US
employees in electric and natural gas utilities
Causal Analytics in IIoT – AI That Knows What Causes What, and When
IIC Journal of Innovation - 5 -
will be ready to retire within 5 years5. The US
Department of Labor also estimates that the
average age of industry employees is now
over 50 and up to half of the current energy
industry workforce will retire within 5-10
years.6
Manual RCA requires the combination of a
rigorous methodology, fault analysis
technology and experience to evaluate the
possible causes of business events such as
equipment failure, quality problems or
safety incidents. Much of the expertise
needed will be lost with the retiring
workforce. A data-driven, algorithmic
approach provides a viable replacement for
the experience of people to determine
causal relationships between business
events.
Complexity of Industrial Equipment
As industrial equipment becomes
increasingly sophisticated 7 and more
complex, the ability to perform diagnostics
becomes increasingly more difficult. As
equipment becomes more complex and
sophisticated, the number or combinations
and permutations of potential causal factors
for certain events increases exponentially. It
follows a similar pattern to Metcalfe’s law8
5 U.S. Department of Energy, Quadrennial Energy Review (QER) Task Force report second installment titled “Transforming the
Nation’s Electricity System.” Chapter V: Electricity Workforce of the 21st-Century: Changing Needs and New Opportunities.
January 2017. Retrieved from https://energy.gov/epsa/initiatives/quadrennial-energy-review-qer
6 U.S. Department of Labor Employment and Training Administration “Industry Profile – Energy.” Retrieved from
https://www.doleta.gov/brg/indprof/energy_profile.cfm
7 Challenges To Complex Equipment Manufacturers: Managing Complexity, Delivering Flexibility, and Providing Optimal Service
http://www.oracle.com/us/solutions/046249.pdf
8 Metcalfe's law https://en.wikipedia.org/wiki/Metcalfe%27s_law
9 The problem with root cause analysis http://qualitysafety.bmj.com/content/26/5/417
for telecommunication devices that states
“the effect of a telecommunications
network is proportional to the square of the
number of connected users of the system
(n2)”.
Metcalfe’s law, now also used in economics
and business management, provides some
quantification of the impact of the increasing
complexity of equipment to troubleshoot
potential causal relationships between
operational events.
Inaccuracy of Root Cause Analysis
Root Cause Analysis gained popularity in
industrial and other sectors such as
healthcare. One of the main challenges that
emerged centers around the fact that it
requires facilitation and analysis by people
who can process only limited amounts of
information. People are also susceptible to
opinions and organizational influences such
as politics. Peerally 9 et al describe the
problem with Root Cause Analysis with these
8 main challenges:
The unhealthy quest for “the” root cause
Questionable quality of RCA investigations
Political hijack
Causal Analytics in IIoT – AI That Knows What Causes What, and When
- 6 - June 2018
Poorly designed or implemented risk controls
Poorly functioning feedback loops
Disaggregated analysis focused on single organizations and incidents
Confusion about blame
The problem of many hands
Many of these are as a result of the
subjective nature of the people doing
analysis and can be addressed with a more
objective, data-driven approach. People
can’t process all the potential data sources
of event and contextual information.
Modern advances in data, stream and event
processing address some of that challenge
and AI provides a means to make sense of
the data at scale. It removes the reliance on
the subjective nature of human analysis and
opens the opportunity to analyze fact-based
information at scale to derive insights.
The unhealthy quest for “the” root cause
further describes a challenge that can be
better addressed with an algorithmic
approach to Root Cause Analysis. Peerally
states that “the first problem with Root
Cause Analysis is its name. By implying—
even inadvertently—that a single root cause
(or a small number of causes) can be found,
the term ‘root cause analysis’ promotes a
flawed reductionist view.”
An algorithmic approach often provides
more potential causal factors, their
10 How does business analytics contribute to business value? https://onlinelibrary.wiley.com/doi/pdf/10.1111/isj.12101
11 The Age of Analytics: Competing in a data-driven world
https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Analytics/Our%20Insights/The%20age%2
0of%20analytics%20Competing%20in%20a%20data%20driven%20world/MGI-The-Age-of-Analytics-Full-report.ashx
relationship to each other and the strength
(causal coefficient) of the relationships. It
offers additional insights into events and
often finds causation that may be
counterintuitive to the views of the people
that do it manually. An algorithmic approach
also provides repeatability and scale. It will
analyze the IoT and contextual data in a
consistent way that is independent of the
person performing the analysis.
USING CAUSAL ANALYTICS TO
PERFORM ALGORITHMIC ROOT
CAUSE ANALYSIS
Correlation is Not Causation
In this era of big data, it is commonly said
that data analytics is a prime driver of value
to enterprises10. This is true, but only if the
analytics performed across the data are well
grounded methodologically and perform
well and efficiently to derive the value.
Big data creates big and complex data
volumes. This is of limited value however, if
it is not accompanied by the best available
analytics to enable the most valuable,
accurate and reliable decisions to occur11 .
Hence, there is an increasing requirement
for the analytics component in industrial IoT
solutions to be fast, reliable and accurate to
identify the problems and opportunities and
Causal Analytics in IIoT – AI That Knows What Causes What, and When
IIC Journal of Innovation - 7 -
ensure that such problems are addressed
correctly and urgently.
Correlation of events and systems is often a
starting point for problem-solving in
industrial environments but “correlation is
not causation”12. Correlation helps to point
the way, helps indicate what might be
candidate causative or driving factors for
some particular effect yet keeping in mind
that correlation is simply a measure of
association not causation.
Introductory statistics courses tell us that it
is not possible to prove causation unless one
conducts an experiment whereby treatment
and control groups are randomized.
This is totally correct but is just not feasible
to conduct an experiment in 99% of real-
world situations. Algorithmic methods have
a probabilistic and contributory approach –
spurred on by big data’s need for
empirically-based data-driven decisions – to
answering questions about what caused
what or will. For example, a causal
coefficient of 0.83 of X as a causative
influence on Y, does not mean that X is
necessarily the sole cause of Y (there may be
multiple causes) nor does it always cause Y.
X is identified however as a contributory
cause of Y. Similarly, smoking is a
contributory cause of lung cancer; it is not
12 Correlation does not imply causation https://en.wikipedia.org/wiki/Correlation_does_not_imply_causation
13 Attaining IoT Value: How To Move from Connecting Things to Capturing Insights
https://www.cisco.com/c/dam/en_us/solutions/trends/iot/docs/iot-data-analytics-white-paper.PDF
14 The IoT’s Potential for Transformation http://e.huawei.com/en-
sa/publications/global/ict_insights/201703141505/focus/201703141643
the sole cause, nor does it always cause the
effect.
Correlations can be misleading. Valuable
results and insights are often found, but the
correlation methods upon which decisions
are made mean that risks are inherent and
could lead to mistaken or sub-optimal
decision-making and outcomes.
The chief analytics tool of most industrial IoT
analytics vendors is correlation. Most
sensor-driven data (IoT and machine-
generated logs) is analyzed using a proven
but older form of statistical methods (even
when operating within a machine learning
framework). Correlational methods are the
dominant form of analytics.
Some examples from IoT vendor
publications and websites demonstrate this
approach:
1. Cisco (Attaining IoT Value): …enable the company’s customers to perform real-time data correlation and, as a result, quickly react to irregularities13
2. Huawei (‘The IoT's Potential for Transformation’): …enables correlation-based process and productivity improvements.14
Causal Analytics in IIoT – AI That Knows What Causes What, and When
- 8 - June 2018
3. ThingWorx's capabilities make it possible for users to correlate data, deriving powerful insights…15
4. Siemens PLM: …quantitative statistical relationships to real-life usage, called customer correlation16
5. Industrial Internet Consortium: …common issue in IIoT systems is correlating data between multiple sensors and process control states17
Correlational methods are established as
powerful aids to decision-making as is
witnessed in the rise of platforms that
provide the capability. Correlations often
vary such that at a given time one entity and
another may be positively related and at
other times only weakly related or not at all.
There is no fact-based causal coefficient that
describes the strength of potential causal
relationships.
The lack of stability in correlations indicates
complexity in the relationships and the
presence of a dynamical system (common in
IIoT). This results in variability according to
the system state and nonlinearity in system
behavior. It means that traditional statistical
methods, correlation included, have
limitations for obtaining precise analytics
and improved decision making about
performance in IIoT.
15 A survey of IoT cloud platforms https://www.sciencedirect.com/science/article/pii/S2314728816300149
16 Customer Correlation Durability Methodology
https://www.plm.automation.siemens.com/en/products/lms/engineering/customer-correlation.shtml
17 Industrial Analytics: The Engine Driving the IIoT Revolution https://www.iiconsortium.org/pdf/Industrial_Analytics-
the_engine_driving_IIoT_revolution_20170321_FINAL.pdf
The result is that an observed correlation
over time may or may not be coincidental;
or, the observed correlation (and any
implied causation) may be the result of one
or more third-party variables (hidden
confounders), e.g. another variable that
influences two events that are seemingly
correlated. An example of this may be ice
cream sales and boating accidents that are
correlated, but both are affected by summer
temperatures, and so a causal inference
would be spurious. In this example summer
temperature is causal, but one may
incorrectly infer causation that an increase
in ice cream sales leads to boating accidents
due to the high correlation factor. More
humorous examples of these erroneous
correlations can be found at Spurious
Correlations.
Mathematically-based causal analytics
attempts to improve on correlation for
causality identification.
The Evolution of Causal Analytics
CAUSALITY FOR REAL-WORLD APPLICATIONS
It is well accepted that causation cannot be
proven statistically unless one conducts an
experiment with randomization to control
Causal Analytics in IIoT – AI That Knows What Causes What, and When
IIC Journal of Innovation - 9 -
for spurious relationships 18 19 , which is
simply not practicable in most real-world
situations. The position of the authors is that
correlational methods have served well and
are proven to provide useful insights, but are
nonetheless prone to producing spurious
relationships and hence mistaken
decisions.20 21
As noted earlier, causality research has been
undertaken to develop different
probabilistic methods and approaches for
identifying cause and effect relationships in
non-experimental or ‘observational’ data.
Causal analytics evolved over the past few
decades from academic studies to practical
solutions such as rCA. A stumbling block
historically in reaching this goal has been to
devise causal algorithms that produce
reliable and accurate results for commercial
and government application.
ADVANCES IN CAUSAL ANALYTICS AND THE
DEVELOPMENT OF RELIABLE CAUSAL ANALYTICS (RCA)
In the 1980s, mathematical advances by
Judea Pearl22 from UCLA showed that causal
relationships can be represented from data
in terms of probabilities and led him later to
declare that “causality has been
mathematized”. The mathematization was
perhaps a little premature, but Pearl’s
18 https://us.sagepub.com/sites/default/files/upm-binaries/14289_BachmanChapter5.pdf
19 http://www.statisticssolutions.com/establishing-cause-and-effect/
20 https://hbr.org/2015/06/beware-spurious-correlations
21 https://en.wikipedia.org/wiki/Spurious_relationship
22 Judea Pearl https://en.wikipedia.org/wiki/Judea_Pearl
23 Clive Granger https://en.wikipedia.org/wiki/Clive_Granger
outstanding work led him to be awarded in
2012 the industry’s equivalent of the Nobel
Prize, the Turing award, for advances in both
machine learning and causality.
Problems remained however, e.g. how to
identify a causal relationship when unknown
delays occur between cause and effect. And,
what are termed hidden confounders, were
difficult to identify and control for. Earlier,
the work of Weiner (1950s) laid the basis for
several information-theoretic measures of
causality (and for well-known data
compression algorithms).
A landmark innovation was that of Clive
Granger 23 , awarded a Nobel prize for
developing a test of causality: X is said to
cause Y, if the past values of X contain
information that helps predict future values
of Y, above and beyond the information
contained in past values of Y – graphically:
Causal Analytics in IIoT – AI That Knows What Causes What, and When
- 10 - June 2018
Researchers extended this framework, e.g.
to allow for analysis of multiple time series
generated by nonlinear models, for lagging
the cause and effect variables and for causal
graphical models for better handling of
latent variables.
Transfer Entropy 24 25 (TE) is a later
implementation of the principle that causes
must precede and predict their effects. TE
improves on Granger in that it directly caters
for nonlinear interactions and helps
minimize problems of noisy data. TE is a
model-free and non-parametric measure of
directed information flow from one variable
to another.
24 Transfer Entropy https://en.wikipedia.org/wiki/Transfer_entropy
25 Transfer entropy between multivariate time series https://www.sciencedirect.com/science/article/pii/S1007570416305020
26 Progress in Root Cause and Fault Propagation Analysis of Large-Scale Industrial Processes
https://www.hindawi.com/journals/jcse/2012/478373/
The application of TE to empirical analytics
has been substantial in areas of biomedicine
and climate science. However, further
developments were needed to help
overcome shortcomings related to
unreliability and a lack of accuracy 26 . Au
Sable’s work on improving the reliability of
TE, combined with other Au Sable
proprietary algorithms, have led to the
development of an algorithmic approach to
causal analytics that can process IoT event
data and provide reliable results. This means
that Causal Analytics can now be applied to
real-world scenarios with non-experimental
data.
Figure 1: Granger causality test
Causal Analytics in IIoT – AI That Knows What Causes What, and When
IIC Journal of Innovation - 11 -
These real-world applications involve
methods that take into account the
complexity of systems (thereby including
analytics of system machine and log data).
The inter-dependencies and dimensionality
of many IIoT system devices mean that
identifying their behavior (causal and
otherwise) can be extremely difficult
depending on the magnitude and nature of
the couplings. One variable may be found to
be a driver of another, but not alone. The
multiple influences that have an impact on a
particular variable must be teased out, such
as the timings, state-dependencies and
multi-dimensionality of other influences that
impact an ‘effect’ of interest, such as a
decrease in pressure or rise in temperature.
These are identified as part of the rCA
process for IIoT.
This approach has led to an area of causal
research from a dynamical systems
perspective. A dynamical system is one in
which a function describes the time
dependence of a point in a geometrical
space.27 28 29 30 A dynamical systems course
at Harvard states that the methods have a
focus on the behavior of systems described
by ordinary differential equations.
Application areas “…are diverse and
multidisciplinary, ranging over areas of
applied science and engineering, including
biology, chemistry, physics, finance, and
27 https://en.wikipedia.org/wiki/Dynamical_system
28 https://en.wikipedia.org/wiki/Dynamical_systems_theory
29 https://mathinsight.org/dynamical_system_idea
30 http://math.huji.ac.il/~mhochman/research-expo.html
31 https://scholar.harvard.edu/siams/am-147-nonlinear-dynamical-systems
industrial applied mathematics."31 This is a
fairly recent set of developments and
especially with respect to incorporating AI
and machine learning where these
algorithms can be applied to IIoT data at
scale.
Automating rCA in Industrial IoT
Applications
Although the rCA approach can be employed
on an ad-hoc basis by an analyst, the real
benefits come from automating the rCA AI
analysis as part of an IoT process. The rCA
function can be executed based on trigger
events such as data changes or exceptions.
The rCA software and algorithms are
embedded in the functions library of the
XMPro IoT Process platform for IIoT
applications.
Causal Analytics in IIoT – AI That Knows What Causes What, and When
- 12 - June 2018
In this example, event data is ingested from
their Honeywell® historian and
contextualized with asset data from their
IBM Maximo® EAM system. Further context
is provided from operational data stores.
This information is passed to the rCA Causal
Analytics AI function that creates the causal
coefficient matrix and other outputs
described later in the article.
This automated, process-based approach
ensures repeatability, consistency and that it
can be done at scale for a large number of
assets in a process stream. The automated
process can process and analyze much larger
volumes of IoT data than human RCA
analysts. In the FPSO example different
analyses are automated at different time
intervals such as daily for high impact
equipment and weekly or monthly for other
areas. This is configurable by the end users
and ad hoc analysis can also be performed.
CUSTOMER EXAMPLE: RCA IN OIL
AND GAS PROCESSING
Background to the Application of rCA in Oil
& Gas
The example demonstrates how rCA can
enable an FPSO to optimize production and
productivity as well as predict and avoid
incidents which threaten health, safety,
environment, community and financial
Figure 2: XMPro IoT Process Stream for rCA
Causal Analytics in IIoT – AI That Knows What Causes What, and When
IIC Journal of Innovation - 13 -
outcomes. The initial field study project was
aimed at three key objectives:
EFFICIENT OPERATIONS AND MAINTENANCE
This project will drive down the costs of
reduced or lost production caused by
unplanned failures. Other operational
efficiency gains will be achieved by reducing
the risks of environmental impact caused by
operational failure and the risks to personnel
safety caused by breaches of operational
standards. Furthermore, the costs of asset
maintenance will be reduced and the
capability of diagnosing asset health in
remote and challenging operating
environments is increased.
SAFETY AND SOCIAL LICENSE TO OPERATE
Equipment failure and/or an unsafe work
environment can potentially result in harm
to humans or the environment, ultimately
increasing operational risk and impacting an
organization’s social license to operate.
This solution will assist in providing a safe
production environment. In addition,
through to the inbuilt predictive analytics,
further eliminate operational risks which
could impact the social license to operate if
undetected and left uninvestigated and
unaddressed.
ENABLING EFFECTIVE COLLABORATION
Traditionally there exists a significant divide
between the operational technology (OT) in
heavy asset sectors like Oil & Gas and the
information technology (IT) arena. Not only
are they typically separated by physical,
geographical and network constraints, they
are also generally isolated philosophically.
The innovative solution and integrated
application suite enables interoperability of
data feeds from sensors and devices, with
the associated referential information from
the asset registry and maintenance
framework. It combines data from both IT
and OT and this new information provides
insights that can be shared collaboratively
between OT, IT and Operations. It makes
new levels of operational excellence,
collaboration and sustained productivity
improvements possible.
The project mirrored an upstream oil and gas
process flow including value-added services
at each stage of the supply chain leveraging
real-time IoT big data, machine learning and
artificial intelligence.
Going beyond the obvious elements that
cause an interruption to production, rCA is
used to find root causes and inter-
dependency which may be overlooked or
not realized with current technology. This
will enable the operations team onboard the
FPSO to keep it in production without
interruption for long periods and, when
down, to be repaired and brought on-stream
faster.
Most importantly, these improvements
reduce the risk of events that impact the
safety of all personnel on the FPSO and
protect the environment on the vessel and in
the geographic vicinity.
Project Background
The FPSO plant had experienced occasional
periods of operational instability. These
were largely unexplained, yet some
significant and costly problems resulted. It
was particularly challenging to identify the
Causal Analytics in IIoT – AI That Knows What Causes What, and When
- 14 - June 2018
actual cause(s) of the problems. Routine
correlational methods of analysis, such as
traditional RCA, had been applied but
provided the operators with only limited
assistance.
The project was conducted in 2 phases. In
the initial phase the FPSO operator wanted
to validate the algorithms through an initial
manual analysis before automating the
process in phase 2.
Au Sable’s rCA system used sensor-driven
and machine data covering a defined period
where these events occurred as the basis for
the analysis. The rCA analysis discovered
cause-effect relationships that were not
previously known by the engineers and that
helped to identify and address the real root
causes of the problem. The results of the rCA
analysis in phase 1 provided new insights
into causal relationships that were
previously not considered in the human
analysis process. This is of significant value
and benefit to the operations and
engineering teams of the Oil & Gas customer
operating the FPSO and it provided the
support to automate the process in phase 2.
The example data shown is that of the phase
1 analysis that provided the insights and
confidence in the output to support the
decision to automate the process for a larger
data set and additional equipment.
rCA on the FPSO
During phase 1 the analysis was done
manually to validate the model and establish
the workflow for the automated steps in
phase 2.
The analysis approach in phase 2 consists of
three main process steps:
Ingest data: Real-time feeds of operational data are collected from intelligent equipment (e.g. submersible pumps, etc.) and from the Honeywell Historian through XMPro’s Listener integration connectors that stream the data to the analysis step;
Perform analysis: The streaming data from the previous step is passed to the rCA algorithms in the XMPro rCA Functions connector where the analysis is performed; and
XMPro Action Agents provide reports and actions on the results of the analysis that identify real causes and not just symptoms of production outages.
Data from sensors at locations across the
plant operations were mapped to locations
on a process flow diagram (PFD) and are
shown as red circles in Figure 3. This provides
a familiar visual reference for the engineers
of the physical process and the data from the
different sources.
Causal Analytics in IIoT – AI That Knows What Causes What, and When
IIC Journal of Innovation - 15 -
The FPSO plant engineers provided one
month’s data at one-minute intervals from
sensors at the above locations. The rCA AI
algorithm processed the data to identify a
limited number of cause and effect
relationships between the equipment or
devices in Figure 3 where there is a high
causation coefficient. This is derived from a
proprietary causal effects matrix. The causal
effects matrix tends to be sparse with a
much small number than in a correlation
matrix, typically about 10-15% for the same
data.
The output from the causal effects matrix
that provided invaluable insight for the
engineers is a graph (Figure 4) that ranks
causal relationship based on causal
coefficient and the confidence level in the
causal relationship. It identifies those
relationships with high causality and high
confidence at a glance and engineers can use
this information to map it back to the
physical process.
It is easy to spot events that have a high
causal coefficient with high confidence
levels and focus on the meaning and impact
of these insights.
Figure 3: Locations of sensors (red circles) on a process diagram of plant operations
Causal Analytics in IIoT – AI That Knows What Causes What, and When
- 16 - June 2018
An enhanced PFD diagram, Figure 5, shows
the top casual relationships between the
equipment/devices overlaid on the process
flow diagram. This approach connects the
analytical model with the physical process
Figure 4: Chart of the top cause and effect relationships
Figure 5: Overlay of causal relationships on a process flow diagram – showing causal coefficient values plus
confidence levels in brackets
Causal Analytics in IIoT – AI That Knows What Causes What, and When
IIC Journal of Innovation - 17 -
model for the engineers who can interpret
the results.
This diagram shows, for example the causal
relationship and confidence level, in
brackets, of the condenser coil pressure and
the flash vessel pressure (nodes I and J) that
correspond to the chart in Figure 4.
The plant engineers were most interested in
examining the main relationships, those with
the strongest measure casual coefficients
and higher levels of confidence.
Figure 6 provides a causal coefficient view in
a traditional graph that removes the
contextual bias that an engineer may have.
This view enabled the engineers to see
causal relationships without the physical
process relationships. It triangulated some
of their findings and observations from the
chart and process flow diagram views.
Findings and Observations from the Phase 1
Analysis
The chart in Figure 4 provided the most
insight to the FPSO operators. The initial
objective to validate the rCA algorithm was
accomplished with physical evidence to
support the outputs of the rCA analysis. The
following three examples describe some of
that validation process.
The high casual coefficient of the condenser
coil pressure and the flash vessel pressure
made sense from an engineering perspective
as it is part of the design. It was expected to
have a high causal relationship and it did.
This proved that the rCA algorithm
performed as expected.
The causal relationship between the
condenser coil pressure and the dewpoint
(middle of the chart) was not obvious prior
Figure 6: The top cause and effect relationships shown as a directed graph
Causal Analytics in IIoT – AI That Knows What Causes What, and When
- 18 - June 2018
to the analysis and it turned out that it had
an impact on operations. It led to further
investigations by the engineering team.
The third example relates to the causal
relationship at the bottom of the chart. It has
a high causal coefficient, but a low
confidence level for ambient temperature to
measure dewpoint. In a physical, isolated
system the 2 should not be related from an
engineering perspective. One of the
engineers found the cause after
investigation. The dewpoint measurement
sensor is exposed to full sun when the vessel
sits in a certain orientation, which affected
the dewpoint measurement and led it to
track the ambient temperature as a measure
of exposure to sunlight. This in turn affected
the efficiency of operations as the rate of lift-
gas being dehydrated was affected.
The three examples provided enough
evidence to proceed with phase 2 to
automate the process for scheduled time
intervals and to include additional data
points. One of the main benefits of the
automated process is that the FPSO
engineers can perform the root cause
analysis without the assistance of a data
scientist to manually perform the analysis.
This is an important requirement for the
FPSO operator to deploy rCA at scale. The
nature of causal analytics requires a
complete re-analysis if any of the causes
found in previous analyses were addressed
or changed. There are often unintended
consequences of making changes that only
show up in a new analysis. These analyses
need to be performed in a consistent
manner which is facilitated by the
automation of the process. Phase 2 is
underway at the time of writing this article
and the benefits of the automation process
will described in a future article by the
authors.
The use case illustrated that an additional
dimension of understanding can be added to
the analysis and troubleshooting of
Industrial IoT problems. The results are
reported in the above formats and able to be
extended to additional desired forms of
output.
CONCLUSION
Most IoT platform vendors use traditional
statistical methods based on correlation and
regression for their analytics functionality.
These are proven to work but also have
limitations that restrict their applicability to
quickly and accurately identify causative
factors that did, or will, cause some effect.
This means the decision making may be sub-
optimal or even inaccurate.
rCA adds a different and complementary
dimension of IoT data analysis and decision
making. rCA is a breakthrough technology
that brings new methods for the application
of cause and effect analytics to real-world
industrial problems. By adding a new and
powerful dimension of analytics, it enables
users to refine their decision-making, reduce
risks, improve safety and reliability, and
reduce costs and equipment downtime.
The culmination of this “Reliable Causal
Analytics” approach is the ability for the
solution to enable operators to more
completely understand:
What caused what (and when)
What will cause what (and when)
Causal Analytics in IIoT – AI That Knows What Causes What, and When
IIC Journal of Innovation - 19 -
What could/should be done to avoid…
Which processes could/should be changed to improve going forward
Continuous improvement of processes
Future work and research is planned to
extend rCA to other use cases and IIoT
applications. Interested parties are invited to
submit use cases for consideration.
Return to IIC Journal of Innovation landing page for more articles and past editions.
The views expressed in the IIC Journal of Innovation are the contributing authors’ views and do
not necessarily represent the views of their respective employers nor those of the Industrial
Internet Consortium.
© 2018 The Industrial Internet Consortium logo is a registered trademark of Object Management
Group®. Other logos, products and company names referenced in this publication are property
of their respective companies.