This document serves to introduce
engineers to the workflow necessary to follow in order to analyze
multivariate process monitoring data via the mvMonitoring
package.
This is the accompanying package to the research published by Kazor et al (2016)
and Odom et al (2018).
The mvMonitoring
package is designed to make simulation of
multi-state multivariate process monitoring statistics easy and
straightforward, as well as streamlining the online process monitoring
component.
mvMonitoring
The mvMonitoring
package can be used to detect outliers
in a correlated multivariate process with non-linear, non-stationary, or
autocorrelated feature behavior. These process outliers are often
indicative of system fault. The common and naive approach to
multivariate process monitoring is to use expert opinion to identify a
few important features to monitor visually, and raise an alarm if these
features travel outside pre-defined normal operating boundaries.
However, this split univariate approach fails to account for the correlated nature of the process features, so some engineers have taken to monitoring the system as a single correlated multivariate process rather than a collection of independent univariate processes. In the literature review of the motivating papers, the authors cite how this approach has benefitted the science of process monitoring as a whole. Unfortunately, this approach has its own shortcomings – namely, monitoring a multivariate process in its original feature space can lead to exorbitant computational costs.
To this end, principal components analysis (PCA) and its many modifications have been employed to reduce the computation necessary to monitor a multivariate process as a whole. This package is an implementation of one such PCA modification.
AD-PCA: adaptive-dynamic PCA, which accounts for non-linearity, non-stationarity, and autocorrelation in the multivariate process. This modification is thoroughly discussed in Kazor et al (2016).
Multi-State Monitoring: a modification of PCA which accounts for multiple process states and models them separately. States can be any mutually exclusive blocking factor, and states do not necessarily follow a strict order.
Multi-State AD-PCA (MSAD-PCA) allows process engineers to account for a few very different process states. Specifically, this modification should be used when features under different process states have different means, correlations, or variances.
msp
FunctionsThese are brief descriptions of the external functions useful for multivariate statistical process monitoring via MSAD- or AD-PCA.
mspProcessData
This function generates three-dimensional multi-state or single-state non-linear, non-stationary, and autocorrelated process observations. We follow the seminal work of Dong and McAvoy (1996) for generation of the foundational stationary and independent features.
So that the simulated features have non-zero correlations, Dong and McAvoy created their three features all as polynomial functions of a single latent variable t(s), where s = 1, …, ω is the observational index of the process.
This t vector will be sinusoidal with period ω = 7 * 24 * 60 (signifying a weekly period in minute-level observations). We then synthesize a t with $$ t^*_s = -\cos\left( \frac{2\pi}{\omega} s \right) + \epsilon_s, $$ and scale t* to $$ t = \frac{(b - a)(t^*_s - \text{min}(t^*_s))}{\text{max}(t^*_s) - \text{min}(t^*_s)} + a. $$ Finally then, the t vector will lie entirely in [a, b].
First mspProcessData
simulates three features, with each
feature operating under k
different states. Let < xk(t), yk(t), zk(t)>
be the notation for State k.
These are the three features under State 1 (normal operating conditions,
or NOC) as three functions of t: where εi ∼ N(0, 0.01).
The mspProcessData
function calls the internal
processNOCdata
function to generate single-state
observations under normal operating conditions (NOC).
The multi-state feature expression is induced by rotation and scaling
of certain sets of observations. To induce a three-state, hourly
switching process (the default), the mspProcessData
function will create a label column that switches from “1” to “2” to “3”
every hour. State “1” will be the features generated under the
single-state assumption, while State “2” and State “3” are generated as
follows. These states will be scaled rotations of the current < x, y, z>
set. The second state is yaw, pitch, and roll rotated by (0, 90, 30)
degrees, and the scales are multiplied by (1, 0.5, 2). The third state
is yaw, pitch, and roll rotated by (90, 0, -30) degrees, and the scales
are multiplied by (0.25, 0.1, 0.75).
𝒮1: X(ts) := ⟨x(ts), y(ts), z(ts)⟩.
𝒮2: X(ts) := ⟨x(ts), y(ts), z(ts)⟩ ⋅ P1Λ1, where $$ \textbf{P}_1 = \begin{bmatrix} 0 & 0.50 & -0.87 \\ 0 & 0.87 & 0.50 \\ 1 & 0 & 0 \end{bmatrix} $$ is the orthogonal rotation matrix for a yaw, pitch and roll degree change of ⟨0∘, 90∘, 30∘⟩, and Λ1 = diag(1, 0.5, 2) is a diagonal scaling matrix.
𝒮3: X(ts) := ⟨x(ts), y(ts), z(ts)⟩ ⋅ P2Λ2, where $$ \textbf{P}_2 = \begin{bmatrix} 0 & 0.87 & -0.50 \\ -1 & 0 & 0 \\ 0 & 0.50 & 0.87 \end{bmatrix} $$ is the orthogonal rotation matrix for a yaw, pitch and roll degree change of ⟨90∘, 0∘, −30∘⟩, and Λ2 = diag(0.25, 0.1, 0.75) is a diagonal scaling matrix.
These rotation matrices P1 and P2 turn the
states in three-dimensional space so that the states are at right angles
to each other in at least one dimension, and the scaling matrices Λ1 and Λ2 inflate or
deflate the process variances along each principal component. The
mspProcessData
function calls the internal function
dataStateSwitch
which splits the observations by state and
applies the state-specific rotation and scaling through the internal
rotateScale3D
function.
Faults can be introduced to single- or multi-state data via the
mspProcessData
function. The default fault start index is
8500, or roughly 84% through the 10,080 observation cycle. These faults
are added through the internal faultSwitch
function.
Fault 1A is a positive shift to all three features before state rotation: X*(ts) = X(ts) + 2, s ≥ 8500.
Fault 1B is a positive shift to the x feature before state rotation: x*(ts) = x(ts) + 2, s ≥ 8500.
Fault 1C is a positive shift to the x and z features in State 3 only and after state rotation: x*(ts) = x(ts) + 2, z*(ts) = z(ts) + 2, s ≥ 8500.
Fault 2A is a positive drift across all the process monitoring features before state rotation: X*(ts) = X(ts) + (s − 8500) × 10−3, s > 8500.
Fault 2B is a positive drift across the y and z process monitoring features before state rotation: y*(ts) = y(ts) + (s − 8500) × 10−3, z*(ts) = z(ts) + (s − 8500) × 10−3, s > 8500.
Fault 2C is a negative drift in the y process monitoring feature in State 2 only and after state rotation: $y^*(t_s) = y(t_s) - 1.5 \times \frac{s - 8500}{10080 - 8500}$, for s > 8500.
Fault 3A is an amplification of the underlying latent variable t for all features. The maximum latent drift of this fault will be 5 + 1: X*(ts) = X(ts*), s > 8500, where $t_s^* = \left[\frac{5(s - 8500)}{\omega - 8500} + 1\right]t_s$.
Fault 3B is a mutation of the underlying latent variable t for the z feature: z*(ts) = z(log ts*), s ≥ 8500. This fault will dampen the underlying latent effect for z if ts > 1 and amplify this effect if ts < 1.
Fault 3C is a polynomial mutation of the error for the y feature in State 2 only and after state rotation: y*(ts) = y(ts) + 2 * e3(s) − 0.25, for s > 8500.
The mspProcessData
function can generate weeks’ worth of
non-linear, non-stationary, autocorrelated, multi-state, multivariate
process data useful to test new process monitoring techniques. Users can
generate observations under NOC to measure false alarm rates, or induce
one of nine pre-built faults to test detection time and consistency with
repeated Monte Carlo sampling. We expect this function will generate
interesting data useful enough to compare new and improved process
monitoring techniques with existing methods.
mspTrain
The mspTrain
function will generate projection matrices
and test statistic thresholds from test data matrices.
xts
Data MatrixAn xts
data matrix is first and foremost a matrix,
not a data frame. For users very familiar with data frame
manipulation (with dplyr
for instance), the slight but
profound differences between manipulating matrices and data frames
become quickly apparent. Because of the class requirements for matrices,
all features must be integer or double objects. The
mspTrain
function cannot train on character
information.
The xts
object class stands for extendable time
series and comes from the package xts
, which is itself
built on the package zoo
. The date and time information
(necessarily as POSIX
objects) are stored as the row
indices of xts
matrices. We recommend the package
lubridate
for manipulating POSIX
objects.
When implementing single-state AD-PCA, this class vector simply needs
to be a numeric column of the same value. However, for MSAD-PCA, the
class vector should be an integer vector of class membership. The
mspTrain
function will split the observations by the class
label, apply single-state AD-PCA to each class, then return the
class-specific projection matrices and thresholds. Because of this
split-apply-combine strategy, users must ensure that one or more classes
are not too “rare” – that is, the class-specific sample size should be
sufficiently large to allow for stable covariance matrix inversion. For
p features (including lags),
covariance inversion requires a class sample size greater than p2/2. Because of this,
pay close attention to model parsimony – don’t block the observations on
a factor unless the factor significantly affects the observations’ mean
vector or covariance matrix.
Because of the non-linear, non-stationary, and autocorrelated nature
of some process monitoring applications, the mspTrain
function allows users to include lags of all feature variables and also
to update the training window over time. Including lags of the features
in the data matrix can significantly reduce the negative effects of
autocorrelation. Re-estimating the projection matrix and test statistic
thresholds at pre-specified time intervals reduces the negative effects
of non-linearity and non-stationarity in the observations.
The idea is to divide a non-linear and non-stationary process along some boundaries (every day, for instance), and then the process becomes locally linear and stationary within the window. As time progresses, the oldest observations are “forgotten” and the newest observations are “learned”, which causes the projection and NOC thresholds to update over time.
After the observations have been split by class, the
mspTrain
function will call the internal function
processMonitor
, which in turn calls the internal function
faultFilter
. This function will calculate a linear
projection matrix of the data by taking the PCA of the training data
matrix. The observations will then be projected linearly into a
reduced-feature subspace which preserves a chosen proportion of the
energy of the training data, where the energy of a matrix is the sum of
eigenvalues of that matrix. The default proportion is 90%. This
projection is calculated by the internal pca
function.
Furthermore, non-parametric threshold values are calculated for the
two process monitoring statistics – Squared Prediction Error
and Hotelling’s T2. These monitoring
statistics are described in the motivating paper. These threshold values
are found in the internal threshold
function, and passed
through the function pipe to be returned by mspTrain
. The
α-level of the nonparametric
threshold is controlled by the user, and its default value is 0.001.
Finally, the mspTrain
function will remove any
observation which would cause an alarm from the training data set. The
alarm free observations will be returned in one xts
matrix,
while the alarmed observations will be returned by another. When
training your process monitor, pay attention to the observations flagged
as alarms. The proportion of observations flagged as faults may be
higher than the α-level
specified, so some tuning may be necessary.
mspMonitor
After training the model with mspTrain
, the projection
matrices and non-parametric monitoring statistic thresholds can be used
to flag incoming observations. This function can check every observation
in a test matrix (useful when analyzing past data), but was instead
designed to test a single incoming observation at a time via a script or
batch file. To this end, the mspMonitor
function projects a
single observation with the class projection matrices returned by
mspTrain
and checks the observation’s SPE and T2 statistics against the
thresholds returned by mspTrain
. The
mspMonitor
will then append the monitoring statistic
values, and indicators if these values exceed normal thresholds, to the
end of the observation row. This new appended observation will be passed
to the mspWarning
function.
mspWarning
If an observation returned by the mspMonitor
function
has positive statistic indicator values for either the SPE or T2 monitoring statistics,
then the mspWarning
function will query the last set r observations for other flags. If
all r observations are
positive for anomalies, then the mspWarning
function will
issue an alarm. This is the parameter “faultsToTriggerAlarm”, and it
defaults to 5. However, this default value depends heavily on the scale
of the data: for continuous observations aggregated and recorded every
five seconds, the number of sequential faults necessary to trigger an
alarm should be very high, perhaps even 50 or more. For observations
aggregated to the 10-minute-scale, only three sequential faults may be
necessary.
In future updates of this package, this function will also have an option to issue an alarm if a critical mass of non-sequential alarms is reached in a set period of observations. This modification may be necessary if the observation level becomes more fine than the 1-minute-level. Additionally, this function will also be equipped to take in a cell phone number and service provider and issue an alarm via SMS through email.
This section provides a fully-commented code walkthrough for the main
msp
functions in the mvMonitoring
package.
First begin by generating multi-state data from a fault scenario.
This code will yield observations under Fault 2A, as described the
Synthetic Fault Induction section. We choose the default options for the
period length (7 days * 24 hours * 60 minutes = 10,080 observations),
the starting index of the fault (8500 out of 10080), and the time stamp
for beginning the data is 16 May of 2016 at 10:00AM (my wedding
anniversary). As we can see from the str()
function, we
have an xts
matrix with 10080 rows and four columns (the
state indicator and the three features).
library(mvMonitoring)
fault1A_xts <- mspProcessData(faults = "A1",
period = 7 * 24 * 60,
faultStartIndex = 8500,
startTime = "2015-05-16 10:00:00 CST")
str(fault1A_xts)
## An xts object on 2015-05-16 10:00:00 / 2015-05-23 09:59:00 containing:
## Data: double [10080, 4]
## Columns: state, x, y, z
## Index: POSIXct,POSIXt [10080] (TZ: "")
Now that these observations are generated and stored in memory, the
mspTrain
function can train the MSAD-PCA model. The last
1620 observations (27 hours’ worth) will be saved for testing. The
mspTrain
function takes in the training data set
partitioned between the observation and the label column. If this
function errors, make sure the label column is not included in the data
matrix – this will cause a singularity in the data. The function will 1.
Train on the first three days’ worth of observations, as set by
trainObs
. 1. Scan the fourth day for anomalies, as set by
updateFreq
. 1. Remove any alarmed observations. 1. “Forget”
the first day’s observations. 1. “Learn” the non-alarmed observations
from the fourth day. 1. Retrain and repeat until the end of the data
matrix
Furthermore, the Dynamic = TRUE
option means that the
mspTrain
function will include the lags specified by the
lagsIncluded
argument. Finally, the number of sequential
anomalous observations necessary to raise an alarm is set at 5 by the
faultsToTriggerAlarm
argument. These last three arguments
are set to their defaults.
train1A_xts <- fault1A_xts[1:8461,]
# This function will run in 13 seconds on the author's machine.
train1A_ls <- mspTrain(data = train1A_xts[,-1],
labelVector = train1A_xts[,1],
trainObs = 3 * 24 * 60,
updateFreq = 1 * 24 * 60,
Dynamic = TRUE,
lagsIncluded = 0:1,
faultsToTriggerAlarm = 5)
The mspTrain
function returns a list of four
objects:
FaultChecks
: An xts
matrix of monitoring
statistics and associated indicators for all observations after the
burn-in of trainObs
. It will have 10080 -
trainObs
number of rows and five columns:
SPE
: the SPE statistic for each observation.SPE_Flag
: an indicator showing if the SPE statistic for
that observation is beyond the calculated threshold; 0 is normal, 1 is
flagged.T2
: the T2 statistic for each
observation.T2_Flag
: an indicator showing if the T2 statistic for that
observation is beyond the calculated threshold; 0 is normal, 1 is
flagged.Alarm
: an indicator showing if the observation is in a
sequence of flagged observations; 0 is normal, 1 is alarmed.Non_Alarmed_Obs
: An xts
matrix will all
observations with an alarm code of 0 from FaultChecks
. Of
note, this matrix contains the data, while the FaultChecks
matrix only contains the monitoring statistics and indicators.Alarms
: An xts
matrix of all the
observations removed from the training data matrix.TrainingSpecs
: a list with length equal to the number
of classes – in this case 3. For each class, this list contains a list
of six objects:
SPE_Threshold
: a named numeric scalar of the 1 - α percentile of the non-parametric
estimate of the SPE statistic density.T2_Threshold
: a named numeric scalar of the 1 - α percentile of the non-parametric
estimate of the T2
statistic density.projectionMatrix
: The p × q matrix of
eigenvectors necessary to project a p-dimensional observation to q dimensions. This is necessary to
reduce the dimension of any test observation, and is used in calculating
the SPE statistic for test observations.LambdaInv
: The inverse of the diagonal q × q matrix of
eigenvalues. This matrix is used to calculate the T2 statistic for test
observations.muTrain
: The mean vector of the training observations.
This is used to center the test observations on the training mean.RootPrecisTrain
: The p × p diagonal matrix of
the inverse square roots of the feature variances. This is used to scale
the test observations into the training scale.The training data summary was given by mspTrain
, so this
information can now be used to monitor incoming observations for system
faults.
First, concatenate the last given observation from the training set
as “row 0” of the test data set. This will enable
mspMonitor
to include lag-1 features. Similarly, one would
include the last k
observations of the training set should the process dictate the need for
any lag-k features. Because
the Fault Start Index was set to 8500, this testing window will show the
change point between observations generated under normal conditions and
those generated under a fault state.
library(xts)
test1A_xts <- fault1A_xts[8460:8520, -1]
lagTest1A_xts <- lag.xts(test1A_xts, 0:1)
lagTest1A_xts <- cbind(fault1A_xts[8461:8520,1],
lagTest1A_xts[-1,])
head(lagTest1A_xts)
## state x y z x.1 y.1
## 2015-05-22 07:00:00 1 0.3393199 -1.335053 0.5732896 0.2987526 0.07828365
## 2015-05-22 07:01:00 1 0.4892595 -1.139139 0.4075442 0.3393199 -1.33505297
## 2015-05-22 07:02:00 1 0.4115161 -1.055267 0.2896119 0.4892595 -1.13913919
## 2015-05-22 07:03:00 1 0.6143472 -1.349866 1.0040817 0.4115161 -1.05526728
## 2015-05-22 07:04:00 1 0.4201376 -1.260279 0.6206731 0.6143472 -1.34986628
## 2015-05-22 07:05:00 1 0.4960031 -1.164435 0.6951802 0.4201376 -1.26027918
## z.1
## 2015-05-22 07:00:00 0.2474009
## 2015-05-22 07:01:00 0.5732896
## 2015-05-22 07:02:00 0.4075442
## 2015-05-22 07:03:00 0.2896119
## 2015-05-22 07:04:00 1.0040817
## 2015-05-22 07:05:00 0.6206731
With the lagged test observations in the working environment, the
mspMonitor
function can be applied. This function
(similarly to mspTrain
) takes in the label information as a
separate argument from the input data. Further, the
mspMonitor
function takes in the TrainingSpecs
object returned in the results list from mspTrain
. Notice
that the first six rows of the monitor matrix are the exact same as the
first six rows of the lagged test matrix, except that the rows of the
monitor matrix have the monitoring statistic and corresponding indicator
columns appended.
monitor1A_xts <- mspMonitor(observations = lagTest1A_xts[,-1],
labelVector = lagTest1A_xts[,1],
trainingSummary = train1A_ls$TrainingSpecs)
head(monitor1A_xts)
## x y z x.1 y.1
## 2015-05-22 07:00:00 0.3393199 -1.335053 0.5732896 0.2987526 0.07828365
## 2015-05-22 07:01:00 0.4892595 -1.139139 0.4075442 0.3393199 -1.33505297
## 2015-05-22 07:02:00 0.4115161 -1.055267 0.2896119 0.4892595 -1.13913919
## 2015-05-22 07:03:00 0.6143472 -1.349866 1.0040817 0.4115161 -1.05526728
## 2015-05-22 07:04:00 0.4201376 -1.260279 0.6206731 0.6143472 -1.34986628
## 2015-05-22 07:05:00 0.4960031 -1.164435 0.6951802 0.4201376 -1.26027918
## z.1 SPE SPE_Flag T2 T2_Flag Alarm
## 2015-05-22 07:00:00 0.2474009 0.9453870 0 25.470972 0 NA
## 2015-05-22 07:01:00 0.5732896 1.8182191 0 7.679800 0 NA
## 2015-05-22 07:02:00 0.4075442 2.6248754 0 8.662934 0 NA
## 2015-05-22 07:03:00 0.2896119 0.5650673 0 7.912892 0 NA
## 2015-05-22 07:04:00 1.0040817 1.6029714 0 6.040021 0 NA
## 2015-05-22 07:05:00 0.6206731 1.7478715 0 7.231092 0 NA
Note that all Alarm
codes in the monitor matrix above
are recorded with NA
values. This is because the
mspMonitor
function does not check the sequential flag
conditions of the monitor matrix. This is the responsibility of the
mspWarning
function. Because the mspWarning
function is designed to test one incoming observation at a time through
a script or batch file, the following example is designed to mimic the
behavior of the mspWarning
function as each new observation
comes online.
alarm1A_xts <- monitor1A_xts
for(i in 1:nrow(alarm1A_xts)){
if(i < (5 + 1)){
alarm1A_xts[1:i,] <- mspWarning(alarm1A_xts[1:i,])
}else{
alarm1A_xts[(i - 5):i,] <- mspWarning(alarm1A_xts[(i - 5):i,])
}
}
The fault was introduced at index 8500, which corresponds to about 40 minutes into the test hour.
The alarm codes are
We have supplied our motivation for this package, and we have discussed implementing a multivariate process monitoring scheme with this package using the example of a decentralized WWT plant in Golden, CO. We believe that this software will provide system engineers with the tools necessary to quickly and accurately detect abnormalities in multivariate, autocorrelated, non-stationary, non-linear, and multi-state water treatment systems. Further, we have given a synthetic example showing how the functions within this package would be implemented and tuned in practice.