Title: | Multi-State Adaptive Dynamic Principal Component Analysis for Multivariate Process Monitoring |
---|---|
Description: | Use multi-state splitting to apply Adaptive-Dynamic PCA (ADPCA) to data generated from a continuous-time multivariate industrial or natural process. Employ PCA-based dimension reduction to extract linear combinations of relevant features, reducing computational burdens. For a description of ADPCA, see <doi:10.1007/s00477-016-1246-2>, the 2016 paper from Kazor et al. The multi-state application of ADPCA is from a manuscript under current revision entitled "Multi-State Multivariate Statistical Process Control" by Odom, Newhart, Cath, and Hering, and is expected to appear in Q1 of 2018. |
Authors: | Melissa Innerst [aut], Gabriel Odom [aut, cre], Ben Barnard [aut], Karen Kazor [aut], Amanda Hering [aut] |
Maintainer: | Gabriel Odom <[email protected]> |
License: | GPL-2 |
Version: | 0.2.4 |
Built: | 2024-11-22 04:37:03 UTC |
Source: | https://github.com/gabrielodom/mvmonitoring |
Split single-state process observations, apply multiple state projections, and combine these observations into a single data frame, arranged by process time or index.
dataStateSwitch( df, angles2 = list(yaw = 0, pitch = 90, roll = 30), scales2 = c(1, 0.5, 2), angles3 = list(yaw = 90, pitch = 0, roll = -30), scales3 = c(0.25, 0.1, 0.75) )
dataStateSwitch( df, angles2 = list(yaw = 0, pitch = 90, roll = 30), scales2 = c(1, 0.5, 2), angles3 = list(yaw = 90, pitch = 0, roll = -30), scales3 = c(0.25, 0.1, 0.75) )
df |
A data frame returned by processNOCdata() or faultSwitch(). |
angles2 |
Change the principal angles for State 2. |
scales2 |
Change the principal scales for State 2. |
angles3 |
Change the principal angles for State 3. |
scales3 |
Change the principal scales for State 3. |
This function splits a process data frame by state, and rotates and scales the observations from states 2 and 3 by the scales and angles specified in the function arguments. After state-specific rotation and scaling, this function combines the observations back together and orders them by process time index. This function takes in data frame returned by processNOCdata() or faultSwitch(). This function calls rotateScale3D() and is called internally by mspProcessData().
A data frame containing the time index, state, and feature values after state-specific rotation and scaling; this data frame also contains the other columns of df that aren't the feature values. This data frame has
a POSIX column of the time stamps for each observation
column of state membership (1, 2, or 3)
the process values for the first feature, corresponding to t + random error
the process values for the second feature, corresponding to t ^ 2 - 3 * t + random error
the process values for the third feature, corresponding to -t ^ 3 + 3 * t ^ 2 + random error
the non-stationary and autocorrelated latent feature
a Gaussian white noise vector
a Gaussian white noise vector
a Gaussian white noise vector
Calls: processNOCdata
, faultSwitch
,
rotateScale3D
. Called by: mspProcessData
nrml <- processNOCdata() dataStateSwitch(nrml)
nrml <- processNOCdata() dataStateSwitch(nrml)
Three-feature, three-state simulated process data including observations under normal operating conditions and observations after a positive shift for each feature in the system.
fault1A_xts
fault1A_xts
An xts data matrix with 10080 rows and four columns, corresponding to one week worth of data recorded at a 1-minute interval. The columns under normal conditions are defined in the help file for normal_switch_xts. The fault is a system shock to each of the three features by 2. The fault starts at row 8500, and the four columns under the fault state are defined here:
the state indicator for the multivariate system, with three levels
x(t) = t + 2 + error
y(t) = t ^ 2 - 3t + 2 + error
z(t) = - t ^ 3 + 3t ^ 2 + 2 + error
where t is a 10080-entry vector of autocorrelated and non-stationary hidden process realizations. The states alternate each hour and are defined as follows:
As presented
Rotated by (yaw = 0, pitch = 90, roll = 30) and scaled by (1 * x, 0.5 * y , 2 * z).
Rotated by (yaw = 90, pitch = 0, roll = -30) and scaled by (0.25 * x, 0.1 * y , 0.75 * z).
See the vignette for more details.
Simulated in R.
Three-feature, three-state simulated process data including observations under normal operating conditions and observations after a positive drift in values for each feature in the system.
fault2A_xts
fault2A_xts
An xts data matrix with 10080 rows and four columns, corresponding to one week worth of data recorded at a 1-minute interval. The columns under normal conditions are defined in the help file for normal_switch_xts. The fault is a drift on each feature by s / 10 ^ 3, where s is the observation index. The fault starts at row 8500, and the four columns under the fault state are defined here:
the state indicator for the multivariate system, with three levels
x(t) = t + drift + error
y(t) = t ^ 2 - 3t + drift + error
z(t) = - t ^ 3 + 3t ^ 2 + drift + error
where t is a 10080-entry vector of autocorrelated and non-stationary hidden process realizations. The states alternate each hour and are defined as follows:
As presented
Rotated by (yaw = 0, pitch = 90, roll = 30) and scaled by (1 * x, 0.5 * y , 2 * z).
Rotated by (yaw = 90, pitch = 0, roll = -30) and scaled by (0.25 * x, 0.1 * y , 0.75 * z).
See the vignette for more details.
Simulated in R.
Three-feature, three-state simulated process data including observations under normal operating conditions and observations after an amplification of the underlying process for each feature in the system.
fault3A_xts
fault3A_xts
An xts data matrix with 10080 rows and four columns, corresponding to one week worth of data recorded at a 1-minute interval. The columns under normal conditions are defined in the help file for normal_switch_xts. The fault is a signal amplification in the underlying determining t vector. The fault starts at row 8500, and the four columns under the fault state are defined here:
the state indicator for the multivariate system, with three levels
x(t_*) = t_* + error
y(t_*) = (t_*) ^ 2 - 3t + error
z(t_*) = - (t_*) ^ 3 + 3(t_*) ^ 2 + error
where t_* = 3 * t * (10080 - s) / (2 * 10080), where s is the observation index, and t is a 10080-entry vector of autocorrelated and non-stationary hidden process realizations. The states alternate each hour and are defined as follows:
As presented
Rotated by (yaw = 0, pitch = 90, roll = 30) and scaled by (1 * x, 0.5 * y , 2 * z).
Rotated by (yaw = 90, pitch = 0, roll = -30) and scaled by (0.25 * x, 0.1 * y , 0.75 * z).
See the vignette for more details.
Simulated in R.
Detect if a single multivariate observation is beyond normal operating conditions.
faultDetect(threshold_object, observation, ...)
faultDetect(threshold_object, observation, ...)
threshold_object |
An object of classes "threshold" and "pca" returned by the internal threshold() function. |
observation |
A single row of an xts data matrix (a 1 x p matrix) to compare against the thresholds |
... |
Lazy dots for additional internal arguments |
This function takes in a threshold object returned by the threshold() function and a single observation as a matrix or xts row. Internally, the function multiplies the observation by the projection matrix returned within the threshold object, calculates the SPE and T2 process monitoring statistics for that observation, and compares these statistics against their corresponding threshold values to determine if the observation lies outside the expected boundaries. The function then returns a row vector of the SPE test statistic, a logical indicator marking if this statistic is beyond the threshold, the Hotelling's T2 statistic, and an indicator if this statistic is beyond the threshold. Observations with monitoring statistics beyond the calculated threshold are marked with a 1, while observations within normal operating conditions are marked with a 0. These threshold values are passed from the threshold() function through this function via a returned threshold object. This object will be used in higher function calls.
This internal function is called by faultFilter().
A named 1 x 4 matrix with the following entries for the single row observation passed to this function:
the SPE statistic value
the SPE fault indicator, where 1 represents a flag and 0 marks that the observation is within the normal operating conditions
the T2 statistic value
the T2 fault indicator, defined the same as SPE_Flag
Called by faultFilter
and mspMonitor
.
nrml <- mspProcessData(faults = "NOC") scaledData <- scale(nrml[,-1]) pca_obj <- pca(scaledData) thresh_obj <- threshold(pca_object = pca_obj) faultDetect(threshold_object = thresh_obj, observation = scaledData[1,])
nrml <- mspProcessData(faults = "NOC") scaledData <- scale(nrml[,-1]) pca_obj <- pca(scaledData) thresh_obj <- threshold(pca_object = pca_obj) faultDetect(threshold_object = thresh_obj, observation = scaledData[1,])
Flag and filter out observations beyond normal operating conditions, then return the observations within normal operating conditions.
faultFilter(trainData, testData, updateFreq, faultsToTriggerAlarm = 5, ...)
faultFilter(trainData, testData, updateFreq, faultsToTriggerAlarm = 5, ...)
trainData |
An xts data matrix of initial training observations |
testData |
The data not included in the training data set |
updateFreq |
The number of observations from the test data matrix that must be returned to update the training data matrix and move it forward. |
faultsToTriggerAlarm |
Specifies how many sequential faults will cause an alarm to trigger. Defaults to 5. |
... |
Lazy dots for additional internal arguments |
This function is essentially a wrapper function to call and organize the output from these other internal functions: faultDetect(), threshold(), and pca(). It is applied over a rolling window, with observation width equal to updateFreq, of the larger full data matrix via the processMonitor() function, wherein the testing and training data sets move forward in time across the entire data matrix.
This internal function is called by processMonitor().
A list of class "fault_ls" with the following:
An xts flagging matrix with the same number of rows as "testData". This flag matrix has the following five columns:
The SPE statistic value for each observation in "testData". This statistic is defined as
where is the
observation vector,
is the reduced-feature projection of the
observation
, and
is the
projection matrix such that
.
A vector of SPE indicators recording 0 if the test statistic is less than or equal to the critical value passed through from the threshold object.
The T2 statistic value for each observation in "testData". This statistic is defined as
where is the reduced-
feature projection of the observation
, and
is the diagonal matrix of eigenvalues.
A vector of T2 fault indicators, defined like SPE_Flag.
A column indicating if there have been five flags in a row for either the SPE or T2 monitoring statistics or both. Alarm states are as follows: 0 = no alarm, 1 = Hotelling's T2 alarm, 2 = Squared Prediction Error alarm, and 3 = both alarms.
An xts matrix of the first updateFreq number of rows of the training data which were not alarmed.
The threshold object returned by the internal threshold() function. See the threshold() function's help file for more details.
Calls: pca
, threshold
,
faultDetect
. Called by: processMonitor
.
nrml <- mspProcessData(faults = "NOC") # Select the data under state 1 data <- nrml[nrml[,1] == 1] faultFilter(trainData = data[1:672, -1], testData = data[673:3360, -1], updateFreq = 336)
nrml <- mspProcessData(faults = "NOC") # Select the data under state 1 data <- nrml[nrml[,1] == 1] faultFilter(trainData = data[1:672, -1], testData = data[673:3360, -1], updateFreq = 336)
Infect the input data frame with a specific fault, then return the infected data frame.
faultSwitch( df, fault, period = 7 * 24 * 60, faultStartIndex = round(0.8433 * period), shift = 2, postStateSplit = FALSE )
faultSwitch( df, fault, period = 7 * 24 * 60, faultStartIndex = round(0.8433 * period), shift = 2, postStateSplit = FALSE )
df |
A data frame returned by the processNOCdata() function. |
fault |
A character string. Options are "NOC", "A1", "B1", "C1", "A2", "B2", "C2", "A3", "B3", or "C3". See "details" of mspProcessData() for more information. |
period |
The observation cycle length. Defaults to one week's worth of minute-level observations (10,080 observations). |
faultStartIndex |
An integer specifying the index at which the faults will start. Defaults to roughly 85 percent through the cycle. |
shift |
The fault parameter for faults "A1" and "B1" corresponding to the positive shock value added to features. Defaults to 2. See "details" of mspProcessData() for more information. |
postStateSplit |
Should we induce faults before or after state-splitting? Defaults to FALSE. Make this argument TRUE for faults 1C, 2C, 3C. |
The faults return data frames as follows:
A data frame with 10080 rows and five columns, corresponding by default to one week worth of data recorded at a 1-minute interval (as defined by the "period" argument of this function and the "increment" argument of the processNOCdata() function). The fault is a system shift to each of the three features by 2 (the "shift" argument). The fault starts at row 8500 (specified by the argument "faultStartIndex"), and the five columns under the fault state are defined here:
a POSIXct column
the state indicator for the multivariate system, with three levels when the argument "multiState" is TRUE and one level otherwise
x(t) = t + shift + error
y(t) = t ^ 2 - 3t + shift + error
z(t) = -t ^ 3 + 3t ^ 2 + shift + error
where t is a 10080-entry vector of autocorrelated and non-stationary hidden process realizations generated within the processNOCdata() function.
A matrix as defined in A1, but with x, y, and z feature columns defined as follows:
x(t) = t + shift + error
y(t) = t ^ 2 - 3t + error
z(t) = -t ^ 3 + 3t ^ 2 + error
A matrix as defined in A1, but with x, y, and z feature columns defined as follows:
x(t) = t + shift / 4 + error
y(t) = t ^ 2 - 3t + error
z(t) = -t ^ 3 + 3t ^ 2 + shift / 4 + error
This shift is applied only in State 3.
The fault is a drift on each feature by (s - faultStartIndex / 10 ^ 3, where s is the observation index. The fault starts at "faultStartIndex", and the x, y, and z feature columns are defined as follows:
x(t) = t + drift + error
y(t) = t ^ 2 - 3t + drift + error
z(t) = -t ^ 3 + 3t ^ 2 + drift + error
The fault is a drift a drift on the "y" and "z" feature by (s - faultStartIndex / 10 ^ 3, where s is the observation index. The fault starts at "faultStartIndex", and the x, y, and z feature columns are defined as follows:
x(t) = t + error
y(t) = t ^ 2 - 3t + drift + error
z(t) = -t ^ 3 + 3t ^ 2 + drift + error
The fault is a negative drift on the "y" feature by 1.5 * (s - faultStartIndex) / (period - faultStartIndex). Thus,
x(t) = t + error
y(t) = t ^ 2 - 3t - drift + error
z(t) = -t ^ 3 + 3t ^ 2 + error
This drift is applied only in State 2.
The fault is a signal amplification in the determining latent t vector. The fault starts at "faultStartIndex", and the x, y, and z features under the fault state are defined here:
x(t_*) = t_* + error
y(t_*) = (t_*) ^ 2 - 3t_* + error
z(t_*) = -(t_*) ^ 3 + 3(t_*) ^ 2 + error
where t_* = 5 x t x (period - s) / (period - faultStartIndex) and s is the observation index.
The fault is a signal amplification in the determining latent t vector for the "z" feature only. The fault starts at "faultStartIndex", and the x, y, and z features under the fault state are defined here:
x(t) = t + error
y(t) = (t) ^ 2 - 3t + error
z(t_*) = -(t_*) ^ 3 + 3(t_*) ^ 2 + error
where t_* = 3 x t x (period - s) / (2 x period) and s is the observation index.
This fault is a change in the error structure of feature "y". We let errorNew = 2 * error - 0.25, so that
x(t) = t + error
y(t) = t ^ 2 - 3t + errorNew
z(t) = -t ^ 3 + 3t ^ 2 + error
This new error structure is applied only in State 2.
A data frame with the same structure as df, but with faults induced across all observations. The mspProcessData() function then subsets the observations necessary to corrupt the normal data frame, and binds them together by row. This function is called by mspProcessData(). See ?mspProcessData for more details.
Called by: mspProcessData
.
nrml <- processNOCdata() faultSwitch(nrml, fault = "NOC")
nrml <- processNOCdata() faultSwitch(nrml, fault = "NOC")
This function plots the contribution value for each variable of a newly monitored observation and compares them to the contribution values of the training data.
mspContributionPlot( trainData, trainLabel, newData, newLabel, var.amnt, trainObs )
mspContributionPlot( trainData, trainLabel, newData, newLabel, var.amnt, trainObs )
trainData |
an xts data matrix containing the training observations |
trainLabel |
Class labels for the training data as a logical (two states only) or finite numeric (two or more states) vector or matrix column (not from a data frame) with length equal to the number of rows in “data." For data with only one state, this will be a vector of 1s. |
newData |
an xts data matrix containing the new observation |
newLabel |
the class label for the new observation |
var.amnt |
the energy proportion to preserve in the projection, which dictates the number of principal components to keep |
trainObs |
the number of observations upon which to train the algorithm. This will be split based on class information by a priori class membership proportions. |
A contribution plot and a list with the following items:
A list vectors containing the contribution values corresponding to each observation in the set of training observations.
The vector of contribution values associated with the new observation
## Not run: # Create some data dataA1 <- mspProcessData(faults = "B1") traindataA1 <- dataA1[1:8567,] # Train on the data that should be in control trainResults <- mspTrain(traindataA1[,-1], traindataA1[,1], trainObs = 4320) # Lag an out of control observation testdataA1 <- dataA1[8567:8568,-1] testdataA1 <- lag.xts(testdataA1,0:1) testdataA1 <- testdataA1[-1,] testdataA1 <- cbind(dataA1[8568,1],testdataA1) tD <- traindataA1[,-1] tL <- traindataA1[,1] nD <- testdataA1[,-1] nL <- testdataA1[,1] tO <- 4320 vA <- 0.95 mspContributionPlot(tD, tL, nD, nL, vA, tO) ## End(Not run)
## Not run: # Create some data dataA1 <- mspProcessData(faults = "B1") traindataA1 <- dataA1[1:8567,] # Train on the data that should be in control trainResults <- mspTrain(traindataA1[,-1], traindataA1[,1], trainObs = 4320) # Lag an out of control observation testdataA1 <- dataA1[8567:8568,-1] testdataA1 <- lag.xts(testdataA1,0:1) testdataA1 <- testdataA1[-1,] testdataA1 <- cbind(dataA1[8568,1],testdataA1) tD <- traindataA1[,-1] tL <- traindataA1[,1] nD <- testdataA1[,-1] nL <- testdataA1[,1] tO <- 4320 vA <- 0.95 mspContributionPlot(tD, tL, nD, nL, vA, tO) ## End(Not run)
Monitor and flag (if necessary) incoming multivariate process observations.
mspMonitor(observations, labelVector, trainingSummary, ...)
mspMonitor(observations, labelVector, trainingSummary, ...)
observations |
an n x p xts matrix. For real-time monitoring via a script within a batch file, n = 1, so this must be a 1 x p matrix. If lags were included at the training step, then these observations will also have lagged features. |
labelVector |
an n x 1 integer vector of class memberships |
trainingSummary |
the TrainingSpecs list returned by the mspTrain() function. This list contains—for each class—the SPE and T2 thresholds, as well the projection matrix. |
... |
Lazy dots for additional internal arguments |
This function is designed to be run at specific time intervals (e.g.every 10 seconds, 30 seconds, 1 minute, 5 minutes, 10 minutes) through a scheduled operating script which calls this function and mspWarning(). We expect this script to be set up in Windows "Task Scheduler" or Macintosh OX "launchd" application suites. This function takes in the specific observations to monitor and their class memberships (if any) and returns an xts matrix of these observation columns concatenated with their monitoring statistic values, flag statuses, and an empty alarm column. Users should then append these rows onto a previously existing matrix of daily observations. The mspWarning() function will then take in the daily observation xts matrix with updated rows returned by this function and check the monitoring statistic flag indicators to see if an alarm status has been reached. For further details, see the mspWarning() function.
This function calls the faultDetect() function, and requires the training information returned by the mspTrain function. This function will return the xts matrix necessary for the mspWarning() function.
An n x (p + 5) xts matrix, where the last five columns are:
the SPE statistic value for each observation in "observations"
a vector of SPE indicators recording 0 if the test statistic is less than or equal to the critical value passed through from the threshold object
the T2 statistic value for each observation in "observations"
a vector of T2 fault indicators, defined like SPE_Flag
a column indicating if there have been five flags in a row for either the SPE or T2 monitoring statistics or both. Alarm states are as follows: 0 = no alarm, 1 = Hotelling's T2 alarm, 2 = Squared Prediction Error alarm, and 3 = both alarms.
Calls: faultDetect
. Pipe flow: mspTrain
into mspMonitor
into mspWarning
.
## Not run: # cut down on R CMD check time nrml <- mspProcessData(faults = "NOC") n <- nrow(nrml) # Calculate the training summary, but save five observations for monitoring. trainResults_ls <- mspTrain(data = nrml[1:(n - 5), -1], labelVector = nrml[1:(n - 5), 1], trainObs = 4320) # While training, we included 1 lag (the default), so we will also lag the # observations we will test. testObs <- nrml[(n - 6):n, -1] testObs <- xts:::lag.xts(testObs, 0:1) testObs <- testObs[-1,] testObs <- cbind(nrml[(n - 5):n, 1], testObs) mspMonitor(observations = testObs[, -1], labelVector = testObs[, 1], trainingSummary = trainResults_ls$TrainingSpecs) ## End(Not run)
## Not run: # cut down on R CMD check time nrml <- mspProcessData(faults = "NOC") n <- nrow(nrml) # Calculate the training summary, but save five observations for monitoring. trainResults_ls <- mspTrain(data = nrml[1:(n - 5), -1], labelVector = nrml[1:(n - 5), 1], trainObs = 4320) # While training, we included 1 lag (the default), so we will also lag the # observations we will test. testObs <- nrml[(n - 6):n, -1] testObs <- xts:::lag.xts(testObs, 0:1) testObs <- testObs[-1,] testObs <- cbind(nrml[(n - 5):n, 1], testObs) mspMonitor(observations = testObs[, -1], labelVector = testObs[, 1], trainingSummary = trainResults_ls$TrainingSpecs) ## End(Not run)
Generate single- or multi-state observations under normal operating conditions or under fault conditions.
mspProcessData( faults, period = 7 * 24 * 60, faultStartIndex = round(0.8433 * period), startTime = "2015-05-16 10:00:00 CST", multiState = TRUE, angles2 = list(yaw = 0, pitch = 90, roll = 30), scales2 = c(1, 0.5, 2), angles3 = list(yaw = 90, pitch = 0, roll = -30), scales3 = c(0.25, 0.1, 0.75), adpcaTest = FALSE, msadpcaTest = FALSE, ... )
mspProcessData( faults, period = 7 * 24 * 60, faultStartIndex = round(0.8433 * period), startTime = "2015-05-16 10:00:00 CST", multiState = TRUE, angles2 = list(yaw = 0, pitch = 90, roll = 30), scales2 = c(1, 0.5, 2), angles3 = list(yaw = 90, pitch = 0, roll = -30), scales3 = c(0.25, 0.1, 0.75), adpcaTest = FALSE, msadpcaTest = FALSE, ... )
faults |
A character vector of faults chosen. Options are "NOC", "A1", "B1", "C1", "A2", "B2", "C2", "A3", "B3", "C3", or "All". See details for more information. |
period |
The observation cycle length. Defaults to one week's worth of minute-level observations (10,080 observations). |
faultStartIndex |
An integer specifying the index at which the faults will start. Defaults to roughly 85 percent through the cycle. |
startTime |
a POSIXct object specifying the day and time for the starting observation. |
multiState |
Should the observations be generated from a multi-state process? Defaults to TRUE. |
angles2 |
Change the principal angles for State 2. Defaults to yaw = 0, pitch = 90, and roll = 30. |
scales2 |
Change the principal scales for State 2. Defaults to 1, 0.5, and 2. |
angles3 |
Change the principal angles for State 3. Defaults to yaw = 90, pitch = 0, and roll = -30. |
scales3 |
Change the principal scales for State 3. Defaults to 0.25, 0.1, and 0.75. |
adpcaTest |
If "multiState" is TRUE, incorrectly label all the states the same. This should only be used to test AD-PCA performance under a true multi-state model. Defaults to FALSE. |
msadpcaTest |
If "multiState" is FALSE, incorrectly label all the states at random. This should only be used to test MSAD-PCA performance under a true single-state model. Defaults to FALSE. |
... |
Lazy dots for internal arguments |
For details on how the faults are induced, see the "details" of the faultSwitch() function. This function also includes AD-PCA versus MSAD-PCA treatment arm testing. There are four possibilities to test:
The true process has one state, and we correctly assume the true process has one state. In this case, AD-PCA and MSAD-PCA are exactly the same. Draw observations from this state by setting the "multiState" argument to FALSE. The "state" label will correctly mark each observation as from the same state.
The true process has one state, but we incorrectly assume the true process has multiple states. In this case, AD-PCA should outperform MSAD-PCA in false alarm rates and waiting time to the first alarm. Draw observations from this state by setting the "multiState" argument to FALSE and the "msadpcaTest" argument to TRUE. The "state" label will be contain randomly generated state values (1, 2, and 3 are all equally likely) for each observation.
The true process has multiple states, but we incorrectly assume the true process has one single states. In this case, MSAD-PCA should outperform AD-PCA in false alarm rates and waiting time to the first alarm. Draw observations from this state by setting the "multiState" argument to TRUE and the "adpcaTest" argument to TRUE. The "state" label will be identical for each observation.
The true process has multiple states, and we correctly assume the true process has multiple states. In this case, MSAD-PCA should outperform AD-PCA in false alarm rates and waiting time to the first alarm. Draw observations from this state by setting the "multiState" argument to TRUE. The "state" label will correctly mark each observation as from the same state.
A list of data frames named with the names of the given faults with the following information:
A POSIXct column of times starting at the user- defined 'startTime' argument, length given by the 'period' argument, and spacing given by the 'increment' argument. For example, if the starting value is "2016-01-10", period is 10080, and the incrementation is in minutes, then this sequence will be one week's worth of observations recorded every minute from midnight on the tenth of January.
An integer column of all 1's (when the 'multiState' argument is FALSE), or a column of the state values (1, 2 or 3).
If either adpcaTest or msadpcaTest are TRUE, this column will contain incorrect state information used for testing the different treatment arms against their respective controls.
A double column of generated values for the first feature.
A double column of generated values for the second feature.
A double column of generated values for the third feature.
If the user only specifies one fault, then this function will return the single xts matrix, instead of a list of one matrix. For details on how these features are defined, see the "details" of the processNOCdata() function.
Calls: processNOCdata
, faultSwitch
,
dataStateSwitch
. Simulation pipe flow: mspProcessData
into mspTrain
into mspMonitor
into
mspWarning
.
## Not run: # cut down on R CMD check time mspProcessData(faults = "All") ## End(Not run)
## Not run: # cut down on R CMD check time mspProcessData(faults = "All") ## End(Not run)
Plots a variation of the squared prediction error (SPE) statistic to visualize the contribution of each variable to a fault.
mspSPEPlot( trainData, trainLabel, trainSPE, newData, newLabel, newSPE, trainObs, var.amnt )
mspSPEPlot( trainData, trainLabel, trainSPE, newData, newLabel, newSPE, trainObs, var.amnt )
trainData |
an xts data matrix containing the training observations |
trainLabel |
Class labels for the training data as a logical (two states only) or finite numeric (two or more states) vector or matrix column (not from a data frame) with length equal to the number of rows in “data." For data with only one state, this will be a vector of 1s. |
trainSPE |
the SPE values corresponding to the newLabel state calculated by mspTrain using the full training data with all variables included |
newData |
an xts data matrix containing the new observation |
newLabel |
the class label for the new observation |
newSPE |
the SPE value returned by mspMonitor using the full new observation with all variables included |
trainObs |
the number of observations upon which to train the algorithm. This will be split based on class information by a priori class membership proportions. |
var.amnt |
the energy proportion to preserve in the projection, which dictates the number of principal components to keep |
## Not run: # Create some data dataA1 <- mspProcessData(faults = "B1") traindataA1 <- dataA1[1:8567,] # Train on the data that should be in control trainResults <- mspTrain(traindataA1[,-1], traindataA1[,1], trainObs = 4320) # Lag an out of control observation testdataA1 <- dataA1[8567:8568,-1] testdataA1 <- lag.xts(testdataA1,0:1) testdataA1 <- testdataA1[-1,] testdataA1 <- cbind(dataA1[8568,1],testdataA1) # Monitor this observation monitorResults <- mspMonitor(observations = testdataA1[,-1], labelVector = testdataA1[,1], trainingSummary = trainResults$TrainingSpecs) tD <- traindataA1[,-1] tL <- traindataA1[,1] nD <- testdataA1[,-1] nL <- testdataA1[,1] tO <- trainObs vA <- 0.95 nSPE <- monitorResults$SPE tSPE <- trainResults$TrainingSpecs[[nL]]$SPE mspSPEPlot(tD,tL,tSPE,nD,nL,nSPE,tO,vA) ## End(Not run)
## Not run: # Create some data dataA1 <- mspProcessData(faults = "B1") traindataA1 <- dataA1[1:8567,] # Train on the data that should be in control trainResults <- mspTrain(traindataA1[,-1], traindataA1[,1], trainObs = 4320) # Lag an out of control observation testdataA1 <- dataA1[8567:8568,-1] testdataA1 <- lag.xts(testdataA1,0:1) testdataA1 <- testdataA1[-1,] testdataA1 <- cbind(dataA1[8568,1],testdataA1) # Monitor this observation monitorResults <- mspMonitor(observations = testdataA1[,-1], labelVector = testdataA1[,1], trainingSummary = trainResults$TrainingSpecs) tD <- traindataA1[,-1] tL <- traindataA1[,1] nD <- testdataA1[,-1] nL <- testdataA1[,1] tO <- trainObs vA <- 0.95 nSPE <- monitorResults$SPE tSPE <- trainResults$TrainingSpecs[[nL]]$SPE mspSPEPlot(tD,tL,tSPE,nD,nL,nSPE,tO,vA) ## End(Not run)
This function separates the data into k subsets, one for each of the k states, containing the subset of the original variables that are of interest for a given state.
mspSubset( data, labelVector = rep(1, nrow(data)), subsetMatrix = matrix(TRUE, nrow = length(unique(labelVector)), ncol = ncol(data)) )
mspSubset( data, labelVector = rep(1, nrow(data)), subsetMatrix = matrix(TRUE, nrow = length(unique(labelVector)), ncol = ncol(data)) )
data |
An xts data matrix |
labelVector |
Class labels as a logical (two states only) or finite numeric (two or more states) vector or matrix column (not from a data frame) with length equal to the number of rows in "data." For data with only one state, this will be a vector of 1s. |
subsetMatrix |
A matrix of logicals with number of rows equal to the number of states and number of columns equal to the number of columns in data. The i,j entry in the matrix indicates whether or not to monitor the jth variable in the ith state. |
This function is designed to be used in conjunction with mspTrain
and to allow the user to monitor a different subset of the variables during each state.
A list with the following components:
Class1Data -- |
an xts data matrix containing the subset of the state 1 data. |
Class2Data -- |
an xts data matrix containing the subset of the state 2 data. |
Class3Data -- |
an xts data matrix containing the subset of the state 3 data. |
nrml <- mspProcessData(faults = "NOC") sub1 <- c(TRUE,TRUE,FALSE) sub2 <- c(TRUE,FALSE,TRUE) sub3 <- c(TRUE,FALSE,FALSE) submatrix <- t(matrix(c(sub1,sub2,sub3),nrow=3,ncol=3)) subsets <- mspSubset(data = nrml[,-1], labelVector = nrml[,1], subsetMatrix = submatrix)
nrml <- mspProcessData(faults = "NOC") sub1 <- c(TRUE,TRUE,FALSE) sub2 <- c(TRUE,FALSE,TRUE) sub3 <- c(TRUE,FALSE,FALSE) submatrix <- t(matrix(c(sub1,sub2,sub3),nrow=3,ncol=3)) subsets <- mspSubset(data = nrml[,-1], labelVector = nrml[,1], subsetMatrix = submatrix)
Plots a variation of the Hotelling's T-squared statistic to visualize the contribution of each variable to a fault.
mspT2Plot( trainData, trainLabel, trainT2, newData, newLabel, newT2, trainObs, var.amnt )
mspT2Plot( trainData, trainLabel, trainT2, newData, newLabel, newT2, trainObs, var.amnt )
trainData |
an xts data matrix containing the training observations |
trainLabel |
Class labels for the training data as a logical (two states only) or finite numeric (two or more states) vector or matrix column (not from a data frame) with length equal to the number of rows in “data." For data with only one state, this will be a vector of 1s. |
trainT2 |
the Hotelling's T-squared values corresponding to the newLabel state calculated by mspTrain using the full training data with all variables included |
newData |
an xts data matrix containing the new observation |
newLabel |
the class label for the new observation |
newT2 |
the Hotelling's T-squared value returned by mspMonitor using the full new observation with all variables included |
trainObs |
the number of observations upon which to train the algorithm. This will be split based on class information by a priori class membership proportions. |
var.amnt |
the energy proportion to preserve in the projection, which dictates the number of principal components to keep |
## Not run: # Create some data dataA1 <- mspProcessData(faults = "B1") traindataA1 <- dataA1[1:8567,] # Train on the data that should be in control trainResults <- mspTrain(traindataA1[,-1], traindataA1[,1], trainObs = 4320) # Lag an out of control observation testdataA1 <- dataA1[8567:8568,-1] testdataA1 <- lag.xts(testdataA1,0:1) testdataA1 <- testdataA1[-1,] testdataA1 <- cbind(dataA1[8568,1],testdataA1) # Monitor this observation monitorResults <- mspMonitor(observations = testdataA1[,-1], labelVector = testdataA1[,1], trainingSummary = trainResults$TrainingSpecs) tD <- traindataA1[,-1] tL <- traindataA1[,1] nD <- testdataA1[,-1] nL <- testdataA1[,1] tO <- 4320 vA <- 0.95 nT2 <- monitorResults$T2 tT2 <- trainResults$TrainingSpecs[[nL]]$T2 mspT2Plot(tD,tL,tT2,nD,nL,nT2,tO,vA) ## End(Not run)
## Not run: # Create some data dataA1 <- mspProcessData(faults = "B1") traindataA1 <- dataA1[1:8567,] # Train on the data that should be in control trainResults <- mspTrain(traindataA1[,-1], traindataA1[,1], trainObs = 4320) # Lag an out of control observation testdataA1 <- dataA1[8567:8568,-1] testdataA1 <- lag.xts(testdataA1,0:1) testdataA1 <- testdataA1[-1,] testdataA1 <- cbind(dataA1[8568,1],testdataA1) # Monitor this observation monitorResults <- mspMonitor(observations = testdataA1[,-1], labelVector = testdataA1[,1], trainingSummary = trainResults$TrainingSpecs) tD <- traindataA1[,-1] tL <- traindataA1[,1] nD <- testdataA1[,-1] nL <- testdataA1[,1] tO <- 4320 vA <- 0.95 nT2 <- monitorResults$T2 tT2 <- trainResults$TrainingSpecs[[nL]]$T2 mspT2Plot(tD,tL,tT2,nD,nL,nT2,tO,vA) ## End(Not run)
This function performs Multi-State Adaptive-Dynamic PCA on a data set with time-stamped observations.
mspTrain( data, labelVector, trainObs, updateFreq = ceiling(0.5 * trainObs), Dynamic = TRUE, lagsIncluded = c(0, 1), faultsToTriggerAlarm = 5, ... )
mspTrain( data, labelVector, trainObs, updateFreq = ceiling(0.5 * trainObs), Dynamic = TRUE, lagsIncluded = c(0, 1), faultsToTriggerAlarm = 5, ... )
data |
An xts data matrix |
labelVector |
Class labels as a logical (two states only) or finite numeric (two or more states) vector or matrix column (not from data frame) with length equal to the number of rows in "data". For data with only one state, this will be a vector of 1s. |
trainObs |
The number of observations upon which to train the algorithm. This will be split based on class information by a priori class membership proportion. |
updateFreq |
The algorithm update frequency. Defaults to half as many observations as the training frequency. |
Dynamic |
Specify if the PCA algorithm should include lagged variables. Defaults to TRUE. |
lagsIncluded |
A vector of lags to include. If Dynamic = TRUE, specify which lags to include. Defaults to c(0, 1), signifying that the Dynamic process observations will include current observations and observations from one time step previous. See "Details" for more information. |
faultsToTriggerAlarm |
The number of sequential faults needed to trigger an alarm. Defaults to 5. |
... |
Lazy dots for additional internal arguments |
This function is designed to identify and sort out sequences of observations which fall outside normal operating conditions. We assume that the process data are time-dependent in both seasonal and non-stationary effects (which necessitate the Adaptive and Dynamic components, respectively). We further assume that this data is drawn from a multivariate process under multiple mutually exclusive states, implying that the linear dimension reduction projection matrices may be different for each state. Therefore, in summary, this function lags the features to account for correlation between sequential observations, splits the data by classes, and re-estimates projection matrices on a rolling window to account for seasonality. Further, this function uses non-parametric density estimation to calculate the 1 - alpha quantiles of the SPE and Hotelling's T2 statistics from a set of training observations, then flags any observation in the testing data set with process monitoring statistics beyond these calculated critical values. Because of natural variability inherent in all real data, we do not remove observations simply because they are have been flagged as outside normal operating conditions. This function records an alarm only for observations having five flags in a row, as set by the default argument value of "faultsToTriggerAlarm". These alarm-positive observations are then removed from the data set and held in a separate xts matrix for inspection.
Concerning the lagsIncluded variable: the argument lagsIncluded = c(0,1) will column concatenate the current data with the same data from one discrete time step back. This will necessarily remove the first row of the data matrix, as we will have NA values under the lagged features. The argument lagsIncluded = 0:2 will column concatenate the current observations with the observations from one step previous and the observations from two steps previous, which will necessarily require the removal of the first two rows of the data matrix. To include only certain lags with the current data, specify lagsIncluded = c(0, lag_1, lag_2, ..., lag_K). This induce NA values in the first max(lag_k) rows, for k = 1, ..., K, and these rows will be removed from consideration. From the lag.xts() function helpfile: "The primary motivation for having methods specific to xts was to make use of faster C-level code within xts. Additionally, it was decided that lag's default behavior should match the common time-series interpretation of that operator — specifically that a value at time ‘t’ should be the value at time ‘t-1’ for a positive lag. This is different than lag.zoo() as well as lag.ts()."
Of note when considering performance: the example has 10080 rows on three features alternating between three states, and trains on 20 percent of the observations, while updating every 1008 (10 percent) observation. On a 2016 Macbook Pro with 16Gb of RAM, this example function call takes 15 second to run. Increasing the update frequency will decrease computation time, but may increase false alarm rates or decrease flagging accuracy. We recommend that you set the update frequency based on the natural and physical designs of your system. For example, if your system has a multi-state process which switches across one of four states every two hours, then test the update frequency at an eight or 12 hour level — enough observations to measure two to three full cycles of the switching process. For observations recorded every five minutes, try updateFreq = (60 / 5) * 8 = 96 or (60 / 5) * 12 = 144.
This user-facing function calls the processMonitor() function, and returns the training arguments necessary to call the mspMonitor() and mspWarning() functions.
For more details, see Kazor et al (2016):
A list with the following components:
an xts flagging matrix with the same number of rows as "data". This flag matrix has the following five columns:
the SPE statistic value for each observation in "data"
a vector of SPE indicators recording 0 if the test statistic is less than or equal to the critical value passed through from the threshold object
the T2 statistic value for each observation in "data"
a vector of T2 fault indicators, defined like SPE_Flag
a column indicating if there have been five flags in a row for either the SPE or T2 monitoring statistics or both. Alarm states are as follows: 0 = no alarm, 1 = Hotelling's T2 alarm, 2 = Squared Prediction Error alarm, and 3 = both alarms.
an xts data matrix of all the non-alarmed observations
an xts data matrix of the features and specific alarms for Alarmed observations with the alarm codes are listed above
a list of k lists, one for each class, with each list containing the specific threshold object returned by the internal threshold() function for that class. See the threshold() function's help file for more details.
Calls: processMonitor
. Pipe flow: mspTrain
into mspMonitor
into mspWarning
.
## Not run: # cut down on R CMD check time nrml <- mspProcessData(faults = "NOC") mspTrain(data = nrml[, -1], labelVector = nrml[, 1], trainObs = 4320) ## End(Not run)
## Not run: # cut down on R CMD check time nrml <- mspProcessData(faults = "NOC") mspTrain(data = nrml[, -1], labelVector = nrml[, 1], trainObs = 4320) ## End(Not run)
Trigger an alarm, if necessary, for incoming multivariate process observations.
mspWarning(mspMonitor_object, faultsToTriggerAlarm = 5)
mspWarning(mspMonitor_object, faultsToTriggerAlarm = 5)
mspMonitor_object |
An xts matrix returned by the mspMonitor() function |
faultsToTriggerAlarm |
Specifies how many sequential faults will cause an alarm to trigger. Defaults to 5. |
This function and the mspMonitor() function are designed to be ran via a scheduled task through Windows "Task Scheduler" or Macintosh OX "launchd" application suites. The file flow is as follows: at each time interval, run the mspMonitor() function on the matrix of daily observations to add a flag status to the most recent incoming observation in the matrix, and return this new xts matrix. Then, pass this updated daily observation matrix to the mspWarning() function, which will check if the process has recorded five or more sequential monitoring statistic flags in a row. Of note, because these functions are expected to be repeatedly called in real time, this function will only check for an alarm within the last row of the xts matrix. To check multiple rows for an alarm state, please use the mspTrain() function, which was designed to check multiple past observations.
This function requires an xts matrix returned by the mspMonitor() function.
An xts matrix of the same dimensions as mspMonitor_object, with a recorded negative or positive and type-specific alarm status. Alarm codes are: 0 = no alarm, 1 = Hotelling's T2 alarm, 2 = Squared Prediction Error alarm, and 3 = both alarms.
Pipe flow: mspTrain
into mspMonitor
into mspWarning
.
## Not run: # cut down on R CMD check time nrml <- mspProcessData(faults = "NOC") n <- nrow(nrml) # Calculate the training summary, but save five observations for monitoring. trainResults_ls <- mspTrain(data = nrml[1:(n - 5), -1], labelVector = nrml[1:(n - 5), 1], trainObs = 4320) # While training, we included 1 lag (the default), so we will also lag the # observations we will test. testObs <- nrml[(n - 6):n, -1] testObs <- xts:::lag.xts(testObs, 0:1) testObs <- testObs[-1,] testObs <- cbind(nrml[(n - 5):n, 1], testObs) # Run the monitoring function. dataAndFlags <- mspMonitor(observations = testObs[, -1], labelVector = testObs[, 1], trainingSummary = trainResults_ls$TrainingSpecs) # Alarm check the last row of the matrix returned by the mspMonitor # function mspWarning(dataAndFlags) ## End(Not run)
## Not run: # cut down on R CMD check time nrml <- mspProcessData(faults = "NOC") n <- nrow(nrml) # Calculate the training summary, but save five observations for monitoring. trainResults_ls <- mspTrain(data = nrml[1:(n - 5), -1], labelVector = nrml[1:(n - 5), 1], trainObs = 4320) # While training, we included 1 lag (the default), so we will also lag the # observations we will test. testObs <- nrml[(n - 6):n, -1] testObs <- xts:::lag.xts(testObs, 0:1) testObs <- testObs[-1,] testObs <- cbind(nrml[(n - 5):n, 1], testObs) # Run the monitoring function. dataAndFlags <- mspMonitor(observations = testObs[, -1], labelVector = testObs[, 1], trainingSummary = trainResults_ls$TrainingSpecs) # Alarm check the last row of the matrix returned by the mspMonitor # function mspWarning(dataAndFlags) ## End(Not run)
The mvMonitoring
package has four main functions for
external use, all of which begin with the string "msp" (for "multivariate
statistical process") followed by the function use. Functions without this
"msp" key are primarily internal functions. They are available to see and
use, but will largely be unnecessary to call in common workflows.
mvMonitoring
external functionsmspProcessData
- A function for synthetic process data generation. Use this data to test new process monitoring methods.
mspTrain
- A function to take in observations for training under normal conditions, and to return the training summary from these observations.
mspMonitor
- A function to take in real-time
process observations and detect system anomalies based on the training
summary returned by mspTrain
.
mspWarning
- A function to take in observations
returned by mspMonitor
and check for alarms by
measuring sequential anomalies. This function will also be equipped to
send SMS notifications to process technicians in future versions.
Three-feature, three-state simulated process data under normal operating conditions as example data for different included functions.
normal_switch_xts
normal_switch_xts
An xts data matrix with 10080 rows and four columns, corresponding to one week worth of data recorded at a 1-minute interval, and four columns as defined here:
the state indicator for the multivariate system, with three levels
x(t) = t + error
y(t) = t ^ 2 - 3t + error
z(t) = - t ^ 3 + 3t ^ 2 + error
where t is a 10080-entry vector of autocorrelated and non-stationary hidden process realizations. The states alternate each hour and are defined as follows:
As presented
Rotated by (yaw = 0, pitch = 90, roll = 30) and scaled by (1 * x, 0.5 * y , 2 * z).
Rotated by (yaw = 90, pitch = 0, roll = -30) and scaled by (0.25 * x, 0.1 * y , 0.75 * z).
See the vignette for more details.
Simulated in R.
Data from the SM-MBR Bioreactor system over 12 hours. This data will be used for testing the mvMonitoring package.
oneDay_clean
oneDay_clean
An xts matrix of 75 rows and 35 features recorded over 2017-01-27 at 00:10 to 2017-01-27 at 12:30.
Kathryn Newhart
Calculate the principal component analysis for a data matrix, and also find the squared prediction error (SPE) and Hotelling's T2 test statistic values for each observation in this data matrix.
pca(data, var.amnt = 0.9, ...)
pca(data, var.amnt = 0.9, ...)
data |
A centred-and-scaled data matrix or xts matrix |
var.amnt |
The energy proportion to preserve in the projection, which dictates the number of principal components to keep. Defaults to 0.90. |
... |
Lazy dots for additional internal arguments |
This function takes in a training data matrix, without the label column, and the energy preservation proportion, which defaults to 95 percent per Kazor et al (2016). This proportion is the sum of the q largest eigenvalues divided by the sum of all p eigenvalues, where q is the number of columns of the p x q projection matrix P. This function then returns the projection matrix P, a diagonal matrix of the reciprocal eigenvalues (LambdaInv), a vector of the SPE test statistic values corresponding to the rows of the data matrix, and a T2 test statistic vector similar to the SPE vector.
This internal function is called by faultFilter().
A list of class "pca" with the following:
the q eigenvectors corresponding to the q largest eigenvalues as a p x q projection matrix
the diagonal matrix of inverse eigenvalues
the vector of SPE test statistic values for each of the n observations contained in "data"
the vector of Hotelling's T2 test statistic for each of the same n observations
Called by: faultFilter
.
nrml <- mspProcessData(faults = "NOC") scaledData <- scale(nrml[,-1]) pca(scaledData)
nrml <- mspProcessData(faults = "NOC") scaledData <- scale(nrml[,-1]) pca(scaledData)
Apply Adaptive-Dynamic PCA to state-specific data matrices.
processMonitor( data, trainObs, updateFreq = ceiling(0.5 * trainObs), faultsToTriggerAlarm = 5, ... )
processMonitor( data, trainObs, updateFreq = ceiling(0.5 * trainObs), faultsToTriggerAlarm = 5, ... )
data |
An xts data matrix |
trainObs |
The number of training observations to be used |
updateFreq |
The number of non-flagged observations to collect before the function updates. Defaults to half as many observations as the number of training observations. |
faultsToTriggerAlarm |
The number of sequential faults needed to trigger an alarm. Defaults to 5. |
... |
Lazy dots for additional internal arguments |
This function is the class-specific implementation of the Adaptive- Dynamic PCA described in the details of the mspTrain() function. See the mspTrain() function's help file for further details.
This internal function is called by mspTrain(). This function calls the faultFilter() function.
A list with the following components:
a class-specific xts flagging matrix with the same number of rows as "data". This flag matrix has the following five columns:
the SPE statistic value for each observation in "data"
a vector of SPE indicators recording 0 if the test statistic is less than or equal to the critical value passed through from the threshold object
the T2 statistic value for each observation in "data"
a vector of T2 fault indicators, defined like SPE_Flag
a column indicating if there have been five flags in a row for either the SPE or T2 monitoring statistics or both. Alarm states are as follows: 0 = no alarm, 1 = Hotelling's T2 alarm, 2 = Squared Prediction Error alarm, and 3 = both alarms.
a class-specific xts data matrix of all the non-alarmed observations (observations with alarm state equal to 0)
a class-specific xts data matrix of the features and specific alarms of Alarmed observations, where the alarm codes are listed above
a threshold object returned by the internal threshold() function. See the threshold() function's help file for more details.
Calls: faultFilter
. Called by: mspTrain
.
nrml <- mspProcessData(faults = "NOC") data <- nrml[nrml[,1] == 1] processMonitor(data = data[,-1], trainObs = 672)
nrml <- mspProcessData(faults = "NOC") data <- nrml[nrml[,1] == 1] processMonitor(data = data[,-1], trainObs = 672)
This function generates data under normal operating conditions from a single-state or multi-state process model.
processNOCdata( startTime = "2015-05-16 10:00:00 CST", period = 7 * 24 * 60, stateDuration = 60, increment = "min", multiState = TRUE, autocorellation = 0.75, tLower = 0.01, tUpper = 2, errVar = 0.01 )
processNOCdata( startTime = "2015-05-16 10:00:00 CST", period = 7 * 24 * 60, stateDuration = 60, increment = "min", multiState = TRUE, autocorellation = 0.75, tLower = 0.01, tUpper = 2, errVar = 0.01 )
startTime |
a POSIXct object specifying the day and time for the starting observation. |
period |
The observation cycle length. Defaults to one week's worth of minute-level observations (10,080 observations). |
stateDuration |
The number of observations generated during a stay in each state. Defaults to 60. |
increment |
The time-sequence base increment. See "Details" of the seq.POSIXt() function options. Defaults to "min" for minutes. |
multiState |
Should the observations be generated from a multi-state process? Defaults to TRUE. |
autocorellation |
The autocorrelation parameter. Must be less than 1 in absolute value, or the process generated will be nonstationary. Defaults to 0.75 in accordance to Kazor et al (2016). |
tLower |
Lower bound of the latent $t$ variable. Defaults to 0.01. |
tUpper |
Upper bound of the latent $t$ variable. Defaults to 2. |
errVar |
Error variance of the normal white noise process on the feature variables. |
This function randomly generates a non-stationary (sinusoidal) and autocorrelated latent variable t with lower and upper bounds given by the arguments "tLower" and "tUpper", respectively, with autocorrelation governed by the "autocorrelation" argument. Necessarily, this coefficient must be less than 1 in absolute value, otherwise the latent variable will be unbounded. Next, this function draws a realization of this random variable t and calculates three functions of it, then jitters these functions with a normal white noise variable (with variance set by "errVar"). These three functions are:
x(t) = t + error
y(t) = t ^ 2 - 3t + error
z(t) = -t ^ 3 + 3t ^ 2 + error
This function is called by the mspProcessData() function. See ?mspProcessData for more details.
An data frame with the following information:
A POSIXct column of times starting at the user-defined 'startTime' argument, length given by the 'period' argument, and spacing given by the 'increment' argument. For example, if the starting value is "2016-01-10", period is 10080, and the incrementation is in minutes, then this sequence will be one week's worth of observations recorded every minute from midnight on the tenth of January.
An integer column of all 1's (when the 'multiState' argument is FALSE), or a column of the state values (1, 2 or 3).
A double column of generated values for the first feature.
A double column of generated values for the second feature.
A double column of generated values for the third feature.
Called by: mspProcessData
.
processNOCdata()
processNOCdata()
Quantiles for objects of class density
## S3 method for class 'density' quantile(x, probs = seq(0.25, 0.75, 0.25), names = TRUE, normalize = TRUE, ...)
## S3 method for class 'density' quantile(x, probs = seq(0.25, 0.75, 0.25), names = TRUE, normalize = TRUE, ...)
x |
a object of class |
probs |
numeric vector of probabilities with values in [0,1]. Note that
elements very close to the boundaries return |
names |
logical; if |
normalize |
logical; if |
... |
further arguments passed to or from other methods (currently unused) |
This function is a near-exact copy of the quantile.density
function from package BMS (https://CRAN.R-project.org/package=BMS).
In spring of 2022, CRAN informed us that the BMS has been orphaned, so we
copied the code (and corresponding documentation) we needed from it. See
doi:10.18637/jss.v068.i04 for their paper.
The function quantile.density()
applies generically to the built-in
class density
(as least for versions where there is no such method
in the pre-configured packages). Note that this function relies on
trapezoidal integration in order to compute the cumulative densities
necessary for the calculation of quantiles.
If x is of class density
(or a list with exactly one element),
a vector with quantiles. If x is a list of densities, then the output is a
matrix of quantiles, with each matrix row corresponding to the respective
density.
Stefan Zeugner, [email protected]
Martin Feldkircher, [email protected]
rNorm_dens <- density(rnorm(100000)) quantile(rNorm_dens)
rNorm_dens <- density(rnorm(100000)) quantile(rNorm_dens)
Render a 3-Dimensional projection matrix given positive or negative degree changes in yaw, pitch, and / or roll.
rotate3D(yaw, pitch, roll)
rotate3D(yaw, pitch, roll)
yaw |
z-axis change in degrees; look left (+) or right (-). Consider this a rotation on the x-y plane. |
pitch |
y-axis change in degrees; look up (-) or down (+). Consider this a rotation on the x-z plane. |
roll |
x-axis change in degrees; this change appears as if you touch head to shoulders: right roll (+) and left roll (-). |
When plotting with the package scatterplot3d, the default perspective is such that the pitch action appears as a roll while the roll action appears as a pitch.
This function is used only in data generation of the package vignette. This function is called by rotateScale3D().
A 3 x 3 projection matrix corresponding to the degree changes entered.
Called by: rotateScale3D
.
rotate3D(yaw = -10, pitch = 0, roll = 15)
rotate3D(yaw = -10, pitch = 0, roll = 15)
Render a 3-Dimensional projection matrix given positive or negative degree changes in yaw, pitch, and / or roll and increment or decrement feature scales.
rotateScale3D(rot_angles = c(0, 0, 0), scale_factors = c(1, 1, 1))
rotateScale3D(rot_angles = c(0, 0, 0), scale_factors = c(1, 1, 1))
rot_angles |
a list or vector containing the rotation angles in the order following: yaw, pitch, roll. Defaults to <0,0,0>. |
scale_factors |
a list or vector containing the values by which to multiply each dimension. Defaults to <1,1,1>. |
See the help file of function rotate_3D() for a brief explanation of how these angles behave in scatterplot3d functionality (from package scatterplot3d).
This function is used only in data generation in the package vignette (version 1) and the dataStateSwitch() function within the mspProcessData() function. This function calls rotate3D().
A 3 x 3 projection matrix corresponding to the degree and scale changes entered.
Calls: rotate3D
. Called by
dataStateSwitch
.
rotateScale3D(rot_angles = list(yaw = -10, pitch = 0, roll = 15), scale_factors = c(0.2, 1, 5))
rotateScale3D(rot_angles = list(yaw = -10, pitch = 0, roll = 15), scale_factors = c(0.2, 1, 5))
Data from the SM-MBR Bioreactor system over ten days. This data will be used for training the mvMonitoring package.
tenDay_clean
tenDay_clean
An xts matrix of 1,299 rows and 35 features recorded over 2017-01-17 at 00:10 to 2017-01-27 at 00:00.
Kathryn Newhart
Calculate the non-parametric critical value threshold estimates for the SPE and T2 monitoring test statistics.
threshold(pca_object, alpha = 0.001, ...)
threshold(pca_object, alpha = 0.001, ...)
pca_object |
A list with class "pca" from the internal pca() function |
alpha |
The upper 1 - alpha quantile of the SPE and T2 densities from the training data passed to this function. Defaults to 0.001. |
... |
Lazy dots for additional internal arguments |
This function takes in a pca object returned by the pca() function and a threshold level defaulting to alpha = 0.1 percent of the observations. This critical quantile is set this low to reduce false alarms, as described in Kazor et al (2016). The function then returns a calculated SPE threshold corresponding to the 1 - alpha critical value, a similar T2 threshold, and the projection and Lambda Inverse (1 / eigenvalues) matrices passed through from the pca() function call.
This internal function is called by faultFilter().
A list with classes "threshold" and "pca" containing:
the 1 - alpha quantile of the estimated SPE density
the 1 - alpha quantile of the estimated Hotelling's T2 density
a projection matrix from the data feature space to the feature subspace which preserves some pre-specified proportion of the energy of the data scatter matrix. This pre-specified energy proportion is user supplied as the var.amnt argument in the pca() function. See the pca() function's help file for more details.
a diagonal matrix of the reciprocal eigenvalues of the data scatter matrix
the vector of Hotelling's T2 test statistic values for each of the n observations in "data"
the vector of SPE test statistic values for each of the n observations in "data"
Called by: faultFilter
. This function uses a port of
the quantile.density()
function from the now-orphaned BMS package.
nrml <- mspProcessData(faults = "NOC") scaledData <- scale(nrml[,-1]) pca_obj <- pca(scaledData) threshold(pca_object = pca_obj)
nrml <- mspProcessData(faults = "NOC") scaledData <- scale(nrml[,-1]) pca_obj <- pca(scaledData) threshold(pca_object = pca_obj)