Death of alarm storms: How control room operators can stay ahead of the curve

NOTE: The sponsor of this content may contact you with more information on this topic. Click here to opt-out from sharing your email address with this sponsor. (This link will not unsubscribe you from any other BIC email list).

Monitoring and troubleshooting chemical plants is a challenge on a good day to keep chemical plants is a challenge on a good day.

When alarms come in huge waves, as they often do, the operator is inundated with a sea of red or amber on their screen and then has to hunt down the trends to help triage the situation. But what if there was another way?

Alarm fatigue is real

Death of alarm storms: How control room operators can stay ahead of the curve

DCS/SCADA systems trigger alarms to alert operators of conditions that deserve attention. Typically, alarms are encoded as high and low (and high-high and low-low) limits on individual tags (i.e., sensors). These preset rules are generally written by the engineering company that implements the DCS when the plant is constructed but may also be updated during TARs and other configuration changes over time.

Alarms tend to go off all at once because many sensors are involved in any incident or deteriorating state of the plant. The worst kept secret in the control room is that most alarms are not actionable and do not necessarily indicate a “bad” situation. Over time, these storms cause alarm fatigue where operators may pay less attention to them. But, if they let their guard down, they are surprised. This can lead to non-optimal production, unplanned flaring events, which then lead to unnecessary emissions and fines, or in the worst case scenario, to trips costing millions of dollars a day.

Anomaly detection vs alarms

What if you could watch the dynamic behavior of the plant and flag alerts based on outliers versus only preset limits? Anomaly detection algorithms can monitor the values of variables and determine when the values are behaving differently than usual.

For example, one simple approach to anomaly detection is to keep statistics on a window of values for a tag and when the values are two standard deviations away from a running average, flag an anomaly.

These "tag-based" anomaly detection methods are widely used in monitoring applications. While leveraging these dynamic monitoring techniques is better than relying solely on preset rules, these approaches still have many flaws.

System-based anomalies vs tag-based anomalies

Tag-based anomaly detection is still noisy. If one tag is outside of a dynamically computed operating envelope, it does not necessarily mean that there is an actionable anomaly. For the mathematically inclined, tag-based anomaly detection algorithms are called univariate because they only operate on one variable.

In the pursuit of more signal and less noise, turning to multivariate anomaly detection methods can determine how groups of variables behave with respect to each other. Typically, tags are broken up according to systems and are sometimes organized by unit, by area within a unit or by systems within areas, depending on how many tags there are to model. Generally, you want between 20-400 tags together in a model for effective anomaly detection.

System-based anomaly detection algorithms are much less noisy because they are modeling the behavior of multiple variables. In the process industries, tags behave in a certain ”harmony” with each other. Even though individual tags might be changing significantly, they are changing together in a harmonious way. In these circumstances, univariate methods would be noisy but the multivariate methods can discern that these patterns are good. Harmony between tags also helps to surface anomalies when things are not changing. Multivariate, system-based methods can catch that some tags should be changing but are steady because other tags are changing.

Benefits of system-based anomalies

Not only are system-based anomalies less noisy, they serve as an early-warning system for operators. Every second matters when troubleshooting in the moment, so if a system-based anomaly detection system surfaces an anomaly, it usually precedes an alarm storm. Those few minutes can give operators the cushion they need to troubleshoot and take remedial action to stabilize processes before an incident arises.

Death of alarm storms: How control room operators can stay ahead of the curve

Even less noise with ensembles

As earlier stated, noise is the operators’ nemesis. Even system-based anomaly detection methods can be noisy and alert too frequently. What else beyond using system-based anomaly detection can we do to reduce noise and lower false positives? One very effective method of improving accuracy and reducing noise generally leveraged in the machine learning community is the use of model ensembles.

An ensemble of models is exactly what it sounds like: multiple algorithms that all work together to come to a consensus of whether an anomaly exists. Ensemble-based machine learning is a powerful technique where multiple algorithms are applied and combined to produce an optimal answer. This further reduces false positives and improves the actionability of anomalies.

Believe it when you see it

Until recently, machine learning projects were costly and time consuming. They required tons of time from both the process engineers in the plant and a team of data scientists that know how to extract, clean, transform and process data.

ControlRooms.ai has solved this problem by creating this turnkey application to deploy system-based anomaly detection ensembles in minutes. You can upload your historical tag data in files and when the models are trained and the ensemble is ready, you’ll get an email inviting you to see anomalies in just a few minutes. After that you can get a streaming real-time implementation in just a week.

Give your operators the minutes they need to respond and the ability to see around corners with ensembles of system-based anomaly models and turnkey solutions. This AI-powered monitoring and troubleshooting tool for chemical plant operators helps prevent system failures and unplanned downtime.

To detect sooner and troubleshoot faster, visit controlrooms.ai.

Death of alarm storms: How control room operators can stay ahead of the curve

Death of alarm storms: How control room operators can stay ahead of the curve

Tags