What is Root Cause Analysis in manufacturing?

On the production floor, Root Cause Analysis (RCA) is the process of identifying factors that cause defects or quality deviations in the manufactured product.

The term “root cause” refers to the most primary reason for a production line’s drop in quality, or a decrease in the overall equipment effectiveness (OEE) of an asset.

Common examples of root cause analysis in manufacturing include methodologies such as the “Fishbone” diagram and the “5 Whys”. The simplicity of these methods is also their strength, but how effective are they in dealing with the complexity of today’s manufacturing processes?

 

“Fishbone Diagram” created by Kaoru Ishikawa (Quality Manager at Kawasaki) in the 1960s.
“Fishbone Diagram” created by Kaoru Ishikawa (Quality Manager at Kawasaki) in the 1960s.

Root cause analysis is undergoing a new interpretation in light of the Industry 4.0 revolution. With the power of industrial IoT and artificial intelligence at our fingertips, it’s natural that manufacturers progress to more advanced root cause analysis methods.

 

Why do we look for the Root Cause?

While the symptom and immediate cause might be easy and quick to solve, failing to detect and treat the root cause will very likely lead to the problem recurring.

The challenge in RCA is distinguishing between a symptom or intermediate cause, and the true root cause of a problem.

 

Shortcomings of Traditional Root Cause Analysis

The general approach currently used by many manufacturers when it comes to root cause analysis is to rely on on-site expert knowledge.

Experience is indeed valuable, but some production lines are so complex that being simultaneously aware of every component and sub-process is humanly impossible.

Manufacturers that do collect data from OT and IT systems still need to be able to make sense of it in order to perform RCA. This requires time and a variety of professionals to perform – in most cases, process, quality and maintenance engineers.

It’s natural that even experts can be biased towards certain ideas. And, even if the root cause of the problem is roughly identified, there may be inaccuracies in the definition of the problem, making it difficult to come up with an intelligent and lean solution.

Another disadvantage of manual root cause analysis:

Currently, most RCA information isn’t shared across manufacturing sites, leaving factories/plants of the same company to repeat each other’s mistakes, leading to unplanned downtime that could have been prevented.

 

The Power of Automated Root Cause Analysis

Machine Learning is a subfield of artificial intelligence that focuses on developing and researching algorithms that learn from data. The algorithms exist in the form of models which are trained with historical data in a way that allows them to make predictions and decisions based upon new data.

Thanks to significant advances in machine learning and Big Data analytics, root cause analysis can be performed using automated methods. These methods are unbiased and based purely upon historic and real-time data from the production floor.

Anomaly Detection

To perform RCA using machine learning, we need to be able to detect that something is out of the ordinary, or in other words, that an anomaly is present.

The machine learning model is trained to analyze the equipment’s data output under regular “healthy” operating conditions. An anomaly can take the form of any pattern of deviation in the amplitude, period, or synchronization phase of a signal when compared to normal behavior.

The algorithm forms a prediction based on the current behavioral pattern of the anomaly. If the predicted values exceed the threshold confirmed during the training phase, an alert is sent.

Examples of anomalies detected using automated root cause analysis include:

  • Component failure
  • Abnormal process input parameters (eg. off-spec material composition)
  • Corrupt sensor values
  • Changes made to the control logic (eg. via the PLC)
  • Changes in environmental conditions

So, is this the end of industry expertise?

Automated root cause analysis reduces the overall dependency on expert knowledge, but it doesn’t diminish the value of on-site experts who are vital in monitoring, validating and managing the RCA process.

Additionally, automated root cause analysis is powered by machine learning and probabilistic graphical models that need to be trained in order to be able to perform inference. This makes on-site experience critical in ensuring a system that takes into account all relevant parameters.

Mutual Information

Another mathematical solution suited to RCA is the probabilistic strategy known as Mutual Information. In a manufacturing setting involving a high volume of data and parameters, this approach can be used to leverage complex statistical knowledge to search for patterns.

Mutual information is an investigative tool that aims to describe the mutual dependency between two random variables. When aiming to identify causal relationships – such as in root cause analysis – mutual information helps by identifying which information can be learned about one variable through data about another.

 

The Role of AI in Root Cause Analysis

Artificial intelligence, specifically in the form of machine learning, catapults root cause analysis into another realm of asset management.

It’s all about timing:

The ability of AI to formulate predictions relating to machine performance and health, instead of waiting for disaster to strike, introduces a whole range of benefits that affect the bottom line.

Some examples of the direct benefits of automated root cause analysis in manufacturing are:

  • Early detection of safety issues
  • Reduced emissions due to accurate monitoring of the entire production process
  • Identification of complex process disruptions eg. inefficiency of a reactor
  • More efficient electrical consumption through anomaly detection
  • Predicting quality deviations and adjusting processes to prevent the waste of raw materials

If you had to summarize the value of machine learning in root cause analysis, it would be:

Less time spent on figuring out the problem, more time spent on fixing and preventing it.

 

Example of Automated Root Cause Analysis in Manufacturing

A prime example of automated root cause analysis would be to look at how machine learning can be utilized to deduce the root cause of asset failure and quality deviations in manufacturing.

We can look at a manufacturing process as consisting of:

  • an input stage – in process manufacturing, feeding the raw materials into the production line;
  • the process – a sequence of steps performed consecutively;
  • a resulting output – the finished product.

For this basic example, we will describe the framework for an automated root cause analysis system by using a Bayes Network (see Figure 1.)

Figure 1 - A Bayes Network describing causal correlations between root causes and failures.
Figure 1 – A Bayes Network describing causal correlations between root causes and failures.

The example process consists of 6 processing steps (S1 – S6), each with a number of causal nodes (the blue circles) and 4 failures or quality deviation types known as failure nodes (the orange circles).

The failures are the result of errors in one of the six processing steps, although some failures can be the result of errors in more than one processing stage. The causal correlations are represented by the dotted arrows.

Building a Bayesian Network like the one in Figure 1 requires the involvement of relevant process experts since all process stages and failure points need to be carefully defined.

Expert knowledge that includes causal correlations based upon experience, as well as historic data of known causal relationships, can be added to the algorithm which can take into account this knowledge, without being biased by it.

Once the model has been trained, new data can be fed into it to discover the root cause of new failure incidents. With data only about the failure nodes, the machine learning algorithm can infer which causal nodes were likely involved in the failure.

An example of the algorithm’s output:

Probability (Causal Node) A = 0.01
Probability (Causal Node) B = 0.81
Probability (Causal Node) C = 0.03

As you may have noticed, the results don’t add up to 1 (the standard for statistical and probabilistic calculations). This is because the algorithm takes into consideration the fact that the exact root cause might not be described as an already-defined causal node.

Another important element of this type of model is what is known as a measurement node. Measurement nodes give specific readouts of observable information pertaining to the causal nodes such as pressure or vibration measurements taken in a specific step of the process.

In this way, measurement nodes add another data layer to the model, allowing for relationships that aren’t yet defined by the model to affect the outcome.

 

The result:

A data-driven automated RCA system that is accurate and predictive, offering actionable insights that can be shared between cooperating facilities.

Patterns and anomalies can be detected pointing to root causes that would normally be very difficult to identify based purely upon expert knowledge.

The fact that root causes of unplanned downtime and quality deviations can be predicted makes these methods of automated root cause analysis perfectly suited to Industry 4.0 use cases.

For an in-depth example of automated root cause analysis in manufacturing, be sure to check out our free case study here.

Ready to advance toAutomated Root Cause Analysis?

Learn More