Thursday, September 29, 2016

Techniques (Methods) of RCA: 5-Why's and Ishikawa Diagram

Techniques/Methods used to perform RCA:-

There are many techniques that can be used to carry out RCA but “5-WHY’s” and “Ishikawa Diagram” are the most popular ones and hence I am explaining them in detail in this blog post.


This technique was developed by Sakichi Toyoda, founder of Toyota Industries Co. Ltd. In this technique a series of five questions are asked in order to reach to the root cause. In some cases the questions asked could be more or less than five questions as five questions may not always be sufficient to lead to the root cause. But if the questions asked are proper and within context then in normal circumstances 5-WHY’s will lead to the root cause. In this technique the problem that has to be analyzed is written down and the question asked is generally why it happened. The answer to the problem is written down and this process is iterated until the root cause is reached.

This technique is very basic in nature and takes a fairly small amount of time and does not require any software or other materials. All you need is a paper and a pen and you can start as there is no analysis of statistics involved. However only one root cause can be found for the problem being analyzed using this technique. Also if two different people are analyzing the same problem using this technique they will point out different root causes. If however the answer to the “WHY” question being asked currently can be verified on the spot of the occurrence of the problem the issues discussed above can be avoided.

An example on how find the root causes by using 5-WHY’s:-
1    1. Why is your computer not operating?
          - Because the operating system crashed.
2    2. Why did it crash?
          -  Because it was infected by a virus.
3    3. Why was it infected by virus?
          - Because I had not installed an anti-virus software on my computer.

Ishikawa Diagram:-

Ishikawa Diagram is one of the oldest techniques used for RCA and was developed by Karou Ishikawa who used it in the 1960’s. It is also known as cause and effect analysis or fishbone diagram. The reason why it is known as fishbone diagram is because its shape resembles the bone of a fish.

 In this technique all the possible causes and their effects for the problem are listed down. An Ishikawa diagram generates and sorts hypotheses about possible causes of problems within a process by asking participants to list all of the possible causes and effects for the identified problem. The links between the events and their causes which could be actual or potential are shown by this technique in the Ishikawa diagramThus a very large amount of information can be represented using this technique. This information is then used to generate ideas as to why the problem (or cause) occurred and what could be the possible effects of that problem (or cause).


                      Fig. A sample template showing the layout of the “fishbone diagram”.

A template should be constructed as shown in the figure. The effect (or problem) should be written in the box on the right hand side as shown in the Fig. The effect should reflect what is happening and must be defined without any ambiguity. The categories are identified and written on the top of the slanting lines that come out from the horizontal line. Many authors tend to give a specific list of categories in which the causes should fall but in my opinion the categories of causes should depend on the type of problem. Hence you can use those categories that suit your problem the best. Once the categories have been identified, the underlying causes in each category are found out and written in the form of sub branches to the categories as shown in the Fig. After this step, questions in the form of ‘why this happened’ are asked to identify why that cause took place. The answer to these questions forms the sub branches for the cause as shown in Fig. There is no restriction as to how many questions should be asked as it should be continued until almost all the aspects as to why the cause took place have been covered. Once all the causes have been discussed and dissected in detail, the root cause can be traced. The best method to do this is to establish the chain of events in such a way that the answer to the ‘why’ question traces back to the effect block.

The biggest advantage of Ishikawa diagram is that it gives all the causes that may have an effect on the system. This is useful as it can lead to the discovery of more than one root cause. It is a very useful technique for RCA when using a team to analyze the problem as it keeps everyone involved. The ideas are arranged in logical groups and one cause leads to another. Once the causes have been listed various countermeasures are applied to check whether they solve the problem. This can lead to the wastage of time, money and effort. This technique only gives what could be the root cause to a problem. In order to establish a root cause data must be collected and used to verify each cause which may be thought of as a root cause. As the diagram tends to be complex and large, if it is not drawn appropriately it may result in overlooking of important causes which must be avoided. The best part is that it is easy to carry out and to get started you don’t need any software or other tools.

Besides ‘5-WHY’s’ and ‘Ishikawa Diagram’ there are number of other methods devised over the years. All of them have their particular importance for specific fields in which they can be employed. Cause and Effect Analysis (Tree Diagram), Failure Mode and Effects Analysis, Pareto Analysis, Fault Tree Analysis, Bayesian Inference, Cause Mapping, Barrier Analysis, Change Analysis, Causal Factor Tree and Analysis, Taproot, Apollo Root Cause Analysis (ARCA), RPR Problem Diagnosis, Kepner-Tregoe Problem Solving and Decision Making, Management Oversight and Risk Tree (MORT) Analysis are the other RCA techniques that are widely used.

Next and the final post for RCA on my blog will be regarding an indepth analysis of root cause analysis.

Monday, September 26, 2016

Different Phases of Root Cause Analysis.

Phases of Root Cause Analysis
The method of RCA has seven discrete phases. Each of these phases has a significant contribution in performing the analysis. These methods should be followed for successful execution of root cause analysis.

The seven phases are:-
1    1. Define the problem clearly:-
This is the first and the most critical phase of RCA. The problem to be analyzed must be clearly defined. Ambiguities should be avoided because if you get this phase wrong the entire process and effort may get wasted. It should be clearly stated what is to be prevented from recurring.

2    2. Gather the required data or evidence regarding the problem to be analyzed:-
To carry out RCA there must be some evidence or data available to prove that a problem exists within a system. Impact the problem has, along with the duration of existence of the problem should be carefully recorded. The situation should be analyzed fully before the causes that affect the problem are taken into account. With data the outcome of the actions can be predicted.

3    3. Identify the casual relationships which are related with the problem defined in the step one:-
In this phase all the events that lead to the problem are figured out. A sequential relationship is established between all the events that occurred. Those conditions which existed before and after the occurrence of the problem should be determined. Also the problems surrounding the central problem should be analyzed because a resulting problem could have been triggered by some other problem.

4  4. The causes which if changed or removed will help to prevent the recurrence should be identified:-
The causes that have the biggest impact on the system are identified. The reason of the existence of casual factors is found out. The causes that have the most occurrences or have the maximum impact on the performance of the system are normally the factors that should be changed or removed.

5   5. Figure out the corrective actions (or solutions) to prevent the recurrence of the problems or causes:-
What kind of corrective actions (or solutions) should be taken to immediately control the problem should be determined. These actions should immediately control the problem and should help to prevent their future occurrence. To implement the actions or solutions following factors must be considered:-
a.  The actions or the solutions are feasible to be implemented. This will include feasibility factors for cost of implementation and resources in terms of time, effort, man power etc.
b.      The actions must be aligned with the objectives for which RCA is being done.
c.       What new risks would be introduced by these actions? Will the system be able to handle these new risks or not?

6    6. Implementation of the recommended corrective actions (or solutions):-
Those solutions or corrective actions are recommended that would prevent the recurrence of the problem again. How to implement this solution is also considered as it should not violate any of the factors given in phase 5.

7  7. The corrective actions (or solutions) recommended should be analyzed to check for their effectiveness:-
This phase is also called as follow up phase. This phase deals with making sure that the implemented corrective actions (or solutions) are effective enough to control the recurrence of the same type of errors or problems again. The periodic tracking is done so as to review that whether the corrective actions are implemented as desired and are functioning properly. If a problem which was supposedly corrected occurs again, the corrective actions must be analyzed to figure out why the implemented actions were not effective.

Previous Post on Root Cause Analysis IntroductionStay Tuned for More. 

Root Cause Analysis - An Introduction

“Do not look where you fell but where you slipped.”
This proverb is a very significant piece of wisdom. What it means is that to find the answer to your problem you should not only look where it lies but also look at the source of the problem. In today’s world we all face problems and most of them cannot be avoided but what can definitely be avoided is the recurrence of the same problem. To avoid encountering the same problem again, the reason why the problem came up must be identified but it is the human tendency to rectify the obvious effects of that problem only. The approach or method used to look for the root causes is called Root Cause Analysis, popularly known as RCA.

This blog post will provide the definition of root cause analysis, principles and types of causes. Even though the concept of RCA has a universal application I have written this post keeping in mind the software engineering perspective only.  But even then it has been written in such a way that someone with no software engineering background can understand it.

In today’s world software development is getting more and more complex. Many software projects result in the failure due to improper requirements and those which any how succeed are not always perfect. There is always a need to solve the errors in the software. Finding and fixing the errors or faults in a software is not an easy task and it is very challenging to find the errors that are critical to software functioning. Very often the software engineers tend to look for the errors that are obvious. And in most of the cases the steps taken to remove these problems are not enough to eradicate the problem. The problems which are obvious are removed, but what happens is that the same problems return back later to haunt everyone including the designers and developers, and mostly importantly the stake holders.

What is needed is a way of eradicating the problems in such a way that the problem which is once solved should not occur again. This is normally accomplished by using the Root Cause Analysis, which is popularly known by its three letter acronym RCA. RCA is a method in which the main focus is to identify the root causes due to which the error occurs or the problems come up. RCA deals with determining what happened, what the reason for its occurrence is and how to reduce the possibility of recurrence of the same kind of the problem. This method works on the belief that if the problems are to be removed fully, you must concentrate on the root causes of those problems. Eliminating or correcting the root causes of any problem tends to solve the problem in the long run as opposed to correcting only the obvious symptoms which does solve the problem but it is bound to happen again and it could come up in even worse form.  

RCA is a process that is both iterative and reactive. By iterative I mean that RCA has to be repeated again and again certain number of times as doing it once normally will not solve the problem fully and stop it recurrence. Hence RCA is a tool that works on the principle of continuous improvement. By reactive I mean that RCA is done once an event has occurred. Using this technique once the desired level of expertise has been achieved , the method becomes pro-active, which means that RCA can then be used to forecast whether there is any possibility of an error or problem to occur. Such an approach will help in the prevention of future occurrence of same kind of the problems.

Principles of RCA:-
·       1The main aim of the RCA is performance improvement by removing the root causes of a problem as it is more effective than removing the obvious symptoms.
·        2. For RCA to be effective, a systematic approach must be followed. The conclusions are drawn about the causes (or problems), they must be documented as it will provide the evidence for all the causes.
·       3. Normally for a given problem there can be one or more root causes.
·   4. Casual relationships between the root cause (or causes) and the defined problems must   be established for the RCA to be effective.

Three basic types of causes:-

  • Physical Causes: - The tangible or material items such as motherboard, hard disks etc could have failed.
  • Human Causes: - The people working on the system did something wrong like inserting the wrong values or not following the operation procedures properly.
  • Organizational Causes: - This includes the policies or the process of the organization which sometimes contributes to the causes. For example no one was responsible for the regular check up of the power supply to the database servers.
Further reading: RCA Phases and Techniques