Monday, October 3, 2016

Analysis of Root Cause Analysis

“To stumble twice against the same stone is a proverbial disgrace.”
Marcus Tullius Cicero

The above statement teaches a very simple thing, that a mistake made more than once is a blunder. Yet we choose to make same mistakes over and over again instead of using a simple and very effective method of RCA. Repetition of the same problem is a very troublesome issue especially in the field of software engineering. In my view RCA is the only technique that can help you to identify the major issues that contribute to the toughest of the problems that you may be facing irrespective of the fact in which field you work may it be engineering, manufacturing or medicine. There is no other tool in my opinion as simple as RCA and effective too at the same time.

The ‘5-WHY’s’ and the ‘Ishikawa Diagram’ are the two most popular techniques to carry out RCA. They have been in usage for more than 25 years now but it was with the popularity of six sigma and lean manufacturing that RCA really came into vogue and got the necessary recognition that such a fine tool deserved. However, even though much has been written and discussed about RCA it is still not used intensively and is a much underutilized tool given its benefits and effectiveness.

The successful implementation of RCA is inhibited due to two reasons. The first one is the lack of organizational support in the form of work process and policies and the second is that individuals are not willing to carry out RCA. The latter is traceable to the former. If the organization does not support the RCA then it will be obviously opposed by the individuals within the organization. Although this may not always be true but the principles and policies of the organization do affect the way individuals may approach RCA. Sometimes the organization may support the RCA but individuals may run away from it due to certain organizational processes and policies. Unawareness in the organization as how to apply RCA strategically along with the lack of work culture that supports the usage of RCA discourages most of the professionals from using RCA. Most of the organizations train their professionals to carry out RCA but they do not really have any policies that will help in implementation of RCA.

Individuals working in the organization would agree that RCA is a great tool and must be used but they really don’t use it. The most of the organizations are task oriented and solving the problem immediately is given more emphasis rather than carrying out the process of RCA which may take more time than the immediate corrective measures. This is the reason why most of the professionals would say that they do not have time for RCA and leave it to be done at some later stage. Hence I will attribute the unwillingness of the individuals to carry out RCA to the organizational work cultures. Sometimes individuals do carry out the RCA in full force but when they report the corrective measures to the management, these measures are either rejected or accepted but not implemented. Such actions of the management discourage the individuals to carry out RCA in future.
There is a big misconception amongst the professionals within the organizations that to implement RCA successfully you need some new tools or different skill set. Once they undergo training for the RCA they are normally disappointed that it was not something radically different and can be easily learnt. They now attribute RCA to common sense but I do not view RCA as common sense. The reason why I cannot attribute RCA to common sense is that if the same problem is given to be analyzed by two different people with notably different ideas, perceptions, backgrounds and experience, they will definitely identify a different root cause for that problem. Hence RCA is not just common sense.

RCA if utilized to its full extent can really do wonders for any type of an organization as it helps in identifying the causes which if removed can permanently prevent the recurrence of the problem. The biggest advantage of RCA is that it really needs no software to do the analysis and reach to root causes. In fact the software’s that are available have the specific categories under which the causes are to be enlisted. I do not consider this to be a valid approach as it would limit the thought process of the individuals. And in some cases it would lead to the conflict as to whether a specific cause should be kept under which category as it may seem valid under more than one category. Some software’s allow categories to be added to the existing ones but using software normally inhibits the thinking process of the individuals carrying out RCA. In my opinion the RCA can yield the best results if carried out in a team. The reason for this is that a single individual may not be able to figure out all the factors that lead to a problem. This would happen because a single individual will have a limited knowledge of his own and may also bring in the bias. Having a team carrying out RCA will bring a greater level of knowledge and experience and the outcome will be more effective. For example if there is a problem in software, the system design analyst will tend to propose solutions based on the causes that may crop up due to design of the software. Similarly the developer will tend to identify the causes associated with the code. Thus having a team will help to identify all the causes and selection of the best cause as the root cause.


Root Cause Analysis is a very useful method to find out the underlying causes for the problems but to make it a success organizational support is very important. The best part of RCA is that it can be applied to almost everything that may be experiencing some problem. My opinion on RCA is that it should be extensively used especially in the field of software engineering. But its use is not limited to software engineering domain only. It can be very much successfully applied to other engineering domains, medical sciences, manufacturing processes, organizational issues and even to our daily life. The combination of “5-WHY’s” and “Ishikawa Diagram” give you a powerful yet easy to use tool for RCA. All that is needed is a pen, sheet of paper and an open mind to start with RCA.

Here are the links to RCA's IntroductionPhases and Techniques

Thursday, September 29, 2016

Techniques (Methods) of RCA: 5-Why's and Ishikawa Diagram

Techniques/Methods used to perform RCA:-

There are many techniques that can be used to carry out RCA but “5-WHY’s” and “Ishikawa Diagram” are the most popular ones and hence I am explaining them in detail in this blog post.


This technique was developed by Sakichi Toyoda, founder of Toyota Industries Co. Ltd. In this technique a series of five questions are asked in order to reach to the root cause. In some cases the questions asked could be more or less than five questions as five questions may not always be sufficient to lead to the root cause. But if the questions asked are proper and within context then in normal circumstances 5-WHY’s will lead to the root cause. In this technique the problem that has to be analyzed is written down and the question asked is generally why it happened. The answer to the problem is written down and this process is iterated until the root cause is reached.

This technique is very basic in nature and takes a fairly small amount of time and does not require any software or other materials. All you need is a paper and a pen and you can start as there is no analysis of statistics involved. However only one root cause can be found for the problem being analyzed using this technique. Also if two different people are analyzing the same problem using this technique they will point out different root causes. If however the answer to the “WHY” question being asked currently can be verified on the spot of the occurrence of the problem the issues discussed above can be avoided.

An example on how find the root causes by using 5-WHY’s:-
1    1. Why is your computer not operating?
          - Because the operating system crashed.
2    2. Why did it crash?
          -  Because it was infected by a virus.
3    3. Why was it infected by virus?
          - Because I had not installed an anti-virus software on my computer.

Ishikawa Diagram:-

Ishikawa Diagram is one of the oldest techniques used for RCA and was developed by Karou Ishikawa who used it in the 1960’s. It is also known as cause and effect analysis or fishbone diagram. The reason why it is known as fishbone diagram is because its shape resembles the bone of a fish.

 In this technique all the possible causes and their effects for the problem are listed down. An Ishikawa diagram generates and sorts hypotheses about possible causes of problems within a process by asking participants to list all of the possible causes and effects for the identified problem. The links between the events and their causes which could be actual or potential are shown by this technique in the Ishikawa diagramThus a very large amount of information can be represented using this technique. This information is then used to generate ideas as to why the problem (or cause) occurred and what could be the possible effects of that problem (or cause).


                      Fig. A sample template showing the layout of the “fishbone diagram”.

A template should be constructed as shown in the figure. The effect (or problem) should be written in the box on the right hand side as shown in the Fig. The effect should reflect what is happening and must be defined without any ambiguity. The categories are identified and written on the top of the slanting lines that come out from the horizontal line. Many authors tend to give a specific list of categories in which the causes should fall but in my opinion the categories of causes should depend on the type of problem. Hence you can use those categories that suit your problem the best. Once the categories have been identified, the underlying causes in each category are found out and written in the form of sub branches to the categories as shown in the Fig. After this step, questions in the form of ‘why this happened’ are asked to identify why that cause took place. The answer to these questions forms the sub branches for the cause as shown in Fig. There is no restriction as to how many questions should be asked as it should be continued until almost all the aspects as to why the cause took place have been covered. Once all the causes have been discussed and dissected in detail, the root cause can be traced. The best method to do this is to establish the chain of events in such a way that the answer to the ‘why’ question traces back to the effect block.

The biggest advantage of Ishikawa diagram is that it gives all the causes that may have an effect on the system. This is useful as it can lead to the discovery of more than one root cause. It is a very useful technique for RCA when using a team to analyze the problem as it keeps everyone involved. The ideas are arranged in logical groups and one cause leads to another. Once the causes have been listed various countermeasures are applied to check whether they solve the problem. This can lead to the wastage of time, money and effort. This technique only gives what could be the root cause to a problem. In order to establish a root cause data must be collected and used to verify each cause which may be thought of as a root cause. As the diagram tends to be complex and large, if it is not drawn appropriately it may result in overlooking of important causes which must be avoided. The best part is that it is easy to carry out and to get started you don’t need any software or other tools.

Besides ‘5-WHY’s’ and ‘Ishikawa Diagram’ there are number of other methods devised over the years. All of them have their particular importance for specific fields in which they can be employed. Cause and Effect Analysis (Tree Diagram), Failure Mode and Effects Analysis, Pareto Analysis, Fault Tree Analysis, Bayesian Inference, Cause Mapping, Barrier Analysis, Change Analysis, Causal Factor Tree and Analysis, Taproot, Apollo Root Cause Analysis (ARCA), RPR Problem Diagnosis, Kepner-Tregoe Problem Solving and Decision Making, Management Oversight and Risk Tree (MORT) Analysis are the other RCA techniques that are widely used.

Next and the final post for RCA on my blog will be regarding an indepth analysis of root cause analysis.

Monday, September 26, 2016

Different Phases of Root Cause Analysis.

Phases of Root Cause Analysis
The method of RCA has seven discrete phases. Each of these phases has a significant contribution in performing the analysis. These methods should be followed for successful execution of root cause analysis.

The seven phases are:-
1    1. Define the problem clearly:-
This is the first and the most critical phase of RCA. The problem to be analyzed must be clearly defined. Ambiguities should be avoided because if you get this phase wrong the entire process and effort may get wasted. It should be clearly stated what is to be prevented from recurring.

2    2. Gather the required data or evidence regarding the problem to be analyzed:-
To carry out RCA there must be some evidence or data available to prove that a problem exists within a system. Impact the problem has, along with the duration of existence of the problem should be carefully recorded. The situation should be analyzed fully before the causes that affect the problem are taken into account. With data the outcome of the actions can be predicted.

3    3. Identify the casual relationships which are related with the problem defined in the step one:-
In this phase all the events that lead to the problem are figured out. A sequential relationship is established between all the events that occurred. Those conditions which existed before and after the occurrence of the problem should be determined. Also the problems surrounding the central problem should be analyzed because a resulting problem could have been triggered by some other problem.

4  4. The causes which if changed or removed will help to prevent the recurrence should be identified:-
The causes that have the biggest impact on the system are identified. The reason of the existence of casual factors is found out. The causes that have the most occurrences or have the maximum impact on the performance of the system are normally the factors that should be changed or removed.

5   5. Figure out the corrective actions (or solutions) to prevent the recurrence of the problems or causes:-
What kind of corrective actions (or solutions) should be taken to immediately control the problem should be determined. These actions should immediately control the problem and should help to prevent their future occurrence. To implement the actions or solutions following factors must be considered:-
a.  The actions or the solutions are feasible to be implemented. This will include feasibility factors for cost of implementation and resources in terms of time, effort, man power etc.
b.      The actions must be aligned with the objectives for which RCA is being done.
c.       What new risks would be introduced by these actions? Will the system be able to handle these new risks or not?

6    6. Implementation of the recommended corrective actions (or solutions):-
Those solutions or corrective actions are recommended that would prevent the recurrence of the problem again. How to implement this solution is also considered as it should not violate any of the factors given in phase 5.

7  7. The corrective actions (or solutions) recommended should be analyzed to check for their effectiveness:-
This phase is also called as follow up phase. This phase deals with making sure that the implemented corrective actions (or solutions) are effective enough to control the recurrence of the same type of errors or problems again. The periodic tracking is done so as to review that whether the corrective actions are implemented as desired and are functioning properly. If a problem which was supposedly corrected occurs again, the corrective actions must be analyzed to figure out why the implemented actions were not effective.

Previous Post on Root Cause Analysis IntroductionStay Tuned for More. 

Root Cause Analysis - An Introduction

“Do not look where you fell but where you slipped.”
This proverb is a very significant piece of wisdom. What it means is that to find the answer to your problem you should not only look where it lies but also look at the source of the problem. In today’s world we all face problems and most of them cannot be avoided but what can definitely be avoided is the recurrence of the same problem. To avoid encountering the same problem again, the reason why the problem came up must be identified but it is the human tendency to rectify the obvious effects of that problem only. The approach or method used to look for the root causes is called Root Cause Analysis, popularly known as RCA.

This blog post will provide the definition of root cause analysis, principles and types of causes. Even though the concept of RCA has a universal application I have written this post keeping in mind the software engineering perspective only.  But even then it has been written in such a way that someone with no software engineering background can understand it.

In today’s world software development is getting more and more complex. Many software projects result in the failure due to improper requirements and those which any how succeed are not always perfect. There is always a need to solve the errors in the software. Finding and fixing the errors or faults in a software is not an easy task and it is very challenging to find the errors that are critical to software functioning. Very often the software engineers tend to look for the errors that are obvious. And in most of the cases the steps taken to remove these problems are not enough to eradicate the problem. The problems which are obvious are removed, but what happens is that the same problems return back later to haunt everyone including the designers and developers, and mostly importantly the stake holders.

What is needed is a way of eradicating the problems in such a way that the problem which is once solved should not occur again. This is normally accomplished by using the Root Cause Analysis, which is popularly known by its three letter acronym RCA. RCA is a method in which the main focus is to identify the root causes due to which the error occurs or the problems come up. RCA deals with determining what happened, what the reason for its occurrence is and how to reduce the possibility of recurrence of the same kind of the problem. This method works on the belief that if the problems are to be removed fully, you must concentrate on the root causes of those problems. Eliminating or correcting the root causes of any problem tends to solve the problem in the long run as opposed to correcting only the obvious symptoms which does solve the problem but it is bound to happen again and it could come up in even worse form.  

RCA is a process that is both iterative and reactive. By iterative I mean that RCA has to be repeated again and again certain number of times as doing it once normally will not solve the problem fully and stop it recurrence. Hence RCA is a tool that works on the principle of continuous improvement. By reactive I mean that RCA is done once an event has occurred. Using this technique once the desired level of expertise has been achieved , the method becomes pro-active, which means that RCA can then be used to forecast whether there is any possibility of an error or problem to occur. Such an approach will help in the prevention of future occurrence of same kind of the problems.

Principles of RCA:-
·       1The main aim of the RCA is performance improvement by removing the root causes of a problem as it is more effective than removing the obvious symptoms.
·        2. For RCA to be effective, a systematic approach must be followed. The conclusions are drawn about the causes (or problems), they must be documented as it will provide the evidence for all the causes.
·       3. Normally for a given problem there can be one or more root causes.
·   4. Casual relationships between the root cause (or causes) and the defined problems must   be established for the RCA to be effective.

Three basic types of causes:-

  • Physical Causes: - The tangible or material items such as motherboard, hard disks etc could have failed.
  • Human Causes: - The people working on the system did something wrong like inserting the wrong values or not following the operation procedures properly.
  • Organizational Causes: - This includes the policies or the process of the organization which sometimes contributes to the causes. For example no one was responsible for the regular check up of the power supply to the database servers.
Further reading: RCA Phases and Techniques