How to triage bugs. Or, avoiding “just one more quick bug fix” because that is really, really, really, really, really expensive

Reading Time: 4 minutes

What is the problem, exactly?

It is easy to fall into the trap of adding “just one more quick bug fix” into the build before sending the software out to the customer.  Doing this repeatedly can cause lengthy delays, as typically any change can impact other areas of the software, creating a snowball effect of changes that again need to be triaged and handled in some way.

Since time is money, delays to delivery cost the company money they did not plan to spend on that part of the project.  Also these delivery delays can impact on your relationship with your customer.  I believe the golden question to ask your customer is “how likely are you to recommend us?”… and if they are constantly irritated by unforeseen delays, they might be less and less likely to recommend, and even stop doing business with you.

 

But it will only be quick, and we are really, really sure this is the last one

The real issue, though, is triaging based on a description of a bug fix being “just one more quick bug fix”, which implies it is only up for grabs since it is “quick” and “just one more”.  This description does not mention the criteria that should be used when triaging.

 

But our test coverage is excellent, and code is only merged to the main pipeline after all automated tests are run

Good for you! Then in theory regressions should catch most important issues.  And if tests were added at the time of development, even better.  Perhaps this post isn’t for you then.

 

So what is the way to triage bugs?

I am a big advocate of using agile software development within a scrum team because it enables being able to deliver software regularly.  Even if it is not perfect, having a team triage session, which basically means having opinions from business, technical and QA perspectives*, all together at one time in one session, means the ability to quickly accept or reject tickets that will be worked on.

To ensure the team is not wasting company time, and thus a whole bucket load of money, it is important to reject tickets (features / improvements / bugs … and so on) , or otherwise scale the content of the work to be done in the ticket, into only the essentials.

Let me say that again.

It is important to reject tickets:  it is important to reject features, improvements, bugs…

 

How important is it to reject bugs?

Really, really, really, really, really important 🙂

 

How to quickly triage bugs, to ensure your software release does not become very expensive

In a session with your team, which implies the three perspectives* present, triage bugs based on risk of this bug going into production, which I assess using likelihood and impact.  Also triage bugs on importance of the feature to the customer, which I will call customer value.  And finally, like any ticket, think about the QA perspective, which is the risk of doing development work in this area, again using likelihood and impact.

 

  1. Risk of this bug in production, “risk of bug”: Likelihood of this happening in production
  2. Risk of this bug in production, “risk of bug”: Impact of this happening in production
  3. Customer value: Is this bug affecting any feature that the customer will be using?
  4. Risk of development work in this area, “risk of bug fix”: Likelihood of software development on this bug fix affecting other areas of the software in production
  5. Risk of development work in this area, “risk of bug fix”: Impact of software development on this bug fix if it affects other areas

 

If some of these terms sound familiar, yes, they are thanks to the ISTQB Agile certification I obtained.

 

Let’s go through an example

Bug: Bank statement shows the balance in grey instead of green, for balances that are a positive number

  1. Risk of bug, likelihood: High
    • 100% likely, as this is repeatable for certain scenarios.
  2. Risk of bug, impact: Low
    • Team assesses and realises this will only happen for a certain type of customer account, and these make up only 2% of the customer base.
    • Team checks production logs and can see that those customers mostly check balances using another screen.  In the last 24 hours, no customer has checked a balance using that particular screen.
    • The grey is the default text colour used for all text on the screen, so this does not stand out as unusual
  3. Customer value: Low
    • This grey text colour does not prevent the customer from checking their balance, so no.
    • In addition, the customers have been promised new features that will enable the bank to sell them new products through this new app, which will greatly increase the bank revenue.
    • The product manager has a lot of pressure to get that product out to market quickly, and need the team to focus on that as a priority
  4. Risk of bug fix, likelihood: High
    • QA states this is in an area of code with little UI test coverage, since it is a new app.  So fixing this bug will most likely cause new bugs in this area
  5. Risk of bug fix, impact: High
    • Developer states the impact of changing the style here is that is could render some of the balance unreadable, due to a high amount of business logic in the UI instead of the api, as this new app was rushed to market.  The balance should always be readable, so the impact is high

 

I guess you should be seeing the obvious here, which is this ticket should be rejected.

Therefore, the perspective of triaging on “how long it takes to fix” can generate the wrong answer, since this is the kind of bug fix that typically the development team will say is “just five minutes to fix”.

Conclusion

Imagine that you have another five to ten tickets like this that are up for triaging.  It is easy and tempting perhaps to fall into the trap of accepting them all, if thinking they will “just take five minutes to fix”.  In reality, any change has an impact.  In addition, the “fix” might take “just five minutes”, but what about the building of the pipeline, the retesting, and the new cycle if something breaks as a result of that fix?

Multiply that by five or ten, and you will have a frustrated team bogged down by irrelevant tickets, costing the company time and money.

 

Summary of the three perspectives:

  1. a business / customer facing perspective “what does the customer need”,
  2. a technical perspective “how to implement to solve this, which architecture…”,
  3. and a QA perspective “are we building testable software, what are the acceptance criteria for each user story…”