A framework to fix issues

When I joined Babylon Health, back in 2016, the situation was less than ideal. Morale was low, so was productivity. People were putting a lot of hours and effort, but it just didn’t seem like it was heading anywhere. It was a troublesome codebase touched by many hands - some of those long gone. If this sounds familiar, start looking for the following:

Bugs are being reported at an alarming pace.
Features take sometimes twice or thrice the expected time.
There is little to no discussion on Pull Requests.
Work is not tracked and Pull Requests don’t have a ticket assigned.
No technical discussions are happening.
People are not giving feedback to each other.
Features are not discussed before code materializes.

A good pair of lenses to look at these problems is by using Good Strategy / Bad strategy by Richard P. Rumelt. Despite being a framework that is used at a macro level (e.g. at department / company level), I find it quite helpful even for more simple situations like the one described (e.g. at team level). I don’t deny that I have changed the goals of each step, but nevertheless I found them useful in the recent past. Maybe you will find them too.

1. Diagnose existing problems

This is the hardest step, because sometimes we are the problem. It’s possible that the root-cause of a series of unrelated problems is lack of good leadership or management skills. Maybe a manager is not delegating enough and becomes the bottleneck. Or maybe they are micromanaging the team. Perhaps they are demanding accountability without ever giving the team autonomy. Identifying problems should be done initially on your own, and then collect the view of the rest of the team. Look for common themes. If you haven’t built enough trust with your team, collect feedback anonymously. Present the problems to the team and get their buy-in. Agreeing with the team that the items presented are real problems can sometimes be quite hard. Above all, listen, before jumping to conclusions. Some problems are symptoms of the same root-cause and it’s this work that you must do as a manager.

Although there were a lot of problems, they stemmed from only a few things:

No management layer buffering the team from a constant barrage of work.
No management negotiating paying tech debt.
Little trust between team members, and therefore feedback was either poorly received or not given at all. Communication was broken.

2. Create guiding policies

Define policies so that the identified problems are mitigated. I like to think about this step as a way to plug the problem. For example, a problem identified was work not being tracked. Explore why people are resisting creating tickets. Is it because the tooling is bad? Or perhaps because it’s going to impact the sprint velocity and scope? Maybe tracking work is not valued across the team. It’s helpful to go deep into the root-cause of the problem when defining the guiding policies. Another problem identified was Pull Requests being opened without any context (e.g. no RFC or Product Spec). It might be helpful to showcase to the team what a good RFC looks like and how it can help give context to the whole team. Also explain that a good RFC doesn’t have to be long and tiresome. Set-up the rules as to when an RFC needs to be approved before code starts.

3. Take action

Taking action in most cases is down to one thing: having hard conversations when policies are broken. If an engineer consistently opens work without any context and untracked, they need to be held accountable. In other cases, taking action requires getting the buy-in from the business to tackle tech debt, instead of continuing to build on top of a broken codebase. There are also actions that are your responsibility. For example buffering the team from distractions or removing blockers.

Despite not following this framework at Babylon, all those initial problems, alongside their root-causes, were addressed. Plenty of mistakes were made, but with the gift of foresight it’s possible to systematically address similar problems. It’s also possible to measure how successful you are with these changes:

Engineering churn rate.
Number of bugs being reported as time progresses.
Quality of work released, their impact on the business and the team.
Qualitative feedback from the team (e.g. using something like Peakon).