Skip to content

Engineering Health Essentials

In software development, I believe engineering health is a term that deserves attention and strategic focus. Sustainable software development is not just about the code we write or the features we ship; it’s about a proactive commitment to the underlying health of our engineering processes. Allocating 20-25% of our resources to engineering health has always been my preference. At this point, I feel like it is a necessity for sustainable development. In this article, I am going to explore engineering health. I’ll define it, share my thoughts, and look at why it’s important in the long run. Let’s dive in and see how it really impacts our work in software development.

concept of engineering health, showing a doctor character examining the health of a software system

Defining Engineering Health

Engineering health is mostly about maintaining and enhancing the operational aspects of our software systems. It scope extends well beyond mere keeping the lights on. Engineering health is the ongoing effort to keep software systems running smoothly, secure and up to date while ensuring the software can handle today’s needs and adapt to tomorrow’s challenges.

Engineering health means staying one step ahead of security risks by always checking and improving your defenses. It’s beyond fixing things when they break. It’s also about making good things even better to stop problems before they start. This means making the way we build software faster and more efficient, and keeping all our instructions and guides (documentation) fresh and current. It’s a bit like tuning a car regularly, not just repairing it when it breaks down. Plus, dealing with technical debt is a big part of this, making sure we clean up old or clunky code so everything runs smoother in the long run.

In addition, engineering health includes the often overlooked yet critical keeping the lights on tasks. This ranges from updating runbooks for smoother operational procedures to lining up after incidents to prevent recurrence. With engineering health work, we aim to create a sustainable environment where engineers can have the space and resources to tackle underlying issues that. If we neglect these issues, they can have snowball effect.

20-25% Resource Allocation

Setting aside 20-25% of our team’s effort for engineering health is really important. It’s a necessary plan to stop small problems from growing into bigger ones. Think of it like this. If you have a team of 10 engineers working for three months, 2 or 3 of them should focus on fixing and improving things, not just adding new features. Obviously, it shouldn’t be the same engineers who are just shoveling problems. In this way, your team is better prepared. You don’t need to rush to fix emergencies, but actually making our system better and stronger over time.

Redefining Resource Allocation

Let’s get creative with how we handle engineering health. It’s not just about giving some of our team’s time to maintenance tasks.  Looking at the bigger picture, we should invest in tools that can do the boring, repetitive stuff for us, put money into training so our team gets even smarter, and set aside time for group think-sessions where we can come up with ways to stop problems before they even start. Instead of going in circles, fixing the same problems over and over, we should focus on eliminating these issues from the get-go. 

Creative Scheduling

A smart move is to have team members take turns working on engineering health tasks. It’s like a round robin DNS approach: everyone gets a turn, which keeps maintenance steady and lets everyone share ideas and responsibility. This way, keeping our systems in good shape becomes a team effort.

Leveraging Cross-Functional Collaboration

Imagine a scenario where multiple teams need synthetic test data for performance testing, but there’s no clear ownership of this task, leading to everyone facing difficulties. A clever solution would be to create a cross-team alliance specifically to tackle this issue. This alliance, comprised of members from different teams, would take charge of generating and maintaining the synthetic test data. It’s like building a team that takes care of a common hurdle, so all teams can focus better on their specific tasks. Thus, engineering health spans over multiple teams.

I also see high value of building teams that enable others. I generally call them ops teams. They not only enable other teams but also make the teams more efficient. This is a strategic move to handle the engineering health work at an organizational level.

A Manager’s Perspective

In my role, I push for engineering health proactively. This involves encouraging engineers to not only identify problems but to also craft creative solutions and tackle root causes. Consider the time we implemented a new testing system in our workflow tool where we implemented void tasks to verify the system’s overall behavior. It helped us to become aware of problems before they got big. This significantly improved our alert system in the distributed setup, catching potential issues early on.

First, always be on the lookout for minor tweaks that can have major impacts down the line. Encourage your team to find these opportunities because engineers often have a deeper understanding of the system’s intricacies. Second, help your team  understand the significance of these tasks. Clearly, it’s not just about addressing the problems we see. It’s beyond that. We should find the root and prevent problems from recurring. By developing solutions that not only fix but also enhance our systems, we empower our team. Third, start a feedback loop. Sometimes these efforts aren’t as fruitful. It’s impossible to make it better without a critical eye.

The Long-Term Benefits

Investing in engineering health brings long lasting benefits to organizations and our projects. It’s like planting seeds that grow into a strong, healthy system over time. From my own experience, here are the lasting rewards we’ve seen from this approach:

  • Increased System Reliability: Regular attention to maintenance tasks means fewer unexpected breakdowns and a more reliable system.
  • Keeping the Lights On: Ensuring that software systems remain reliable and functional at all times mean that systems are less likely to experience downtime or unexpected failures.
  • Cost Efficiency: Proactive maintenance can be more cost-effective in the long run, preventing larger, more expensive issues.
  • Enhanced Team Morale: When teams are not constantly bogged down by unexpected issues, it leads to higher job satisfaction and motivation.

Allocating 20-25% of our resources to engineering health is a strategic decision that pays dividends in the long run. It’s about building a culture where maintenance and improvement are part of our daily routines. As leaders in the software development sphere, it’s our responsibility to steer our teams towards this sustainable practice, ensuring the longevity and success of our projects

Oh hi there 👋 It’s nice to meet you.

Sign up to receive awesome content in your inbox, every month.

We don’t spam!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.