Breaking the Stigma of Legacy Codebases

Article by Matt Schultz — Jul 21, 2021

With today’s rapidly evolving software landscape and emphasis on clean, modern code, the term “legacy” often comes with a certain stigma attached to it. From a developer’s perspective, legacy means large, complicated, and outdated with questionable code quality and tenuous test coverage. Legacy software presents an array of unique problems and achieving any meaningful productivity early on will be a challenge.

However, the product owner’s perspective is often much different. A successful legacy application has stood the test of time and has continued to (at the very least) function and generate revenue. And the reality is some very ugly code has generated a lot of revenue. As a developer, it’s easy to lose sight of this. Clean, modern code isn’t the goal, a successful business is. What developers prioritize doesn’t necessarily keep the business afloat.

Now, that’s not to say developers should toss any notion of software craftsmanship out the window—quite the opposite. Instead, developers must learn to thrive and apply their abilities within the constraints of a legacy system. It’s a delicate balancing act of developing new features while simultaneously keeping technical debt under control.

5 Tips for Working with Legacy Codebases

Cut Your Predecessors Some Slack

When inheriting a legacy codebase, it’s easy to look at its previous developers with disdain. The reality is every ugly line of code, unoptimized query, and bloated dependency was a technical solution that was not afforded the luxury of being implemented in a vacuum. This is code that had to be written yesterday or incurred a last-minute change in requirements—it’s real-world, and it’s gritty. There are a multitude of external factors that contribute to the code you see beyond simply the skill of the developer that wrote it. So, cut those developers a bit of slack, because you too may need to implement a less-than-ideal technical solution due to factors out of your control.

Decipher the Past

It’s an obvious thing to state, but utilizing the documentation at your disposal is key to getting up to speed with a legacy codebase. What’s not so obvious is the form that this documentation can take. If you’re lucky enough to inherit a codebase which describes every complicated, business-critical component in detail, great. However, this is often not the case and we need to resort to alternative means of figuring out the what and why of the code we’re maintaining.

In the absence of traditional documentation, tests are an excellent fallback because they describe the expected behavior of the part of the system they are covering. They are so effective that more traditional forms of documentation, such as code comments, are nowadays reserved only for the most complicated bits of logic.

In lieu of a test suite, there are some useful tools provided by our version control software (for the purposes of this example, Git). Having the ability to browse a repository’s commits (snapshots of the codebase) with git-log and git-show is certainly helpful, but more often than not, the current state of a file in our codebase is the result of multiple commits potentially spread across long periods of time. This is where git-blame comes into play. With git-blame, we can see a line-by-line breakdown of the author and commit that last changed each line.

Sometimes, the most recent commit that changed a line doesn’t tell the whole story of why it was changed. Fortunately, with git-diff, we can compare a commit with any previous commit to visualize the changes between them.

While it’s a more hands-on approach, these tools can give us some valuable insight into the code we’re looking at when we’re left with no better alternative.

Document the Present

There is a mantra, albeit somewhat cliche, that says one should leave things in a better state than they found them. With regards to legacy software, this means if you didn’t inherit a test suite, create one of your own. If there isn’t documentation, write some. Now, remember, we’re bound by the constraints of the real world (most notably time), so focus on what will give you the most bang for your buck.

When writing tests, ensure that any business-critical components are well covered. For everything else, you can get by with some broad acceptance tests that at least ensure the most common use cases are handled.

The same applies to writing documentation: focus on the complicated logic whose behavior can’t adequately be inferred through the test suite or commit history. When writing commits of your own, make sure to be as descriptive as possible, because someone years down the road may be relying on your commit messages to figure out how the code you wrote works. There’s nothing worse than a Git log filled with commits that simply say updates.

Lastly, don’t break git-blame, or if you must, do it all at once. Code formatters such as Prettier are great tools for ensuring a consistent, clean coding style—perfect for a legacy codebase that may lack those things. However, when you run a code formatter for the first time, it touches virtually every file. This means when using git-blame, most lines of code will now show the code formatting commit rather than the commit that actually introduced that logic. If you do introduce a code formatter to an existing codebase, try to perform the initial code formatting in one big commit. This way, we can easily configure git-blame to ignore it using –ignore-revs-file.

Keep Your Dependencies Under Control

As mentioned before, maintaining a legacy codebase is a balancing act of developing new features and managing technical debt. One of the largest contributors to technical debt in a codebase is the introduction of dependencies. When dealing with time crunches, changing requirements, and all of the other factors that come with developing for a business in the real world, a third-party library that saves you from having to reinvent the wheel is a tempting prospect. However, if this becomes habit, it’s easy to get bogged down in this complicated web where your dependencies share different versions of dependencies of their own, making it difficult to upgrade one dependency without breaking others. This is commonly referred to as Dependency Hell, and it can be a serious problem when you really need to upgrade a dependency to patch a security flaw, for example.

So, how does one avoid this? Well, for starters, approach introducing new dependencies with extreme caution. This is easier said than done given how quick and easy of a solution they are to the task at hand. Before introducing a new dependency, ask yourself: Is it really worth the time savings over rolling out your own solution? If so, try to vet it to the best of your ability:

Is it actively maintained?
Does it have a healthy user base?
Are issues being addressed in a timely manner?

Even with the most thorough vetting, sometimes maintainers fall behind or simply abandon their projects. The goal isn’t to scare you away from dependencies—they are a necessary part of virtually every software project—it’s to encourage a more cautious mindset. And regardless of whether or not dependencies are a major pain point in your legacy codebase, using an automated dependency updater like Dependabot simplifies dependency management and helps ensure you don’t miss any critical security patches.

Resist the Urge to Rewrite

It’s easy to look at a legacy codebase and come to the conclusion that it needs to be rewritten. While it’s certainly possible that a codebase can become so laden with technical debt that it’s the only way forward, oftentimes it’s a trap. Rewrites are massive endeavors, they always take longer than expected, and can bring business operations to a grinding halt as new feature development takes a backseat to the rewrite.

Some might suggest that compartmentalizing old code while new code is written with a new framework or language is a viable option, and it is. But keep in mind that this also means your codebase will incur technical debt in the form of a fractured architecture.

Ultimately, finding success in maintaining a legacy codebase is about understanding and embracing its nature, rather than fighting against it. Time and budget are finite resources, and the perfect technical solution you may want to implement isn’t always possible. If you learn to work within those constraints, the stigma of legacy code might not be as bad as it seems.

Up Next —

Clear Communication in Software Development Starts with Semantics

From day one, you need to be sure you and your software development partner are speaking the same language. Find out how to bridge costly communication problems.

Read this Article →

Read this Article