Over the course of my years as a software engineer, I’ve slowly become more <s>curmudgeonly</s> deliberate about how to structure a codebase, and how to gauge its relative success.
In my early days I was myopically focused on what the code can do today. It was all about speed, and cranking out code as fast as possible. Tests were a nice-to-have. “Works on my machine” was a reasonable acceptance criteria. And I’m not sure I even knew the definition of the word “maintainable”.
Those times were great fun. And I considered myself to be a pretty 1337 coder.
Then I watched several codebases I’d worked on grind to an eventual halt because no one could understand them, or they were too hard to extend or debug, or they were so fragile that people were afraid to change anything about them lest they introduce a crazy bug that would explode after being deployed to production.
Old man yells at cloud
Now, I’m older and I’ve worked with a lot of different engineers on a lot of different codebases. Now, I have a very different opinion. Now, “maintainability” is one of the most important words in my vocabulary. Now, I care so much more than I did before about what the code will be able to do tomorrow. And perhaps most importantly, I care so much more about what $nextEngineer will be able to make this code do tomorrow than about what I myself might be able to do.
These are the things that allow your software to survive and thrive beyond the early days. The things that ensure that your business will be able to continue to grow and evolve at the same pace a year from now that it can today, that it won’t get bogged down by an un-maintainable, un-extensible code foundation that drags your engineering team’s velocity down towards zero.
The skills, experience and foresight that are required to ensure a maintainable codebase, to be a force multiplier that ensures that a breadth of current and future engineers will be able to achieve sustainable, high velocity working on the code—these are the traits that are at the top of my list when evaluating engineering candidates nowadays. It’s not how many lines of code you can produce nor how quickly—it’s how well those lines of code will hold up as the foundation for your business that grows and evolves over time.
Boiling it down
Occasionally I reflect back on this mindset transition and try to distill my thoughts on maintainability down into something concrete that I can try to communicate to other engineers. And if I had to choose one overarching theme—that I could boil down to a single sentence—that best captures the spirit of what I believe about maintainability these days, it would be this:
Structure your code so that you will catch your bugs at compile time, rather than at run time.
Move your bugs forward in time. There is no single thing that you can do that will have a more sustained impact on the medium-to-long-term velocity of your team than this.
For the rest of this post I’ll list off some more tactical examples of things that you can do towards this goal. Savvy readers will note that these are not novel ideas of my own, and in fact a lot of the things on this list are popular core features in modern languages such as Kotlin, Rust, and Clojure. Kotlin, in particular, has done an amazing job of emphasizing these best practices while still being an extremely practical and approachable language.
So, credit where it is due: the brilliant language designers of these and other languages deserve all of the kudos for bringing these ideas to the foreground of the software engineering zeitgeist. Today, I’m just here to sing their praises. 🙂
(Side note: spending some time writing code in a variety of new languages is a really amazing way to broaden your horizons and challenge your beliefs about software engineering best practices. You’ll find that it’s often much easier than you think to apply lessons learned from a foundational feature of one language to another language that doesn’t explicitly provide or emphasize that feature. I haven’t written Clojure in several years now, but I firmly believe that the time I spent writing in that language did more to improve my skills as a software engineer than anything else I’ve ever done.)
And now, without further ado, let’s get into it.
5 patterns and language features to help catch bugs earlier
1. Static types
This one can be a tough pill to swallow for folks who love Python, Ruby, Clojure, and other dynamically typed languages. And I might never convince some of you of this point. But this is a thing that has burned me enough times over the years that I don’t think I’ll ever change my mind on it.
Part of the allure of dynamically typed languages is that if you don’t have to spend time on all of the ceremony of defining types and declaring them on all of your method signatures, you can code more quickly and spend your time thinking about the business logic instead of the object model. And you can write more flexible, re-usable functions that operate on data rather than operating on types.
There’s some truth to those arguments, especially in the early prototyping phase of a project. But what I’ve repeatedly seen is that once a codebase in a dynamically typed language grows beyond a certain size, it becomes harder and harder to reason about it and maintain it. Over and over again I’ve seen cases where a well-meaning developer working on one part of the code passes the wrong object type to a function in another part of the code that they weren’t the original author of, and then when that function call occurs, the app crashes.
The worst part of this is that that error happens at run time. If you’ve already deployed the code to production without catching this bug, you may have a customer-facing outage on your hands, and now you have to go through a fire drill to rollback the change or push a hotfix. Depending on how bad it was, this may cost you customers. Even in the best case scenario it’s stressful—and it has a high opportunity cost, as you have to pull some of your engineering team off to fight the fire.
Some argue that if such a bug makes it through to production, that is a sign that you didn’t add enough test coverage to ensure that the function would only be called with the correct argument types. My response to that is: yes, if you are diligent enough about test coverage, and you don’t make any mistakes in the test code itself, you might be able to avoid shipping most bugs of this classification. But testing is an art form in and of itself, and every one of your engineers must achieve a certain level of proficiency at it in order to clear this bar. And even if they do, we all still make mistakes from time to time.
No ifs, ands, or buts about it. And it does not rely on the varying experience levels of your engineers - if they write some code that has a bug like this in it, it won’t compile, and it will never even make it into a PR. No matter how good or bad the test coverage is. Let’s offload this category of work to compilers instead of putting it on our engineers!
I’ve become more and more convinced of this over time, to the point where I won’t even advocate for dynamically typed languages for prototypes anymore. Prototypes very often end up being promoted to products, if for no other reason than the code is already written. But if you’re going to promote a prototype to a product then you really ought to have sufficient test coverage to make sure your product is reliable, and at that point you’ll probably have invested the same amount of engineering effort that you would have put into building the prototype in a statically typed language like Kotlin or Go in the first place.
2. Null safety
Now we’ll get into some (hopefully) less controversial territory. You’ve probably heard the line about null references being a billion-dollar mistake. And if you’ve worked in a language that doesn’t provide compile-time null safety, you’ve surely encountered your fair share of silly bugs resulting in null pointer exception crashes at run time, or code that is littered with boilerplate null checks at the beginning of every single function call, or both.
Thankfully the trend in modern languages is to help us move these bugs forward in time by giving us a way to declare variables that are not allowed to be null. This is a core feature of C#, Kotlin, and TypeScript, among others. In Java, you can use Optional instead of allowing null. So we can let the compilers do this work for us.
In general if you find yourself using nullable variables these days, it might be a code smell. See if there is a different way you can structure the code to avoid it, and if not, see if your language of choice has any mechanism for compile-time/build-time null safety tooling.
3. Immutable variables and data structures
This one takes some getting used to and may be hard to believe at first, but consider this: there are precious few places in your code where you actually need any of your variables or types to be mutable.
The first time someone told me this was when I was learning Clojure, where it was a matter of necessity because it’s very difficult to even express a mutable object. I found the idea quite implausible. But once I opened my mind to it and got a little practice, I realized that it was true.
Immutable variables are an incredibly powerful way to improve code maintainability. Here’s why: when you are an engineer working on a code base that you aren’t entirely familiar with, and you encounter a line of code that defines an immutable variable whose type is an immutable data structure, you know all that you will ever need to know about that variable after reading that one line of code. Because it is guaranteed not to be changed anywhere else in the program.
Contrast this with mutable variables and mutable data structures: after reading a line of code where one of these is instantiated, if I want to reason about the state of that variable 100 lines farther down in that same code, there are a ton of things I have to consider:
- Were there any statements in between that might have modified it?
- Was that object passed by reference to any functions that might have mutated it?
- If so, do I need to go examine the source code of all of those functions to make sure I know what the state will be?
- Does my program have more than one thread, and if so, do any other threads have a reference to this object that might have allowed them to mutate it while I was working with it?
There is so much hidden complexity that comes along with mutable state. If I have to do the amount of reasoning described above for every line of code that deals with a mutable object, and I can instead write the same program in a way that only uses immutable variables and data structures, the reduction in complexity is astounding. And the corresponding increase in engineering velocity and maintainability is as well.
Many languages have a way to define immutable local variables these days (e.g. Kotlin val, TypeScript const). Many also have a way to define immutable data structures (e.g. Kotlin data class, C# record). Lean into these where you can.
Most engineers I work with are sold on this idea fairly easily, except for when dealing with collections. We are so used to writing loops that build up arrays or maps, it’s really hard to get used to the idea that this can be done without a mutable data structure and without a loop. But it can! Almost all languages these days have some flavor of functional programming tools for operating on collections (map, filter, reduce/fold, etc.). These can take some getting used to but they are well worth the price of admission.
The reduce / fold operation in particular can be a bit of a learning curve but it is the key to eliminating the need for mutable collections in your code. It will allow you to re-write code that looks like this:
without the mutable map:
4. Persistent collections (aka immutable collections)
When a coworker originally told me that I should be using immutable collections, my instinct was that this was impractical due to performance concerns and memory usage. If I represent a map as an immutable collection, and then somewhere in my code I need to add or modify a key in it, doesn’t that mean copying the entire data structure in order to obtain the version that contains my modification? Isn’t that crazy expensive?
Well, it turns out: no. As long as you are using persistent collections.
I first encountered this concept in Clojure, and I highly recommend Rich Hickey’s fantastic talk on the topic. The tl;dr is that:
- A persistent data structure is guaranteed to be immutable, but provides modifier functions (put, add, remove etc.) that will produce another persistent data structure with the same immutability guarantees.
- Under the hood, these data structures are implemented as trees, and when you want to modify a single item, you can do so by creating a new tree that shares almost all of the nodes of the original tree. You only need to copy and replace the small set of nodes in the tree on the path to the item you are modifying. In efficient implementations, this means you almost never need to clone more than about 4 nodes in the tree even if it has millions of nodes. The rest can be shared, which is efficient in terms of both memory usage and performance.
Many languages now have “persistent collections” or “immutable collections” libraries (e.g. Java PCollections, C# Immutable Collections, etc.) that do all of the heavy lifting for you. You interact with them just like normal collections, but you get all of the benefits of immutability while still maintaining great performance.
This concept is amazingly powerful, especially in concurrent programs. It means that you can pass a reference to a collection around anywhere you like in your program, and many threads can consume it at the same time with no locking or any other concerns about the collection being modified by another thread. You’ll be amazed at how much simpler this can make some of your application code! And at how nice it is to stop needing to worry about lock contention.
5. Algebraic data types and exhaustive pattern matching
These go by many names in different languages. In Kotlin they are called sealed classes. In many languages this may end up being just a special flavor of polymorphism. I think of them as an enumeration of types, where you know at compile time all of the types in the enumeration, but each type in the enumeration can have its own discrete properties, methods, etc.
It’s easiest to explain via a specific example. I’ll use an example from the Momento serverless cache API, since that’s something I’ve been working on a lot lately.
When you make a call to the get method on a Momento client object to retrieve a value from your cache, the response may be one of three very different types:
- A cache hit, in which case you will also get back the value that was retrieved from the cache
- A cache miss, in which case there will be no cache value.
- An error, if something went wrong with the request, in which case you might get an error code and an error message.
Without algebraic data types, a common way to try to represent this situation in code might be to provide a GetResponse object, with a status enum property that could be used to identify whether the response was a HIT, MISS, or ERROR. The object would also need fields to hold the various data that is relevant for each of those cases: e.g. value, errorCode, errorMessage. Those fields would have to be nullable or optional, because they would only be available conditionally, depending on which type of response we got. Something like this:
This is not an awful way to define this API, but it has one big drawback: it is the developer’s responsibility to write code that checks all of the conditions correctly, and if there is a bug in the code it will only surface at run time. For example, if you write code that assumes the response was a HIT without checking, and you try to access the value property, you will get a null pointer exception at run time if the response was not actually a HIT. (In the Kotlin code snippet above, because of Kotlin’s null-safety rules, you’d be forced to write some code to deal with the possibility of those values being null, but in other languages that wouldn’t necessarily be the case. The point remains that it is the developer’s responsibility to reason about which of these fields might be null and when.)
Algebraic data types provide a much nicer way to specify this API, without exposing any nullable fields at all. Here’s how this might look using Kotlin’s sealed classes:
Now we have a discrete class for each of the three cases, and each of those three classes has only the properties that are relevant to it. And they are no longer nullable.
A developer would access the appropriate class via pattern matching. In Kotlin, this is done via the when expression:
This approach is really nice because it removes the burden of knowledge from the developer for questions like “in which cases will value be available?” The value property only exists on the Hit class, so we get compile-time enforcement that it can’t be accessed unless the result was a Hit. We have once again moved our bugs forward in time!
The other great thing about this approach is that, in languages like Kotlin, the pattern matching expression is exhaustive. This means that the compiler is smart enough to know whether you have handled all of the possible cases in your when expression, and fail to compile if you have not. Imagine a scenario where you have several of these when expressions scattered around a large code base, and an engineer is working on a new feature that involves adding an additional type of GetResponse to the sealed class. Without the exhaustive pattern matching, the engineer would be responsible for identifying every place in your code that is interacting with a GetResponse, and making sure that it appropriately handles the new type of response. Otherwise what do we end up with? A bug that isn’t exposed until run time.
But with exhaustive pattern matching, once the new type is added, the code won’t compile until we’ve updated all of the places in the code that need to be updated to account for it. Win!
The key to building a solid foundation for your software and sustaining high velocity for your engineering team for the life of your product is to make sure your codebase is maintainable. It’s crucially important that future engineers are able to ramp up on the code quickly and safely. Thankfully, trends in modern programming languages are giving us more and more tools to achieve that, and to move entire classes of bugs forward in time from run time to compile time. This also saves us a ton of engineering time that we don’t need to spend writing tests to prove that we haven’t introduced these classes of bugs. (Don’t get me wrong: tests are still very important! But it’s so nice not to have to write tests around the behavior of nullable properties or other such mundane things that aren’t actually related to your business.)
The strategies above have proved especially valuable for me in the projects that I’ve worked on in recent years. I hope you’ll find them valuable too!
If you have any other favorite approaches for moving bugs forward in time, I would love to hear about them. Send them my way on Twitter (@cprice404)—or join the Momento Discord and start a discussion!