Exceptions are bugs

Learn how Momento has adopted a philosophy about exceptions in a JVM language.

Kenny Chamberlin

Author

Perspectives

If you are a Rust or Kotlin developer, be careful of neck strain from too much nodding along as we look at patterns that increase verbosity (at the right place, I’d suggest) and promote correctness.

Some backstory

I started out professionally in the salt mines of C# and Java. In those days, generics were new and not widely adopted. A common debate back then was “exceptions: checked vs unchecked?” Passionate and respected people differed greatly over this point, and the outcome, at least in Java, has been a uniform mess. I cannot begin to count the number of bugs I resolved that boiled down to someone forgetting to catch and handle a relevant exception, or a new exception type being added without every consuming call site and stack being carefully inspected. As an early contribution, I wrote and then passionately advocated for years for the removal of the terrible ^{logAndThrow()} function, which codifies one of the worst ways to handle exceptions.

Later, I spent a few years deep in C++. Exceptions in C++ are problematic, and we disabled them altogether to our great benefit. This immersion therapy in thought-out and comprehensive software along with the “default exception”—the omnipresent guillotine of the segfault—set the stage for a pensive return to the JVM at Momento.

The Momento story

At Momento we use gRPC, and for many internal services we use Kotlin. In case you’re not aware, Kotlin is a programming language that compiles down to JVM bytecode—reminiscent of TypeScript’s relationship to JavaScript in a number of ways.

Setting the direction for a greenfield service, I needed a way to ensure that, over time, bugs are “interesting” instead of mundane gotchas for well-meaning developers (like future me) to blunder into. Placing the burden of some mandate like “you must catch the stuff you call into” or “read every line of everything you call to know what you need to catch” seemed like a great way to waste everyone’s time. Developers, as it turns out, are good at writing code, and written code is reasonably well-correlated with value. I want to encourage value, so enabling developers to write better code faster instead of reading and tip-toeing everywhere is a no-brainer for me.

The direction

(At least) 2 key technical maxims need to be agreed on:

Software should lean on the compiler to purge “illegal” code
Common bugs should be “illegal” to write (functional programmers unite!)

One of the simplest examples of this is when a compiler prevents you from using a variable that was not initialized. This is common across JVM, dotnet, Rust and many other ecosystems. While we could instead rely on the compiler generating code that throws an exception when it encounters the use of an uninitialized variable (laughs in C++), we enjoy the guarantee that no uninitialized variables will be read anywhere in our applications (modulo unsafe blocks, creative reflection, and the like—but we’re talking about guard rails and rules, not The Universal Model Of Everything). This guarantee is so valuable and intuitive, developers will fix code that does not comply with it. Here’s an example you’ll want to fix:

var message: String? = null
if (condition) {
  message = "the condition was true"
} else {
  message = "the condition was false"
}

Now, Kotlin types are also aware of possible-nullity so this can be repeatedly simplified to remove cognitive burden for readers and make this code rely more and more on the compiler for static legality verification:

val message: String
if (condition) {
  message = "the condition was true"
} else {
  message = "the condition was false"
}

We can declare ^message as a readonly (^val vs ^var in Kotlin), so if I want to use ^message after this conditional expression (yes I’m foreshadowing the next update here), the compiler will guarantee the variable was certainly initialized before my use—or it will declare the program as ^illegal and will fail to compile, which is good!

For completeness, since we’re using Kotlin here we should lift the assignment to make this a clean and error-resistant declarative expression instead of an imperative, temporally vulnerable expression sequence:

val message = if (condition) {
  "the condition was true"
} else {
  "the condition was false"
}

On exceptions

Oh yeah, this is what this is supposed to be about. Exceptions are, if you squint (to be honest, I don’t think you really need to squint), ^gotos by another name.

One hilarious trick I’ve seen used by teams that decry ^goto in languages that support it is to use the ^{do {} while (false);} pattern. This gives you both a break label and a continue label you can ^goto at will! This is a terrible thing to do. Future readers will be confused and will have to reason about scopes and original intent—and heaven help them if the original author has implemented a dynamic break counter to bounce out of successive wrapped label-loops.

Now, to be fair, the problem itself is not the *Exception—*the issue here is the ^throw or ^raise statement. Exceptions themselves are great. A stack trace bundled with helpful debug information? What’s not to love? Assuming we have agreed on the technical maxims above: When you ^goto ^{nearest_wrapping_catch_expression_on_the_stack_which_catches_a_type_contravariant_of_this_exception_type} in your library code, you have just installed a bug in your users’ software. That label is unreadable—but that’s what more or less what ^throw does!

Consider an API you might vend (to your teammates, to your customers, to ^$future_you) that gives back a random number most of the time, but say we think there’s a bug with the random number generator for very large values:

fun randomNumberUsually(): Int {
  val randomNumber = ThreadLocalRandom.current().nextLong()
  if (randomNumber < Long.MAX_VALUE - 32) {
    return 4 // source: xkcd.com/221
  } else {
    throw AskMeAgainException("this function does not trust random values near 2^63")
  }
}

delay(randomNumberUsually())

Tragically, it ignores that “very important” facet in your API where you distrust large values of ^nextLong(). You will cause your users to crash now and then as a consequence.

Instead, consider how it would look if you were to use your language’s features to help your users correctly call your special function:

sealed interface RandomNumberUsually {
  data class RandomNumber(val n: Int) : RandomNumberUsually
  object AskMeAgainLater : RandomNumberUsually // You could make this a data class and put an exception (unthrown of course) inside if you want
}

fun randomNumberUsually(): RandomNumberUsually {
  val randomNumber = ThreadLocalRandom.current().nextLong()
  return if (randomNumber < Long.MAX_VALUE - 32) {
    RandomNumberUsually.RandomNumber(4) // source: xkcd.com/221
  } else {
    RandomNumberUsually.AskMeAgainLater
  }
}

This takes more code to write, so why bother? Well, as a good API developer who cares about their users more than they care about themselves, you have now exposed all possible valid return values from this function to the type system so your users’ compilers ensure they don’t have bugs while using your function. Their naive ^{delay(randomNumberUsually())} invocation now will not compile, because they have not handled that crucial rare occurrence of large entropy values.

Here’s what your user will do with that function now:

delay(
  when (val random = randomNumberUsually()) {
    is RandomNumberUsually.RandomNumber -> random.n
    RandomNumberUsually.AskMeAgainLater -> 16 // oh well, I don't think this is so bad
    // There are no more possible responses. The compiler will remind me if the return codes are updated and another is added.
  }
)

They must consider both cases in order to use the value. They can opt out of handling many other cases with an ^{else ->} if they only care about 1 result, and it’s guaranteed that no rogue ^{goto dynamic_catch_label} will bomb their application unless there is a real bug that you never considered, like ^{ThreadLocalRandom::current()} failing. If it fails a lot you might consider catching it internally and modeling it instead as a clean, typesafe return value.

If they encounter the exception bug and realize they need to catch AskMeAgainException later they’d end up with:

delay(
  try {
     randomNumberUsually()
  } catch (e: AskMeAgainException) { // Are there other exceptions I should catch? Remember to check on every version bump...
    16 // this caused an outage because it crashed the server...
  }
)

The rule

Your errors are not special; they are return values. Never raise exceptions across component boundaries.

Nakedly invoke your components without ^try{}, and allow bugs to propagate outward in the manner of a segfault to either crash the program or log an unhandled bug at the very top. Provide components that are safe to invoke without ^try{}.

Within a component you might use tools—language features or libraries—that throw exceptions. That’s fine; you need to catch them and model your component’s return values. You don’t need much more than ^Ok(value) and ^{Error(exception)} in the general case. Returning to where you were called with an error is far kinder to your user than issuing a dynamic ^goto to an unknown label they may not have even provided.

(What is a component? I leave that up to your good judgement. It could be a method, a function, a class or something else; but generally, the tighter-scoped your components are, the better the compiler will do at static verification.)

What do popular languages do?

Some languages already have built-in affordances for this; some have fairly recently expanded support for the paradigm. I certainly did not invent this idea, and here’s some evidence of that in addition to the Kotlin above:

In Rust, you cannot escape ^Result<> and ^Option<>. You could call ^.unwrap() and panic your program, but that would be a sad thing to do! Instead just use the Rust ^? operator and cascade a Result upward from your possibly-erroring function. The ^? operator is basically a special case of pattern-matching explicitly intended for error handling. It supports following a well-behaved stackwise unrolling of failures until something handles it or drops it.

Go has a similar view on error handling as the new multi-language Momento position and it shows up in Go’s idioms. They have an ^error type, and even the linked 11-year-old blog illustrates how useful it is to advertise the error-semantics of one’s frameworks. Notwithstanding the rest of the language, I certainly like Go’s error handling philosophy.

C# added a neat ^switch branch matching expression in 8.0. They have some unfortunate handling of enum values due to int/enum interchangeability but they’re trying, and the language is better for it!

Even Java has embraced pattern matching constructs suitable for safe errors-as-return-values with their new structured typecasting instance of operator. While generally applicable to all kinds of downcasting, this is particularly powerful for ensuring your APIs are predictable and well-behaved in your users’ runtimes.

The C++ community is currently working on pattern matching constructs. The likes of Bjarne Stroustrup and Herb Sutter are talking about this. The writing is on the wall, and soon we’ll have some construct for matching. You could use workarounds available to the language today like variants ^(std::get, ^{std::holds_alternative)}, and I would encourage you to do so in the meantime. The discussion for this feature is subtle but if you’ve got some patience it’s a worthwhile topic. Grab some tea and have a view of a particularly interesting demonstration. This talk even mentions the C# implementation of switch and can help you understand at a deeper level what the whole idea in this blog is about.

Pros and cons

Pros:

Your users will have fewer bugs (in many languages). Code validation is pushed forward to the compiler to statically verify that every known path is handled. This is “cooperative development” with your users as opposed to exception style “antagonistic/gotcha development.”
Your users discover and account for edge cases early.
Your users do not litter try/catch everywhere to try to outsmart your throws.
Your users’ call sites (rather, their usage sites) are explicit, clear, and safe; the compiler enforces this for them.

Cons:

You sometimes type more with Exceptions Are Bugs philosophy (you have to choose sealed types and model your function’s return value domain).
Your users might type more. Common IDEs auto-generate pattern match arms in many common languages though, and if those users need the happy case result they probably should make a choice about what to do with the sad case anyway (see Rust, Kotlin, Go, C++, and Java).
Some languages do not yet have a great answer for pattern matching.
Some people like exceptions. (This philosophy does not wall them off. As an exercise for the reader, can you figure out how you could add an ^getOrThrow() to ^{RandomNumberUsually} above?)

Closing thoughts

When you’re in the weeds writing some code, a network error or a slow disk might seem like the end of the world to you. That’s natural. However, when you’re done and move on to employ that code to solve a higher-level problem, you may no longer care about those possible issues—or you may have a different opinion about how important those issues are.

In Rust, sometimes people call ^unwrap() on an ^Option which might be ^None: If you do this in a library, you’re forcing a dynamic ^panic onto your users when they might have workarounds just in case your library isn’t working (I.e. your errors are not so special that you should crash your users over them).

For the cost of some additional boilerplate code, your users (remember: peers, external or future you) receive the benefit of “obviously correct” code that, to the greatest degree possible, works if it compiles. Like Rust tries so hard to do. ❤️

Let me know what you think—join the Momento Discord!

Kenny Chamberlin

Author

Kenny is a versatile engineer who has a background ranging from the AWS DynamoDB storage & control plane teams and video game e-commerce to microcontrollers, display drivers, devices & video ads. After helping to deliver the Steam Deck he has been at Momento building an easy to use serverless cache.