Tagged unions are overrated.' /> Are tagged unions overrated?

Are tagged unions overrated?

Recently, Nelson Elhage published a newsletter post Tagged unions are overrated. In the post, he describes some of the problems with tagged unions and pattern matching in existing languages, and how you can do a good enough job at having language ASTs and IRs without using tagged unions.

(I recommend reading the linked post before proceeding.)

Now, Nelson is an experienced engineer who has worked on many different things; in particular, he has worked on Sorbet, a type-checker for Ruby written in C++ that is pretty fast.

While I do not have nearly as much experience as Nelson, I’ve been working on a compiler written in C++ for the past ~1.5 years, and my personal experience with C++ has been quite different from Nelson’s. For some rough calibration on my personal preferences: I’d even go so far as to say that language support for pattern matching with binding (that is, binding payloads to names while doing the match) and exhaustivity checking is THE thing that I miss the most when writing C++, and that feeling has not substantially changed over time.

So I figured it would be interesting to dig deeper into why there is a sharp disconnect between my perspective and Nelson’s perspective.

ASTs and IRs

At the beginning of the article, Nelson points out

At this point I’ve worked on a number of compilers, both toy and production, in a wide range of languages, including Go, C++, Java, Rust, and Haskell. With that experience, my considered opinion is this: Tagged unions and pattern-matching on them are vastly overrated for implementing language ASTs and IRs. They are nice to have, but they rarely make a substantial difference in the development of a compiler, typechecker, or similar.

(Nelson also reiterates this later “I want to pause again to emphasize that this opinion is scoped: I think tagged unions are overrated for implementing IRs.”)

Perhaps surprisingly, I agree with this point! (With some caveats. 😉) For the specific case of ASTs and IRs, here is what I’ve observed:

So far, it might seem like I’m almost entirely in agreement with Nelson. What’s the catch?

Caveat 1: There is more to an AST than the node hierarchy

I think one common simplification that people make when talking about ASTs, is they think of a tree structure with nodes of different kinds, along with an is-a relationship between nodes, and they call that the AST.

Which is true, in some sense, but an AST is more than that! Arguably, most of the tricky code is inside the nodes themselves. Tricky code is a recipe for disaster in at least two ways:

  1. It is easy to miss an edge case.
  2. It is difficult for both newcomers and experienced people to understand and change the code.

To give you a concrete example, the Swift compiler had the following types for the longest time: (source)

enum class FunctionTypeRepresentation : uint8_t {
  Swift = 0,
  Block,
  Thin,
  CFunctionPointer,
  Last = CFunctionPointer,
};

/// A class which abstracts out some details necessary for
/// making a call.
class ExtInfo {
  // If bits are added or removed, then TypeBase::AnyFunctionTypeBits
  // and NumMaskBits must be updated, and they must match.
  //
  //   |representation|noEscape|throws|
  //   |    0 .. 3    |    4   |   5  |
  //
  enum : unsigned {
    RepresentationMask     = 0xF << 0,
    NoEscapeMask           = 1 << 4,
    ThrowsMask             = 1 << 5,
    NumMaskBits            = 6
  };

  unsigned Bits; // Naturally sized for speed.
};

(In some ways, it’s even more complicated now. 😅.)

In case you’re not familiar with C++ or the Swift compiler, let me give you a quick summary of what is going on here.

Apart from the complexity around sometimes storing one thing and transiently storing another, there’s even more complexity here that’s not manifest in the code I’ve shown: some combinations are illegal.

This brings us to a couple of operational questions:

If you’re trying to change this structure in a meaningful way, as I was at a certain point, this is (to put it politely) a bit of a headache.

Exacerbating the issue, the API on this type involved changing one “field” at a time, which meant that sometimes you would be forced to transiently create an illegal ExtInfo value in case you wanted to go from one legal value to another (or you could use a less ergonomic API). This meant that when I added a centralized assertion to check for illegal values, it would often be tripped in a spurious way only when compiling very particular funky code!

Now, I concede that one can write confusing code in any language. Language features do not magically stop anyone from writing confusing code. I also concede that this is a somewhat extreme example.

That said, I suspect that if C++ had native support for tagged unions with layout optimizations (such as Swift or Rust), this would’ve probably been written in a much more understandable manner from the outset. Or, even if you really need to hand-optimize the internals like in this example, the lower-level type itself doesn’t need to be part of APIs, one can use a higher-level sum type with a worse layout in APIs with conversions between the two in the most low-level functions.

The point I’m trying to make here is that, not unlike other domains, one often wants to be able to model small parts of an AST or an IR (say 1 field of 1 particular AST node) using tagged unions. Being unable to define sums and use pattern-matching imposes a small recurring tax on whoever is reading the code or trying to change it. Depending on the complexity of invariants, this tax can be surprisingly large for otherwise innocuous-looking data types.

There are ways around this to some extent – for example, you can have more asserts which clarify what invariants hold – but the benefits aren’t quite the same because the barrier of writing code that is understandable is now higher.

If types are like gravity for programs (paraphrasing Conor McBride), a language’s type system is like gravity for the types that you’re actually writing. Having tagged unions and pattern-matching reduces the friction for one of the most common cases of data representation “this or that”. Does that mean that they should always be the go-to choice when you want to represent different cases of something? No. But they’re very often a good starting point.

Caveat 2: It is okay to want to have nice things

Later in the post, Nelson points out that pattern-matching isn’t nearly as powerful as one would like it to be:

My next comment is on the flip side — the limits of pattern-matching in languages that do support it. In my experience, it often ends up being less powerful and convenient than you’d like.

To me, this is a compelling argument that, even if C++ had language-level pattern-matching, it would be necessary to invent our own pattern-matching system to support these complex and non-type-specific matchers, anyways. So what would we really be gaining from that language support?

I agree with Nelson’s point about ML-style pattern-matching (or its variations in popular languages) typically not being very powerful. However, I don’t agree with the argument in the second paragraph, for a few reasons:

  1. I think that ML-style pattern-matching is valuable enough to cover a significant fraction of day-to-day use cases.
  2. Library solutions often don’t end up providing sufficiently good ergonomics.
  3. Having something as core as pattern matching be a language feature makes it easier for different libraries to interoperate.

At the same time, I agree that there is a need for more powerful solutions in case you do need the ability to do complex matches. Thankfully, there is already a bunch of cool work in this area which one can use for inspiration:

I’m sure there’s a bunch of related work that I’m missing here.

Based on these, I think there is quite some headroom for making pattern-matching in mainstream languages more powerful, so that everyone can benefit from it, not just people willing to hand-roll their own pattern-matching libraries (which are often accompanied by gnarly type errors).

Closing thoughts

I think that tagged unions and pattern-matching are powerful tools when working “in the small”. At the moment, they don’t work equally well “in the large”. The node hierarchy in an AST or IR is one example where using classes provides some tangible benefits over tagged unions, although one might think that tagged unions are a perfect fit at a first glance.

When thinking about the power of language features, I often like to think in terms of a “yield point”. In case you’re unfamiliar with the idea, here’s the description from Wikipedia:

In materials science and engineering, the yield point is the point on a stress-strain curve that indicates the limit of elastic behavior and the beginning of plastic behavior. Below the yield point, a material will deform elastically and will return to its original shape when the applied stress is removed. Once the yield point is passed, some fraction of the deformation will be permanent and non-reversible and is known as plastic deformation.

A yield point captures the idea of a point where gradual degradation starts. It’s not an immediate catastrophe, but you’re stretching the system in a way that you perhaps didn’t intend.

Today, given the pattern-matching facilities (or lack thereof) in mainstream languages, the yield point for tagged unions and pattern-matching is relatively low. But that’s not something to be lamented, rather it is a great opportunity. We can push the yield point further by synthesizing pattern-matching facilities in less popular languages and coming up with a design that extends ML-style pattern-matching in a way that provides excellent composability and ergonomics for users of mainstream languages.