Principled Ad-Hoc Polymorphism

Part -1 - Before we start …

When I say “Haskell”, I mean GHC Haskell, not Haskell98 or Haskell2010 or something else. I will also pretend that the Incoherent pragma and IncoherentInstances extension do not exist for the sake of this post as they are used very rarely.

A substantial portion of this post is based off Modular implicits,Modular implicits - Leo White, Frédéric Bour and Jeremy Yallop.

which discusses various tradeoffs in the context of OCaml.

Part 0 - What the heck does the title mean?

Quoting the Haskell wiki:

Ad-hoc polymorphism refers to when a value is able to adopt any one of several types because it, or a value it uses, has been given a separate definition for each of those types. For example, the + operator essentially does something entirely different when applied to floating-point values as compared to when applied to integers – in Python it can even be applied to strings as well.

Okay, that doesn’t sound very “principled”. So what is up with “principled” in the title? Well, “principled” for the sake of this post means 2 things:

  1. Sub-expressions can be assigned types. As a counter-example, in C++, if you had a function plus with the overloads as for Python, it is not meaningful to speak about the type of plus; you can only speak about the types of the individual overloads themselves.

  2. All type-related information that is accessed within a function body (either at compile time or at run time) is reflected in its type signature.

    In other words, principled ad-hoc polymorphism is similar to bounded parametric polymorphism.You might ask: why don’t I just use the term “bounded parametric polymorphism” instead of “principled ad-hoc polymorphism” then? The answer is that we will usually be interested in having some fine-grained way of controlling instances, so parametricity won’t be quite applicable once we do that.

Part 1 - Coherence

There are several ways to go about adding some support for ad-hoc polymorphism to that shiny type checker you’re working on. A natural question to ask is:

What are the possible options in the design space that have already been explored, and what are the ones being considered? What are the tradeoffs involved with each approach?

Before I can try to answer that question, we need to go over a key definition:Type classes: confluence, coherence and global uniqueness - Edward Z. Yang.

Coherence: A type system is coherent if every different valid typing derivation for a program leads to a resulting program that has the same dynamic semantics.Type classes: an exploration of the design space - Simon Peyton Jones, Mark Jones and Eric Meijer, 1997.

1

In looser wording, if there are multiple ways to assign types throughout the program, all the ways lead to a program with the same runtime behaviour. Whether you take the road less traveled by or not, it does not matter.

Notice that this definition of coherence does not involve any notion of type classes or traits or modules.

Hopefully, you will agree that coherence is a nice property to have; it would be odd, if not downright nasty UX, to have the program run differently based on arbitrary choices made by the type checker.

Part 2 - Canonicity

Note: The definitions I’m using in this section are not standard. If you know of some standard definitions for the terms, please let me know!

In Part 0, I said that style of ad-hoc polymorphism we are interested in here is like bounded parametric polymorphism. Usually we’d like to be able to name the bounds corresponding to something sensible. Consider the following pseudocode:The terminology varies from language to language 🙁. Haskell and Purescript use “type class” and “instance”, Rust uses “trait” and “impl”, Swift uses “protocol” and “extension”. I’ve chosen “bound” as there is a direct connection with bounded quantification.

bound Display (T : Type) {
    val display : T -> String
}
instance Display String {
    let display s = s
}
instance Display (List T) where (T : Type, Display T) {
    let display v = List.sepBy "," (List.map display v)
}

Instances correspond to bound-type-constraint triples. In the first instance, there is no constraint, whereas in the second instance, there is a constraint that the type variable (aka generic parameter) T in the type List T must have an instance for Display.

Say you’re working on an application and want to display some error message. You might want the display the message differently to users, compared to the display in the test suite or in logs. Are you allowed to have multiple instances for Display MyErrorMessage? This brings us to canonicity:

Canonicity: At most 1 instance is compiled for a bound-type pair.

When canonicity is enforced, there are two kinds of instances that are of particular interest:

Orphan instances: Instances are usually placed with either the definition of the corresponding bound or the corresponding type; instances present in other modules (I am using the term “module” as shorthand for “aggregations for definitions” here) are called orphan instances.

Overlapping instances: If there are instances for which a substitution of type variables can lead to ambiguity on which instance can be picked, they are overlapping instances.

Different languages make different choices here.

Part 3 - Problems with demanding canonicity

Duplicate functions with implicit and explicit arguments

In Haskell, Purescript and Rust, it is not uncommon for the standard library to have duplicate versions of functions - one which implicitly takes arguments via instances and another _by/By version which explicitly takes a higher-order function instead. For example, Haskell’s base library has:

sort :: Ord a => List a -> List a
sortBy :: (a -> a -> Ordering) -> List a -> List a

If you are a library author, offering just one kind of interface means that library users need to potentially go through more trouble. If you offer the implicit argument version, then a client might have to use single-element records (aka newtype wrappers) to use the API. If you only offer an explicit argument version, then a client has to explicitly pass in the relevant instance functions themselves.

This also doesn’t scale once the number of type variables or constraints increases.

Workaround(s): None.

Non-modular development (orphans or overlap disallowed)

If orphan instances are disallowed, and if package PB defines a bound B, and another package PT defines a type T, then one must know of the other, as a third party cannot write their own instance for the B-T pair. This problem commonly crops up when (de)serialization and random generation of values (such as in QuickCheck-esque libraries) are done using bounds+instances.

Similarly, if overlapping instances are disallowed, then all the logic for instance selection (via instance chains or otherwise) must be concentrated in one place. Once that is handled, there is no wiggle room for adding different logic in an external location (due to canonicity).

Workaround(s):

  1. In the case where orphan instances are disallowed, one can either create newtypes and write instances for them, or one can use feature flags (in either PB or PT) to add instances. Neither of these are without their own problems.

  2. In cases where overlapping instances are disallowed, adding newtypes may or may not be sufficient for writing new instances.

Troublesome Orphans (orphans allowed)

If orphan instances are permitted, then ensuring canonicity is more expensive, as you need a global check to do so. This corresponds to a long open GHC ticket #2356.

If the corresponding checks are skipped (for performance reasons or otherwise), one needs to be particularly careful around using orphan instances as the compiler doesn’t have your back.

Workaround(s): Be particularly careful around orphan instances.

Mental complexity with selection rules (overlap allowed)

  1. Understanding the rules as an end-user - In Haskell, overlapping instances do not work for matching instance heads. This is sometimes a source of confusion for someone trying out overlapping instances (SO 1, SO 2).

  2. Possibly unexpected “time travel” - code written in a compilation unit can change the semantics of code written its dependencies:

    module M1 {
        bound Display (T : Type) {
            val display : T -> String
        }
        instance Display (List T) where (T : Type, Display T) {
            let display v = sepBy "," (map display v)
        }
        val display_m1 : Display T => List T -> String
        let display_m1 v = display v
    }
    
    module M2 {
        import M1 (..)
    
        type X = NewX
        instance Display X        { let display _ = "Generic." }
        instance Display (List X) { let display _ = "Special." }
    
        let main () =
          print (display [NewX]);
          print (display_m1 [NewX])
    }
    
    M2.main ()

    The equivalent Haskell program prints Special.Generic. with ghc 8.6.1 (manual) (giving up coherence), whereas the equivalent Rust programWriting an equivalent Rust version is a bit tricky as Rust doesn’t allow orphan instances (for writing the impl for Vec<T>). So the trait needs to take an extra parameter to get the same effect.

    prints Special.Special. with rustc 1.34.0-nightly (giving up separate compilation).

  3. Working with weaker assumptions in the type-checker - One needs to consider all the possible cases where overlap might affect typing judgements and be careful about edge cases. This becomes more complicated when more features like associated types/associated type families are thrown into the mix.

Workaround(s): For the “time travel” problem, one could use dynamic dispatch (like virtual functions in OO systems) to have both coherence and separate compilation at the cost of reduced efficiency.

Zero cost newtypes are tricky

Newtypes are not always zero cost. If you have a vector of newtypes, then you can’t convert it to a vector of the underlying type with no allocations in a type-safe manner unless you explicitly add in type-safe coercions.

Haskell adds support for type-safe coercions via the Coercible type class. However, this adds a non-trivial amount of complexity to the compiler implementation (look up GHC’s role system if you’re interested in the details), and creates more room for bugs.

Workaround(s): Use unsafe coercions in a controlled manner.

Part 4 - Non-canonicity

As you might already know, not all languages enforce canonicity. Scala’s implicit objects and OCaml’s modular implicits are two examples where coherence is maintained but canonicity is not.

We can still talk about orphans and overlap here:

Part 5 - Problems with eschewing canonicity

Types need to keep track of corresponding instances

Note: This is sometimes called the “Set Ordering problem” or the “Hash table problem”.

If canonicity is guaranteed, the signature of a set union function will look like:

val union : forall a. Ord a => Set a -> Set a -> Set a

However, in the absence of canonicity, this function cannot be implemented as the two sets might have been created with different Ord instances.

OCaml solves this problem using parameterized modules (in the absence of implicits). Set acts as a parameterized module that takes in an Ord module (say IntAscendingOrd) and creates another module (say IntAscendingSet) with a set type and functions (such as union) which operate on that type. If you apply Set to another Ord module (say IntDescendingOrd), that creates a distinct module (say IntDescendingSet). IntAscendingSet.union cannot operate on values of type IntDescendingSet.t, so crisis is averted. This strategy carries over to implicits as-is. Hence, the signature of union becomes:

val union : {O : Ord} -> Set(O).t -> Set(O).t -> Set(O).t

Workaround(s): With a little bit of practice (you must educate your users!), you can get the hang of writing APIs without such soundness issues.

Less well-studied in mainstream languages compared to type classes

If you want to implement type classes, you can read the various papers related to Haskell’s type classes, as well as play around with the implementations in Haskell, Purescript, Rust (traits), Swift (protocols) and more.

On the other hand, if you want to forego canonicity, the two mainstream languages that you can study are Scala and OCaml.Brendan Zabarauskas points out that Idris interfaces, Agda’s instance arguments, Coq type classes and Lean type classes all do not force canonicity.

Moreover, the OCaml implementation is very much work-in-progress (there is a prototype though); there isn’t a paper with the technical details of type checking for it yet.

Workaround(s): None.

Path-independence proofs

Alt heading: Diamonds are a PL designer’s worst enemy.

  1. Picking instances through different paths - This is very similar to the problem that arises in the presence of multiple inheritance. For example, say Traversable and Monad are two bounds that depend on Functor (either via composition or via inclusion/inheritance, it doesn’t matter):

    -- Arguments that come before '=>' are implicitly passed in caller code
    val map : Functor f => (a -> b) -> f a -> f b
    let map Ff f x = Ff.map f x
    
    val contrived_map : Traversable f => Monad f => (a -> b) -> f a -> f b
    let contrived_map Tf Mf f x = map     g x
                                    --^^^--
               Should we pass in Functor from Traversable or Monad?
               How do we assert that both of them should be the same
               in the absence of canonicity?

    While the above example is contrived (for brevity), such a situation comes up very frequently in Haskell code, where Monad is a common subclass across several different constraints.I haven’t written enough Purescript or Rust code to say the same there.

  2. Deriving instances through multiple paths - If bounds/instances utilize/permit subtyping, then derived instances can lead to ambiguities:

    instance Eq (List T) where (T : Type, Eq T) { .. }
    instance Ord (List T) where (T : Type, Ord T) { .. }
    -- instance Eq Int is unnecessary as we can up-cast the Ord instance
    instance Ord Int { .. }
    
    let _ = equal     [1, 2] [2, 3]
                --^^^--
        Should we create Ord (List Int) and then up-cast to Eq (List Int)
        or up-cast (Ord Int) to Eq Int and then create Eq (List Int).

    This problem can be avoided in multiple ways:

    1. Using composition so up-casting doesn’t work. So you’d need to define Eq Int separately and only that gets used to create Eq (List Int).
    2. Getting rid of implicit up-casting for implicit arguments. Again, you can manually write Eq Int, which gets used to create Eq (List Int).

Workaround(s): To solve problem 1, in Scala, one can use composition + implicit conversions. This is called the Scato encoding.The Limitations of Type Classes as Subtyped Implicits (Short Paper) - Adelbert Chang.

In my opinion, this approach cannot work for large hierarchies as the boilerplate required scales super-linearly. On a more fundamental level, what we want to be able to say is “prove that these N derivations lead to the same outcome and give me a type error if that isn’t the case”, whereas the Scato encoding says “here’s the most privileged derivation, please prefer it over other derivations”.

Speculation: It might be possible to have a module system with a merge operation where the merge succeeds only if the common exports (types and terms) are provably equal.This would be similar to MixML’s with operation except that MixML disallows common exported terms.

Equations are propagated through applicative functors, transparent ascription, manually written equalities and common inclusions.

No inference for implicit parameters

Haskell and Purescript support inference of constraints which can be helpful if you’re not exactly sure on what’s needed - you can ask the compiler to fill in the details for you. Implicit parameters cannot be inferred in Scala or OCaml, which can be disappointing.

Workaround(s): None.

Part 5 - Problems independent of canonicity

Nominal vs structural bounds

Bounds usually have a “subclassing” mechanism associated with them. If bounds are nominal, this means the hierarchy cannot be modified without modifying all the code dependent on it. For example, Haskell originally didn’t have Applicative as a superclass of Monad (and Semigroup as a superclass of Monoid), so adding the superclasses were breaking changes. Historical artifacts of this can be seen today in the form of redundant functions (return and pure, *> and >>). The semigroupoids package contains additional classes Apply and Bind, but these cannot be treated as superclasses of Applicative and Monad respectively.

Structural bounds don’t suffer from these issues. Clients are free to “project out” subsets without worry, if one uses subtyping via inheritance/inclusion. However, subtyping comes with its own set of problems as discussed earlier.

Part 6 - Closing thoughts

Debates and discussions on type classes vs modules (or other things) often end up re-iterating the same points in several distinct ways. Newer systems with traits/protocols complicate the design space further (at least, superficially). Formulating the question of “What system do I use?” in terms of key desirable properties and resulting incompatibilities is a useful exercise that highlights the commonalities and distinctions between different systems.

No points for guessing that there isn’t a clear “right answer” here. Middle-of-the-road solutions such as in Dotty #2047 might be worth exploring in other contexts too.

Lastly, there are a couple of references that I intended to talk about but didn’t. You might still want to look at those.Type classes vs. the World [video] - Edward Kmett.

scala/dotty PR #5458 - An Alternative to Implicits - Martin Odersky.


  1. It seems the Rust community uses the same terminology to mean something different here (although as Edward Yang’s blog post shows, there was enough confusion about it in the Haskell community at some point to warrant a blog post). If you look at the explanation for Orphan Rules or Rust RFC 1023 or the Little Orphan Impls blog post, the Rust community uses the term “coherence” akin to Yang’s “global uniqueness of instances” or more succinctly “canonicity” (this post and Modular Implicits1).

    Similarly, Martin Odersky's comments in the context of Scala (Dotty #4234, Dotty #2047) seem to use the term “coherence” as shorthand for “uniqueness of instances”.↩︎