My Blogs

Async/Await Is Real And Can Hurt You

Aysnc/await in Rust is a couple years old now. Personally, I was very much into async Rust at first. Over the years, I've come to slowly dislike it. Now, I actively avoid it. In this article I will try and lay out the reasons for that. I have written plenty of async Rust code (about 100k lines async Rust, 50k non-async). I was partially inspired by many others who have similar thoughts about async/await.

Async/await History

Let's start with a quick refresher of the "story" of async/await and how it came to be (or at least how I understand it): Around the turn of the century, as the web started to grow and more and more people came online, there was a need for faster web servers. More specifically, developers wanted to optimize the maximum number of concurrent clients a web server could serve. This problem was dubbed the C10k problem. "How to serve more than 10 thousand requests simultaneously?". The most basic web server works like this: It opens a server socket, accepts incoming connections, starts a thread for each of them. The thread then handles the request, and stops when it is done. If it wants to do some I/O, like sending or receiving, it will just wait for that operation to complete, it will block. The OS will handle the scheduling between these threads. This approach is not optimal. Spawning a thread is somewhat slow, and handling 10k simultaneous connections means doing it a lot. To get better performance, we need to reach for non-blocking I/O. On Linux, non-blocking I/O is provided by poll and epoll. The latter of which is now the de facto standard (io_uring is more modern and gaining traction). On Windows there's IOCP, and on macOS and the BSDs, there's kqueue. All of these are roughly similar: They are just a list of sockets. The programmer adds a socket and what operation it wants to do. Then the program polls the list as a whole. Any time any of the sockets are "ready", they are notified. This way, many sockets can be managed from just one single thread. This is much faster since there is no need to spawn OS threads.

This paradigm has an issue though: It is much more complicated to program. Before, you had this nice per-request linear function flow. But with non-blocking I/O, the programmer has to manage the state of all these sockets from a single point. A possible solution to this is the callback-based approach. For each event, you pass in a closure that handles that event. The main thread just calls that callback whenever it has polled an event. This is the approach NodeJS took, and still does to this day. Unfortunately, it has some usability problems too, colloquially known as "callback hell". It even has its own website.

That's were async/await comes in. With async/await you can have your cake and eat it too. It allows the programmer to write their code linearly like before, but it will use these non-blocking I/O mechanisms under the hood. As far as I can find, async/await was first introduced in F#. But it was popularized by C#, which brought it to the masses (the programmer masses, that is). That's also where I got first introduced to it.

Async/await Primer

A quick primer on async/await: Async/await is three things: async functions, async tasks and the async runtime. I will use the web server analogy to explain how async/await relates to the C10k problem, and non-blocking I/O: First, the web server starts. It creates a socket. A connection comes in. Without async/await, we'd have to register it with epoll (or similar) manually. In the async/await paradigm, we start an async task. This is like a thread, except much faster because it is not managed by the OS (no syscall is needed). We receive data from the socket using an async function. Instead of calling it like normal, we use await. A lot happens: Under the hood, epoll was used and the receive operation is added to the list. No blocking happens at all. Instead, the async runtime switches to some other async task, one that does have data available. Or maybe it switches to the original async task that is awaiting new client, since one is available. The async runtime can do all this from a single OS thread. Using this model, we have best of both worlds: The code looks and feels like a normal "linear" program. Under the hood, we really are using epoll and friends, so we have no "blocking" code (in the OS thread sense). And no threads need to be spawned at all.

There is more. First, we can shoehorn file I/O and timers into this paradigm as well (actually, we kind of have to, for reasons explained later). Second, async functions can be composed. For example, we can do await race(receive_reply, timeout(10)). This composes two async functions, and creates a new one that will either receive_reply or timeout after 10 seconds, whichever comes first. Async functions can be composed in many ways, and it is very convenient.

If you peel of all the layers of abstraction, async/await is similar to what NodeJS did with its callback-based approach. The only differences is that it looks linear. In reality, async functions together form a complex state machine that allows all of this to happen on a single thread. This state machine is inherently there when using non-blocking I/O. When using epoll or another non-blocking I/O primitive, you program the state machine yourself. In the callback-based approach, the state machine is encoded into the callback dependency graph. This is the root cause of callback hell. In async/await, it is hidden away by the interpreter or compiler, it is implicit. Whatever abstraction we use for async/await, in the end they all encode a state machine of some sort.

In the remainder of this post I will focus on Rust. This is for two reasons: The first is that I know Rust async/await best. The second is that the problems of async/await manifest more in Rust compared to other programming languages. I want to emphasize that Rust's async/await is well implemented. Async/await is inherently incompatible with a systems level programming language. Any implementation would be problematic. Regardless, the Rust team did an incredible job of implementing it best they could. The async working group is incredibly productive and consists of very smart people. I especially enjoy Niko's blog and I'm a big fan of withoutboats, who is probably the single most authoritative person on the matter. That said, most of my arguments will apply to other programming languages with async/await as well, in some form or another.

Async/await has good parts

Let's start with the good parts. There are two things about async/await that are good: (1) async/await futures are composable and (2) async/await actually has a very good niche use case in embedded programming.

Futures are composable. Using select, join, race, and friends makes it really easy to manage many async tasks and extract their results in a useful way. This is not impossible with OS threads, but there is no standard library support for it. The easiest way to mimic the same behavior in non-async code is by using channels to communicate between OS threads. The excellent crossbeam library has support for composing channels similar to async/await. This also incentivizes the pattern of initializing a fixed number of threads in a thread pool, and scheduling work on them, then using channels to relay results back to where they are needed. This pattern works really well and is faster than async/await since there is no overhead of spawning tasks/threads at all.

Futures make it easy to manage timeouts and cancellation. Wrapping any async future in a timeout is trivial. You can use select or race to wait for an async operation, or a signal from another task. The sync Rust alternatives do exist, but I will admit they are not as straightforward. For example, sockets in the Rust standard library have functions like set_read_timeout. You can also use signals to cancel blocking operations. And on Linux there is pthread_kill. These operations are hard to use, and easy to get wrong. Async/await wins over threads in this regard. I believe this can be fixed relatively easily though: By introducing the same composability features to OS threads. There is no fundamental reason why that could not happen.

Async/await works well for embedded programming. Many of the arguments against async/await rely on the fact that most of what it does is already handled by the OS. What if you did not have an OS? Async tasks work out really well in embedded programming since there are no threads at all. In that case a framework like embessay can provide much of the missing OS functionality without introducing too much overhead. Moreover, a lot of the footguns (the std::sync::Mutex vs tokio::sync::Mutex thing for example) disappear since the scheduler is single-threaded (or zero-threaded, really). If you are an embedded programmer then most of this blog post does not apply.

Now onto the bad parts...

Async/await is strictly worse than OS threads

After all this talk about being able to handle thousands of clients from a single thread, the following might come as a bit of a surprise: In practice, async code usually runs from multiple threads. As much as possible actually. In the default tokio configuration, the async runtime will schedule tasks across many threads, to maximize performance. The claim-to-fame of async/await is to be able to program non-blocking code linearly. In terms of how the code looks, this is true. But it is not single-threaded code. So all the usual ceremony of multi-threaded programming still applies. You will still need to use Arc and Mutex everywhere. Even worse, you now need to choose between std::sync::Mutex and tokio::sync::Mutex. Picking the wrong one could adversely affect performance.

When programming async Rust, you must hit an await point every 10 milliseconds. Not doing this can easily wipe out any theoretical performance benefits async tasks have over OS threads. The reason for this is that in Rust, the async runtime can only actively schedule on await points. So by not hitting one, you are depriving the scheduler from behaving optimally. The OS scheduler has no such limitation, since it is the OS. And the OS can do whatever it wants. Presumably, this is one of the reasons the scheduler was made part of operating system in the first place. The every 10-millisecond rule is a footgun for beginners since you just have to know about it to avoid it.

Since not calling await often enough is mostly due to doing CPU-bound work in async context, it is generally advised to spawn a separate thread pool for CPU-bound workloads. If find this very interesting because the original promise of async/await is that you'll be able to linearize the flow of you programs while still enjoying the benefits of async I/O primitives. But now, if your program flow includes CPU-work anywhere, you will still have to deal with all the synchronization and non-linear control flow patterns associated with multi-threaded programming. Right back were we started.

Then there is cancellation safety. As it turns out, composing stateful async tasks can lead to horribly subtle bugs. I have personally had to deal with a couple of these and they are not fun at all. It also feels very unlike Rust to have such hidden footguns that are not protected against by the language.

Cancellation safety has the fun side effect that leaking a future, which is a safe operation in Rust, can now manifest all kinds of unexpected bugs such as a mutex that is forever locked. To be fair, this issue is unlikely to crop up in real code, since you have to go to pretty great lengths to leak a future.

Furthermore, iterators do not work (well) for async functions. You cannot use map and friends, but luckily you can make it work with futures::join_all. You'll just need a separate crate for it. To use it, you will need to understand the underlying concept of what a Future is. And this won't be the last time you unexpectedly get a peek under the hood of the "hidden" state machine. Meet Pin. Sooner or later you will be writing code like this: Pin<Box<dyn Future<Output = ()> + Send + '_>> (actual production code).

Standard Rust threads can be "scoped". Tokio tasks do not support it. This one is also likely never going to be fixed, since there is a fundamental issue making it impossible.

Async drop is still missing, and it is making it very hard to do proper shutdown in async code. The problem of gracefully shutting down is not exclusive to async Rust. It is hard with OS threads too. But the lack of async drop makes it just that much harder. In normal Rust code you can apply RAII to ensure that resources are properly disposed of. If you have some object whose cleanup requires calling and async function, you are out of luck.

In sync Rust, you can do:

struct TemporaryFile { /* */ }

impl Drop for TemporaryFile {
    fn drop(&mut self) {
        let _ = std::fs::remove_file(self.path);
    }
}

In async Rust, that is not possible since tokio::fs::remove_file must be called from an async function, and there is no async fn Drop (yet). There is no quick fix for this problem. Actual sane solutions require at least an entire crate of ceremony.

In my own code I used a something akin to a CleanupService pattern: The service runs in the background and runs a worker thread in async Rust that executes cleanup jobs. The jobs are callbacks that are registered in the drop handlers of async objects. They are communicated of a CleanupToken that communicates over a channel to the CleanupService. That implementation alone was more than 100 lines of code and also wins the async bingo: * Needs Arc in two places to overcome lifetime restrictions. * Needs extra logic to prevent issues with cancellation safety (discovered through multi-day horrible debugging session). * Had a synchronization bug that appeared 1 in 10000 times which made it stall completely (discovered after a multi-hour medium horrible debugging session). * Uses impl Future<Output = ()> + Send + 'static twice. * Uses std::thread::panicking from since it has issues with unwinding (yes, really!).

Others have solved it somewhat more creatively (by blocking in an async context, which is the one thing you should not do). I don't blame them. It is a hard problem to solve. Unfortunately, it is unlikely that async drop will ever come to Rust.

Async traits are partially there, but with some caveats. These are likely to be fixed in the future though.

The Rust compiler is not that good at determining what goes into a future's closure. (They are working on it.) Because of this issue, your futures might get too big, which has serious performance consequences. Did you know Tokio will sometimes put your tasks on the heap when it believes they are too big?

I want to stress here that almost all of these issues are specific to Rust. They are caused by an inherent dichotomy in async Rust: To make async Rust more "ergonomic", more abstractions need to be introduced. The more abstractions, the more hidden the underlying mechanisms become. Explicitly high-level languages such as C# have a much easier time, since they can get away with more abstractions. As Rust becomes more of a systems programming language, it becomes less suitable for async/await. The two goals simply do not align. In its current state, there is already a lot of hidden machinery behind every .await. That amount will only grow. At the same time, Rust will continue to cater to low-level programmers, and thus all the bits and pieces need to be exposed.

For each of these issues there are workarounds. They all require having deep understanding of how async works in Rust. Most of these issues are fundamental, and may not even be possible to solve.

Async/await is almost never faster

On modern systems the overhead of spawning an async task (green thread) versus a normal one is much lower than it was 25 years ago. Benchmarks now show that the performance difference is smaller than expected. Often somewhere in the 10-30% range. Take into account that most of these benchmarks have "dumb" request handlers, that do little work. Because of this, the cost of thread creation becomes relatively large. We can safely assume that most real software will spend more compute per request compared to these benchmarks. Thus the overhead of spawning threads is most likely even lower in practice. Even if you were able to realize a 20% performance benefit by using async/await, does that make up for introducing a new programming paradigm, a scheduler, a runtime and an entirely changed standard library into your code?

So how about the web servers that run the web right now? Interestingly enough, nginx is written in C, and does not use async/await. Same for Apache. Those two servers happen to be the most widely used web servers and together serve two thirds of all web traffic. Both of them do use non-blocking I/O, but they do not use async/await. They seem to do pretty well regardless. (Note that my beef is specifically with async/await, not with non-blocking I/O.)

If you really need a custom server, and you are a startup, scaleup, or medium size business: You are probably maintaining somewhere in the order of 1-100 servers, most likely in the cloud. If you really need this extra 20% to not have your service go belly up, you are already severely under-provisioned! Stated differently: If somehow 20% extra performance could be the difference between your service staying up or going down, you are doing something wrong. If you think 20% extra performance is worth it to reduce your cloud/server costs, think again. At this scale you can provide more value by implementing useful features for your customers instead of dealing with Pin<Box<dyn Future<Output = ()> + Send + '_>>.

If you really need a custom server, and you are a big business, things are different. You are managing hundreds, thousands or tens of thousands of servers. The 20% might actually make a difference in your bottom line. So, what web server are you using? At this point I would consider building your infrastructure on top of established server technology, but maybe you have no choice? Also, your workload must be highly I/O bound, or async will not have a measurable effect. If all of this is true, you are part of a very small minority for whom using async/await is a good idea. If you, dear reader, are part of this minority, there you go, go and use async/await.

If you, for some reason are writing the fastest web server in the world, for sure you would not want the abstraction that is async/await between you and the kernel anyway. Better to speak to io_uring directly.

If you are a hyperscaler, you are not using async/await. For a hyperscaler, the cost of managing their server infrastructure may be in the billions. Async/await is an abstraction. You'll not want to use it. To really get the most out of your performance, you are better of modifying the kernel to your needs. You may build your own server OS. You are probably working on an in-house port of the Linux kernel with all kinds of bit-twiddling to get that last 0.1%. I don't think these people are using async/await, and for good reasons.

The most heard counter argument is something along the lines of "a typical application WILL spend most of it's time in a wait state. [...] And async/await moves all this wait from thread blocking to just callbacks, and you can use 10-20 OS threads to handle thousands of requests". This argument is flawed. If your application spends most of its time in wait state and is highly I/O bound, then OS threads would work too. When a thread blocks on a read for example, it is put to sleep and only activated once data arrives. Quite similar to async tasks actually. OS threads do use more memory, and spawning them is a bit more costly, so if you really need the last 20% of performance, then it may make sense. But again, for most use cases the trade-off is not worth it.

I actually see the above misunderstanding a lot in arguments about async/await. Async/await proponents will often present the async scheduler as somehow superior since it will switch to a different task when the current async task is blocking. In reality, your OS scheduler does the exact same thing, and probably better. Schedulers are not a new concept.

I want to emphasize that the above arguments on their own are not enough to kill async/await. Had async/await programming been without costs, none of the above would matter. Even if you did not really need the extra 20%, if it were free, you should take it! But it's not. There are tangible, pervasive and real costs to using async/await. Nonetheless, there's a very small subset of software that benefits from async/await. But it is a small group. Just take a moment to appreciate just how small this group of developers is. You have to be (1) working at a large organization, (2) be working on a custom web server (3) that is highly I/O bound. An entire language feature was dedicated to this pretty niche use case. (Note that I am excluding everyone that has fallen victim to async/await because their (server) library forced them. I will get to that later.)

Async/await's preferential treatment

One of the core reasons that async/await remains widely used is this: Proponents of async/await overstate the usefulness of it, and make the group that benefits from this feature out to be much bigger than it really is. I am quite sure that more than 90% of developers do not need async/await in any meaningful way.

And even if we put a very generous percentage on the number of developers that benefit from async/await, let's say 25% (I believe it is much, much lower, but anyway). That's still a really low bar for implementing a language feature. Would the for loop exist if only 25% of developers had any use for it? I would argue that most other language features like functions, structs, generics, if/else, match statements, enums, the standard library, all of them are useful for the vast majority of developers. They are broadly applicable and there is no discussion about that. Async/await stands out in this regard. It is a feature for a very specific subset of developers, but somehow earned its place among other built-in language features. Can you come up with another built-in language feature that only benefits a minority subset of developers?

This is quite unfortunate because the feature also consumes a significant amount of developer resources in the Rust project. Time that could have been spent elsewhere.

So, why has async/await been given preferential treatment? Why is it so likeable? I think part of the reason is that it has a certain computer science aesthetic to it, one that is hard to argue against. That is the reason I liked it in the first place. For someone who is into programming, async/await is (in theory) quite a beautiful idea. It encodes an entire state machine into your code with just a bit of syntactic sugar. Developers are often perfectionists. Being able to squeeze out the maximum performance out of your application, when using the most advanced non-blocking I/O primitives the OS has to offer, all from a single thread, and the code basically does not have to change at all? Sounds to good to be true. And it is. The problem is that the theory does not translate to practice. The real world is messy, and the internal bits and pieces of that beautiful state machine show up everywhere. Or, as put more eloquently in my favorite programming manifesto:

grug understand all programmer platonists at some level wish music of spheres perfection in code. but danger is here, world is ugly and gronky many times and so also must code be

Async/await is poisonous

By now most are familiar with colored functions. It is a fantastic blogpost and you should read it now if you haven't already. I'm not going to reiterate its points here. Instead, I will focus on another way that async/await is poisonous: Anytime a library adopts async/await, all its consumers must adopt it too. Aysnc/await poisons across project boundaries. There are some nuances here. Technically, there are ways to deal with this. In practice though, poisoning happens all the time.

Take Rust's axum for example. It is a great library for writing servers. But it forces async/await upon you. It is built entirely with async/await in mind. Even if you are building a small internal tool, you will use async/await, and all of its warts will come with it too. Most modern Rust server libraries do this (rocket and actix-web too). Most servers don't actually need this async/await. They are still getting it. And anyone using one of those libraries is too.

For servers it is bad. But at least there is some use case for it. For HTTP clients though, I hardly see any use case. Yet, the de facto standard reqwest defaults to its async version. It does have a "blocking" version, but it wraps the async version to achieve this goal. And you have to enable it manually. For most applications doing some kind of request to a server, there is no need for multiplexing and non-blocking I/O at all. It will actually make the code more complicated, since now you have to deal with cancellation, synchronization and the async runtime.

The poisonings continues. Someone builds an OpenAI API crate on top of reqwest. reqwest is async, so naturally, async-openai is too. Someone else builds a general purpose LLM crate that uses async-openai. Of course, when someone else writes an API for some AI provider, and they want it to be part of llm-chain, they make it async too. There's no other option at this point. No one has stopped to ask, why? Your tiny CLI app now has an entire async runtime, an implicit state machine, Send and Sync bounds, async move, Box::pin and a bunch of other async goodies. But what for? Are you going to be sending OpenAI ten thousand simultaneous requests?

As far as I can see, the poisoning has now gone airborne and crates are getting infected by async/await through mere word-of-mouth. Please see this async chess engine. Or a code editor using aysnc/await. Or this terminal git client.

It has gotten so bad that "async" has become somewhat of a marketing term, being synonymous with "fast" and "good", even when it makes no sense. The quite popular (and quite good) terminal file manager yazi for example boasts: "Full Asynchronous Support: All I/O operations are asynchronous, CPU tasks are spread across multiple threads, making the most of available resources.". We're talking about a file manager here. Maybe the author is not aware that almost the entire tokio::fs API does not actually do any async I/O at all (in the event loop sense). Instead, it uses spawn_blocking for most of the operations, and schedules operations on the built-in tokio thread pool. This means the authors of yazi could have easily prevented introducing async/await in their code without losing any performance at all. They could have used rayon or any other thread pool implementation to offload any blocking operations to. The code would be much simpler, and effectively equivalent to what they are doing now.

Conclusions

I think async/await is not worth the cost. It is not that there no benefits to it at all, there are some. But the trade-off is usually not worth it. Async/await is complex paradigm that consists an incompatible parallelism construct, a scheduler and an different standard library. The more ergonomic it becomes, the more of this machinery is hidden away from the user, This makes Rust less suitable for low-level programming, where the programmer must be able to trust they understand what is happening behind every line of code. The complexity of async/await and its machinery leads to misunderstandings, compile time and runtime errors that are hard to understand, and hidden performance issues that are very hard to debug. It also introduces extra footguns and hiding places for bugs in your code. For most use cases, it does this at no extra measurable performance benefit. At the same time it distracts from more useful parallelism methods. Regardless, it has taken over much of the industry and many popular crates assume it as a default, forcing programmers to use it, even when they do not stand to benefit.

Key take aways:

  • Ergonomic async/await cannot be achieved in a low-level/systems programming language.
  • Async/await is a dangerously leaky abstraction that hides too much, and shows too much at the same time.
  • Async/await is riddled with little problems, hidden footguns and inconveniences.
  • Async/await will not deliver any meaningful performance benefit for more than 90% of use cases.
  • Async/await is poisonous, even across crate boundaries, and you will be forced to use it even when you don't benefit from it.
  • Async/await has become a cargo cult marketing term, and is now used in projects where it makes no sense.

Recommendations

In an ideal world, I would like to see async/await being removed from Rust. I know this will not happen. Plan B would be:

  • Discourage the use of async/await and properly document the use cases for which there is actual evidence it works.
  • Library authors (especially of popular crates) should provide a sync API first, and optionally an async API behind a feature flag that is off by default.
  • If you already have a project using async/await, challenge yourself to prove that you are benefiting from it: How many of your bugs have something to do with async/await? How many fewer or more lines of code would your project be without async/await? Are you sure your project is faster due to using async I/O (it very likely is not)? How many times have you struggled with async Rust and lost valuable time on a problem that would not exist in sync Rust? Be honest to yourself when answering these questions.
  • Programmers should refrain from using async/await, especially in a systems programming context (with the exception of embedded programming).

Lastly, I have been looking into getting into Zig for a while now. But I have not had the time for it yet. During my research I found that Zig actually removed async/await because they were not able to fit it in nicely. This makes sense since it seems Zig highly favors the "what you write is what you get" mentality. I believe that if Zig wants to be a serious contender in the low-level/systems programming space, it should avoid async/await like the plague.