Async Rust in Practice: Performance, Pitfalls, Profiling

First Issue Arises

A few weeks ago, an interesting issue appeared on our GitHub tracker. It was reported that, despite our care in designing the driver to be efficient, it proved to be unpleasantly slower than one of the competing drivers, cassandra-cpp, which is a Rust wrapper of a C++ CQL driver. The author of latte, a latency tester for Cassandra (and Scylla), pointed out that switching the back-end from cassandra-cpp to scylla-rust-driver resulted in an unacceptable performance regression. Time to investigate!

Reproducing the Problem

At first, we were unable to reproduce the issue — all experiments seemed to prove that scylla-rust-driver is at least as fast as the other drivers, and often provides better throughput and latency than all the tested alternatives.

Profiling

After we were able to reliably reproduce the results, it was time to look at profiling results — both the ones provided in the original issue and the ones generated by our tests.

cargo flamegraph your-app your-params

Hint

Rust ecosystem is great at testing various small changes introduced on the dependencies of your project, which was invaluable when comparing various fixes applied to scylla-rust-driver, without having to publish anything on crates. If you want to test a particular change made to one of your dependencies, before it was published (or even on your own fork, where you applied some experimental changes yourself!), you can simply provide a git repo path in Cargo.toml:

scylla = { git = "https://github.com/scylladb/scylla-rust-driver", branch = "some_custom_branch" }

Root Cause №1

Ultimately, the root cause of the original issue was our lack of buffering of reads and writes.

Root Cause №2: a Pitfall in Async Rust

That’s not the end of the story at all! In fact, the most interesting bit was uncovered later, after the first fix was already applied.

Quadratic Behavior?

Yes, an experiment performed by one of our engineers hinted that using a combinator for Rust futures — FuturesUnordered — appears to cause quadratic rise of execution time, compared to a similar problem being expressed without the combinator, by using Tokio’s spawn utility directly.

The Pitfall

In order to fully grasp the problem one needs to understand how Rust async runtimes work.

The Solution

Since FuturesUnordered are part of the Rust’s futures crate, the issue was reported in there directly: https://github.com/rust-lang/futures-rs/issues/2526. It was recognized and triaged very quickly by one of the contributors – the response time was really impressive!

Summary

The world of async programming in Rust is still young, but very actively developed. Investigating and getting rid of bottlenecks and pitfalls is a very useful skill, so don’t hesitate in joining in the effort — e.g. by becoming a contributor to our brand new, native Scylla driver! https://github.com/scylladb/scylla-rust-driver

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ScyllaDB

ScyllaDB

The monstrously fast and scalable NoSQL database.