We’re Porting Our Database Drivers to Async Rust
By Piotr Sarna
Our Rust driver started as a humble hackathon project, but it has eventually grown to become our fastest and safest Cassandra Query Language (CQL) driver. We happily observed in our benchmarks that our ScyllaDB Rust Driver beats even the reference C++ driver in terms of raw performance, and that gave us an idea — why not unify all drivers to use Rust underneath?
Benefits of a Unified Core
Database drivers are cool and all, but there’s one fundamental problem with them: a huge amount of reinventing the wheel, implementing the same logic over and over for every possible language that people write applications in. Implementing the same thing in so many languages also increases the chances of some of the implementations having subtle bugs, performance issues, bit rot, and so on. It sounds very tempting to deduplicate as much as possible by reusing a common core — in our case, that would be the Rust driver.
When most of the logic is implemented once, maintainers can focus on this central implementation by extensive tests and proper review. It takes much less manpower to keep a single project up-to-date than trying to manage one project per each language. Once a new feature is added to ScyllaDB, it’s possible that only the core should be updated, making all the derivative implementations automatically benefit from it.
It goes without saying that deduplication helps reduce the number of bugs, because there’s simply less code where a bug can occur. Additionally, backporting urgent fixes also gets substantially easier, because the same fix wouldn’t have to be carefully rewritten in each supported language.
What’s even better is that all existing drivers already have their own test suites — for unit tests, integration tests, and so on. A single core implementation would therefore be tested by many independent test harnesses and lots of cases. Sure, the majority of them will overlap, but there’s no such thing as a perfect test suite, so using several ones reduces the probability of missing a bug and generally improves test coverage. And all the tests are already there, for free!
Some drivers are slow due to their outdated design, some are faster than others because they’re implemented in a language that imposes less overhead. Some, like our Rust driver, are the fastest. Similarly to how Python relies on modules compiled in C to make other modules less unbearably slow faster, our CQL drivers could benefit from a Rust core. A lightweight API layer would ensure that the drivers are still backward compatible with their previous versions, but the new ones will delegate as much work as possible straight to the Rust driver, trusting that it’s going to perform the job faster and safer.
Rust’s asynchronous model is a great fit for implementing high-performance, low-latency database drivers, because it’s scalable and allows utilizing high concurrency in your applications. Contrary to what other languages implement, Rust abstracts away the layer responsible for running asynchronous tasks. This layer is called runtime, and it’s a very powerful tool for developers to be able to pick their own one, or even implement it. After careful research, we picked Tokio as our runtime of choice, due to its very active open-source community, focus on performance, rich feature set (including complete implementation for network streams, timers, etc.) and lots of fantastic utilities like tokio-console.
Writing code in one language in order to use it in another is common practice and there are lots of tools available for the job. Rust is no exception — its ecosystem is generally very developer-friendly, and there are many crates that make bindings with other languages effortless.
Binding with C/C++ applications doesn’t actually require much effort anyway. Rust uses LLVM for code generation, and the output executables, libraries and object files are more or less linkable with any C/C++ project. Still, there are a few good practices when using Rust and C/C++ in a single project.
First of all, make sure that name mangling won’t make it hard for the linker to find the functions you compiled in Rust. People who ever wrote functions in C++ and used them from C are definitely familiar with the keyword `extern “C”`, and the same trick applies to Rust: https://doc.rust-lang.org/std/keyword.extern.html. Simply mark your functions that you mean to export with `
extern "C"`, and names will not be mangled in any way. Then, the linker will have an easier job matching the Rust parts with your C++ object files and executables.
For even smoother developer experience, the cxx crate can be used to reduce the amount of boilerplate code and make the bindings more robust.
The Python CQL driver is extremely popular among ScyllaDB/Cassandra users, but, well, Python is not exactly well known for its blazing speed or scalability for high concurrency applications (ref: https://en.wikipedia.org/wiki/Global_interpreter_lock).
Fortunately, due to its dynamic typing and the interpreter being very lenient, it’s also really easy to provide bindings to a Python application. PyO3 crate sounds like it has great potential for simplifying the development of native Python modules.
Even though there are lots of advantages for unifying the implementation of multiple drivers, one must also consider the drawbacks. First of all, each tiny bug in the Rust core now has global scope — it would affect all the derivative drivers. Then, the glue code provided to bind our Rust driver with the target language is also a potential place where a bug can hide. Relying on third party libraries for bindings also adds yet another dependency to each driver.
It’s become popular lately to embrace the “driverless” way and expose an interface implemented for a well known protocol like gRPC or HTTP(S). It’s an interesting point and certain applications and developers could definitely benefit from that approach, but going through yet another layer of protocols creates overhead (multiple rounds of serialization/deserialization, parsing the protocol frames, and so on), and users should be able to opt-in for better performance, which native CQL drivers provide.
What’s Already Done
Porting the CQL C++ driver was already mostly done during our internal hackathon: https://github.com/hackathon-rust-cpp/cpp-rust-driver. While it’s still work in progress, it’s also very promising, because the compatibility layer is quite thin for C++ — partly because ABI of both languages share many similarities.
Unifying the drivers is a big and complicated task and we’ve only just begun the journey, but we have very high hopes for the future performance and robustness of all our CQL drivers. Stay tuned for updates!
Watch the Scylla Summit 2022 Talks on Rust!
We had a number of talks at Scylla Summit 2022 on Rust:
- ScyllaDB Rust Driver: One Driver to Rule Them All
- Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
- ORM and Query Building in Rust
Register now to watch all these sessions and more on-demand
Learn to Build Low-Latency Apps in Rust!
If you want to learn to build your own low-latency apps in Rust, check out our upcoming virtual workshop on Thursday, March 3, 2022! Save your seat! Register now!