Project Circe April Update

by Peter Corless

Project Circe is ScyllaDB’s year-long initiative to make Scylla, already the best NoSQL database, even better. For the month of April we are going to take a look inside the organization and code base to see what it takes to bring major new features into a project as dynamic as Scylla. Currently there are nearly a half-million lines of code in the scylladb/scylla repository on Github (482.7k as of this writing). Of those thousands of source lines of code so far are dedicated to the library implementing the new Raft consensus protocol.

Raft and the Logical Clock

While Raft is currently being wired up to work with RPC to permit topology changes (add/remove nodes) the breadth of testing in order to make Raft truly resilient goes far beyond the source lines of code of the database commits themselves. “We added over a hundred test cases,” Kostja noted, “unit tests, functional tests using the concept of nemesis, or failure injections, plus a randomized test inspired by the Jepsen approach to testing.”

If you have a keen eye for scouring Github, you may already have come across the scylla/test/raft subdirectory, which includes another 2,853 source lines of code. The most basic of all tests is fsm_test.cc (finite state machine test), which treats Raft server as a “device under test.” It models specific chains of events which must not lead to protocol failure, e.g. receiving an outdated message from a deposed leader. The next level of testing includes a mock network and mock logical clock that is in replication_test. It allows us to test how different combinations of Raft options, such as pre-voting, non-voting members and gracious leader step down work together over a potentially slow network or non-synchronized clocks In etcd_raft.cc you’ll find a port of etcd Raft implementation unit tests. The team studied lots of Raft implementation and found the etcd testing effort one of the most thorough.

This sort of aggressive testing led to some interesting results. “One test found that our library crashes when the network reorders packets and there is a failure of one of the members. Another crash was when the leader fails while trying to bring on board a new cluster member — so there is a leader change — but it shouldn’t lead to a crash.”

Raft, despite being widely considered a simple protocol, has infinitely many protocol states. To be able to radically increase the amount of states our testing explores, the ScyllaDB engineering team came up with an implementation of a logical clock. “It’s a clock that is ticking with the speed at which the computer executes the test, not the speed of a wall clock.” Using this logical clock the team was able to squeeze a lot of things into a single test that runs at CPU speed — millions of events per second.

In Scylla’s implementation, “Every state machine has [its] own instance of logical clock; this enables tests when different state machines run at different clock speeds.”

Recent Interesting Commits

Here’s a few of the more salient commits mentioned this past month:

Compaction Strategies Reshaped

Better Memory Allocation to Further Reduce Latencies and Stalls

  • More code paths can now work with non-contiguous memory for table columns and intermediate values: comparing values, and the CQL write path. This reduces CPU stalls due to memory allocation when large blobs are present.
  • Continuing on the path of allowing non-contiguous allocations for large blobs, memory linearizations have been removed from Change Data Capture (CDC). This reduces CPU stalls when CDC is used in conjunction with large blobs.

Scylla Monitoring Stack 3.7

Learn More in Scylla University

You can start with an overview of Scylla and then, at your own pace, move up into advanced architectural concepts like consensus protocols, learning how these power user-oriented features like Lightweight Transactions and how they work under the hood.

GET STARTED IN SCYLLA UNIVERSITY

The monstrously-fast NoSQL database.