Apache Cassandra 4.0 vs. Scylla 4.4: Comparing Performance

TL;DR Scylla Open Source 4.4 vs. Cassandra 4.0 Results

The detailed results and the fully optimized setup instructions are shared below. We compared two deployment options in the AWS EC2 environment:

  1. The first is an apples-to-apples comparison of 3-node clusters.
  2. The second is a larger-scale setup where we used node sizes optimal for each database. Scylla can utilize very large nodes so we compared a setup of 4 i3.metal machines (288 vCPUs in total) vs. 40 (!) i3.4xlarge Cassandra machines (640 vCPUs in total — almost 2.5x the Scylla’s resources).
  • Cassandra 4.0 has better P99 latency than Cassandra 3.11 by 100x!
  • Cassandra 4.0 speeds up admin operations by up to 34% compared to Cassandra 3.11
  • Scylla has 2x-5x better throughput than Cassandra 4.0 on the same 3-node cluster
  • Scylla has 3x-8x better throughput than Cassandra 4.0 on the same 3-node cluster while P99 <10ms
  • Scylla adds a node 3x faster than Cassandra 4.0
  • Scylla replaces a node 4x faster than Cassandra 4.0
  • Scylla doubles a 3-node cluster capacity 2.5x faster than Cassandra 4.0
  • A 40 TB cluster is 2.5x cheaper with Scylla while providing 42% more throughput under P99 latency of 10 ms
  • Scylla adds 25% capacity to a 40 TB optimized cluster 11x faster than Cassandra 4.0.
  • Scylla finishes compaction 32x faster than Cassandra 4.0
  • Cassandra 4.0 can achieve a better latency with 40 i3.4xlarge nodes than 4 i3.metal Scylla nodes when the throughput is low and the cluster is being underutilized. Explanation below.

Limitations of Our Testing

It’s important to note that this basic Apache Cassandra 4.0 performance analysis does not cover all factors in deciding whether to stay put on Cassandra 3.x, upgrade to Cassandra 4.0, or to migrate to Scylla Open Source 4.4. Users may be wondering if the new features of Cassandra 4.0 are compelling enough, or how changes between implemented features compare between Cassandra and Scylla. For instance, you can read more about the difference in CDC implementations here, and how Scylla’s Lightweight Transactions (LWT) differ from Cassandra’s here. Apart from comparison of basic administrative tasks like adding one or more nodes which is covered below, benchmarking implementation of specific features is beyond the scope of consideration.

Cluster of Three i3.4xlarge Nodes

3-Node Test Setup

The purpose of this test was to compare the performance of Scylla vs. both versions of Cassandra on the exact same hardware. We wanted to use relatively typical current generation servers on AWS so that others could replicate our tests, and reflect a real-world setup.

Throughput and Latencies

The actual benchmarking is a series of simple invocations of cassandra-stress with CL=QUORUM. For 30 minutes we keep firing 10,000 requests per second and monitor the latencies. Then we increase the request rate by another 10,000 for another 30 min, and so on. (20,000 in case of larger throughputs). The procedure repeats until the DB is no longer capable of withstanding the traffic, i.e. until cassandra-stress cannot achieve the desired throughput or until the 90-percentile latencies exceed 1 second.

  1. “Real-life” (Gaussian) distribution, with sensible cache-hit ratios of 30–60%
  2. Uniform distribution, with a close-to-zero cache hit ratio
  3. “In-memory” distribution, expected to yield almost 100% cache hits
  • 100% writes
  • 100% reads
  • 50% writes and 50% reads

“Real-life” (Gaussian) Distribution

In this scenario we issue queries that touch partitions randomly drawn from a narrow Gaussian distribution. We make an Ansatz about the bell curve: we assume that its six-sigma spans the RAM of the cluster (corrected for the replication factor). The purpose of this experiment is to model a realistic workload, with a substantial cache hit ratio but less than 100%, because most of our users observe the figures of 60–90%. We can expect Cassandra to perform well in this scenario because its key cache is denser than Scylla’s, i.e. it efficiently stores data in RAM, though it relies on SSTables stored in the OS page cache which can be heavyweight to look up. By comparison, Scylla uses a row-based cache mechanism. This Gaussian distribution test should indicate which uses the more efficient caching mechanism for reads.

Mixed Workload — 50% reads and 50% writes

Uniform Distribution (disk-intensive, low cache hit ratio)

In this scenario we issue queries that touch random partitions of the entire dataset. In our setup this should result in high disk traffic and/or negligible cache hit rates, i.e. that of a few %.

Writes Workload — Only Writes

Reads Workload — Only Reads

Mixed Workload — 50% reads and 50% writes

Uniform Distribution (memory-intensive, high cache hit ratio)

In this scenario we issue queries touching random partitions from a small subset of the dataset, specifically: one that fits into RAM. To be sure that our subset resides in cache and thus no disk IO is triggered, we choose it to be safely small, at an arbitrarily picked value of 60 GB. The goal here is to evaluate both databases at the other extreme end: where they both serve as pure in-memory datastores.

Writes Workload — Only Writes

Reads Workload — Only Reads

Mixed Workload — 50% reads and 50% writes

Adding Nodes

One New Node

In this benchmark, we measured how long it takes to add a new node to the cluster. The reported times are the intervals between starting a Scylla/Cassandra node and having it fully finished bootstrapping (CQL port open).

Doubling Cluster Size

In this benchmark, we measured how long it takes to double the cluster node count, going from 3 nodes to 6 nodes. Three new nodes are added sequentially, i.e. waiting for the previous one to fully bootstrap before starting the next one. The reported time spans from the instant the startup of the first new node is initiated, all the way until the bootstrap of the third new node finishes.

Replace node

In this benchmark, we measured how long it took to replace a single node. One of the nodes is brought down and another one is started in its place. Throughout this process the cluster is being agitated by a mixed R/W background load of 25,000 ops at CL=QUORUM.

Major Compaction

In this benchmark, we measured how long it takes to perform a major compaction on a single node loaded with roughly 1TB of data. Thanks to Scylla’s sharded architecture, it can perform the major compactions on each shard concurrently, while Cassandra is single-thread bound. The result of major compaction is the same in both Scylla and Cassandra: a read is served by a single SSTable. In the later section of this blogpost we also measure the speed of a major compaction in a case where there are many small Cassandra nodes (which get higher parallelism). We observed worse major compaction performance in Cassandra 4.0.0 with the default num_tokens: 16 parameter.

“4 vs. 40” Benchmark

Now let us compare both databases installed on different hardware, where Scylla gets four powerful 72-core servers, meanwhile Cassandra gets the same i3.4xlarge servers as before, just… forty of them. Why would anyone ever consider such a test? After all, we’re comparing some 4 machines to 40 very different machines. In terms of CPU count, RAM volume or cluster topology these both are like apples and oranges, no?

4 vs. 40 Node Setup

We set up clusters on Amazon EC2 in a single Availability Zone within us-east-2 datacenter, but this time the Scylla cluster consists of 4 i3.metal VMs. The competing Cassandra cluster consisted of 40 i3.4xlarge VMs. Servers are initialized with clean machine images (AMIs) of Ubuntu 20.04 (Cassandra 4.0) or CentOS 7.9 (Scylla 4.4).

Throughput and Latencies

Mixed Workload — 50% reads and 50% writes

  • 4-node Scylla cluster (4 x i3.metal, 288 vCPUs in total)
  • 40-node Cassandra cluster (40 x i3.4xlarge, 640 vCPUs in total).
  • 4-node Scylla cluster (4 x i3.metal, 288 vCPUs in total)
  • 40-node Cassandra cluster (40 x i3.4xlarge, 640 vCPUs in total).

Scaling the cluster up by 25%

In this benchmark, we increase the capacity of the cluster by 25%:

  • By adding a single Scylla node to the cluster (from 4 nodes to 5)
  • By adding 10 Cassandra nodes to the cluster (from 40 nodes to 50 nodes)

Major Compaction

In this benchmark we measure the throughput of a major compaction. To compensate for Cassandra having 10 times more nodes (each having 1/10th of the data), this benchmark measures throughput of a single Scylla node performing major compaction and the collective throughput of 10 Cassandra nodes performing major compactions concurrently.


On identical hardware, Scylla Open Source 4.4.3 withstood up to 5x greater traffic and in almost every tested scenario offered lower latencies than Apache Cassandra 4.0.

Supplementary Information

Here you can check out detailed results of latency/throughput benchmarks, JVM settings and cassandra.yaml from Cassandra 3 and Cassandra 4, as well as cassandra-stress invocations used to run benchmarks. Scylla used default configuration.

Cassandra 3.11 configuration

JVM settingsJVM version: OpenJDK 8
cassandra.yamlOnly settings changed from the default configuration are mentioned here.

Cassandra 4.0 configuration

JVM settings
JVM version: OpenJDK 16

Cassandra-stress parameters

Only the important facts and options are mentioned below.

  • Scylla’s Shard-aware Java driver was used.
  • Background loads were executed in the loop (so duration=5m is not a problem).
  • REPLICATION_FACTOR is 3 (except for major compaction benchmark).
  • COMPACTION_STRATEGY is SizeTieredCompactionStrategy unless stated otherwise.
  • loadgenerator_count is the number of generator machines (3 for “3 vs 3” benchmarks, 15 for “4 vs 40”).
  • BACKGROUND_LOAD_OPS is 1000 in major compaction, 25000 in other benchmarks.
  • DURATION_MINUTES is 10 for in-memory benchmarks, 30 for other benchmarks.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


The monstrously fast and scalable NoSQL database.