Benchmarking MongoDB vs ScyllaDB: Social Media Workload Deep Dive
benchANT’s comparison of ScyllaDB vs MongoDB in terms of throughput, latency, scalability, and cost for a social media workload
By Daniel Seybold, benchANT Co-Founder and CTO
BenchANT recently benchmarked the performance and scalability of the market-leading general-purpose NoSQL database MongoDB and its performance-oriented challenger ScyllaDB. You can read a summary of the results in the blog Benchmarking MongoDB vs ScyllaDB: Performance, Scalability & Cost, see the key takeaways for various workloads in this technical summary, and access all results (including the raw data) from the complete benchANT report.
This blog offers a deep dive into the tests performed for the social workload. This workload is based on the YCSB Workload B. It creates a read-heavy workload, with 95% read operations and 5% update operations. We use two shapes of this workload, which differ in terms of the request distribution patterns, namely uniform and hotspot distribution. These workloads are executed against the small database scaling size with a data set of 500GB and against the medium scaling size with a data set of 1TB.
Before we get into the benchmark details, here is a summary of key insights for this workload.
ScyllaDB outperforms MongoDB with higher throughput and lower latency for all measured configurations of the social workload.
- ScyllaDB provides up to 12 times higher throughput
- ScyllaDB provides significantly lower (down to 47 times) update latencies compared to MongoDB
- ScyllaDB provides lower read latencies, down to 5 times
Throughput Results for MongoDB vs ScyllaDB
The throughput results for the social workload with the uniform request distribution show that the small ScyllaDB cluster is able to serve 60 kOps/s with a cluster CPU utilization of ~85% while the small MongoDB cluster serves only 10 kOps/s under a comparable cluster utilization of 80–90%. For the medium cluster sizes, ScyllaDB achieves an average throughput of 232 kOps/s showing ~85% cluster utilization while MongoDB achieves 42 kOps/s at a CPU utilization of ~85%.
The throughput results for the social workload with the hotspot request distribution show a similar trend, but with higher throughput numbers since the data is mostly read from the cache. The small ScyllaDB cluster serves 152 kOps/s while the small MongoDB serves 14 kOps/s. For the medium cluster sizes, ScyllaDB achieves an average throughput of 587 kOps/s and MongoDB achieves 48 kOps/s.
Scalability Results for MongoDB vs ScyllaDB
These results also enable us to compare the theoretical throughput scalability with the actually achieved throughput scalability. For this, we consider a simplified scalability model that focuses on compute resources. It assumes the scalability factor is reflected by the increased compute capacity from the small to medium cluster size. For ScyllaDB, this means we double the cluster size from 3 to 6 nodes and also double the instance size from 8 cores to 16 cores per instance, resulting in a theoretical scalability of 400%. For MongoDB, we move from one replica set of three data nodes to a cluster with three shards and nine data nodes and increase the instance size from 8 cores to 16 cores, resulting in a theoretical scalability factor of 600%.
The ScyllaDB scalability results for the uniform and hotspot distributions both show that ScyllaDB is close to achieving linear scalability by achieving a throughput scalability of 386% (of the theoretically possible 400%).
With MongoDB, the gap between theoretical throughput scalability and the actually achieved throughput scalability is significantly higher. For the uniform distribution, MongoDB achieves a scaling factor of 420% (of the theoretically possible 600%). For the hotspot distribution, we measure 342% (of the theoretically possible 600%).
Throughput per Cost Ratio
In order to compare the costs/month in relation to the provided throughput, we take the MongoDB Atlas throughput/$ as baseline (i.e. 100%) and compare it with the provided ScyllaDB Cloud throughput/$.
The results for the uniform distribution show that ScyllaDB provides five times more operations/$ compared to MongoDB Atlas for the small scaling size and 5.7 times more operations/$ for the medium scaling size.
For the hotspot distribution, the results show an even better throughput/cost ratio for ScyllaDB, providing 9 times more operations/$ for the small scaling size and 12.7 times more for the medium scaling size.
Latency Results for MongoDB vs ScyllaDB
For the uniform distribution, ScyllaDB provides stable and low P99 latencies for the read and update operations for the scaling sizes small and medium. MongoDB generally has higher P99 latencies. Here, the read latencies are 2.8 times higher for the small scaling size and 5.5 times higher for the medium scaling size. The update latencies show an even more distinct difference; MongoDB’s P99 update latency in the small scaling size is 47 times higher compared to ScyllaDB and 12 times higher in the medium scaling size.
For the hotspot distribution, the results show a similar trend for the stable and low ScyllaDB latencies. For MongoDB, read and update latencies increase from the small to medium scaling size. It is interesting that in contrast to the uniform distribution, the read latency only increases by a factor of 2.8 while the update latency increases by 970%.
Technical Nugget — Performance Impact of the Data Model
The default YCSB data model is composed of a primary key and a data item with 10 fields of strings that results in a document with 10 attributes for MongoDB and a table with 10 columns for ScyllaDB. We analyze how performance changes if a pure key-value data model is applied for both databases: a table with only one column for ScyllaDB and a document with only one field for MongoDB
The results show that for ScyllaDB the throughput improves by 24% while for MongoDB the throughput increase is only 5%.
Technical Nugget — Performance Impact of the Consistency Level
All standard benchmarks are run with the MongoDB client consistency writeConcern=majority/readPreference=primary and for ScyllaDB with writeConsistney=QUORUM/readConsistency=QUORUM. Besides these client consistent configurations, we also analyze the performance impact of weaker read consistency settings. For this, we enable MongoDB to read from the secondaries (readPreference=secondarypreferred) and set readConsistency=ONE for ScyllaDB.
The results show an expected increase in throughput: for ScyllaDB 56% and for MongoDB 49%.
Continue Comparing ScyllaDB vs MongoDB
Here are some additional resources for learning about the differences between MongoDB and ScyllaDB:
- Benchmarking MongoDB vs ScyllaDB: Results from benchANT’s complete benchmarking study that comprises 133 performance and scalability measurements that compare MongoDB against ScyllaDB.
- Benchmarking MongoDB vs ScyllaDB: Caching Workload Deep Dive: benchANT’s comparison of ScyllaDB vs MongoDB in terms of throughput, latency, scalability, and cost for a caching workload (50% read operations and 50% update operations).
- Benchmarking MongoDB vs ScyllaDB: IoT Sensor Workload Deep Dive: benchANT’s comparison of ScyllaDB vs MongoDB in terms of throughput, latency, scalability, and cost for a workload simulating an IoT sensor (90% insert operations and 10% read operations).
- A Technical Comparison of MongoDB vs ScyllaDB: benchANT’s technical analysis of how MongoDB and ScyllaDB compare with respect to their features, architectures, performance, and scalability.
- ScyllaDB’s MongoDB vs ScyllaDB page: Features perspectives from users — like Discord — who have moved from MongoDB to ScyllaDB.