What Do You Mean by a “Distributed Database?”

Defining Distributed Databases

Just as there’s no ANSI or ISO or IETF or W3C definition of what a NoSQL database is, there’s no standard or protocol or consensus on what a “distributed database” is.

Clustering and Distribution

First, let’s just vaguely accept the concept of “database” as a given — even though it can mean many things to many people — and focus instead on the word “distributed.”

Node Roles, High Availability & Failover Strategies

Next, you have the role of the nodes in your database. Are they all peers, each capable of full writes, or are any of them designated as leaders or primaries with others designated as read-only replicas?

Data Replication and Sharding

Now let’s look at these replication and sharding strategies. Like I said, you can make each node a full replica of the entire database. You could have, aay, three full sets of data on three different servers, or you can distribute different pieces across multiple servers, sharded somewhat differently on each server, so that it’s more difficult to lose any one piece of data even if two or more servers in a cluster dies.

Consistency Levels

Consistency levels determine how much you need each of those replicas to be in sync before you allow a read or write to complete. Let’s say that your data is replicated three ways, and you want to make sure that all three are fully in sync at all times. You want a fully transactional strong consistency guarantee. These are common in SQL RDBMS, where you use strong consistency with ACID guarantees.

Manual Sharding vs. Auto-Sharding

Next, for horizontal scalability, how does your system decide how to shard data across nodes? At first that was always a manual process, difficult and problematic to manage. So distributed databases implemented algorithms to automagically shard your data across your nodes. While that is far more prevalent these days, there’s still some distributed databases that haven’t solved for how, specifically, to auto-shard or make auto-sharding an advanced feature you don’t get out of the box.

Topology Awareness

Finally — and this is important for high availability — distributed databases need to understand their own physical deployments. Let’s say you have a local cluster, but it’s all on the same rack in the datacenter. Then, somehow, power is knocked out to it. Whoops! Your whole system is down.

Next Steps to Learn More about Distributed Databases

Now that we’ve set out a baseline to understand what a distributed database is, you can watch the full webinar to see how this definition applies to various distributed databases, both popular SQL and NoSQL systems, and how they each implement these concepts.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ScyllaDB

ScyllaDB

The monstrously fast and scalable NoSQL database.