Building Event Streaming Architectures on Scylla and Confluent with Kafka

Numberly: Combining the Power of Scylla and Kafka

Numberly’s Stack

Numberly’s Scylla + Kafka Use Cases

Use Case #1: Remote State Store for Kafka Pipelining

First Iteration: Beanstalk + RabbitMQ

Second Iteration: Redis + Confluent Kafka

Third Iteration: Scylla + Confluent Kafka

Use Case #1 Takeaways

  • Metrics — “And I say ‘metrics, metrics, metrics!’ Because, without metrics there is no successful tuning.
  • Prometheus client — Part of the Scylla Monitoring Stack, Othmane recommended it to the audience, describing how it will ease their monitoring.
  • Tuning — “Tuning is important in such a pipeline. Size the number of partitions regarding your query. That’s important.”
  • Time-to-Recovery — “Mind your time to recovery.”
  • Max Throughout — “Capacity should be able to have at least three times the average of your throughput.”
  • Add Query Caching — “But no more than your query time to maximize consistency.”
  • Shard-Aware Drivers — ‘Make sure you use a shard-aware client for Scylla.”

Use Case #2: Synapse

Kafka & Scylla: A Complementary Match

Use Case #2 Takeaways

  • Match Table Models to Queries — “First, define your table models in Scylla to suit your queries. This is really important.”
  • Forecast Data Volume — “Forecast your data volume on your models. Does it suit your volume? Will it suit your use case? Will it work on production? It is important before you go live.”
  • Mind Large Partitions — “Because this can and will damage your cluster performance if you don’t.
  • Kafka Streams — “Kafka Streams turn out really great for on-the-fly aggregations.”
  • Persist Aggregations to External Data Store — “Those aggregations should be persisted on an external store to give you the ability to go and do multiple time span lookups. Kafka Streams are great, but on real time ‘hot’ data.”

Scylla’s Change Data Capture (CDC) Implementation

Scylla’s CDC Write Path

  • Row Keys
  • Changes per non-key columns (deltas) — optional
  • Pre-image (prior state data) — optional
  • Post-image (current state of row) — optional

The Power of Confluent and Scylla CDC

Reminder: Kafka is a Log

Kafka Connect

Built on Debezium

Configuring Scylla and Kafka Connect

Get Started with Scylla and Confluent Today

  • Get Scylla — If this is the first time you’ve ever tried your hand at Scylla you have two options. If you’re a self-starter that would love to run it in your own laptop via Docker to learn how it works, you can download Scylla Open Source. Or if you wish to have professional support and prefer we set up and manage the cluster for you, feel free to create an account on Scylla Cloud.
  • Get the Confluent Platform — similarly you can get started on the Confluent Platform or Confluent Cloud. Kafka Connect is included as a component.
  • Get the Scylla Kafka Connectors — To get these systems to play together, head straight to the Confluent Hub where, depending on your needs, you can download the Scylla Sink Connector, or the Scylla CDC Source Connector for Kafka, or both!
  • Download a Shard-Aware Driver — To noodle around with Scylla itself, outside of the Kafka Connect context, you’ll probably want a Scylla shard-aware driver. Fortunately, we have a number of them, as well as a list of other 3rd party CQL-compatible drivers.

--

--

--

The monstrously-fast NoSQL database.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Discord Database Integration

Your First Golang Websocket: FX Data

Save PDF from URL directly to Google Drive using Google Apps Script

Intermediate: Huawei Mobile Services Multi kit Part -1 in Flutter (Cross platform)

Time to up Your Code Review Game

Diving Into The Ethereum VM Part 3 — The Hidden Costs of Arrays

Demystify Builder Pattern in Scala

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ScyllaDB

ScyllaDB

The monstrously-fast NoSQL database.

More from Medium

Stargate gRPC: The Better Way to CQL

Introducing Hot and Cold Retries on Apache Kafka®

How to develop custom code to stream Aerospike Change Notifications to an HTTP-based system

Using Keycloak and Kafka Stream to detect identity anomalies