Getting the Most out of Lightweight Transactions in Scylla

What is a Lightweight Transaction?

A Practical Example: An LWT Banking App

CREATE TABLE lightest.accounts (
bic TEXT, -- bank identifier code
ban TEXT, -- bank account number within the bank
balance DECIMAL, -- account balance
pending_transfer UUID, -- will be used later
pending_amount DECIMAL, -- will be used later
PRIMARY KEY((bic, ban))
)
INSERT INTO accounts (bic, ban, balance, pending_amount)
VALUES (?, ?, ?, 0) IF NOT EXISTS
$ ./lightest pop --host db1 -n 1000000
Creating 1000000 accounts using 384 workers on 96 cores
0.76% done, RPS 7577, Latency min/max/med: 0.002s/0.141s/0.013s
3.11% done, RPS 23562, Latency min/max/med: 0.002s/0.119s/0.010s

99.98% done, RPS 676, Latency min/max/med: 0.002s/0.026s/0.002s
100.00% done, RPS 174, Latency min/max/med: 0.002s/0.003s/0.002s
Inserted 1000000 accounts, 13 errors, 1637283 duplicates
Total time: 122.042s, 8193 inserts/sec
$ ./lightest pop --host db1 -n 1000000 -w 2000
Creating 1000000 accounts using 2000 workers on 96 cores
0.74% done, RPS 7401, Latency min/max/med: 0.002s/0.186s/0.026s
7.30% done, RPS 65647, Latency min/max/med: 0.003s/0.150s/0.024s
...

Dealing with Failure

cqlsh:lightest> INSERT INTO accounts (bic, ban, balance, pending_transfer) VALUES (...) IF NOT EXISTS; [applied] | bic | ban | balance | pending_amount | pending_transfer
-----------+-----+-----+---------+----------------+------------------
False | ... | ... | 0 | null | null
$ ./lightest pop --host db1 -n 3000000 -c QUORUM -w 8000
Creating 3000000 accounts using 8000 workers on 96 cores
1.70% done, RPS 51048, Latency min/max/med: 0.000s/0.415s/0.002s
11.30% done, RPS 287915, Latency min/max/med: 0.000s/0.045s/0.001s
..
Inserted 3000000 accounts, 0 errors, 0 duplicates
Total time: 19.766s, 151774 t/sec
Latency min/max/avg: 0.000s/0.415s/0.002s
Latency 95/99/99.9%: 0.007s/0.025s/0.180s
  • Step 1, the prepare phase, is necessary to ensure there is only one coordinator making the change. We could avoid selecting a coordinator independently for each transaction. However, it would require maintaining a record of the current leader for each partition key, which is a lot of additional metadata. On the positive side, Scylla LWT availability does not depend on a single leader, and node failures have low impact on cluster availability.
  • Step 2, storing the interim result in system.paxos is necessary to work well with eventually consistent reads and writes. Scylla can’t assume people will not be using QUORUM reads and writes of the same data along with LWT, even though it’s not recommended. So a transaction should not be visible for these until it’s committed, and this is why it’s first made durable in system.paxos. We could have avoided this step by not permitting eventually consistent and conditional writes of the same data. That would break compatibility with Cassandra, so we decided not to do it. Cassandra perhaps followed the same reasoning originally and wanted to simplify adoption of lightweight transactions in existing eventually consistent apps.
  • Step 3 moves the data from system.paxos to the base table after a transaction is known to have succeeded. This one can’t be avoided 🙂
  • Step 4, the pruning step, is distinct to Scylla. We’ve heard a lot of complaints from Cassandra users that system.paxos can grow out of control. There are no nodetool operations for repairing or pruning LWT state, so at least we make an effort to keep it small. Besides, keeping system.paxos small reduces its write amplification during compaction, and hence speeds up LWT performance quite a bit.
  • never lead to an overdraft; the client should get an error about possible overdraft
  • change source and destination accounts atomically
  • be safe to re-try, for example, re-trying should be a no-op if the transfer is already done.
BEGIN
UPDATE accounts
WHERE bic=$bic AND ban=$src_ban
SET balance = $balance - $amount
UPDATE accounts
WHERE bic=$bic AND ban=$dst_ban
SET balance = balance + $amount
COMMIT (*)

Adding transfer history

CREATE TABLE lightest.transfers (
transfer_id UUID, -- transfer UUID
src_bic TEXT, -- source bank identification code
src_ban TEXT, -- source bank account number
dst_bic TEXT, -- destination bank identification code
dst_ban TEXT, -- destination bank account number
amount DECIMAL, -- transfer amount
state TEXT, -- 'new', 'locked', 'complete'
client_id UUID, -- the client performing the transfer
PRIMARY KEY (transfer_id)
)
INSERT INTO transfers
(transfer_id, src_bic, src_ban, dst_bic, dst_ban, amount, state)
VALUES (?, ?, ?, ?, ?, ?, 'new')
IF NOT EXISTS</code
UPDATE transfers USING TTL 30
SET client_id = ?
WHERE transfer_id = ?
IF amount != NULL AND client_id = NULL</code
UPDATE accounts
SET pending_transfer = ?, pending_amount = ?
WHERE bic = ? AND ban = ?
IF balance != NULL AND pending_amount != NULL AND pending_transfer = NULL
  • It ensures the account is not accidentally created if it doesn’t exist
  • In Scylla, it returns back the previous balance and pending amount, which saves one query:
cqlsh:lightest> UPDATE accounts
SET pending_transfer = b22cfef0-9078-11ea-bda5-b306a8f6411c,
pending_amount = -24.12
WHERE bic = 'DCCDIN51' AND ban = '30000000000000'
IF balance != NULL AND pending_amount != NULL
AND pending_transfer = NULL;
[applied] | balance | pending_amount | pending_transfer
----------+---------+----------------+------------------
True | 42716 | 0 | null
cqlsh:lightest> UPDATE accounts
SET pending_transfer = b22cfef0-9078-11ea-bda5-b306a8f6411c,
pending_amount = -24.12
WHERE bic = 'DCCDIN51' AND ban = '30000000000000'
IF balance != NULL AND pending_amount != NULL
AND pending_transfer = NULL;
[applied] | balance | pending_amount | pending_transfer
----------+---------+----------------+-------------------------------------
False | 42716 | -24.12 | b22cfef0-9078-11ea-bda5-b306a8f6411c
UPDATE transfers
SET state = 'locked'
WHERE transfer_id = ?
IF amount != NULL AND client_id = ?
-- run for both debit and credit accounts
UPDATE accounts
SET pending_amount = 0, balance = ?
WHERE bic = ? AND ban = ?
IF balance != NULL AND pending_transfer = ?
UPDATE transfers
SET state = ‘complete’
WHERE transfer_id = ?
IF amount != NULL AND client_id = ?
UPDATE accounts
SET pending_transfer = NULL, pending_amount = 0
WHERE bic = ? AND ban = ?
IF balance != NULL AND pending_transfer = ?

Error handling

CREATE MATERIALIZED VIEW orphaned_transfers AS
SELECT * FROM transfers WHERE client_id=null
PRIMARY KEY (id);
  • An additional challenge: there is a very improbable race if a transfer takes longer than 30 seconds. Drop me a message to kostja@scylladb.com if you guess what it is.
$ ./lightest pay -n 1000000 -w 200 --host db1
Total time: 384.056s, 2515 t/sec
Latency min/max/avg: 0.018s/46.906s/0.077s
Latency 95/99/99.9%: 0.120s/0.166s/0.221s
Errors: 0, Retries: 650, Recoveries: 274,
Not found: 41889, Overdraft: 23086
$ ./lightest pay -n 1000000 -w 4000 --host db1 --zipfian
Total time: 1443.835s, 693 t/sec
Latency min/max/avg: 0.009s/1327.792s/3.378s
Latency 95/99/99.9%: 1.072s/64.676s/551.144s
Errors: 0, Retries: 879726, Recoveries: 11150,
Not found: 178461, Overdraft: 19131

Conclusion

  • Consistency of SELECT statements
  • Excessive contention issues and retries
  • Uncontrolled growth of the Paxos table.

--

--

--

The monstrously-fast NoSQL database.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Is it easy or difficult to become a Web Developer?

The co-evolution of realtime and back office data stores — the two in a tango

Https, email ready WordPress site on Google Cloud Platform

Auditing Changes in Security Configuration with the Snowflake Grant Report Tool

HTTPs on CloudFront using Certificate Manager and aws-cdk

Building dynamic queries with Ecto

Comprehensive Guide to Mongo DB : Part 2

Schedule Backups for MySQL and MariaDB Databases

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ScyllaDB

ScyllaDB

The monstrously-fast NoSQL database.

More from Medium

Using the TiDB Upgrade Toolkit to Guarantee a Safe Database Upgrade

Data-Intensive Applications at Scale: What’s on the Agenda for Scylla Summit

Creating an Enterprise Product: Some Learnings

Druid complex Lookup using other dimensions, with native query or Golang