One-Step Streaming Migration from DynamoDB into Scylla

DynamoDB Streams

Introduced in 2014, DynamoDB Streams can be enabled on any DynamoDB table to capture modification activities into a stream that can be consumed by user applications. Behind the scenes, a Kinesis stream is created into which modification records are written.
For example, given a DynamoDB table created using the following command:

aws dynamodb create-table \
--table-name migration_test \
--attribute-definitions AttributeName=id,AttributeType=S AttributeName=version,AttributeType=N \
--key-schema AttributeName=id,KeyType=HASH AttributeName=version,KeyType=RANGE \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5
aws dynamodb update-table \
--table-name migration_test \
--stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES
{
"TableDescription": {
...
"StreamSpecification": {
"StreamEnabled": true,
"StreamViewType": "NEW_AND_OLD_IMAGES"
}
}
}
export STREAM_ARN=$(aws dynamodb describe-table --table-name migration_test | jq -r ".Table.LatestStreamArn")aws dynamodbstreams describe-stream --stream-arn $STREAM_ARN{
"StreamDescription": {
"StreamArn": "arn:aws:dynamodb:eu-west-1:277356164710:table/migration_test/stream/2020-08-19T19:26:06.164",
"StreamLabel": "2020-08-19T19:26:06.164",
"StreamStatus": "ENABLING",
"StreamViewType": "NEW_IMAGE",
"CreationRequestDateTime": "2020-08-19T22:26:06.161000+03:00",
"TableName": "migration_test",
...
}
}
{
"awsRegion": "us-west-2",
"dynamodb": {
"ApproximateCreationDateTime": 1.46480527E9,
"Keys": {
"id": {"S": "id1"},
"version": {"N": "10"}
},
"OldImage": {
"id": {"S": "id1"},
"version": {"N": "10"},
"data": {"S": "foo"}
},
"NewImage": {
"id": {"S": "id1"},
"version": {"N": "10"},
"data": {"S": "bar"}
},
"SequenceNumber": "400000000000000499660",
"SizeBytes": 41,
"StreamViewType": "NEW_AND_OLD_IMAGES"
},
"eventID": "4b25bd0da9a181a155114127e4837252",
"eventName": "MODIFY",
"eventSource": "aws:dynamodb",
"eventVersion": "1.0"
}

Scylla Migrator and DynamoDB Streams

The functionality we are introducing today is aimed at helping you perform live migrations of DynamoDB Tables into your Scylla deployment without application downtime. Here’s a sketch of how this works:

  1. The migrator, on start-up, verifies that the target table exists (or creates it with the same schema as the source table) and enables the DynamoDB Stream of the source table. This causes inserts, modifications and deletions to be recorded on the stream;
  2. A snapshot of the source table is transferred from DynamoDB to Scylla;
  3. When the snapshot transfer completes, the migrator starts consuming the DynamoDB Stream and applies every change to the target table. This runs indefinitely until you stop it.
  • Provisioned read throughput (or auto-scaling) on the source DynamoDB table;
  • Sufficient executors and resources on the Spark cluster;
  • Sufficient resources on the Scylla cluster.

Walkthrough

Let’s do a walkthrough on how this process is configured on the Scylla Migrator. First, we need to configure the source section of the configuration file. Here’s an example:

source:
type: dynamodb
table: migration_test
credentials:
accessKey:
secretKey:
region: us-west-2
scanSegments: 32
readThroughput: 1
throughputReadPercent: 1.0
maxMapTasks: 8
target:
type: dynamodb
table: mutator_table
endpoint:
host: http://scylla
port: 8000
credentials:
accessKey: empty
secretKey: empty
scanSegments: 8
streamChanges: true
spark-submit --class com.scylladb.migrator.Migrator \
--master spark://spark-master:7077 \
--conf spark.driver.host=spark-master \
--conf spark.scylla.config=./config.yaml.scylla
scylla-migrator-assembly-0.0.1.jar
20/08/19 19:26:05 INFO migrator: Source is a Dynamo table and change streaming requested; enabling Dynamo Stream
20/08/19 19:26:06 INFO DynamoUtils: Stream not yet enabled (status ENABLING); waiting for 5 seconds and retrying
20/08/19 19:26:12 INFO DynamoUtils: Stream enabled successfully
20/08/19 19:26:18 INFO migrator: Done transferring table snapshot. Starting to transfer changes
+---------------+-----+
|_dynamo_op_type|count|
+---------------+-----+
| DELETE | 2|
| MODIFY | 5|
+---------------+-----+

A Test Run

We’ve tested this new functionality with a load generation tool that repeatedly applies random mutations to a DynamoDB table on a preset number of keys. You may review the tool here: https://github.com/iravid/migrator-dynamo-mutator.

Summary

We’ve seen in this post how you may transfer the contents of live DynamoDB tables to Scylla’s Alternator interface using the Scylla Migrator. Please give this a try and let us know how it works!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ScyllaDB

ScyllaDB

The monstrously fast and scalable NoSQL database.