It’s becoming a more frequent question in IT departments: Should we migrate from Apache Cassandra to Amazon DynamoDB? It’s not a decision to be taken lightly. Making a move isn’t simply a “lift-and-shift” endeavor.
Migration by itself can be challenging and time-consuming. Migrating between NoSQL databases, which are what both Cassandra and DynamoDB are, can be even more difficult because of the scale of the data and the rate of change. As such, it requires careful consideration of the advantages and disadvantages.
Cassandra vs. DynamoDB
Cassandra is an open-source, column-oriented database. Among its benefits is its rapid speed for writes and reads. It also offers constant availability, cross-data-center replication, linear scalability, and high performance.
Among the downsides of Cassandra is its architecture requires significant operational overhead. It can also be difficult and expensive to find IT professionals with the necessary expertise.
DynamoDB, an AWS fully-managed database service, is a key-value and document-oriented store. There’s no hardware provisioning, setup, or configuration required. Amazon takes care of resiliency by having a multi-AZ set up by default, so there are no worries about the durability of the data. All writes are synchronously written to multiple Availability Zones (AZs) and asynchronously replicated to one more. There’s also no performance degradation as data volume increases. Amazon divides data automatically into partitions, providing the option to assign capacity at the partition level.
Migrating from any other datastore to Dynamo doesn’t require an understanding of the corresponding DynamoDB capacity requirements. Amazon offers a simple capacity planning model, based on per partition RCU/WCU/Data and Storage limits, which makes capacity planning easy.
DynamoDB is also closely integrated with other AWS services, including Lambda/SQS, which makes it easier for moving to a serverless architecture.
The serverless provisioning model of DynamoDB also eliminates the need to overprovision database infrastructure and is provided without the need for specialized resourcing or licensing.
According to Amazon, some of its customers report that DynamoDB-backed applications run with as much as a 70% total cost of ownership savings when compared to Cassandra.
Real-World Cassandra to DynamoDB Migration
So, what happens when a company decides to make a move from Cassandra to DynamoDB? At ClearScale, we recently had a customer who was interested in doing just that.
The company had been storing and processing large amounts of data on AWS EC2-hosted Cassandra NoSQL clusters. Its largest cluster was handling more than 15 billion reads and writes per day with 300 k/s write peaks. The company wanted to unload the management of its clusters.
One of ClearScale’s tasks was to develop the architecture design and work plan for a migration to DynamoDB. It would then be implemented as a proof of concept to help the company determine if the migration was the right way to go. The plan outlined three stages.
In the first, the goal was to move the required data to DynamoDB. The data had a well-defined time to live (TTL) of 60 days, so it was sufficient to set up a pipeline writing to DynamoDB that would run in parallel to the existing pipeline. Once the pipeline ran for 60 days, the data in the source (Cassandra) and target (DynamoDB) data stores would be the same. The dual pipelines would continue operating to maintain ongoing replication between the datasets.
The Next Phase
In the next stage, the goal was to validate that the data in DynamoDB was consistent with the original data in Cassandra. To do that, ClearScale conducted:
- Identity and correctness testing. This entailed using a special system to ensure the data written to Cassandra and DynamoDB was consistent, although it wouldn’t be identical due to formatting and other issues unique to each database.
- Load testing. Clustered synthetical load testing was used to prove the hypothesis that DynamoDB would sustain 75K reads per second (RPS) at a steady-state and 250k RPS at the peak.
- Cost testing. The underlying idea was to show that DynamoDB would cost much less than Cassandra, in terms of both human resources and management costs.
Throughout the process, the Cassandra source databases remained completely functional. Once the data in DynamoDB was validated, the next step was to cut over from Cassandra to DynamoDB. This required modifying the readers to access data only from DynamoDB, and to decommission the Cassandra writers and nodes. There was minimal downtime resulting from that cutover.
The Proof is in the Results
The project demonstrated that the on-demand and provisioned capacity of DynamoDB were less costly than maintaining EC2 fleets for Cassandra. Administration and infrastructure costs were reduced. The new solution also proved to be resilient, reliable, scalable, secure, and fault-tolerant. For this client, the proof of concept showed that the move to DynamoDB was the right choice.
To determine if a migration — or another solution — is right for your company, contact ClearScale. We can help you develop, implement, and evaluate solutions to meet your specific needs.