An 8 Step Process for Database Migration to AWS PostgreSQL
Dec 6, 2022
By Artem Koval, Director of Data Analytics, ClearScale
There are a lot of good reasons organizations want to modernize and migrate their Oracle or Microsoft SQL Server (MS SQL) databases to AWS (AWS PostgreSQL in particular). Making the business case for doing so isn’t difficult.
Where the issues often lie are in the actual migration. It’s not a simple process ─ migrating technologies never is. In the case of moving from MS SQL or Oracle to AWS PostgreSQL or another open-source relational database (RDB), things like data type conversions and procedural languages for MS SQL can create challenges.
But it’s not an impossible task, particularly since AWS offers a variety of tools to expedite the various steps. The following provides an overview of how a typical database modernization/migration project could work using two specific AWS tools:
- AWS Schema Conversion Tool (AWS SCT), a desktop tool that automates the conversion of database objects from different database migration systems to different RDS database targets.
- AWS Database Migration Service (AWS DMS), a service for migrating data to and from AWS database targets.
Database Migration to AWS PostgreSQL Process
While every database modernization/migration project is different, the following eight steps are typical for most of them. The tips included can help make the overall process go more smoothly.
Step 1: Migration Assessment
The first step entails reviewing the architecture of the existing application and then producing an assessment report that includes a network diagram with all the application layers. The application and database components that aren’t automatically migrated are identified. An estimate is created for the effort for manual conversion work.
Step 2: Schema Conversion
Next up is schema conversion. This consists of translating the data definition language (DDL) for tables, partitions, and other database storage objects from the syntax and features of the source database to the syntax and features of the target database. It’s a two-step process that entails converting the schema and then applying the schema to the target database.
Step 3: Conversion of Embedded SQL and Application Code
The third step addresses any custom scripts with embedded SQL statements, such as ETL scripts and reports, and the application code, so that they work with the new target database. This includes rewriting portions of application code written in Java, C#, C++, Perl, Python, or other languages that relate to JDBC/ODBC driver usage, ORM usage, establishing connections, data retrieval, and iteration.
AWS SCT is used here to scan the folder containing the application code. It extracts embedded SQL statements, converts as many as possible automatically, and flags the remaining statements for manual conversion actions.
Step 4: Data Migration
This is where the actual migration from the source database to the target database takes place. There are two ways to migrate the data. One is as a full load of existing data. The other is as a full load of existing data followed by continuous replication of data changes to the target. AWS DMS is extremely beneficial at this stage as it automates much of the database migration.
Step 5: Testing Converted Code
After the schema and application code have been converted and the data successfully migrated onto the AWS platform, the code is tested to ensure everything works as expected.
Step 6: Data Replication
The next step is data replication. AWS DMS can be used to manage and monitor the ongoing replication process with minimal load on the source database and without platform-specific technologies or components that need to be installed on either the source or target.
A one-time full load of existing data is relatively simple to set up and run. However, many production applications with large database backends can’t tolerate a downtime window that’s long enough to migrate all the data in a full load. For these databases, AWS DMS can use a proprietary Change Data Capture (CDC) process to implement ongoing replication from the source database to the target database.
There are a couple of things to note.
- For MS SQL server sources, replication must be enabled on the source server, and a distributed database that acts as its own distributor configured. In addition, the source database must be in full or bulk recovery mode to enable transaction log backups.
- For PostgreSQL sources, in order for AWS DMS to capture changes from a PostgreSQL database:
- The wal_level must be set to logical.
- max_replication_slots must be >= 1.
- max_wal_senders must be >= 1.
In addition, tables to be included in the CDC must have a primary key.
Step 7: Deployment to AWS and Go Live
The step entails testing the production database migration to ensure that all data can be successfully migrated during the cutover window. Monitor the source and target databases to make sure the initial data load is completed, cached transactions are applied, and data has reached a steady state before cutover.
You can also use the “Enable Validation” option available in the Task settings pane of AWS DMS. If you select this option, AWS DMS validates the data migration by comparing the data in the source and the target databases.
Create a simple rollback plan in the event that an unrecoverable error occurs during the Go Live window. AWS SCT and AWS DMS work together to preserve the original source database and application. So the rollback plan will mainly consist of scripts to point connection strings back to the original source database.
Step 8: Cutover
You have a few options for the cutover. If you can take your application offline for an extended period during write operations, you can use AWS DMS full-load task settings or one of the offline migration options for your data migration. The read traffic will continue during the migration, but the write traffic must be stopped. Because all the data needs to be copied from the source database, source database resources such as I/O and CPU are used.
For flash-cut migration, the objective is to minimize downtime. This relies on continuous data replication (CDC) from the source database to the target database. All read/write traffic will continue on the current database while the data is migrated. Because all the data needs to be copied from the source database, source server resources such as I/O and CPU are used. Make sure the data migration activity doesn’t affect any application performance SLAs.
Active/active database configuration involves setting up a mechanism to keep the source and target databases in sync while both databases are being used for write traffic. This involves more work than offline or flash-cut migration but provides more flexibility during migration.
There’s also incremental migration, in which you migrate your application in smaller parts instead of performing a one-time, full cutover. This strategy could have many variations, based on your current application architecture or the refactoring you’re willing to do in the application.
You can also use the strangler pattern to add new independent microservices to replace parts of an existing, monolithic legacy application. These independent microservices have their own tables that aren’t shared or accessed by any other part of the application.
After deployment, use AWS DMS to monitor the number of rows inserted, deleted, and updated, as well as the number of DDL statements issued per table while a task is running. Additional metrics are available from the Amazon CloudWatch Logs dashboard.
The Amazon Aurora Option
There’s another option for database modernization/migration: Amazon Aurora, in conjunction with Amazon Aurora Serverless. Amazon Aurora is a global-scale relational database service built for the cloud and is PostgreSQL compatible. Amazon Aurora Serverless is an on-demand, autoscaling configuration for Amazon Aurora. It enables you to run your database on AWS without managing database capacity.
You create a database, specify the desired database capacity range, and connect your applications. You pay on a per-second basis for the database capacity that you use when the database is active and migrate between standard and serverless configurations with a few steps in the Amazon RDS console.
Among the benefits of this option, it’s up to 3x faster than PostgreSQL and one-tenth the cost of commercial databases. It offers continuous backup to S3, and replication across availability zones. It also provides up to 64TB per database instance and up to 15 read replicas. Importantly, there are no license fees. There’s much more.
Learn More, Know More About Database Migration to PostgreSQL
AWS offers numerous resources ─ like those referenced in this blog ─ to make database migration to PostgreSQL easier. But, there’s still a lot that goes into pulling off a successful project. Expertise and experience can play a big role in helping avoid the “gotchas” and dealing with the unexpected. That’s why so many organizations opt to team up with ClearScale for our database migrations.
To learn more, watch this webinar: Getting Technical with AWS Database Migrations. It shows you how ClearScale migrates MySQL or PostgreSQL databases to AWS.
Another great resource: A Guide to Migrating Legacy Databases to AWS. It describes the benefits of migrating and modernizing databases on AWS, and the AWS database technologies needed to get the most value out of the cloud when it comes to database management.