How to Accelerate Data Migrations to the Cloud Using PgBouncer
Dec 2, 2021
A few months ago, we had the opportunity to work with a client that was developing a specialized customer relationship management (CRM) platform. Our team helped the company execute a data migration to the Amazon Web Services (AWS) cloud from Oracle. This allowed them to take advantage of the latest cloud database technologies available, such as PostgreSQL, Amazon Redshift, and AWS Glue.
After reviewing the customer’s investments in pl/sql scripts and stored-procedure code, our team recommended the CRM provider go with a hybrid solution. We suggested using AWS Glue for data extraction, transformation, and loading (ETL) in combination with PgBouncer. PgBouncer is a connection pooler that improves performance when making short-term connections between PostgreSQL database servers and unique application users.
We used PgBouncer-rr dynamic rewriting/refactoring SQL queries with middleware technology to facilitate dynamic SQL conversion between the customer’s legacy Oracle database and the new database servers on the cloud. Our team also built a metadata table to hold the patterns required for the dynamic conversions rather than relying on custom Python code for every query.
In addition, we added an event-driven architecture to the existing PgBouncer-rr arch to streamline the overall process. So, in using PgBouncer, we enabled our customer to route queries and rewrite functions in transit. This saved the organization’s engineers from having to refactor applications entirely. Below is an in-depth look at how we approached the rewriting part of the work to give you a sense of what’s possible with PgBouncer.
Rewriting Queries in Transit
Through PgBouncer, we allowed our customer to alter client queries programmatically before they reached the database server. Doing so created the opportunity to manipulate application queries en route to servers without having to modify any application code which, of course, can take significant time, money, and energy depending on the size of the application.
The rewrite function we implemented uses Python and is dynamically loaded from an external module specified in the configuration:
Rewrite_query_py_module_file = ./rewrite_query.py
Implementing a query rewrite function is relatively straightforward when queries are made in fixed formats that are easily detectable and manipulated using regular expression searches or replace logic. We recommend using arrays with pattern matching for more complex queries.
To see how this might work, below is a definition of a rewriting query function that would return a new query value based on certain inputs:
def rewrite_query(username, query):
q1 = “SELECT name FROM products”
q2 = “SELECT name FROM users”
if re.match(q1, query):
new_query = “SELECT * FROM products ORDER BY name”
elif re.match(q2, query):
new_query = “SELECT * FROM users ORDER BY name”
else:
new_query = query
return new_query
In this function, we assign string values to two variables: “q1” and “q2”. Then, we put those variables through a series of conditional statements that determine the value of a “new_query” variable. This effectively changes the query that eventually reaches the server.
Along with this function, we would also create a table named “products” with two columns, “name” and “price,” in one database, and a table named “users” with two columns, “s.no” and “name,” in a second database. We would then set up pattern matching using an array to identify and compare incoming query syntax differences between on-premises SQL and PostgreSQL queries. With this setup, incoming queries can be different from any SQL database syntax, so long as you use the PostgreSQL jdbc driver to receive data and the Postgres-compatible database on AWS as the query destination.
Avoiding Application Code Changes With PgBouncer
The solution outlined above is an effective way to get around having to make code changes to an application when migrating from on-premises databases to cloud-based databases. PgBouncer creates an intermediary step through which engineers can make changes to queries in transit before they reach the database server. As a result, code changes happen outside of the core application in a middleware solution.
For those trying to speed up data migrations to the cloud, this approach can be a gamechanger.
The AWS Babelfish Option
For those interested in even more functionality, take a look at Amazon Web Services’ (AWS) new Babelfish for Aurora PostgreSQL. It’s a built-in capability of Amazon Aurora, available at no additional cost. It allows Aurora to understand commands from applications written for Microsoft SQL Server.
Babelfish enables Aurora PostgreSQL to understand T-SQL, Microsoft SQL Server’s proprietary SQL dialect. That way apps that were originally written for SQL Server can work with Aurora with few code changes.
Babelfish source code is now available on GitHub.
Interested in learning more about how ClearScale helps enterprises accelerate data migrations to the AWS cloud?
Call us at 1-800-591-0442
Send us an email at sales@clearscale.com
Fill out a Contact Form
Read our Customer Case Studies