When it’s time to refresh your data landscape in the name of data modernization, there are several key steps to consider.  It doesn’t matter if you do these solo, contract the work to a partner, or some mix in between.  These concepts are portable across clouds, but since we are huge fans of AWS, we’ll list examples of AWS services relative to each.

  • Get rid of any bias you have towards your data landscape. For example, don’t assume that because you’re stuck in an expensive monolithic database today that you need to lift & shift into a cheaper monolithic database. Consider alternatives such as micro-databases.
  • Define the business objectives and quantify the value of delivering.  Your business objectives become the goalposts to measure success while the value helps guide budgetary decisions.
  • Define data producers and data consumers. Understand who needs the data and why.  Don’t be afraid to challenge the validity of a downstream consumer.  For example, perhaps that TPS report is no longer as critical as the one person who uses it thinks that it is.
  • Select your deployment environment(s) choosing on-prem, cloud, or a hybrid.  Consider the value each environment can bring. Also, keep in mind the egress cost to move data between the options if a hybrid approach is chosen.  AWS offers a broad ecosystem of tools, services, and partners to assist with everything from data migration and modernization to security, compliance, and specific feature development.
  • Security, security, security – this should be at the forefront of the project and should become the foundation.  Security cannot be a downstream consideration. There’s the standard stuff such as in-flight and at-rest encryption, data access, etc. There are many different regulatory compliances depending on what type of data you have. And you can also track/log access for an audit trail. Whatever your security posture is, just remember – cloud providers are responsible for the security of the cloud while you are responsible for security in the cloud.

In the AWS Security ecosystem, AWS Management and Governance services offer a simple single control plane to manage AWS resources, no matter the scale. IAM and Cognito enable you to create and govern access for both internal and external users. Security Hub and Guard Duty help with proactive and reactive security measures, respectively. CloudTrail keeps track of who does what.

  • Identify data ingestion needs.  Where will the data come from?  What quality is that data?  How is the data structured (or is it unstructured)?  What velocity do these sources produce data?  Are you aiming for an ETL strategy, an ELT strategy, or a hybrid of both?  Do you need batch, micro-batch, or streaming ingestion?

AWS Glue is an ETL powerhouse and includes crawlers for building a data catalog.  With AWS Glue DataBrew, we’ve even seen customers turn over self-service ETL to their internal end-users!  If you have streaming or near-real-time needs, pipelines can be built using Amazon Kinesis. AWS Direct Connect can give you a dedicated connection to send your data into AWS.  For even more fun, AWS Snowball lets you load data to a portable device and ship it back to AWS for loading.

  • There’s huge value in how you choose to store your data. Don’t settle for just one technology, choose the right combination and put them behind abstraction (such as microservices).  You can organize your data into data lakes, data marts, data warehouses, data lake houses, operational data stores, or OLTP systems.  Inside each of these are options for self-managed, managed, or serverless as well.
  • Who, when, and how will your data be consumed?  You can provide raw access to the data via query tools, governed access through APIs or streaming interfaces, or visual access through various viz tools.  Exploratory data analysis (EDA) with visualization tools can help quickly gain a deeper understanding of data through tools such as Amazon Managed Grafana or Amazon QuickSight.  Additionally, cold data can be queried through serverless analytic tools such as S3 Select or Amazon Athena.
  • Finally, how will you migrate into your target architecture? You can rehost (aka lift & shift), replatform (move workloads from technology A into technology B with similar output), refactor (aka modernize), or go totally greenfield and build something new and shiny from the ground up.  Whatever your pattern, there are many AWS tools to facilitate such as AWS CloudEndure Migration Factory Solution,  Data Transfer Hub, Database Freedom, and AWS Database Migration Service.

And you certainly don’t have to go it alone. ClearScale’s data and analytics services can help you with whatever data modernization project you need. Download our eBook Leverage Big Data Analytics with AWS to learn more.

And if you want to learn more about data infrastructure, check out part one of this blog.