Revolutionizing HealthTech: ClearScale’s AWS-based Data Lake Solution for Enhanced Scalability and Flexibility
Apr 4, 2023
By Mateusz Cebularz, Data Engineering Practice Lead, ClearScale
The HealthTech industry thrives on – and requires – innovation, efficiency, and flexibility. So, it’s not surprising that one of our HealthTech customers is continually seeking better, faster ways to develop and deliver services to meet the changing needs of its customer base.
One of those ways entailed working with ClearScale to optimize its multi-tenant SaaS platform. The endeavor yielded a modernized data analytics platform that can harness digital technologies and implement new capabilities to enable more growth opportunities and enhance overall efficiency.
Specifically, the HealthTech company’s user base had grown rapidly. The company wanted to extend its analytic platform capabilities to process and analyze big data to deliver better insights. The company also wanted to leverage a more modern technology stack (AWS) to increase the availability, scalability, and maintainability of its analytics platform. At the same time, the HealthTech company hoped to optimize costs while building a foundation for future machine learning capabilities.
The Challenges
The project involved several challenges. The company’s rapid growth of users and data resulted in the need to re-examine its technology stack. IT leadership knew that it had to migrate its relational DB-based analytical solution to a scalable data lake on AWS capable of processing and analyzing big data.
The company’s growing user base and product lines were rapidly increasing its volume of both structured and unstructured data. The relational databases at the core of its analytical platform weren’t designed to manage the large, constantly growing volumes of data. This was impeding scalability, flexibility, and performance.
In addition, the tight coupling of the company’s legacy, monolithic extract, transform, and load (ETL) processes with operational database schema were taking an extensive amount of time and were unable to present real-time data. The platform was preventing users from accessing current information for analysis.
Meanwhile, the complexity of the legacy SQL-based ETL process was compounded by the lack of a unified data model and orchestration framework. That affected the ability to automate and streamline data-driven decision-making.
As a result, there was no ability to scale out to increase the performance of ETL processes. Meanwhile, scaling up provided only a marginal increase in analytics preparation speed. Fault tolerance, disaster recovery, and monitoring capabilities were also inadequate for the HealthTech company’s requirements. ETL processes impacted OLTP database performance and ultimately impacted the core SaaS platform’s release cycles and user experience.
The Solution
ClearScale developed a new analytical platform, using AWS best practices, which leverages a data lake. The advantage of using a data lake in this situation is that it can store both structured and unstructured data. A data lake also allows the company to run different types of analytics — from dashboards and visualizations to big data processing, real-time analytics, and machine learning — to guide better decisions.
Data can be collected from multiple sources ─ such as application events, relational data from operational databases, and different types of data providers. This allows for scaling to data of any size while saving time in defining data structures, schema, and transformations.
In addition, data lakes offer the ability to understand what data is in the lake through crawling, cataloging, and indexing of data. They also enable the generation of different types of insights including reporting on historical data and using machine learning models to forecast likely outcomes and suggest a range of prescribed actions.
AWS Services Used
The solution uses Amazon Simple Storage Service (S3) object storage at the storage layer for low-cost storage of structured and unstructured data and unlimited scalability. Event streaming is handled by Amazon Kinesis Data Streams (KDS). The scalable, real-time data streaming service continuously captures gigabytes of data per second from hundreds of thousands of sources. It then makes them available in milliseconds to enable real-time analytics.
Amazon Kinesis Firehose is used to combine application events into batches. It then transforms them to compact Apache Parquet, a columnar storage format, before loading them to the data lake to minimize the amount of storage used and increase security.
Other AWS services, such as Amazon AppFlow, a fully managed integration service, are used to securely transfer data from other sources such as Salesforce.
Larger binary events are streamed to the data lake through a serverless solution based on Amazon API Gateway and AWS Lambda functions. AWS Glue Data Catalog is used to store metadata. Amazon Athena, a serverless, interactive query service processes initial data in conjunction with AWS Glue. AWS Glue jobs, based on Apache Spark or Python jobs, are used for data processing with advanced complexity logic. The orchestration layer of the data processing jobs, which can include several steps, is based on state machines in AWS Step Functions, a serverless orchestration service.
Amazon Redshift, a fully managed, petabyte-scale cloud data warehouse service, stores the most frequently used pre-aggregated data. The solution also incorporates Amazon QuickSight, a scalable, serverless, embeddable, machine learning-powered BI service. It enables the easy creation and publishing of interactive dashboards that deliver rich data insights.
The Results
The key benefits of ClearScale’s AWS-based data lake solution include:
- Enhanced scalability and maintainability
- The ability to build new data insights across both structured and unstructured data
- The ability to quickly, easily, and cost-effectively build a variety of analytical dashboards or reports
- A more secure data export layer
- Opportunities to securely share analytical and raw data using the Amazon Redshift Data sharing model, AWS Data Exchange
- Opportunities to enable predictive analytics with ML-based recommendation engine
The HealthTech company’s analytics platform is now more flexible, scalable, and cost-effective. This enables the company to provide enhanced user experiences and better current and future capabilities.
The company also now has access to more detailed data insights. That includes the trends that drive margins and the platform features that tenants of its SaaS platform use. Armed with this information, the organization can continue innovating and adding features and capabilities to better serve its customers.
Working with ClearScale has helped the company ensure its technology ecosystem is designed for long-term success. And, by taking advantage of AWS services, the company’s IT stack will continue evolving to support its growth.
Get in touch today to speak with a cloud expert and discuss how we can help:
Call us at 1-800-591-0442
Send us an email at sales@clearscale.com
Fill out a Contact Form
Read our Customer Case Studies