By Artem Koval, Data Analytics and AI/ML Practice Lead, ClearScale
From fraud detection and email filtering to self-driving cars and patient diagnosis, the potential use cases of machine learning seem almost endless. But if you want to deploy a machine learning (ML) application, where do you begin? Do you build it in-house or outsource it? What resources do you need? What platform should you build it on?
In this blog post, we delve into the crucial factors to consider when developing and deploying ML applications.
Understanding Machine Learning: The Basics
Before diving into the intricacies of ML applications, it’s crucial to grasp the fundamental concept of machine learning. While the definition may vary based on the source, machine learning is just what the name suggests. It’s the process of a machine learning something.
In layman’s terms, machine learning is an aspect of data science related to artificial intelligence (AI) that involves feeding data into an algorithm. With each iteration, the algorithm learns from the data, enhancing its ability to predict outcomes or solutions. The end product of an ML algorithm processing data is a model. This model encapsulates what the ML algorithm learned, including any rules, numerical values, or other algorithm-specific data structures necessary for making predictions.
The ML Process
Briefly, the ML process typically includes these steps:
- Problem framing. The ML process starts with problem framing. Determine what you want to predict and what kind of observation data you need to make those predictions.
- Data collection. Next, it collects data that contains the answer or solution you want to predict. There are a few general requirements for data. It should:
- Include large, diverse data sets integrated from multiple sources and concerning various business entities, collected across multiple time frames.
- Have a large, diverse data management infrastructure, with multiple data platforms, tools, and processing engines. While data can come from multiple sources, the trend is toward consolidating as much as possible into a data lake. Data lakes are moving toward elastic clouds to facilitate automation, optimization, and economics.
- Be labeled as required by most ML algorithms, particularly in supervised learning. (The data labeling process takes raw data, such as images or text files, and adds one or more informative labels to provide context. That way an ML algorithm can learn from it.)
- Be cleaned prior to use by employing processes such as deduplication, normalization, and error correction.
- Be transformed into a form that the ML system can understand. Machines can’t understand data in formats such as images or text. That means they must be converted into numbers.
- Be split into two portions: a larger one devoted to training and a smaller one that’s reserved for evaluation.
- Algorithm selection. There are various algorithms from which to choose. Finding the right one is partly trial and error. However, the selection is influenced by the size and type of data you work with, the insights you seek from the data, and how those insights are used. (More on this later.) ML algorithms are generally classified into supervised, unsupervised, semi-supervised, and reinforcement learning. Currently, most business-oriented applications use supervised ML algorithms. Supervised ML algorithms apply what they learned in the past to new data using labeled examples in order to predict future events or behavior. Starting from the analysis of a known training data set, the algorithm produces an inferred function to make predictions about the output values. The algorithm can also compare its output with the correct, intended output and find errors in order to modify the model.
- Training. The algorithm is fed training data. It learns patterns and maps between the feature and the label. This yields a model that can make predictions based on unseen data. Additional data is fed to the algorithm, so it continues to “learn” and outputs a more finely tuned model.
- Evaluation. Once training is completed, the model is evaluated using the remaining data to assess its real-world performance. When the model draws its own conclusions based on its data sets and training, you can deploy it for predictive analytics on real-world data in various applications.
ML in the Real World
In theory, the ML process seems simple enough. However, there’s much more involved in using it to develop real-world applications — and in building out a comprehensive solution that meets complex, multi-faceted customer needs.
That’s where cloud-native application development and ML experience, as well as powerful resources like those available from AWS, come into play. A good example is the machine learning services ClearScale performed for an online floral company. This project culminated in a proof of concept for a cost-effective, efficient-to-use-and-deploy recommendation engine.
Drawing on its expertise in using AWS services, ClearScale architected a solution that not only makes recommendations that align with customers’ preferences but also uses the behaviors of past customers to make accurate predictions about new shoppers.
About the Prototype
The prototype includes a search engine component that re-ranks results within a session. Therefore, it always presents customers with high-quality feeds. It can also deliver personalized notifications. Services such as Amazon Personalize, Amazon Pinpoint, and Amazon Elasticsearch Service figure prominently in the solution. But there’s more to it than ML-driven functions.
ClearScale also used an Infrastructure as Code (IaC) approach. That allows for automatically managing, monitoring, and provisioning resources rather than manually configuring discrete hardware devices and operating systems. If the application should fail for any reason, the online floral company can quickly redeploy the architecture.
It’s these kinds of benefits that make ML application solutions all the more powerful. But they require the expertise and experience of a team like ClearScale that understands the full picture of app development and deployment.