By Artem Koval, Data Analytics and AI/ML Practice Lead, ClearScale
From fraud detection and email filtering to self-driving cars and patient diagnosis, the potential applications of machine learning seem almost endless. But if you want to deploy a machine learning (ML) application, where do you begin? Do you build it in-house or outsource it? What resources do you need? What platform should you build it on?
In the first of this three-blog series, we cover some of the key considerations for developing and deploying ML applications.
The Basics of Machine Learning
Before making any big decisions regarding ML applications, it’s important to understand what ML is and how it works. While the definition may vary based on the source, machine learning is just what the name suggests. It’s the process of a machine learning something.
In simple terms, data is fed into an algorithm. Every time the algorithm processes data, it learns from it and gets better at predicting an answer or solution. The output of an ML algorithm that runs on data is a model. The model represents what the ML algorithm learned, including any rules, numbers, or other algorithm-specific data structures required to make predictions.
The ML Process
Briefly, the ML process typically includes these steps:
- Problem framing. The ML process starts with problem framing. Determine what you want to predict and what kind of observation data you need to make those predictions.
- Data collection. Next, it collects data that contains the answer or solution you want to predict. There are a few general requirements for data. It should:
- Include large, diverse data sets integrated from multiple sources and concerning various business entities, collected across multiple time frames.
- Have a large, diverse data management infrastructure, with multiple data platforms, tools, and processing engines. While data can come from multiple sources, the trend is toward consolidating as much as possible into a data lake. Data lakes are moving toward elastic clouds to facilitate automation, optimization, and economics.
- Be labeled as required by most ML algorithms, particularly in supervised learning. (The data labeling process takes raw data, such as images or text files, and adds one or more informative labels to provide context. That way an ML algorithm can learn from it.)
- Be cleaned prior to use by employing processes such as deduplication, normalization, and error correction.
- Be transformed into a form that the ML system can understand. Machines can’t understand data in formats such as images or text. That means they must be converted into numbers.
- Be split into two portions: a larger one devoted to training and a smaller one that’s reserved for evaluation.
- Algorithm selection. There are various algorithms from which to choose. Finding the right one is partly trial and error. However, the selection is influenced by the size and type of data you work with, the insights you seek from the data, and how those insights are used. (More on this later.) ML algorithms are generally classified into supervised, unsupervised, semi-supervised, and reinforcement learning. Currently, most business-oriented applications use supervised ML algorithms. Supervised ML algorithms apply what they learned in the past to new data using labeled examples in order to predict future events or behavior. Starting from the analysis of a known training data set, the algorithm produces an inferred function to make predictions about the output values. The algorithm can also compare its output with the correct, intended output and find errors in order to modify the model.
- Training. The algorithm is fed training data. It learns patterns and maps between the feature and the label. This yields a model that can make predictions based on unseen data. Additional data is fed to the algorithm, so it continues to “learn” and outputs a more finely tuned model.
- Evaluation. Once training is completed, the model is evaluated using the remaining data to assess its real-world performance. When the model draws its own conclusions based on its data sets and training, you can deploy it for prediction on real-world data in various applications.
ML in the Real World
In theory, the ML process seems simple enough. However, there’s much more involved in using it to develop real-world applications — and in building out a comprehensive solution that meets what can often be complex, multi-faceted customer needs.
That’s where cloud-native application development and ML experience, as well as powerful resources like those available from AWS, come into play. A good example is the machine learning services ClearScale did with an online floral company, which culminated in a proof of concept for a cost-effective, efficient-to-use-and-deploy recommendation engine.
Drawing on its expertise in using AWS services, ClearScale architected a solution that not only makes recommendations that are likely to align with customers’ preferences. It uses the behaviors of past customers to make accurate predictions about new shoppers.
About the Prototype
The prototype includes a search engine component that re-ranks results within a session. Therefore, it always presents customers with high-quality feeds. It can also deliver personalized notifications. Services such as Amazon Personalize, Amazon Pinpoint, and Amazon Elasticsearch Service figure prominently in the solution. But there’s more to it than ML-driven functions.
ClearScale designed the recommendation engine to be serverless. With no infrastructure to buy and maintain, the online floral company can quickly scale on demand and only pay for the resources used.
ClearScale also used an Infrastructure as Code (IaC) approach. That allows for automatically managing, monitoring, and provisioning resources rather than manually configuring discrete hardware devices and operating systems. That means if the application should fail for any reason, the online floral company can quickly and easily redeploy the architecture.
It’s these kinds of benefits that make ML application solutions all the more powerful. But they require the expertise and experience of a team that understands the full picture of app development and deployment.