Identifying Student Candidates in Higher Education using Machine Learning
Apr 24, 2018


Higher education institutions have for years faced the challenge of trying to not only identify prospective students but optimizing the recruitment and retention processes so as to not spend resources on prospective students that ultimately decide not to attend the college or university. Schools typically track information about the ultimate outcome of a student deciding to attend or not, and this information has been used to help staff determine the proper approach to minimizing marketing spend where appropriate.
One organization, however, felt that there had to be a better way to assist institutions in analyzing this historical data to look for potential trends that could better inform administrators and recruiters. Working closely with a number of institutions and executives, they developed a set of success indices that they felt would more clearly define which potential students would ultimately enroll and stay until graduation.
The Challenge – Analyzing Large Volumes of Data
The company was unsure how best to leverage this information effectively, so they reached out to ClearScale, an AWS Premier Consulting Partner, to help them find a path forward. After an extensive review of the client’s situation, their proposal, and requirements, ClearScale posited that utilizing the robust AWS Machine Learning services would be an ideal solution.
The wealth of data around current and former students had to be fully analyzed and to do so the AWS Machine Learning needed a template with which it could assess success or failure. The root of this template lay in the success indices developed by the client. Using regressional analysis algorithms, the Machine Learning engine sifted through the entire student dataset and assigned a ranking or index value to each student or prospective student. In order to assign an index value, the algorithms used various standard academic, co-curricular, and financial aid variables to help sculpt an accurate model.
The Solution – AWS Machine Learning
To successfully implement the Machine Learning solution, ClearScale decided to create three layers of data processing: access to student information, cleansing the data and then running the analysis.
AWS Machine Learning Diagram
To access the student information, ClearScale implemented a solution that allowed staff to import large datasets as a background process, either through direct integration with the institution’s databases or through a manual import of the data. The process would then determine if there were errors in the data, identify any duplicate or absent information, and then, based on how the administrator decided to set it up, allow for automatically converting the data from the institution’s unique data schema into a common set of universal values.
From there, ClearScale enabled a process that then reviewed the data for quality for each student index. This pre-analysis process allowed us not only to validate that the data was correct but also to determine if enough student records existed to meet the sample data threshold that had been set. Failure to achieve this set threshold meant that there would not be enough records to provide a statistically accurate model to apply to the new set of prospective students.
Finally, once the threshold had been met, the Machine Learning analysis could begin. Using the available student data, ClearScale designed the service to leverage existing AWS processes by creating the sample set from the student data used to train the Machine Learning algorithms. From there, the sample set was then evaluated before finally being assessed for predictive patterns using a binary algorithm that predicts the probability of student success.
Because the dataset, processing, and analysis effort can potentially be a large activity, the three different processes outlined above were implemented in a manner that allowed them to run parallel to each other. Depending on the size and complexity of the data, it could take between 15 minutes to an hour to create a predictive model.
The Benefits
Using the predictive model, institutions would be able to take potential students’ information and have the system compare and predict their potential for success in a university or college.
The client was able to use this implementation to work directly with institutions that had sufficient student data to then reduce their marketing spend between 25% and 35% through adjusting expenditures based on the predictive model and applying that to targeted marketing tactics. This, in turn, allowed staff to focus their efforts more effectively on cultivating prospective students’ interest earlier in the engagement process and ultimately optimizing the institution’s limited resources.
Since 2011, ClearScale has demonstrated an aptitude to take a client’s most complex and challenging requirements and deliver solutions that are scalable and forward-thinking. ClearScale can take ideas from concept to solution rapidly and deliver successfully working closely with our client partners.
Learn more about ClearScale’s AWS machine learning services here.