Being great at data science to keep your business ahead of the competition curve is finally becoming more affordable and less complex to manage as open source technology becomes commonplace.
In this POV, we’ll explore the emergence of Automated Machine Learning (AutoML) which is making it much more feasible to use machine learning algorithms to develop machine learning algorithms. This is how quickly the AI industry is progressing today.
We are already seeing the data science community explore ways to make analytics and machine learning tasks cheaper, faster, easier and increasingly automous and self-remediating. Business leaders should prepare for automated data science to become commonplace – not necessarily as a way to entirely replace data scientists, but to boost significantly their capabilities and provide a starting point to ML. AutoML is a step in this direction.
Automated data science and AutoML is fast becoming a reality
The data science lifecycle covers a wide range of tasks that data scientists perform to extract insights and knowledge from data. In any project, teams need to explore and clean data, select, test, train, and tune algorithms, and then monitor and maintain performance over time (see Exhibit 1).
Exhibit 1: The Data Science Process
Source: CRISP-DM, 2018
Automated data science (AutoDS) is a dizzying new development in the AI technology industry that promises to automate some or all of these tasks for two key benefits:
Enterprises are struggling to attract and then retain data science talent across global markets. Some of the biggest challenges with the industry are thus around improving the productivity and quality of work of data science teams, who spend an inordinate amount of time with low-value work in areas like data preparation. Consequently, the industry is responding by focusing on automation and technology enablement to improve processes across the lifecycle to get the most out of the available talent and automate repetitive tasks.
Machine learning is one of the techniques that data scientists can deploy to solve business problems using data, making automated machine learning (or AutoML) a subset of the automated data science movement. Automated data science and AutoML have the potential to accelerate the adoption of these technologies within businesses, with lowered costs and higher productivity of data science teams.
Not every ML task can actually be automated
Automation can be applied to multiple tasks in the data science process. The biggest areas of attention today are in the following area:
Multiple repetitive tasks in the data science process can be automated to some extent. However, it is important to note that there is no “end-to-end” automated data science solution that can expertly execute all data science activities. Setting up impactful ML and other data science projects require the meaningful combination of domain knowledge, human judgement, and technology.
Related advancements in the research community include the concepts of transfer learning, and the availability of pre-trained models.
The open source community, tech majors, and a select few startups are fueling AutoML development
Enterprises will typically invest in analytics and machine learning platforms and tools that enable data scientists to go through the data science lifecycle and deploy models into production environments. The introduction of automated data science is thus quickly becoming the territory of the major and emerging analytics and ML platforms. Simultaneously, the data science community is advancing AutoML through open source libraries such as auto-sklearn. Some of the key players in this fast-developing market niche are:
Exhibit 2: SigOpt’s Automation-Led Model Improvement
Source: SigOpt, 2018
AutoML has a growing place in AI, with implications for both existing and new adopters
The scope of automated data science and AutoML will only continue to expand, not least with the forces of Google and technology majors behind the movement. The open source community is constantly updating the multiple libraries available, and more and more commercial AutoML products are making their way into the market. However, the industry has several challenges to overcome for this concept to gain more maturity:
Think of AutoML as a way to remove rote tasks from data scientists and creating a starting point for ML for non-data scientists. Over time, this will enable organizations to have broader or diversified teams with data scientists plus other resources. As a result, you might lower the heavy reliance of data scientists, and use their talents for the most complex tasks where they are needed.
Bottom-line: You’re not going to replace data scientists anytime soon, but AutoML will make AI easier over time
Despite the challenges, automated data science and automated machine learning have a great potential to improve the democratization of AI tools, and speed up the laborious process of deriving insights from data. If your organization has an established AI-focused data science team, you will most likely benefit from investing in AutoML tools such as SigOpt or Auto-sklearn that can ease the workload burden of analysts and data scientists. If you are new to AI-focused data science, AutoML options such as Baidu’s EZDL can help non-programmers start to explore the data science process and create an appetite for AI.
Register now for immediate access of HFS' research, data and forward looking trends.
Get StartedIf you don't have an account, Register here |
Register now for immediate access of HFS' research, data and forward looking trends.
Get Started