Automating the Machine Learning Workflow - AutoML
- Using Machine Learning will not require expert knowledge.
- All machine learning tasks follow the same basic flow.
- Difficult to find best fit hyper parameters.
- Hard to make hand made features.
Automatic Machine Learning in progress:My motivation to write blog on this topic was Google's new project - AutoML.
Google's AutoML project focuses on deep learning, a technique that involves passing data through layers of neural networks. Creating these layers is complicated, so Google’s idea was to create AI that could do it for them.
There are many other open source projects, like AutoML and Auto-SKLEARN working towards a similar goal.
Goal:The goal is to design the perfect machine learning “black box” capable of performing all model selection and hyper-parameter tuning without any human intervention.
AutoML draws on many disciplines of machine learning, prominently including
- Bayesian optimization - It is a sequential design strategy for global optimization of black box functions.
- Regression models for structured data and big data
- Meta learning - It is a field of machine learning where automatic algorithms are applied on meta-data about machine learning experiments, improving the efficiency of existing learning algorithms.
- Transfer learning - It focuses on storing knowledge gained while solving one problem and applying it to a different but related problem
- Combinatorial optimization.
- Data Preprocessing
- Converting the data to tabular form.
- Splitting the test, train and validation data.
- Feature Engineering
- Label or one hot encoders for categorical variables.
- TF-IDF or Bag Of Words for text variables.
- Feature Stacking
- Combining different features
- For high dimension data PCA is used.
- For text data - SVD is applied after converting text to sparse matrix.
- Feature Selection
- Greedy Forward Selection
- Greedy backward elimination
- Using models like LASSO or Random Forest for implicit selection.
- Model selection and Hyper Parameter tuning
- Grid Search
- Random Search
- Bayesian Search
- Evaluation of model