Automating the Machine Learning Workflow - AutoML

Motivation:

  1. Using Machine Learning will not require expert knowledge.
  2. All machine learning tasks follow the same basic flow.
  3. Difficult to find best fit hyper parameters.
  4. Hard to make hand made features.
  5. Fun.

Automatic Machine Learning in progress:

My motivation to write blog on this topic was Google's new project - AutoML.
Google's AutoML project focuses on deep learning, a technique that involves passing data through layers of neural networks. Creating these layers is complicated, so Google’s idea was to create AI that could do it for them.
There are many other open source projects, like AutoML and Auto-SKLEARN working towards a similar goal.

Goal:

The goal is to design the perfect machine learning “black box” capable of performing all model selection and hyper-parameter tuning without any human intervention. 
AutoML draws on many disciplines of machine learning, prominently including
  • Bayesian optimization - It is a sequential design strategy for global optimization of black box functions.
  • Regression models for structured data and big data
  • Meta learning - It is a field of machine learning where automatic algorithms are applied on meta-data about machine learning experiments, improving the efficiency of existing learning algorithms.
  • Transfer learning - It focuses on storing knowledge gained while solving one problem and applying it to a different but related problem
  • Combinatorial optimization.
The basic pipeline of every AutoML framework:
  • Data Preprocessing
    • Converting the data to tabular form.
    • Splitting the test, train and validation data.
  • Feature Engineering
    •  Label or one hot encoders for categorical variables.
    • TF-IDF or Bag Of Words for text variables.
  • Feature Stacking
    •  Combining different features
  • Decomposition
    • For high dimension data PCA is used.
    • For text data - SVD is applied after converting text to sparse matrix.
  • Feature Selection
    • Greedy Forward Selection
    • Greedy backward elimination
    • Using models like LASSO or Random Forest for implicit selection.
  • Model selection and Hyper Parameter tuning
    • Grid Search
    • Random Search
    • Bayesian Search
  • Evaluation of model

Reference:
  1. Automatic Machine learning-automl
  2. AutoML
  3. Bayesian optimization

Comments

  1. Thanks For Sharing Excellent Blog. Machine Learning is steadily moving away from abstractions and engaging more in business problem solving with support from AI and Deep Learning. With Big Data making its way back to mainstream business activities, now smart (ML) algorithms can simply use massive loads of both static and dynamic data to continuously learn and improve for enhanced performance. Pridesys IT Ltd

    ReplyDelete
  2. Excellent you have provided important data for us. It is essential and informative for everyone. Keep posting always. I am very thankful to you. Read more info about Best online machine learning course

    ReplyDelete

Post a Comment

Popular posts from this blog

Designing Distributed File Storage Systems - Explained with Google File System(GFS) - Part 1

Replication Explained - Designing Distributed Systems (Part 2)