Posts

Showing posts from 2017

Automating the Machine Learning Workflow - AutoML

Motivation: Using Machine Learning will not require expert knowledge. All machine learning tasks follow the same basic flow. Difficult to find best fit hyper parameters. Hard to make hand made features. Fun. Automatic Machine Learning in progress: My motivation to write blog on this topic was Google's new project - AutoML . Google's AutoML project focuses on deep learning , a technique that involves passing data through layers of neural networks . Creating these layers is complicated, so Google’s idea was to create AI that could do it for them. There are many other open source projects, like AutoML and Auto-SKLEARN working towards a similar goal. Goal: The goal is to design the perfect machine learning “black box” capable of performing all model selection and hyper-parameter tuning without any human intervention.  AutoML draws on many disciplines of machine learning, prominently including Bayesian optimization - It is a sequential design strategy for gl

Research Paper Summary - Detecting Outliers in Categorical Data

Image
Original Paper: An Optimization Model for Outlier Detection in Categorical Data Authors : Zengyou He, Xiaofei Xu, Shengchun Deng What are outliers? An out-lier is an observation that lies at an abnormal distance from other values in a random sample from a population. Before abnormal observations can be singled out, it is necessary to characterize normal observations. There are many ways to detect outliers in continuous variables but there exist only few techniques which can detect outliers in categorical variables.    Example : Suppose you have 1000 people choose between apples and oranges. If 999 choose oranges and only one person chooses apple, I would say that that person is an out-lier. We use measurement as a way to detect anomalies. With categorical data you have to explain why choosing an apple is considered an anomaly (that data point does not behave as the rest 99.9% of the population).  One technique to detect outliers in categorical variables is using a

Natural Language Interface for Relational Database - What is it and How it can be made?

Motivation. Database management systems are are systems used for accessing and manipulating information. The data can be manipulated using a set of keywords following of syntax rules. To perform operations on the database, it is required to learn the structured query language(SQL) . Hence, a user who do not know SQL cannot directly access information from the database. Natural Language Interface for Database(NLIDB). What is it? It is a proposed solution to the problem of accessing information in a database using natural language like English, having no technical knowledge about language like database languages like SQL. It is a tool which can understand a user's query in natural language, convert it into appropriate SQL query, so that the user can get the required information from the database. Various approaches for building NLIDB. Symbolic or Rule based approach. This is the approach in which the translation from natural language to SQL is done using human-crafted

Research Paper Summary - A robust Modification on K Nearest Neighbor Classifier

Original Paper : A Modification of K Nearest Neighbor Classifier Authors : Hamid Parvin, Hoseinali Alizadeh, Befrouz Minati K - Nearest Neighbor Classifier KNN is one of the simplest classification algorithm that is used widely across the problems where there is little or no idea of prior distribution of the data. More about KNN can be learn from this or this . Here is a naive implementation of KNN from scratch. Take a look at it as it will improve your understanding about the internals of the algorithm. A cup of fact: You can tweet KNN's implementation from scratch. Here is one such implementation( not by me ) Modified K - Nearest Neighbor Classifier The main problem in KNN is that not all points or nearest neighbors have same importance. Also, when the dimensions of the data increases, the accuracy of KNN decreases ( curse of dimensionality). In this paper, the author has proposed a modification to tackle above problems by some pre-computation of dataset and ge

How they work - The 3 Magical Functions of Python : map, filter and lambda

Image
Python provides several keywords which enable a functional programming approach for python. These functions are all convenience features in that they can be written in Python fairly easily  and can be replaced by custom code with some more line of code. As most of the beginners to python are confused or do not efficiently use these features, I will here try to explain those features in a simple manner. LAMBDA : Lambda function also called anonymous function or unbounded function can be used as any normal function in python but without any name.Starting with the keyword lambda followed by parameters before the colon and return value after it. Syntax - lambda arguments : expression Ex: Getting any number modulo 3 ( of-course this can be done using simple %3 ) is equivalent to The above python expression can be stated as "declare a nameless function taking a parameter named x. Perform the operation x%3. The return value of this nameless function will by the result of