Showing posts from May, 2017

Research Paper Summary - Detecting Outliers in Categorical Data

Original Paper: An Optimization Model for Outlier Detection in Categorical Data Authors : Zengyou He, Xiaofei Xu, Shengchun Deng What are outliers? An out-lier is an observation that lies at an abnormal distance from other values in a random sample from a population. Before abnormal observations can be singled out, it is necessary to characterize normal observations. There are many ways to detect outliers in continuous variables but there exist only few techniques which can detect outliers in categorical variables.    Example : Suppose you have 1000 people choose between apples and oranges. If 999 choose oranges and only one person chooses apple, I would say that that person is an out-lier. We use measurement as a way to detect anomalies. With categorical data you have to explain why choosing an apple is considered an anomaly (that data point does not behave as the rest 99.9% of the population).  One technique to detect outliers in categorical variables is using a