A mathematical outlier, which is a value vastly different from the majority of data, causes a skewed or misleading distribution in certain measures of central tendency within a data set, namely the mean and range, according to About Statistics. The affected mean or range incorrectly displays a bias toward the outlier value. The median and mode values, which express other measures of central tendency, are largely unaffected by an outlier.
The purpose of analyzing a set of numerical data is to define accurate measures of central tendency, also called measures of central location. The Engineering Statistics Handbook defines an outlier as “an observation that lies an abnormal distance from the other values in a random sample from a population.”
Lærd Statistics explains that the mean is the single measurement most influenced by the presence of outliers because its result utilizes every value in the data set. The median, which is the middle score within a data set, is the least affected. The interquartile range, which breaks the data set into a five number summary (lowest value, first quartile, median, third quartile and highest value) is used to determine if an outlier is present. The Engineering Statistics Handbook suggests that outliers should be investigated before being discarded to potentially uncover errors in the data gathering process.