Categorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects are grouped into categories, usually for some specific purpose. Ideally, a category illuminates a relationship between the subjects and objects of knowledge. Categorization is fundamental in language, prediction, inference, decision making and in all kinds of environmental interaction.
There are many categorization theories and techniques. In a broader historical view, however, three general approaches to categorization may be identified:
The classical Aristotelian view claims that categories are discrete entities characterized by a set of properties which are shared by their members. In analytic philosophy, these properties are assumed to establish the conditions which are both necessary and sufficient to capture meaning.
According to the classical view, categories should be clearly defined, mutually exclusive and collectively exhaustive. This way, any entity of the given classification universe belongs unequivocally to one, and only one, of the proposed categories.
Conceptual clustering developed mainly during the 1980s, as a machine paradigm for unsupervised learning. It is distinguished from ordinary data clustering by generating a concept description for each generated category.
Categorization tasks in which category labels are provided to the learner for certain objects are referred to as supervised classification, supervised learning, or concept learning. Categorization tasks in which no labels are supplied are referred to as unsupervised classification, unsupervised learning, or data clustering. The task of supervised classification involves extracting information from the labeled examples that allows accurate prediction of class labels of future examples. This may involve the abstraction of a rule or concept relating observed object features to category labels, or it may not involve abstraction (e.g., exemplar models). The task of clustering involves recognizing inherent structure in a data set and grouping objects together by similarity into classes. It is thus a process of generating a classification structure.
Conceptual clustering is closely related to fuzzy set theory, in which objects may belong to one or more groups, in varying degrees of fitness.
A cognitive approach accepts that natural categories are graded (they tend to be fuzzy at their boundaries) and inconsistent in the status of their constituent members.
Systems of categories are not objectively "out there" in the world but are rooted in people's experience. Conceptual categories are not identical for different cultures, or indeed, for every individual in the same culture.
Categories form part of a hierarchical structure when applied to such subjects as taxonomy in biological classification: higher level: life-form level, middle level: generic or genus level, and lower level: the species level. These can be distinguished by certain traits that put an item in its distinctive category. But even these can be arbitrary and are subject to revision.
Categories at the middle level are perceptually and conceptually the more salient. The generic level of a category tends to elicit the most responses and richest images and seems to be the psychologically basic level. Typical taxonomies in zoology for example exhibit categorization at the embodied level, with similarities leading to formulation of "higher" categories, and differences leading to differentiation within categories.