In information theory
and machine learning
, information gain
is an alternative synonym for Kullback–Leibler divergence
In particular, the information gain about a random variable X obtained from an observation that a random variable A takes the value A=a is the Kullback-Leibler divergence DKL(p(x|a) || p(x|I) ) of the prior distribution p(x|I) for x from the posterior distribution p(x|a) for x given a.
The expected value of the information gain is the mutual information I(X;A) of X and A — i.e. the reduction in the entropy of X achieved by learning the state of the random variable A.
In machine learning this concept can be used to define a preferred sequence of attributes to investigate to most rapidly narrow down the state of X. Such a sequence (which depends on the outcome of the investigation of previous attributes at each stage) is called a decision tree. Usually an attribute with high information gain should be preferred to other attributes.
In general terms, the expected information gain is the change in information entropy from a prior state to a state that takes some information as given:
Let be the set of all attributes and the set of all training examples,
defines the value of a specific example for attribute , specifies the entropy.
The information gain for an attribute is defined as follows: