The AIC is not a test on the model in the sense of hypothesis testing, rather it is a tool for model selection. Given a data set, several competing models may be ranked according to their AIC, with the one having the lowest AIC being the best. From the AIC value one may infer that e.g the top three models are in a tie and the rest are far worse, but one should not assign a value above which a given model is 'rejected'.
Over the remainder of this entry, it will be assumed that the model errors are normally and independently distributed. Let n be the number of observations and RSS be
Increasing the number of free parameters to be estimated improves the goodness of fit, regardless of the number of free parameters in the data generating process. Hence AIC not only rewards goodness of fit, but also includes a penalty that is an increasing function of the number of estimated parameters. This penalty discourages overfitting. The preferred model is the one with the lowest AIC value. The AIC methodology attempts to find the model that best explains the data with a minimum of free parameters. By contrast, more traditional approaches to modeling start from a null hypothesis. The AIC penalizes free parameters less strongly than does the Schwarz criterion.
AIC judges a model by how close its fitted values tend to be to the true values, in terms of a certain expected value.
Often, one wishes to select amongst competing models where the likelihood function assumes that the underlying errors are normally distributed. This assumption leads to data fitting.
For any set of models where the number of data points, n, is the same, one can use a slightly altered AIC. For the purposes of this article, this will be called . It differs from the AIC only through an additive constant, which is a function only of n. As only differences in the AIC are relevant, this constant can be ignored. is given by
This form is often convenient in that data fitting programs produce as a statistic for the fit. For models with the same number of data points, the one with the lowest should be preferred.
Since AICc converges to AIC as n gets large, AICc should be employed regardless of sample size (Burnham and Anderson, 2004).
McQuarrie and Tsai (1998: 22) define AICc as:
and propose (p. 32) the closely related measure:
McQuarrie and Tsai ground their high opinion of AICc and AICu on extensive simulation work.
where c is a variance inflation factor. QAIC adjusts for over-dispersion or lack of fit. The small sample version of QAIC is