In statistics, a confounding variable (also confounding factor, lurking variable, a confound, or confounder) is an extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the independent variable. The methodologies of scientific studies therefore need to control for these factors to avoid what is known as a type 1 error: A 'false positive' conclusion that the dependent variables are in a causal relationship with the independent variable. Such a relation between two observed variables is termed a spurious relationship. Thus, confounding is a major threat to the validity of inferences made about cause and effect, i.e. internal validity, as the observed effects should be attributed to the confounder rather than the independent variable.
For example, assume that a child's weight and a country's gross domestic product (GDP) rise with time. A person carrying out an experiment could measure weight and GDP, and conclude that a higher GDP causes children to gain weight, or that children's weight gain boosts the GDP. However, the confounding variable, time, was not accounted for, and is the real cause of both rises.
By definition, a confounding variable is associated with both the probable cause and the outcome. The confounder is not allowed to lie in the causal pathway between the cause and the outcome: If A is thought to be the cause of disease C, the confounding variable B may not be solely caused by behaviour A; and behaviour B shall not always lead to behaviour C. An example: Being female does not always lead to smoking tobacco, and smoking tobacco does not always lead to cancer. Therefore, in any study that tries to elucidate the relation between being female and cancer should take smoking into account as a possible confounder. In addition, a confounder is always a risk factor that has a different prevalence in two risk groups (e.g. females/males). (Hennekens, Buring & Mayrent, 1987).
Though criteria for causality in statistical studies have been researched intensely, Pearl has shown that confounding variables cannot be defined in terms of statistical notions alone; some causal assumptions are necessary. In a 1965 paper, Austin Bradford Hill proposed a set of causal criteria.. Many working epidemiologists take these as a good place to start when considering confounding and causation. However, these are of heuristic value at best. When causal assumptions are articulated in the form of causal graph, a simple criterion is available, called backdoor, to identify sets of confounding variables.
There are various ways to modify a study design to actively exclude or control confounding variables:
All these methods have their drawbacks. This can be clearly seen in this example: A 45 years old Afro-American from Alaska, avid football player and vegetarian, working in education, suffers from a disease and is enrolled into a case-control study. Proper matching would call for a person with the same characteristics, with the sole difference of being healthy – but finding such one would be an enormous task. Additionally, there is always the risk of over- and undermatching of the study population. In cohort studies, too many people can be excluded; and in stratification, single strata can get too thin and thus contain only a small, non-significant number of samples.
One major problem is that confounding variables are not always known or measurable. This leads to 'residual confounding' - epidemiological jargon for incompletely controlled confounding. Hence, randomization is often the best solution as, if performed successfully on sufficiently large numbers, all confounding variables (known and unknown) will be equally distributed across all study groups.