Following a statistical study, a layman may well ask: "How much confidence
can we have in these conclusions?". A problem immediately arises because a statistician's technical understanding of the term "confidence" can differ radically from a layperson's.
Scope of the question
The question "how much confidence
can we have in these conclusions?" can have several ramifications, some of which are:
- *how reliable are the individual items of data being analysed: do the values measure what they are supposed to measure?
- *how extensive is the dataset?
- *how representative of the target population is the sample selected?
- *how accurately can the important quantities (possibly sizes of effects of interventions) be estimated from the dataset?
- *if testing that an intervention has an effect, what is the smallest size of effect that could reliably have been detected from such a dataset as was available.
The last two questions correspond broadly to outcomes of statistical analyses using confidence intervals and examining the statistical power of a test, but careful interpretation is needed. Other statistical approches to these questions are available.
Meaning of the term confidence
There is a difference in meaning between the common usage of the word 'confidence' and its statistical usage, which is often confusing to the layman. In common usage, a claim to 95% confidence in something is normally taken as indicating virtual certainty. In statistics, a claim to 95% confidence simply means that the researcher has seen something occur that only happens one time in twenty or less. If one were to roll two dice and get double six, few would claim this as proof that the dice were fixed, although statistically speaking one could have 97% confidence that they were. Similarly, the finding of a statistical link at 95% confidence is not proof, nor even very good evidence, that there is any real connection between the things linked.
When a study involves multiple statistical tests, some laymen assume that the confidence associated with individual tests is the confidence one should have in the results of the study itself. In fact, the results of all the statistical tests conducted during a study must be judged as a whole in determining what confidence one may place in the positive links it produces. If researchers conducting a study perform 40 independent statistical tests of the existence of an effect at a 5% significance level, they can expect about two of the tests to return false positives. If they in fact find 3 tests where the result of the test is "effect detected", the confidence associated with the conclusion, 'as the result of the survey', that the effect exists is actually about 32%; it's what one should expect to see two-thirds of the time even if the effect does not exist.