Definitions

# Familywise error rate

In statistics, familywise error rate (FWER) is the probability of making one or more false discoveries, or type I errors among all the hypotheses when performing multiple pairwise tests.

## Classification of m hypothesis tests

The following table defines some random variables related to the m hypothesis tests.

# declared non-significant # declared significant Total
# true null hypotheses $U$ $V$ $m_0$
# non-true null hypotheses $T$ $S$ $m - m_0$
Total $m - R$ $R$ $m$

The m specific hypotheses of interest are assumed to be known, but the number of true null hypotheses m0 and of alternative hypotheses m1, are unknown. V is the number of Type I errors (hypotheses declared significant when they are actually from the null distribution). T is the number of Type II errors (hypotheses declared not significant when they are actually from the alternative distribution). R is an observable random variable, while S, T , U, and V are unobservable random variables.

In terms of random variables,

$mathrm\left\{FWER\right\} = Pr\left(V ge 1\right), ,$

or equivalently,

$mathrm\left\{FWER\right\} = 1 -Pr\left(V = 0\right).$

## What constitutes a family?

In confirmatory studies (i.e., where one specifies a finite number of a priori inferences), families of hypotheses are defined by which conclusions need to be jointly accurate or by which hypotheses are similar in content/purpose. As noted by Hochberg and Tamhane (1987), "If these inferences are unrelated in terms of their content or intended use (although they may be statistically dependent), then they should be treated separately and not jointly" (p. 6).

For example, one might conduct a randomized clinical trial for a new antidepressant drug using three groups: existing drug, new drug, and placebo. In such a design, one might be interested in whether depressive symptoms (measured, for example, by a Beck Depression Inventory score) decreased to a greater extent for those using the new drug compared to the old drug. Further, one might be interested in whether any side effects (e.g., hypersomnia, decreased sex drive, and dry mouth) were observed. In such a case, two families would likely be identified: 1) effect of drug on depressive symptoms, 2) occurrence of any side effects.

Thus, one would assign an acceptable Type I error rate, alpha, (usually .05) to each family and control for family-wise error using appropriate multiple comparison procedures. In the case of the first family, effect of antidepressant on depressive symptoms, pairwise comparisons among groups (here, there would be three possible comparisons) would be jointly controlled using techniques such as Tukey's Honestly Significant Difference (HSD) comparison procedure or a Bonferroni correction. In terms of the side effect profile, one would likely be interested in controlling for Type I error in terms of all side effects considered jointly so that decisions about the side effect profile would not be erroneously inflated by allowing each side effect and each pairwise comparison among groups to receive its own uncorrected alpha. By the Bonferroni inequality, allowing each side effect and comparison its own alpha would result in a Type I error of .05 * 3 side effects * 3 pairwise comparisons per side effect = 0.45 (i.e., 45% chance of making a Type I error). Thus, a more appropriate control for side effect family-wise error might divide alpha by three (.05/3 = .0167) and allocate .0167 to each side effect multiple comparison procedure. In the case of Tukey's HSD (a strong control multiple comparison procedure), one would determine the critical value of Q, the studentized range statistic, based on the alpha of .0167.