Definitions

# Design of experiments

Design of experiments, or experimental design, is the design of all information-gathering exercises where variation is present, whether under the full control of the experimenter or not. (The latter situation is usually called an observational study.) Often the experimenter is interested in the effect of some process or intervention (the "treatment") on some objects (the "experimental units"), which may be people. Design of experiments is thus a discipline that has very broad application across all the natural and social sciences.

## Early example of experimental design

In 1747, while serving as surgeon on HM Bark Salisbury, James Lind, the ship's surgeon, carried out a controlled experiment to develop a cure for scurvy.

Lind selected 12 men from the ship, all suffering from scurvy, and divided them into six pairs, giving each group different additions to their basic diet for a period of two weeks. The treatments were all remedies that had been proposed at one time or another. They were

• A quart of cider per day
• Twenty five gutts of elixir vitriol three times a day upon an empty stomach,
• Half a pint of seawater every day
• A mixture of garlic, mustard and horseradish, in a lump the size of a nutmeg
• Two spoonfuls of vinegar three times a day
• Two oranges and one lemon every day.

The men who had been given citrus fruits recovered dramatically within a week. One of them returned to duty after 6 days and the other became nurse to the rest. The others experienced some improvement, but nothing was comparable to the citrus fruits, which were proved to be substantially superior to the other treatments.

In this study his subjects' cases "were as similar as I could have them", that is he provided strict entry requirements to reduce extraneous variation. The men were paired, which provided replication. From a modern perspective, the main thing that is missing is randomized allocation of subjects to treatments.

## A formal mathematical theory

The first statistician to consider a formal mathematical methodology for designing experiments was Sir Ronald A. Fisher, in his landmark The Design of Experiments. As an example, he described how to test the hypothesis that a certain lady could distinguish by flavour alone whether the milk or the tea was first placed in the cup. While this sounds like a frivolous application, it allowed him to illustrate the most important means of experimental design:

1. Comparison

In many fields of study it is hard to reproduce measured results exactly. Comparisons between treatments are much more reproducible and are usually preferable. Often one compares against a standard or traditional treatment that acts as baseline.

There is an extensive body of mathematical theory that explores the consequences of making the allocation of units to treatments by means of some random mechanism such as tables of random numbers, or the use of randomization devices such as playing cards or dice. Provided the sample size is adequate, the risks associated with random allocation (such as failing to obtain a representative sample in a survey, or having a serious imbalance in a key characteristic between a treatment group and a control group) are calculable and hence can be managed down to an acceptable level. Random does not mean haphazard, and great care must be taken that appropriate random methods are used.

Measurements are usually subject to variation, both between repeated measurements and between replicated items or processes. Multiple measurements of replicated items are necessary so the variation can be estimated.

4. Blocking

Blocking is the arrangement of experimental units into groups (blocks) that are similar to one another. Blocking reduces known but irrelevant sources of variation between units and thus allows greater precision in the estimation of the source of variation under study.

Orthogonality concerns the forms of comparison (contrasts) that can be legitimately and efficiently carried out. Contrasts can be represented by vectors and sets of orthogonal contrasts are uncorrelated and independently distributed if the data are normal. Because of this independence, each orthogonal treatment provides different information to the others. If there are T treatments and T – 1 orthogonal contrasts, all the information that can be captured from the experiment is obtainable from the set of contrasts.

6. Use of factorial experiments instead of the one-factor-at-a-time method. These are efficient at evaluating the effects and possible interactions of several factors (independent variables).

Analysis of the design of experiments was built on the foundation of the analysis of variance, a collection of models in which the observed variance is partitioned into components due to different factors which are estimated and/or tested.

Some efficient designs for estimating several main effects simultaneously were found by Raj Chandra Bose and K. Kishen in 1940 at the Indian Statistical Institute, but remained little known until the Plackett-Burman designs were published in Biometrika in 1946. About the same time, C. R. Rao introduced the concepts of orthogonal arrays as experimental designs. This was a concept which played a central role in the development of Taguchi methods by G. Taguchi, which took place during his visit to Indian Statistical Institute in early 1950s. His methods were successfully applied and adopted by Japanese and Indian industries and subsequently were also embraced by US industry albeit with some reservations.

In 1950, Gertrude Mary Cox and William Gemmell Cochran published the book Experimental Designs which became the major reference work on the design of experiments for statisticians for years afterwards.

Developments of the theory of linear models have encompassed and surpassed the cases that concerned early writers. Today, the theory rests on advanced topics in linear algebra, abstract algebra and combinatorics.

As with all other branches of statistics, there is both classical and Bayesian experimental design.

Some important contributors to the field of experimental designs are R. A. Fisher, R. C. Bose, C. R. Rao, Keifer, Jagdish/Jaya N. Srivastava, Genichi Taguchi, Ravindra Khattree, D. Raghavarao and Raymond Myeres.

## Example

This example is attributed to Harold Hotelling. It conveys some of the flavor of those aspects of the subject that involve combinatorial designs.

The weights of eight objects are to be measured using a pan balance and set of standard weights. Each weighing measures the weight difference between objects placed in the left pan vs. any objects placed in the right pan. Each measurement has a random error. The average error is zero; the standard deviations of the probability distribution of the errors is the same number σ on different weighings; and errors on different weighings are independent. Denote the true weights by

$theta_1, dots, theta_8.,$

We consider two different experiments:

1. Weigh each object in one pan, with the other pan empty. Let Xi be the measured weight of the ith object, for i = 1, ..., 8.
2. Do the eight weighings according to the following schedule and let Yi be the measured difference for i = 1, ..., 8:


begin{matrix} & mbox{left pan} & mbox{right pan} mbox{1st weighing:} & 1 2 3 4 5 6 7 8 & text{(empty)} mbox{2nd:} & 1 2 3 8 & 4 5 6 7 mbox{3rd:} & 1 4 5 8 & 2 3 6 7 mbox{4th:} & 1 6 7 8 & 2 3 4 5 mbox{5th:} & 2 4 6 8 & 1 3 5 7 mbox{6th:} & 2 5 7 8 & 1 3 4 6 mbox{7th:} & 3 4 7 8 & 1 2 5 6 mbox{8th:} & 3 5 6 8 & 1 2 4 7 end{matrix}

Then the estimated value of the weight θ1 is

$widehat\left\{theta\right\}_1 = frac\left\{Y_1 + Y_2 + Y_3 + Y_4 - Y_5 - Y_6 - Y_7 - Y_8\right\}\left\{8\right\}.$

Similar estimates can be found for the weights of the other items. For example

$widehat\left\{theta\right\}_2 = frac\left\{Y_1 + Y_2 - Y_3 - Y_4 + Y_5 + Y_6 - Y_7 - Y_8\right\}\left\{8\right\}.$

The question of design of experiments is: which experiment is better?

The variance of the estimate X1 of θ1 is σ2 if we use the first experiment. But if we use the second experiment, the variance of the estimate given above is σ2/8. Thus the second experiment gives us 8 times as much precision for the estimate of a single item, and estimates all items simultaneously, with the same precision. What is achieved with 8 weighings in the second experiment would require 64 weighings if items are weighed separately. However, note that the estimates for the items obtained in the second experiment have errors which are correlated with other.

Many problems of the design of experiments involve combinatorial designs, as in this example.

## Statistical control

It is best for a process to be in reasonable statistical control prior to conducting designed experiments. When this is not possible, proper blocking, replication, and randomization allow for the careful conduct of designed experiments.

## References

• Box,G. E, Hunter,W.G., Hunter, J.S., Hunter,W.G., "Statistics for Experimenters: Design, Innovation, and Discovery", 2nd Edition, Wiley, 2005, ISBN: 0471718130
• Pearl, J. Causality: Models, Reasoning and Inference, Cambridge University Press, 2000.