Counterbalancing in the Design of Experiments
Contents
Counterbalanced Orders: Analyzing the Data
An Alternative Analysis
Counterbalancing Stimulus Combinations
Considering Order Effects
Counterbalancing is usually thought of as a method for controlling order effects in a repeated measures design (see the notes on variance and experimental design).
In a counterbalanced design to control for order effects, we use separate groups of subjects, each group receiving treatments in a different order. If there are two treatments, (say, A and B), Group 1 receives treatments in the order AB, and Group 2 receives treatments in the order BA. If you create a group for each possible order, then the variance due to order effects becomes a separate source of variance, making for a more powerful design.
Counterbalancing is a more general technique, though. You often want to manipulate two or more independent variables within-subjects, but you do not want to use all possible combinations of the variables for every subject. These applications of counterbalancing are described below.
You can counterbalance many features of an experiment. The technique may apply to the design of your own project, solving problems that might otherwise confuse you. In these notes, we explain how to analyze the data from counterbalanced designs, and show you how to use them in situations where other experimental features need to be balanced.
Counterbalanced Orders: Analyzing the Data
Suppose you compare two treatments, A and B, in a repeated measures design, so that all subjects receive both the treatments. To avoid confounding due to order effects, one group of subjects receives Treatment A first, and a second group receives Treatment B first.
To analyze the data using SPSS, you will create a data matrix with either three or four columns. The first column (which is optional) will contain the participant ID or subject number. One column should contain the group identifier, either 1 or 2. Two other columns will contain the two dependent variable measures, one for the each treatment condition.
You have a mixed design, with one between-groups variable and one within-subjects variable. You can test the main effect for Treatment, which is presumably the effect of primary interest. You can also look at a main effect for Groups, but this will be of no real interest. If it does occur it would be hard to interpret. However, the interaction between Groups and Treatment is of interest. If it is significant, it tells you that there was an overall order effect.
To see what an interaction means in this context, consider the results shown in the following graph, where the dependent variable is a measure of performance. You can see that there is a main effect for Treatment, with the scores being higher for Treatment B. However, the effect was bigger for Group 2 than for Group 1 (the difference is larger for Group 2). Group 2 received Treatment B first, Treatment A second. So, the superiority of Treatment B was greater when Treatment B came first. In other words, there may have been a fatigue effect or some sort of carry over effect.
If you use a counterbalanced design with more than two within-subject conditions, the procedure is much the same. You will have several groups, and several levels for the within-subjects factor, but again an interaction between Groups and Conditions will indicate the presence of overall order effects.
In the analysis described above, the data matrix contained a column for Treatment A and a column for Treatment B. Sometimes it may be more convenient to organize the data by trials, so that there is a column for Trial 1 and a column for Trial 2.
You can still look at the main effect for Groups, and there will be a main effect for Trials in this case. To determine if the treatments are different, which presumably is the question of primary interest, you must look at the interaction between Groups and Trials. If it is significant, it tells you that there was an overall difference between the treatments.
To understand why the interaction is equivalent to a Treatment effect, the results graphed above are shown again in the following graph, where the horizontal axis represents Trials. You can see that performance was better on Trial 1 for Group 2, who received Treatment B on Trial 1. Performance was better on Trial 2 for Group 1, who received Treatment B on Trial 2. In other words, Treatment B was superior for both groups. Note also that there is a small main effect for trials, scores being higher overall on Trial 1, as discussed in the previous section.
Counterbalancing Stimulus Combinations
Now we consider other applications of counterbalancing. Sometimes you have two or more within-subject independent variables, but do not want to use all combinations of the variables for every subject. You may find that a better arrangement is to use a balanced combination of variables.
Suppose a researcher wants to look at gender bias when subjects evaluate the qualifications of a job candidate. To do this she will create resumés that describe various qualifications. Sometimes the name on the resumé will be male (e.g., Bill) and sometimes it will be female (e.g., Mary).
The researcher would like to use a repeated measures design to maximize power, but there's an obvious problem. Suppose a resumé is given to a subject with the name Bill on it. The researcher does not want the subject to see the same resumé with the name Mary on it.
You often encounter situations like this. You have a variable that you want to manipulate (sex of name in this case), but you need to combine it with other stimulus features that must be different on each trial (the resumé in this case).
The solution is to use two groups of subjects. We create two different resumés. Each will come with both a male name and a female name, so we have four stimuli altogether, call them 1M, 1F, 2M, and 2F. Group 1 receives one M stimulus and one F stimulus, say, 1M and 2F. Group 2 receives the other two stimuli, 1F and 2M.
Note that no-one sees the same resumé twice, but everyone sees one resumé with a male name and one with a female name. The design is balanced.
Ignore order effects for the time being (we will need to deal with them later). We can analyze the data in the same way we did in the previous section. You create a data matrix with one column containing the group identifier, either 1 or 2, and two other columns containing the subjects' responses to the male resumé and the female resumé.
In this mixed design, you can test the main Name effect (Male versus Female). The main effect for Groups will be of no interest. However, the interaction between Groups and Name tells you whether or not there was a significant difference between the two resumés.
To see how this comes about, consider the following graph. There is a main effect for Name, ratings being higher for the Male resumés. However, the effect was bigger for Group 1 than for Group 2. Group 1 saw Resumé 1 with the Male name and Resumé 2 with the Female name. So, the preference for males was greater when the male name was on Resumé 1. If Resumé 1 had a female name, Resumé 2 with the male name was not so strongly preferred.
You might want to use several resumés to make sure that the results are not restricted to only the two resumés you selected. The approach is not very different. If we have eight resumés, each will come with both a male name and a female name, so we have 16 stimuli. Now Group 1 receives four M stimuli and four F stimuli, say, 1M, 2M, 3M,4M, 5F, 6F, 7F, and 8F. Group 2 receives the other eight stimuli.
The easiest approach to the data analysis is to calculate preliminary averages for each subject for the four male resumés and the four female resumés. Then the data matrix will be the same as before - one column containing the group identifier, and two other columns containing the subjects' mean responses to the male resumés and the female resumés. The results of the analysis can be interpreted as they were before. You can look at the main Name effect directly. The Group by Name interaction will tell you whether there was any systematic difference between the Resumés 1 through 4 and Resumés 5 through 8.
In the discussion so far, we have ignored order effects. The simplest way to handle order effects is to use independently generated random orders for each subject. In fact, in the case where there are eight different resumés, a simple random order would clearly be the best way to proceed.
In the case using only two resumés, a counterbalanced order would be preferable to a simple random order. In this case, we will have a double counterbalancing. Here's how it works:
We needed two groups to counterbalance the stimulus combinations (the mixing of resumés and the sex of the name). To counterbalance the orders we also need two groups, making four groups in all. The following table shows how each group is treated. It shows the combination of name with resumé, and the order in which the two resumés are presented.
Group |
| Female Name | ||
Group 1 | Resumé 1 | Trial 1 | Resumé 2 | Trial 2 |
Group 2 | Resumé 1 | Trial 2 | Resumé 2 | Trial 1 |
Group 3 | Resumé 2 | Trial 1 | Resumé 1 | Trial 2 |
Group 4 | Resumé 2 | Trial 2 | Resumé 1 | Trial 1 |
Notice how both the trial orders and the resumé-name combinations are balanced.
You have two choices when conducting the data analysis. You may proceed exactly as before, using a mixed design with one between-groups factor and one within-subjects factor. You can examine the Name main effect. An interaction between Group and Name will tell you that either there is an order effect, or there is a resumé effect, or both. You will not be able to separate these possibilities.
A more precise analysis will require you to define two between-groups factors, one for Order and one for Resumé-Name combination. The following table shows how these factors are defined.
Order | Combination |
| Female Name | ||
1 |
1 | Resumé 1 | Trial 1 | Resumé 2 | Trial 2 |
2 | 1 | Resumé 1 | Trial 2 | Resumé 2 | Trial 1 |
1 | 2 | Resumé 2 | Trial 1 | Resumé 1 | Trial 2 |
2 | 2 | Resumé 2 | Trial 2 | Resumé 1 | Trial 1 |
Now your data matrix will need two columns to represent the groups, one for Order and the other for Combination. Specify that there are two between-groups factors.
As before, there will be a test of the main effect for Name. The results will also provide tests of several interactions. The Order by Name interaction will tell you whether or not there are any overall Order effects. The Combination by Name interaction will tell you whether or not there are any overall differences between the two resumés. The triple interaction of Order by Combination by Name will indicate whether or not there is any interaction between Order and the resumés (a rather complicated idea to understand, but probably not very important if it does occur).