American Society for Quality Query : Analysis of a Factorial Experiment (Partially Confounded ) Author(s): D. R. Cox, Agnes Herzberg, Cuthbert Daniel and D. J. Finney Source: Technometrics, Vol. 9, No. (Feb., 967), pp. 67-79 Published by: Taylor & Francis, Ltd. on behalf of American Statistical Association and American Society for Quality Stable URL: http://www.jstor.org/stable/667 Accessed: 8-6-7 9:6 UTC JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://about.jstor.org/terms American Statistical Association, American Society for Quality, Taylor & Francis, Ltd. are collaborating with JSTOR to digitize, preserve and extend access to Technometrics This content downloaded from 6...55 on Thu, 8 Jun 7 9:6:9 UTC
TECHNOMETRICS VOL. 9, No. FEBRUARY 967 Queries NORMAN L. JOHNSON, Editor* QUERY Analysis of a Factorial Experiment (Partially Confounded ) The query reproduced below (in slightly paraphrased form) seemed to be interest and possibly capable of varying interpretation that it was felt to justify rather fuller discussion than usual in this section. Accordingly, solutions were sought from several leading authorities in the field of analysis of experimental data. We were gratified to receive the three replies here reproduced. It is instructive to notice their basic general agreement (in regard, for example, to faults in the design used) as well as the different aspects which they choose to emphasize. We will be pleased to receive further comments on this problem, and to consider them for publication in this section of a later issue of TECHNOMETRICS. During the development of a soap pad it was desirable to estimate consumer response to a change in level of certain factors. The factors (all quantitative) considered important were amount of detergent (D), coarseness of pad (C), and solubility of the detergent (S). Thirty-two panelists (judges) were available, and it was decided to obtain the desired information during a week end. It was further considered expedient not to have a given panelist evaluate (compare) more than two differently formulated soap pads on each of two days (Saturday and Sunday). The three factors were each set at two fixed levels, and each level of a given factor was crossed with each level of the other two factors (a factorial) giving a total of eight soap pad formulations (treatments). The eight formulations (treatments) were divided among four blocks of two treatments each, and this was replicated four times, as shown in Table One. Two panelists were assigned to each of the sixteen blocks, and each panelist evaluated the two treatments in duplicate. Hence, there were X X = 8 scores for each block and 8 X = scores for each replicate, X = 8 scores in all. Since the eight treatments in a given replicate are divided among four blocks, three degrees of freedom will be confounded with blocks and effects. Table One shows how the samples (treatments) were arranged. After arranging the treatments in the standard Yates' order-(), d, c, dc, s, ds, cs, dcsthey were assigned numbers from a table of random digits. The panelists were arranged alphabetically and assigned numbers from to. Panelists and 7 evaluated the same kind of samples (that is, they were assigned to the same block in a given replication) and so on for panelists and 8, and so forth to panelists 6 and, respectively. This notation was further expanded so that unprimed numbers corresponded with the upper treatment in a block, and the same primed number with the lower treatment in the same block. In addition, "X" and "Y" tags were affixed to the samples so that the order of evaluation would be reversed on the second day. Table Two lists the above assignments. The respondents were asked eleven questions. We are concerned with responses to the question, "How well did you like each sample from an overall point of view?" A hedonic scale was subsequently used in which the response "Excellent" was given the value unity, and so forth, with response "Poor" being given the value 5. These scores are also shown in Table Two, where the first two scores are those of panelist, the next two scores those of panelist 7, and so forth. Thus, for example, in the first row, "" and "" are the scores assigned by panelist to treatment () on Saturday and Sunday respectively. * Readers are invited to submit queries and comments to Dr. Norman L. Johnson, Department of Statistics, University of North Carolina, Chapel Hill, North Carolina. 67 This content downloaded from 6...55 on Thu, 8 Jun 7 9:6:9 UTC
68 QUERIES TABLE ONE (Yates' Notation) Reps Blocks and Block Sums Aliases Rep Sums and SS I ()XdX DC = DS = CS 98; 7 6 D = DCS = CS ; 5 6 III ()dc C = DCS = DS 8; IV6 8 7 6 6 8 7 6 S = DCS = DC 7; 9 Totals: 87; 9 Total Sum of Squares (SS) 9 - (87)/8-9 - 7.7 = 8.9 The data were analyzed as a mixed model with the three main factors being fix results are summarized in Table Three, and it is seen that the only significant M that for panelists. If now one pools the SS for "Blocks within Reps" down throu Panelists," and uses the resulting MS (=.6) with 5 degrees of freedom main the three-factor interaction DCS, and "Reps" are nearly significant. There must ways to treat the raw data, and it would be of interest to see if the above conclusions are altered. This content downloaded from 6...55 on Thu, 8 Jun 7 9:6:9 UTC
QUERIES 69 TABLE Two (Decoding Legend and Raw Scores) Panelists Sample No. Treatment Scores + 7 8 () R ' + 7' dcs E +8 d 5 P ' + 8' 5 cs +9 6 c 5 5 I ' + 9' ds + 7 s ' + ' dc 5 + 8 () R 5' + ' 5 cs E 6 + d 5 P 6' + ' dcs 7 + 6 c 5 II 7' + ' 7 s 5 8 + dc 8' + ' ds 9 +5 8 () R 9' + 5' ds E + 6 d P ' + 6' 7 s + 7 6 c III ' +-- 7' dcs + 8 dc ' + 8' 5 cs +9 8 () R ' + 9' dc E + d P ' + ' 6 c 5 5 + 7 s 5 5 IV 5' + ' dcs 6 + ds 6' +' 5 cs This content downloaded from 6...55 on Thu, 8 Jun 7 9:6:9 UTC
7 QUERIES TABLE THREE Summary Anova Table Source of Variability SS DF MS MS ratio EMS C S D DC DS CS DCS Replicates (Reps) Blocks within Reps RD RC RS RDC RDS RCS Among Panelists Within Panelists 6...7.56.5.56 6..65 6.6 6.9.77.9.5.6.5 55.75.5 6 6...7.56.5.56 6..88.5.7.8.95.5.6.5.7.9.7.75.9.5.7 6..5.87.78.99.79.55.9...55 C + r- + 6orD + (8or6) + + + 6 + (8 ) U + ap + 6 s + (8or6) U + op + 87CDc + (6a ) T + c<r + 8RDS + (67DS) + a + 8Cs + (6 ) + p + (cs) o + ar + orj + ar Cr + (7 + 7 7 + r + 6R D Cr + o-p + 6Ra + <p + 6cRS r + + 8Dc <r + r- + 8(5 a + op + 8RCS CT + (r TOTALS 8.9 7 Answered by: D. R. COX AND AGNES HERZBERG, Department o College, London. Because of the heavy confounding, it is possible that there is appr the inter-block comparisons, and a full weighted process of recovery be considered. More informally, however, inspection suggests that any of variation is relatively small and that it would be worth looking a means, ignoring confounding. These are compared below with the estim reconstructed from the contrasts allowing for confounding; these ar estimates regarding the experiment as an incomplete block design Crude treat. mean Adj. treat. mean d.9.5.5.8 c dc s ds cs dcs..5.88.6..9..5..7.8.7 Both sets of values suggest that cs is better, i.e. has a low treatments. (The high value of the three-factor interactio standard factorial contrasts are unlikely to be a fruitful way The difference between the mean of all treatments and t To be significant at the 5 per cent point, the standard error be less than about.7/. and the error variance therefore.; the multiplier. is obtained from the tables of the s value,., compared with that of.6 in the analysis reporte of the difference probably depends appreciably on the choice o the reality of the difference is in doubt; the other treatments themselves. This content downloaded from 6...55 on Thu, 8 Jun 7 9:6:9 UTC
QUERIES 7 Answered by: CUTHBERT DANIEL, Engineering Statistician, Rhinebeck, New York. The experimenter and his advisers assumed that a. closest comparisons are those made by one panelist on one day, between two soap pads formulated differently, b. main factorial effects will probably control; fi will be less important; fi least important, c. replication can be secured by repeating Saturday's observations on Sunday. Assumption a appears justified, but b and c are denied, and in unexpected ways. We have a new example showing the necessity of permitting the data to lead us to the "model." The effects on Sunday (second and fourth columns in the data table) are different from those on Saturday. Effects of D and of -C are large on Saturday; those of -S and DCS are largest on Sunday. This was found by studying each day's data separately. Table gives the total effects, the divisors, the average effects (from mean), and their estimated standard errors. The error term for judging differences within pairs was estimated separately from 6 between-pad within-panelist differences recorded for each day. The two values,.96.88 each with 5 df were pooled to give.9. It is clear that this error excludes any differe in general level among panelists. Panelists might well report similar differences between even though one panelist is more enthusiastic (gives higher levels to all) than another. Th are 6 pairs of panelists-each panelist's data for one day being a block-that will giv measure of these differences in panelist level for each of the two days separately. The po SS (among panelists) is then ( + + 5 + * * * )/ = / =.75. The correspond MS (among panelists) =.75/6 =.9, for Saturday. The value for Sunday is 5/ 6 =.8. This suggests, but hardly proves, that the panelists differed in their general le on Saturday, but became stabilized on Sunday. The first day's work led the panelists to prefer a higher amount of detergent (+D) less coarse pads (-C). The least preferred formulation is of course that with low deterge on coarser pads. All other formulations are undistinguishable from the general average. The second day's use gives apparently different opinions although we are rather vulnerab in that only eight panelists give evidence about DCS. The combined effects of S (-. and DCS (+.5) single out two test-conditions, namely c and d, as best and two othe cs and ds as worst. This means that on second day's use, less soluble detergent is prefer if either: lower amount of detergent is present on coarse pads, or, higher amount of deterg is present on less coarse pads. This sudden change in second-day opinion must have heavy implications for future pla Surely at least a week's trial of each pair is indicated. The choice of this partially confounded plan is not clearly justified. No reason is given TABLE Effects and Their Standard Errors, by Days Totals Av. Effects Est. std. errors Symbol Sat. Sun. Divisors Sat. Sun. s =.9 D 8.*.5. C - 8 -.9*.8. DC 7 -...6 S -7-8 -.5 -.7*. DS - -..6 CS -9 - -.8 -..6 DCS 6 8 6.8.5*. Day X D 8 96.9*.9 Day X C 8 96.9*.9 This content downloaded from 6...55 on Thu, 8 Jun 7 9:6:9 UTC
7 QUERIES apparent for the grouping into Replicates or into the double blocks of observations. If the change of opinion with time was foreseen, then the analysis submitted should reflect this. A more comprehensive plan could have been arranged by giving each of 8 panelists a different pair of pads, exhausting all pairs from the eight formulations, on each of the two days. This would have given all effects with equal precision and would possibly also given a better estimate of error since each panelist would not be biased on Sunday by his own Saturday score. Answered by: D. J. FINNEY, University of Aberdeen, Aberdeen, Scotland. Write T for the factor "difference between days," using t for Sunday, its absence for Saturday. Write U for the factor "difference for order of evaluation within a day," using u for second test on a day, its absence for first test. Then the set of four records for any one subject in the order tabulated, for example for subject 5, 5' appears to correspond to the conditions tu u t, since the order of evaluation was said Denote these four positions in the t Begin by forming four sets of valu I. (a + + +? ) These are subject totals. Their analysis gives information on that part of the tr comparisons that does not confound treatments with subjects. In fact main effects are confounded in of the experiment, two factor interactions of these in of the ex and the interaction DCS in of the experiment. II. (-a - - + ++ 5) These give information on T and its interactions with D, C, S. An exactly paralle will give the contrasts and squares corresponding to augmentation of each effect sym III. (-a +# + y- 5) These give information on U and its interactions with D, C, S, but compli arise in the analysis of variance unless the ordering of pairs of treatments on has followed a consistent pattern. It seems extraordinary that the pairs of subje to the same two d, c, s treatment combinations were not given the treatments in opposite order on the Saturday, with reversal on the Sunday. For example sub have had d followed by cs on Saturday, cs followed by d on Sunday, and subjec then have reversed these two orders instead of following the same. This section of will also contain intra-subject information on the interactions of D, C, S with IV. (a - f + 7-6) These give information on TU and its interactions with D, C, S by an analysis that that for III, and similarly the intra-subject information on D, C, S and their int It is to be expected that the variance for section I, being inter-subject, will di This content downloaded from 6...55 on Thu, 8 Jun 7 9:6:9 UTC
QUERIES 7 (and exceed) variances for II, III, IV. Indeed, the Sections may have different variances, and this possibility must at least be kept in mind during the analysis. Section I Replicate Subjects Total (), dcs d, cs 7 c, ds 9 7 6 s, dc 6 5 98 (), cs d, dcs C, s 9 9 7 6 6 dc, ds 9 5 (), ds d, s, dcs dc, cs 5 9 8 8 8 (), dc d, c s, dcs 8 6 9 6 8 7 ds, cs 6 56 5 7 D C DC S DS CS DCS - - - 8 -- - - -5-9 - 9 - - -- - - - - Total - 9 - - 7 - Divisors 6 6 6 96 Source d.f. Sum of squares Mean squares Replicates D C DC S DS CS DCS 7.6.8.5...5..9.9 Error I 7.5.56 Total I 65.68 This content downloaded from 6...55 on Thu, 8 Jun 7 9:6:9 UTC
7 QUERIES The description of the exp in some respects is positive almost certainly were not a these 8 combinations into pairs in different ways, and then resolved to have subjects associated with each of the 6 pairings. If he grouped his subjects into 8 blocks of on age, skill, or any other character and then associated two of these blocks with each set of pairings, the above analysis with 7 d.f. for replicates is correct. If he chose subjects at random for each of the 6 pairings, "Replicates" would disappear from the analysis and error would have d.f. Other possibilities exist. The experiment gives no suggestion of significant differences between replicates, and the actual randomization matters little here. The principle is important, and in other circumstances exact knowledge of how the randomization was conducted would be vital. Section II Replicate Subjects Total (), d, C, des cs ds - - - 8, dc (), d, cs dcs - - c, s dc, ds - (), d, C, dc, ds s dcs cs - - - - (), d, dc c 8, des ds, cs T DT CT DCT ST DST CST DCST 6 8 - - - _ - - - - - Total Divisors 7 8 6 - - - 6 6-5 -7 96 This content downloaded from 6...55 on Thu, 8 Jun 7 9:6:9 UTC
QUERIES 75 Source d.f. Sum of squares Mean squares T. DT. CT.8 DCT.89 ST. DST. CST. DCST.5 Error II 8.5.55 Total II.75 Fractional Replication Nothing has yet been said about fract this complication would not enter. The noted earlier. Closer inspection shows IV are based on half-replicates from does not conform to this pattern, pre any one of the pairs of treatments in the half-replicate structure would hav have been avoided if the order within th of subjects, so giving one complete rep The half-replicate structure was pre wisely. The flaw in Replicate I was pre of non-orthogonalities that it entails wou design. The indications are that factor U here, therefore, the table of crude sco " + 9" and "' + 9'" were completely interchanged. Replicate I is then based on a half-replicate from CTU. From this discussion, it follows that the main effect C in the Section I analysis can also be interpreted as a CSTU interaction since it is calculated solely from Replicate, and that the interaction DS can be interpreted as partly DTU (Replicate ), partly DCSTU (Replicate ). These aliases might need further examination if other parts of the analysis disclosed any influence of factor U. The "dishonesty" over subjects, 9 is unlikely to affect the interpretation at all seriously. Indeed the non-orthogonality is so slight that the easiest way of dealing with the situation might be to adjust by covariance on a dummy variate. This content downloaded from 6...55 on Thu, 8 Jun 7 9:6:9 UTC
76 QUERIES Section III Replicate Subjects Total dcs, () cs, d - - c, ds dc, s - - - cs, () dcs, d - - - - 8, c ds, dc ds, () s, d dcs, c cs, dc - - dc, () c, d - - dcs, s - - cs, ds - - -6-7 DT CT DCT ST DST CST DCST - - -9 7 - - - - - 7-9 - -5 - Total 8-8 8 6-8 Divisors 96 96 6 96 6 6 Source d.f. Sum of squares Mean squares DT.8 CT DCT ST DST CST DCST.8..8.5.. Error III 5 9..7 Total III 8.75 This content downloaded from 6...55 on Thu, 8 Jun 7 9:6:9 UTC
QUERIES 77 Section IV Replicate Subjects Total dcs, () cs, d - 6-5 -5 c, ds dc, s + - - - - - cs, () dcs, d - - - 8, c - - ds, dc ds, () s, d dcs, c cs, dc - - - - 5 dc, () c, d - dcs, s cs, ds 5 9 D C DC S DS CS DCS -8 - - - - - - -9 - - -9-7 - 7 - - Total - -6 - Divisors 96 96 6 96 6 6 Source d.f. Sum of squares Mean squares D 6. C. DC.56 S.7 DS.5 CS.56 DCS 6. Error IV 5.5.8 Total IV 5.75 This content downloaded from 6...55 on Thu, 8 Jun 7 9:6:9 UTC
78 QUERIES Conclusions on D, C, S, T The four sections of the analysis of variance computed separately can be put into a single table. In fact, the totals with,,, d.f. add to 8.9, the sum of squares for all the data, with 7 d.f. However, the indications are that the error mean squares differ to an important extent. It was to be expected that the inter-subject error (Section I) would exceed the intra-subject error, but it is surprising that the error for Section IV exceeds the other two. No explanation is apparent, but the fact must be accepted and mean squares must be tested only against the appropriate error mean squares calculated from comparisons of contrasts of uniform type. So far, all effects have been named by aliases not involving U. Before discussing U, tentative conclusions can be stated on the assumption that U and its interactions were of no importance. They may need modifying if closer examination shows this assumption to be false. Section I shows no sign of significant effects of treatments; only large effects could be clearly demonstrated by inter-subject comparisons. Section II apparently shows a significant DCT interaction, scarcely very meaningful in itself and very possibly a chance result since in a set of 8 tests of significance one mean square might quite easily happen to exceed the.5 probability level. Section III shows highly significant DT and CT interactions: the effect of factor D was to increase the Saturday score and that of C was to decrease the Saturday score, but on Sunday neither had much effect! Section IV shows a significant overall effect of D and a DCS interaction that defies explanation except as a random effect. Conclusions on U The analyses could have been organized so as to introduce U into the naming of every contrast. The alias rules must be remembered: CTU = for Replicates, STU = for Replicates, Hence U has CT for an alias in Replicates, and ST for an alias in Replicates,. A these arose in Section III, giving as the total contrast for U + + - 9 = -7. The square (-7)/8 clearly makes no important contribution to th for Section III. By use of the aliases, the contrasts already listed for Section III may be renamed as Replicate DCU U CSU DSU DCSU DU CU SU - - - -9-7 - - - - - - - 7-9 - - - - -5 Total 7-7 -5 - -5 Divisors 6 8 6 6 96 An analysis of variance constructed from th would simply show an inexplicable DCU inte U is important, it seems pointless to pursue estimations of the two sets of contrasts asso orthogonal procedures. Section IV may be sim with TU: these are equally unexciting. Anal unrewarding. Thus the conclusion seems almost inescap anomaly. If U were totally without effect, t might reasonably have equal expectations, an This content downloaded from 6...55 on Thu, 8 Jun 7 9:6:9 UTC
QUERIES 79 Some of these anomalies-interactions following no sensible pattern, many mean squares less than the corresponding error mean squares, error heterogeneities-are the kind of thing that would arise if the data were artificial. One wonders whether the querist has given actual records or has modified or manufactured them for the sake of a simple example. Whatever the facts, no good reason appears for worrying about the "faking" of results for subjects, 9 as an aid to exposition. Expected mean squares In the statement of the query, these are totally wrong. If it be recognized that four error variances, o-i, oil, ao'j, arv easily written down along or Design This very long account of t confounding and alias-struct cate" of 8 subjects should no were good reason for wantin basing of everything on half DCSTU = might have been better than the use of two different half-replicate sy insufficient attention was given to the choice of confounding. This content downloaded from 6...55 on Thu, 8 Jun 7 9:6:9 UTC