Validating a Special Olympics Voileyball Skills Assessment Test

Similar documents
JEPonline Journal of Exercise Physiologyonline

1. The service line may be moved closer to the net, but no closer than 4.5 meters (14 feet, 9 inches).

VOLLEYBALL SPORT RULES. Volleyball Sport Rules. VERSION: June 2018 Special Olympics, Inc., 2018 All rights reserved

2018 Fall Sports Information Volleyball Rules and Guidelines

VOLLEYBALL INDIVIDUAL SKILLS

Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement

Attack-Tempo and Attack-Type as predictors of attack point made by opposite players in Volleyball

A Case Study of Leadership in Women s Intercollegiate Softball. By: DIANE L. GILL and JEAN L. PERRY

Legendre et al Appendices and Supplements, p. 1

University of Victoria Faculty of Education School of Physical Education May 2003 PE 117 TENNIS (A01)

ASD # 73 Sports Council Elementary Volleyball Rules

THE PLAYING PATTERN OF WORLD S TOP SINGLE BADMINTON PLAYERS

STUDY OF PSYCHOMOTOR VARIABLES OF BASKETBALL PLAYERS AT DIFFERENT LEVELS OF COMPETITIONS

BODY FORM INFLUENCES ON THE DRAG EXPERIENCED BY JUNIOR SWIMMERS. Australia, Perth, Australia

Analysis of performance at the 2007 Cricket World Cup

Volume 37, Issue 3. Elite marathon runners: do East Africans utilize different strategies than the rest of the world?

Coefficients of Restitution of Balls Used in Team Sports for the Visually Impaired

APPROACH RUN VELOCITIES OF FEMALE POLE VAULTERS

ScienceDirect. Quantitative and qualitative benchmarks in volleyball game at girls "cadets" level (15-16 years old) Doina Croitoru * ICSPEK 2013

LESSON PLAN (Serving) Instructor Name Rankin Class Vball School OSU. Date Unit Vball Lesson # of

2016 CAN Volleyball Rules and Clarifications

Initial Mortality of Black Bass in B.A.S.S. Fishing Tournaments

ACHPERConference 2010 NET/WALL GAMES

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

Match Duration and Number of Rallies in Men s and Women s FIVB World Tour Beach Volleyball

Is lung capacity affected by smoking, sport, height or gender. Table of contents

CARA Volleyball Rules 2017

Efficacy of Static and Dynamic Distance Perception on Kumite Performance in Karate

VOLLEYBALL 2014 GENERAL RULES

Physical Fitness For Futsal Referee Of Football Association In Thailand

Equation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg

Analysis of energy systems in Greco-Roman and freestyle wrestlers participated in 2015 and 2016 world championships

BISA VOLLEYBALL. A scene of volleyball play in an Erwadi village.

Tournament Selection Efficiency: An Analysis of the PGA TOUR s. FedExCup 1

POWER Quantifying Correction Curve Uncertainty Through Empirical Methods

Volleyball Study Guide

Analysis of the Interrelationship Among Traffic Flow Conditions, Driving Behavior, and Degree of Driver s Satisfaction on Rural Motorways

An Analysis of the Components of Sport Imagery in Basketball Players

Should bonus points be included in the Six Nations Championship?

Instructor Workshop. Volleyball for Children (6-12 yrs) Participant Hand-outs

Event Manual JUNIOR SECONDARY SCHOOLS VOLLEYBALL CHAMPIONSHIPS NORTH ISLAND SOUTH ISLAND. Dated: September 2014

EXPLORING MOTIVATION AND TOURIST TYPOLOGY: THE CASE OF KOREAN GOLF TOURISTS TRAVELLING IN THE ASIA PACIFIC. Jae Hak Kim

Puyallup Tribe of Indians Shellfish Department

Predictors for Winning in Men s Professional Tennis

EFFECT OF SIX WEEK SEPECIFIC TRAINING ON FREE THROW SHOT PERFORMANCE OF FEMALE BASKETBALL PLAYERS

Using Hexoskin Wearable Technology to Obtain Body Metrics During Trail Hiking

ScienceDirect. Rebounding strategies in basketball

Numerical and Experimental Investigation of the Possibility of Forming the Wake Flow of Large Ships by Using the Vortex Generators

PREDICTING the outcomes of sporting events

A Chiller Control Algorithm for Multiple Variablespeed Centrifugal Compressors

UTGSU Volleyball League Rules Version: September 2017

DO HEIGHT AND WEIGHT PLAY AN IMPORTANT ROLE IN BLOCK AND ATTACK EFFICIENCY IN HIGH-LEVEL MEN S VOLLEYBALL?

RUNNING LOADS IN DECATHLON TRAINING

QUAD CITIES PICKLEBALL CLUB TOURNAMENT

Oregon School Activities Association SW Parkway Avenue, Suite 1 Wilsonville, OR fax:

Leg Power in Elite Male Fencers: A Comparative Study among the Three Competitive Disciplines

PradiptaArdiPrastowo Sport Science. SebelasMaret University. Indonesia

DIFFERENCES BETWEEN THE WINNING AND DEFEATED FEMALE HANDBALL TEAMS IN RELATION TO THE TYPE AND DURATION OF ATTACKS

Team Advancement. 7.1 Overview Pre-Qualifying Teams Teams Competing at Regional Events...3

RELATIONSHIP OF SELECTED KINEMATIC VARIABLES WITH THE PERFORMANCE OF DOUBLE HANDEDBACKHAND IN TENNIS. Rajesh Kumar, M.P.Ed,

Evaluation of Three New Fit Test Protocols for Use with the TSI PortaCount

Basic Fundamental Skills of Volleyball and 8-Week Training Program Common Errors Causes Corrections

SD # 73 Sports Council Elementary Volleyball Rules

A Comparative Study of Running Agility, Jumping Ability and Throwing Ability among Cricket Players

2017 Tabulation Rules and Guidelines

SPORTS AUTHORITY OF INDIA NETAJI SUBHAS NATIONAL INSTITUTES OF SPORTS:PATIALA DIPLOMA COURSE IN SPORTS COACHING REVISED SYLLABUS ( )

The probability of winning a high school football game.

FIVB Technical Seminar Serve / Reception / Defence / Libero play. Duration: 5 days, 5 hours per day. TOTAL: 26 hours

The 1998 Arctic Winter Games A Study of the Benefits of Participation

Evaluating the Design Safety of Highway Structural Supports

2017 RELIABILITY STUDY STYLE INSIGHTS

VOLLEYBALL STUDY GUIDE

Sourced from:

Training with Hockey IntelliGym

Investigation of Bio-Kinematic Elements of Three Point Shoot in Basketball

Exploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion

Open Research Online The Open University s repository of research publications and other research outputs

Sportsmanship Rating Scale in Tennis Competition with Young Players

A Comparison of American Red Cross- and YMCA-Preferred Approach Methods Used to Rescue Near-Drowning Victims

TEMPORAL ANALYSIS OF THE JAVELIN THROW

An Analysis of the Travel Conditions on the U. S. 52 Bypass. Bypass in Lafayette, Indiana.

Real-Time Electricity Pricing

The Optimal Downhill Slope for Acute Overspeed Running

Analyses of the Scoring of Writing Essays For the Pennsylvania System of Student Assessment

OR DUNGENESS CRAB FISHERY:

Determining bicycle infrastructure preferences A case study of Dublin

Analysis of Shear Lag in Steel Angle Connectors

Chapter 20. Planning Accelerated Life Tests. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

Compression Study: City, State. City Convention & Visitors Bureau. Prepared for

Can trawling effort be identified from satellite-based VMS data?

Student Population Projections By Residence. School Year 2016/2017 Report Projections 2017/ /27. Prepared by:

A SPEED-FLOW MODEL FOR AUSTRIAN -MOTORWAYS

Lab 11: Introduction to Linear Regression

Chapter 4 Traffic Analysis

Research Note. Testing a Self-Classification Measure of Recreation Specialization Among Anglers

GENETICS OF RACING PERFORMANCE IN THE AMERICAN QUARTER HORSE: II. ADJUSTMENT FACTORS AND CONTEMPORARY GROUPS 1'2

Calculation of Trail Usage from Counter Data

Volleyball Unit. 3. Explain and demonstrate the 6. Allow students to work on bump pass and recall 6 safety bump passing with a partner if

Evaluating the Influence of R3 Treatments on Fishing License Sales in Pennsylvania

Evaluation of pedestrians speed with investigation of un-marked crossing

Transcription:

- - -- RESEARCH - ADAPTED PHYSICAL ACTIVITY QUARTERLY, 1996.13, 166-179 Q 1996 Human Kinetics Publishers, Inc. Validating a Special Olympics Voileyball Skills Assessment Test Steve B. Downs and Terry M. Wood Oregon State University This study examined the validity and reliability of a Volleyball Skills Assessment Test (VSAT) as a measure of volleyball skill and as a predictor of team success in Special Olympics International (SOI) volleyball competition. Test-retest reliability data from 130 SO1 volleyball players with mental retardation (101 males and 29 females) in the sixth week of an SOI volleyball training program yielded intraclass reliability coefficients (R) above.80 for all VSAT subtests (forearm pass, spike, set, serve) across gender with the exception of the set test for females (R =.75). Multivariate test battery test-retest reliability, examined using canonical correlation analysis, yielded moderate total redundancy estimates ranging between 62.5 and 66.1%. A high degree of concurrent validity was evidenced when correlating VSAT scores with judges' ratings of performance on the four skills: r =.93 (? = 36) serve, r =.94 (9 =.88) pass, r =.98 (? =.96) spike, and r =.86 (? =.74) set. Contingency table analysis, multiple regression, and discriminant function analysis revealed that the predictive validity of the VSAT as the primary determinant for allocating teams to pools of equal ability is questionable. Special Olympics International (SOI) provides year-round sports training and competition in 22 different individual and team sports, including volleyball, for over 1 million children and adults with mental retardation (SOI, 1994). Philosophically, SO1 emphasizes challenge and participation rather than winning. That is, "the spirit that brings participants to the starting line is more important... than the skill that cames Special Olympic athletes across the finish line" (Dunn & Fait, 1989, p. 526). In line with such a philosophy, SO1 uses sport skill tests to monitor athletes' progress in sport skill acquisition and to aid in seeding teams and individuals in order to develop fair and competitive ability groupings. To assist seeding volleyball teams at Special Olympic local, area, or chapter competitions, several criteria are utilized including information regarding athletes' volleyball skill levels. All volleyball athletes must complete a Special Olympics The authors are with the Department of Exercise and Sport Science, WB 203D, Oregon State University, Cowallis, OR 97331. Direct correspondence to Terry M. Wood.

Validating the VSAT 167 Volleyball Skills Assessment Test (VSAT) in order to be eligible to compete on a team. VSAT scores from each team are used by tournament officials along with results of pretournament play to seed teams and to group teams into pools of relative equal ability. The use of test scores to predict group membership is not uncommon. In personnel psychology there is a long-standing tradition of predicting job success based on a combination of written and performance aptitude tests, while achievement in such sports as wrestling (Nagle, Morgan, Hellickson, Serfass, & Alexander, 1975) and rowing (Morgan & Johnson, 1978) has been predicted with some success based on a combination of physiological and psychological measures. Volleyball is just one of 22 sports currently offered by SOI, yet few sports have a documented valid and reliable sports skills assessment test (A. Lynch, Director of Coaches Education, SOI, personal communication, March 8, 1994). Yet to be examined is the validity of using VSAT scores for seeding and grouping Special Olympic volleyball teams for tournament play. Moreover, there is no evidence to support the validity and reliability of the VSAT as a measure of volleyball skill for Special Olympic athletes. The purpose of this study is to examine the validity and reliability of the VSAT as a measure of volleyball skill and as a predictor of team success in Special Olympic volleyball competition. The Special Olympics Volleyball Skills Assessment Test The VSAT' is an adaptation of a volleyball skills test battery originally developed for use in college physical education classes at North Carolina State University (NCSU) (Bartlett, Smith, Davis, & Peel, 1991). The NCSU Volleyball Skills Test Battery consists of three tests: serve, forearm pass, and set. Intraclass test-retest reliability coefficients for the NCSU test based on a sample of 313 college-aged males and females enrolled in beginning coed volleyball classes were.65,.73, and.88 for serve, forearm pass, and set tests, respectively (Bartlett et al., 1991). Only one test was administered during a given class period, and testing was done by the physical education teacher. A 2-day interval was allowed between the test and the retest. Evidence for validity of the tests was claimed based on face validity "because the ability to serve a ball, receive a ball with the forearm pass (coming across the net), and set ball (coming from different angles) are basic volleyball skills" (Bartlett et al., 1991, p. 20). To date, evidence for the concurrent and predictive validity of the NCSU test has not been presented. Modified from the NCSU Volleyball Skills Test Battery, the Special Olympics VSAT examines four skills-serving, forearm passing, setting skills, and spiking. The VSAT serve test is unmodified from the NCSU version. Both serve tests require that the volleyball court be marked into five target areas in a W- formation; Bartlett and colleagues (1991) explain this is "the most common pattern of serve reception of beginning volleyball players" (p. 20). Points are given for successful serves that land in the opponent's court without hitting the net or antenna. Target areas are worth 2 to 4 points, and each athlete serves 10 volleyballs for a maximum of 40 possible points. The serving score is the best of two trials for Special Olympians. The VSAT forearm pass was modified from the NCSU test. The athlete is required, while standing in the right and left back positions, to underhand pass

168 Downs and Wood balls that are tossed from the opponent's side of the court. Athletes attempt to pass five balls from each position toward a target area located near the net (within the 3 m line). The NCSU test requires that the ball pass over a rope hung parallel to and 2.43 m above the attack line before landing in the target area; the VSAT requires the ball to travel above the height of the net. The VSAT target area is much larger than the NCSU version. Each of 10 successful pass attempts is scored from 1 to 5 points according to where the ball lands in the target area for a maximum total of 50 points. For Special Olympians the better of two trials is recorded. The VSAT did not adopt the NCSU setting test, which requires individuals to receive and overhand set 10 volleyballs from a tosser and place the ball into a predetermined area for a possible total of 50 points. Rather, the VSAT requires the athlete to alternately bump (forearm pass) and set the ball continuously to him- or herself. The athlete tosses the ball upward into the air and then continuously forearm passes the ball upward into the air. The athlete must stay within the half-court lines and is provided four trials to gain his or her best score. A maximum of 50 points (25 bumps and 25 sets) is possible. Unlike the NCSU test, the VSAT tests spiking (attacking) ability. With the athlete starting on the court near the 3 m line, a coach tosses a volleyball in front of the athlete at a height of 2 m above the top of the net. Taking a spiking approach, the athlete attempts to spike the ball within the boundaries of the opponent's court. Each athlete is given 10 attempts, and a score of either 1 or 2 points (depending on where the ball lands in the court) is awarded for successful hits. A maximum of 20 points is possible, and the better of two trials is recorded. The VSAT score for each athlete is the accumulation of the best trial scores from each of the four subskills for a maximum total of 160 points (set, 50 points; forearm pass, 50 points; serve, 40 points; spike, 20 points). To assist seeding in competition, a team score is determined by adding the eight best individual VSAT scores from the team and dividing that total by eight (SOI, 1992). Subjects Reliability Analysis One hundred and thirty individuals with mental retardation (101 males mean age 30.0 + 8.2 years and 29 females mean age 26.5 _+ 8.5 years) who were in their sixth week (f1 week) of a Special Olympic volleyball training program participated in this study. Subjects were members of 16 Special Olympic volleyball teams training for a statewide volleyball competition. In compliance with the Oregon State University Institutional Review Board for the Protection of Human Subjects, written informed consent was acquired from all subjects and/ or legal guardians through the California Special Olympics. Procedures For each team, the testing schedule involved two 1-day sessions (test and retest) separated by 4 days. All tests were administered in an indoor gymnasium by the principal investigator. During data collection, subjects were introduced to the purpose of the testing session (most had been tested in previous years). After familiarization

Validating the VSAT 169 (i-e., orientation to the gymnasium, court boundaries, nets, and balls), subjects performed a group 10-rnin warm-up (light jogging and stretching) followed by two volleyball drills (serving and passing) to prepare for the testing session. The warmup drills were different from the VSAT serve and pass subtests. Following wannup and drilling sessions, each athlete was administered the VSAT.2 Subjects were instructed on the guidelines and proper techniques of serving and were provided three practice trials. The coach (or coaches) from the team being tested assisted with instructions and helped answer questions. With the assistance of a recorder, the principal investigator evaluated the performance of each athlete. Following serving, subjects were instructed in the forearm pass and were provided a short practice and questionlanswer period before being tested. These procedures were repeated prior to the administration of each subtest. Subjects were retested 4 days later at the same location with identical procedures and personnel. Statistical Analysis SPSS/PC+ Version 5.0 Professional Statistics (Norusis, 1992a) and Advanced Statistics (Norusis, 1992b) packages for the IBM-compatible personal computer were used to analyze all data. Descriptive statistics were calculated for VSAT scores for serving, passing, setting, and spiking. Intraclass correlation coefficients (R) were calculated to estimate reliability of each VSAT subtest separately. In addition, the test battery test-retest reliability was estimated using canonical correlation, redundancy estimates, and structure coefficients following procedures recommended by Wood and Safrit (1984, 1987) and Safrit and Wood (1987). Results Descriptive Statistics by Test Occasion. Mean scores and standard deviations for males, females, and the total group are reported in Table 1 for each VSAT subtest. Mean scores for males were higher than those for females for all subtests. Variability of scores within each subtest tended to be high; however, with the exception of the spike test, between-day mean differences were negligible. Table 1 VSAT Means and Standard Deviations by Test Occasion and Gender Males Females Total group Test Retest Test Retest Test Retest Serve 19.81 (SD) f8.44 Pass 13.00 (SD) S.67 Spike 6.28 (SD) f5.04 Set 5.31 (SD) M.57

1 70 Downs and Wood Table 2 Intraclass Test-Retest Reliabiiiy of VSAT Subtests Serve Pass Set Spike Total group (n = 130) 2 days.88.87.85.83 1 day.79.78.74.70 Males (n = 101) 2 days.85.86.86.82 1 day.74.76.76.70 Females (n = 29) 2 days.90.90.75.83 1 day.82.82.60.70 Intraclass Test-Retest Reliability. Intraclass correlation was explored to estimate the stability of scores over 1 and 2 days for males, females, and the total group; these data are presented in Table 2. Stability of test scores over 2 days with 4 days between test and retest was above.82 with the exception of the set test for females (R =.75). Estimated reliability for a single administration of each subtest ranged from.70 to.82 with the exception of the set test for females (R =.60). These data indicate that the mean of VSAT scores over 2 days is preferable to scores from a single administration of the test. As comparison values, Bartlett et al. (1991) reported the following intraclass test-retest reliability coefficients (2 days between testing) for the NCSU test using a sample of 313 college-aged males and females enrolled in coed volleyball classes:.65 for the serve test,.73 for the forearm pass test, and.88 for the set test. Test Battery Reliability. Another approach to estimating reliability of a battery of tests employs canonical correlation analysis (CCA) to estimate total shared variability between two administrations of the battery (Wood & Safrit, 1984). Although this method is computationally complex, it provides (a) a single index (the total redundancy index) indicating the magnitude of shared variability between test administrations and (b) a means for determining which subtests may be contributing to unreliability (analysis of structure coefficients). Table 3 presents the CCA of the VSAT test-retest data for the total group. (Note that CCA for each gender was not computed due to the small sample size relative to the number of variables in the CCA.) Examination of Table 4 reveals four statistically significant (p c.01) canonical correlations and total shared variability (i.e., redundancy) between the two administrations of the test battery ranging between 62.5% and 66.6% (p <.01). Consistency of the magnitude and algebraic sign of the structure coefficients for each subtest across test administrations within each canonical correlation presents further evidence for stability of these tests over time (Wood & Safrit, 1984). In general, the CCA analysis confirms the univariate intraclass reliability analysis of the VSAT as moderately high. Validity Analysis Validity refers to the appropriateness of test score interpretation. The types of evidence provided for test validation depend in large part on the purposes for

Validating the VSAT Table 3 Canonical Correlations, Redundancy Estimates, and Structure Coefficients 1st Factor 2nd Factor 3rd Factor 4th Factor Subtest Admin 1 Admin 2 Admin 1 Admin 2 Admin 1 Admin 2 Admin 1 Admin 2 Serve Passing Spike Set Canonical correlation Redundancy R (1-2)" R (2-l)b Total redundancy = 62.52. btotal redundancy = 66.65. For both, p <.O1 using Miller's (1 975) F criteria. *p <.O1 using Bartlett's sequential ~2 procedure (Marascuilo & Levin, 1983). Table 4 G-Study Results for Interrater Reliability Source of variability Serve Pass Set Spike Subjects (S) 76.94 87.36 80.63 87.44 Judges (J) 05.17 05.76 08.15 03.73 SxJ 17.89 06.88 11.22 08.83 Note. Values represent percentage of total variability. using test scores. To date, validation evidence for the NCSU test and the VSAT consists of logical validity or the degree to which the elements measured by the test match the stated purpose of the test. The VSAT, however, is being employed in Special Olympics volleyball tournaments for seeding teams and grouping teams into pools of relative equal ability. Therefore, the purpose of this study was to provide (a) criterion-related evidence for the validity of the VSAT as a volleyball skills test for SO1 athletes and (b) evidence of the utility of the VSAT as a predictor of team success in SO1 tournament play. Criterion-Related Evidence for Validity Subjects. Using a table of random numbers, we randomly selected 30 Special Olympic volleyball players (15 males with mean age 29.9 years, range

4 72 Downs and Wood 19-54 years, and 15 females with mean age 27.5 years, range 14-42 years) from the 130 athletes involved in the reliability analysis. Sample size was based on obtaining a balance between time constraints and achieving stability of results. Procedures. An independent testing session was conducted with the 30 subjects using the same VSAT protocol described in the reliability section. The criterion measure consisted of judges' ratings of skill. Two volleyball coaches with an average of 6 years Special Olympic volleyball coaching experience, and two high school varsity volleyball coaches with no experience with Special Olympians but with an average of 8 years volleyball coaching experience, were asked to observe the testing session and to independently evaluate each subject on each of the VSAT subtests. Prior to the testing session, the principal investigator met collectively with the coaches to describe their responsibilities and to instruct them in using the assessment rating scale (see Figure 1) to evaluate the athletes. The coaches were not informed of the athletes' previous VSAT scores or the scores received during the validity study. Coaches were asked to Serving Purpose To serve the ball from the service area into the opponent's court. Each athlete will have 10 attempts. Women will serve on women's height net. Rating Effectiveness How effective is the athlete's serve? Would opponents have a difficult time returning the serve? Trajectory? Speed? Spin? 1 Poor 2 Fair 3 Moderate 4 Good 5 Excellent Placement Does the athlete demonstrate command of placing the ball? Does the placement make it difficult for the opponents? 1 Poor 2 Fair 3 Moderate 4 Good 5 Excellent Form How well does the athlete execute the serving motion? Step toward the target? Shift weight? Arm extension? 1 Poor 2 Fair 3 Moderate 4 Good 5 Excellent Overall How would you rate the athlete's overall serving ability? 1 Poor 2 Fair 3 Moderate 4 Good 5 Excellent Figure 1 - Sample judge's rating scale for the serve.

Validating the VSAT 1 73 evaluate each athlete in four areas within each VSAT subtest. For example, while an athlete was performing the serve test, each coach was asked to evaluate the athlete on serving form, placement of the serve, effectiveness of the serve, and overall accomplishment in serving. Each area for a given skill was evaluated using a scale of 1 to 5 with a score of 5 representing the highest possible level of skill proficiency. Coaches were not permitted to compare or modify scores based upon the evaluations of the other raters. Data Analysis. Interrater reliability of judges' ratings was assessed using generalizability theory, with judges as the single facet and subjects as the facet of differentiation. Criterion-related evidence for validity was assessed for each subtest separately via Pearson product-moment correlations between judges9 ratings and VSAT scores. To simplify all analyses, judges' ratings were reduced to composite scores for each of the four skills. For example, form, placement, and effectiveness ratings for the serve were averaged to obtain a single serve rating for each subject from each judge. The composite scores were employed throughout the analyses. Interrater Reliability. Generalizability analysis partitions the total variability of subjects' scores collected over various conditions of measurement (e.g., judges) to estimate sources of measurement error. Variability between subjects (S) is expected and is known as "true-score" or "universe-score" variability. Variability due to conditions of measurement such as the ratings by different judges (J) is another source of variation in scores. In the present study, variation in scores attributable to differences in judges' ratings is considered to be undesirable, reflecting unreliability of scores across judges. A third source of variability (the interaction of subjects and judges' ratings) quantifies the portion of the total variability in scores that is not accounted for by universe-score variance and between-judges variance. This third source of variability, labeled S x J in this study, is attributed to random error andlor sources of error not accounted for in the design of the generalizability analysis. Generalizability analyses typically consist of two stages. In the first stage, known as a generalizability study or G-study, the various components or "facets" of measurement error to be examined are identified, data are collected, and ANOVA procedures are used to estimate the proportion of the total variance in scores attributable to universe-score variance, variance due to the facets, and random measurement error. Facets that have a large portion of the total variability in scores indicate significant sources of measurement error. In the second stage, known as a decision study or D-study, the variance estimates computed in the G-study are used to compute one or more generalizability coefficients that provide an index of the reliability or dependability of scores across the facets in the design. G-coefficients can be interpreted much like reliability coefficients. G- study (see Table 4) and D-study (see Table 5) results were computed using the GENOVA v2.2 computer program (Crick & Brennan, 1984). Examination of Table 4 indicates that the percentage of variability in mean ratings due to judges was quite low, ranging from 8% for the set test to 4% for the spike test. Unexplained variability (S x J) was also quite low, ranging from 18% for the serve test to 7% for the pass test. Generalizability (i.e., dependability or reliability) of ratings across judges was evidenced by substantial generalizability coefficients ranging from.81 to.98, indicating a high degree of interrater consistency. The high degree of interrater consistency provided a rationale for computing mean

1 74 Downs and Wood Table 5 D-Study Generalizability Coefficients for Interrater Reliability Number of judges 1 2 3 4 Serve Pass Set Spike scores across judges for each subtest. The mean scores across judges for each subtest were used as the criterion test in the validity analysis. Criterion-Related Validity. Validity coefficients indicating the degree of relationship between VSAT subtest scores and judges' mean subtest ratings were.93 (9 = 26) for the serve,.94 (9 =.88) for the pass,.98 (6 =.96) for the spike, and.86 (9 =.74) for the set. Coefficients of determination (9) indicate a high degree of shared variability between the judges' ratings and the VSAT subtests. Predictive Validity To determine fair and competitive ability groupings for SO1 volleyball competition, SO1 tournament directors employ VSAT scores in seeding teams. Specifically, participating athletes are tested on the VSAT prior to tournament play. Teams are then clustered into divisions based on the average VSAT total score for the top 8 players on a team3 (e.g., teams with the highest VSAT average total score are clustered into the top division, and the teams with the lowest VSAT average total score are clustered into the bottom division). To ensure that the seeding system using VSAT scores is effective, it is imperative that the predictive validity of the VSAT be examined. Subjects and Procedures. For the purposes of this analysis, team data were examined from three SO1 state coed volleyball tournaments (two in California and one in Oregon) held in 1993 and 1994. For each team (19 teams in the Oregon tournament, 20 and 19 teams in the two California tournaments) the following data were recorded for the top eight player^:^ average age; male to female ratio; average VSAT scores for the serve, pass, spike, and set subtests; and average VSAT total score. In addition, the following team data were recorded: overall team ranking based on team average VSAT score (VRANK), overall team ranking based on the independent observation of qualified judges (JRANK), allocated ability pool based on VSAT total score (i.e., actual seeding during the tournament), and allocated ability pool based on judges' observations. Three to four qualified judges observed the teams during each tournament. At the completion of the tournament, the judges, along with the tournament director, decided as a group the overall ranking of the teams and the appropriate ability pool for each team. The judges' team rankings and ability pool allocations were used as an independent "gold standard" against which to judge the effectiveness of VSAT total score in predicting the ability pool of teams.

Validating the VSAT 1 75 Statistical Analysis. Predictive validity of the VSAT for allocating coed teams to ability pools in SO1 volleyball tournaments was examined from several perspectives. The degree of agreement between the allocated ability pool based on average team total VSAT score (VPOOL) and the ability pool based on judges' rankings (JPOOL) was quantified via cross-tabulation of data and examination of contingency (C) and kappa (K) coefficients (Norusis, 1992~). C is an index of the proportion of the teams allocated into the same pool by both methods, while K provides a correction factor to C for chance agreement. Typically, K values will be less in magnitude than C values, since in the former, the effect of chance or random allocation is statistically removed. Both indices have an effective range of 0 to 1.00 with higher values indicating greater agreement among allocation methods. However, the upper limit of K will be less than 1.00 if all the marginal proportions of the contingency table are not symmetrical. In the case of asymmetrical marginals, the maximum value of K (max K) should be reported to aid interpretation (Looney, 1989). The Spearman rank order correlation between JRANK and VRANK provided minimal evidence for predictive validity because the overall team ranking based on VSAT total scores should at least be similar to team rankings determined by qualified judges. To further explore the nature of the relationship of VSAT subtests, age, and gender composition of teams as predictors of VRANK, JRANK, and JPOOL, multiple-regression analysis using a forward selection procedure (Norusis, 1992c) was employed for each dependent variable separately. Last, discriminant function analysis (Norusis, 1992a) with JPOOL as the grouping variable was used to determine which combination of age, serve test, pass test, set test, and gender ratio best discriminated among ability pools and most accurately classified teams into pools. Only teams from the Oregon and one California tournament were used in the discriminant analysis, since the second California tournament used a slightly different pooling scheme. Results. C, K, and max K coefficients from the cross-tabulation analysis (Norusis, 1992c) are presented in Table 6. The magnitude of C reveals that VPOOL agrees with JPOOL for less than two-thirds of the cases. With agreement by chance taken into account, K coefficients range from.37 to.47, supporting the conclusion that using average team total VSAT scores to allocate teams into ability pools is only moderately successful. Spearman rank order correlations between VRANK and JRANK ranged from.77 to.78 over the three tournaments, indicating that overall tournament team rankings based on average team total VSAT scores are moderately related to team rankings based on judges' observations. Table 6 C and K Coefficients by Tournament Location C K Max K California 1 California 2 Oregon

1 76 Downs and Wood Table 7 Predictive Validity Regression Analysis Dependent variable VRANK JRANK JPOOL Predictors Stepa RZ RZAbeC Step R2 RZA Step R2 RZA Pass 1-74.74 1.42.42 Spike 3.90.06 2.50.07 1.24.24 Serve 4.93.03 Bumplset 2.84.10 Age Gender 3.53.04 "Step that predictor entered the equation. bchange in RZ attributable to the predictor. 'Partial F significant p <.05. Table 7 summarizes the multiple regression analysis (Norusis, 1992~). Using VRANK as the dependent measure lets us explore the contribution of each subtest in predicting team ranking based on team average total VSAT score. Although 93% of the variability in VRANK is explained by the four subtests, it appears that the pass subtest (R2 =.74) contributes most to the relationship, while the set test adds 10% more explanatory power. Therefore, it could be argued that administration of the pass and set tests would provide reasonable prediction of team ranking, thus saving time and energy when administering the VSAT. When JRANK and JPOOL were used as the dependent variable, the regression analysis revealed poor predictive power. A combination of the pass and spike subtests and gender yielded an R2 of.53 when predicting JRANK, while the spike test was the lone predictor of POOL (R2 =.24). These results provide further evidence that the VSAT is a marginal predictor of team volleyball ability measured by ranking. Discriminant function analysis (Norusis, 19921) using JPOOL as the grouping variable and age, gender, serve, pass, spike, and bumplset scores as the independent measures yielded one statistically significant discriminant function characterized by the pass, spike, and serve tests (structure coefficients greater than.5). Table 8 summarizes the classification results from the discriminant analysis. Overall, the discriminant classification function correctly classified 68% of the teams. The greatest classification accuracy (83.3%) occurred with the low-ability teams and the lowest accuracy with the medium-ability teams (57.1%). Because the pass and spike tests appeared in both the regression and discriminant analyses to be the strongest VSAT predictors of team ability, a second discriminant function classification analysis was run using VPOOL as the grouping variable and the pass and spike as the independent measures. The summary of classification results in Table 9 reveals that the pass and spike tests alone provided an overall classification accuracy of 67%. Discussion The VSAT appears to be a valid and reliable volleyball skills test for SO1 athletes. Stability reliability estimates of the subtests separately and the test battery as a

Validating the VSAT in Table 8 Discriminant Analysis C ution Using VSAT Subtests, Age, and Gender Predicted group membership Actual group No. cases I 2 3 High ability (1) 12 8 4 0 66.7% 33.3% 0.0% Medium ability (2) 14 4 8 2 28.6% 57.1 % 14.3% Low ability (3) 12 0 2 10 0.0% 16.7% 83.3% Table 9 Discriminant Analysis Classification Using the Pass and Spike Tests Predicted group membership Actual group No. cases 1 2 3 High ability (1) 13 9 4 0 69.2% 30.8% 0.0% Medium ability (2) 14 4 8 2 35.7% 50.0% 14.3% Low ability (3) 12 0 2 10 0.0% 16.7% 83.3% whole were acceptable. Correlations between judges' ratings of skill performance and each VSAT subtest revealed a strong relationship, thus providing evidence for criterion-related validity. It should be noted, however, that the validity and reliability coefficients reported in this investigation were derived from data collected under optimal conditions that may not be realized in practice. For example, data were collected by the volleyball venue director of the California Special Olympics (an individual with ample experience in volleyball, in working with athletes with mental retardation, and with the VSAT) with assistance from team coaches. Normally, the VSAT is administered by team coaches and volunteers who may not have experience administering sport skill tests, may not be familiar with the VSAT, may not be familiar with the sport of volleyball, andlor (as reported by S.E. Miller, 1987) may not feel qualified to train or test athletes with mental retardation. In addition, depending on playing experience and level of mental retardation, the athletes may not be familiar with the VSAT or even understand what performance is expected. The predictive validity of the VSAT as the primary determinant of allocating teams to pools of equal ability is questionable. At best, the evidence reported in this investigation suggests that the VSAT has a moderate ability to accurately

1 78 Downs and Wood seed teams into ability pools. Overall, approximately two-thirds of the teams were accurately classified, with the greatest accuracy occurring with the lowability teams and the least accuracy with the middle-ability teams. Moreover, two VSAT subtests (pass and spike) appear to classify athletes into ability pools as accurately as a combination of the four subtests plus age and gender ratio. Additional methodologies should be explored for ranking volleyball teams prior to tournament competition. For example, one or a combination of the following approaches could be used in conjunction with VSAT scores: 1. Coaches could submit videotapes of their teams' play prior to a tournament for review by the tournament director andlor a panel of qualified coaches. 2. Pretournament seeding games could be organized to help the tournament director andlor a panel of coaches pool the teams into ability groupings. 3. Coaches could be provided with suitable training materials (e.g., videotapes) to aid in accurately administering the VSAT. The results of the predictive validity study showed consistently that the spike and pass subtests together classify teams as accurately as do all four VSAT subtests used together. This is in agreement with the findings of Nishijima, Ohsawa, and Matsuura (1987), Cox (1974), and Eom and Schutz (1992), who reported that spiking is a primary predictor of team performance for men involved in high-level volleyball competition. Based on the results of the current investigation, greater efficiency in testing could be achieved by administering only the pass and spike tests instead of the total VSAT battery if the primary use of VSAT scores is allocating teams to pools of equal ability. Validating a test is an ongoing process. Future research regarding the VSAT should aim at cross-validating the predictive validity of the measure, weighting VSAT subtests in the composite score, and exploring the usefulness of the VSAT in predicting performance of so-called "unified teams" consisting of team members with and without disabilities. References Bartlett, J., Smith, L., Davis, K., & Peel, J. (1991). Development of a valid volleyball skills test battery. Journal of Physical Education, Recreation & Dance, 62(2), 19-21. Cox, R.H. (1974). Relationship between selected volleyball skill components and team performance of men's Northwest "AA" volleyball teams. Research Quarterly, 45, 44 1-446. Crick, J.E., & Brennan, R.L. (1984). GENOVA: A general purpose analysis of variance system [Computer program]. Dorchester, MA: University of Massachusetts at Boston, Computer Facilities. Dunn, J., & Fait, H. (1989). Special physical education: Adapted, individualized, developmental (6th ed.). Dubuque, IA: Brown. Eom, H.J., & Schutz, R.W. (1992). Statistical analyses of volleyball team performance. Research Quarterly for Exercise and Sport, 63, 1 1-18. Looney, M.A. (1989). Criterion-referenced measurement: Reliability. In M.J. Safrit & T.M. Wood (Eds.), Measurement concepts in physical education and exercise science (pp. 137-152). Champaign, IL: Human Kinetics.

Validating the VSAT 1 79 Marascuilo, L.R., & Levin, J.R. (1983). Multivariate statistics in the social sciences: A researcher's guide. Monterey, CA: BrooksICole. Miller, J. (1975). The sampling distribution and a test for the significance of the bimultivariate redundancy statistic: A Monte Carlo study. Multivariate Behavioral Research, 10, 233-244. Miller, S.E. (1987). Training personnel and procedures for Special Olympics athletes. Education and Training in Mental Retardation, 22, 244-249. Morgan, W.P., & Johnson, R.W. (1978). Personality characteristics of successful and unsuccessful oarsmen. International Journal of Sport Psychology, 9, 119-133. Nagle, F.J., Morgan, W.P., Hellickson, R.O., Serfass, R.C., & Alexander, J.P. (1975). Spotting success traits in Olympic contenders. The Physician and Sportsmedicine, 3, 31-34. Nishijima, T., Ohsawa, S., & Matsuura, Y. (1987). The relationship between the game performance and group skill in volleyball. International Journal of Physical Education, 24, 20-26. Norusis, M.J. (1992a). SPSSPC+ professional statistics version 5.0 [Computer program]. Chicago: SPSS, Inc. Norusis, M.J. (1992b). SPSSPC+ advanced statistics version 5.0 [Computer program]. Chicago: SPSS, Inc. Norusis, M.J. (1992~). SPSS/PC+ base system user's guide version 5.0 [Software manual]. Chicago: SPSS, Inc. Safrit, M.J., & Wood, T.M. (1987). The test battery reliability of the Health-Related Physical Fitness Test. Research Quarterly for Exercise and Sport, 58, 160-167. Special Olympics International. (1992). Oflcial Special Olympics Summer Sports Rules 1992-I995. Washington, DC: Special Olympics International. Special Olympics International. (1994). Fact sheet. Washington, DC: Special Olympics International. Wood, T.M., & Safrit, M.J. (1984). A model for estimating the reliability of psychomotor test batteries. Research Quarterly for Exercise and Sport, 55, 53-63. Wood, T.M., & Safrit, M.J. (1987). A comparison of three multivariate models for estimating test battery reliability. Research Quarterly for Exercise and Sport, 58, 154-163. Notes 'For a copy of the Volleyball Skills Assessment Test, contact Special Olympics International, 1325 G St. NW, Suite 500, Washington DC 20005-3104. 2All female Special Olympians were tested on a women's height net as specified by the VSAT guidelines. 3When possible, tournament directors employ the results of pretournament play during the 10-week training season prior to the tournament to aid in seeding teams. However, since teams change players and may not play against other tournament teams, use of these records in the seeding process is minimal. 4Current SO1 policy seeds teams based on data from the top eight players of each team. Acknowledgments We wish to thank Dave Dill and Mike Jette from Special Olympics and Hopi Hoekstra for their assistance with this study.