Test Equating, Scaling, and Linking: Methods and Practices / Edition 2

Test Equating, Scaling, and Linking: Methods and Practices / Edition 2

by Michael J. Kolen, Robert L. Brennan

Test equating methods are used with many standardized tests in education and psychology to ensure that scores from multiple test forms can be used interchangeably. In recent years, researchers from the education, psychology, and statistics communities have contributed to the rapidly growing statistical and psychometric methodologies used in test equating. This book

See more details below


Test equating methods are used with many standardized tests in education and psychology to ensure that scores from multiple test forms can be used interchangeably. In recent years, researchers from the education, psychology, and statistics communities have contributed to the rapidly growing statistical and psychometric methodologies used in test equating. This book provides an introduction to test equating which both discusses the most frequently used equating methodologies and covers many of the practical issues involved.

This second edition expands upon the coverage of the first edition by providing a new chapter on test scaling and a second on test linking. Test scaling is the process of developing score scales that are used when scores on standardized tests are reported. In test linking, scores from two or more tests are related to one another. Linking has received much recent attention, due largely to investigations of linking similarly named tests from different test publishers or tests constructed for different purposes. The expanded coverage in the second edition also includes methodology for using polytomous item response theory in equating.

The themes of the second edition include:

* the purposes of equating, scaling and linking and their practical context

* data collection designs

* statistical methodology

* designing reasonable and useful equating, scaling, and linking studies

* importance of test development and quality control processes to equating

* equating error, and the underlying statistical assumptions for equating

Michael J. Kolen is a Professor of Educational Measurement at the University of Iowa. Robert L. Brennan is E. F. Lindquist Chair in Measurement and Testing and Director of the Center for Advanced Studies in Measurement and Assessment at the University of Iowa. Both authors are acknowledged experts on test equating, scaling, and linking, they have authored numerous publications on these subjects, and they have taught many workshops and courses on equating. Both authors have been President of the National Council on Measurement in Education (NCME), and both received an NCME award for Outstanding Technical Contributions to Educational Measurement following publication of the first edition of this book. Professor Brennan received an NCME award for Career Contributions to Educational Measurement and authored Generalizability Theory published by Springer-Verlag.

Read More

Product Details

Springer New York
Publication date:
Statistics for Social and Behavioral Sciences Series
Edition description:
2nd ed. 2004
Product dimensions:
9.21(w) x 6.14(h) x 1.25(d)

Table of Contents

Contents Preface Notation
1 Introduction and Concepts
1.1 Equating and Related Concepts
1.1.1 Test Forms and Test Specifications
1.1.2 Equating
1.1.3 Processes That Are Related to Equating
1.1.4 Equating and Score Scales
1.1.5 Equating and the Test Score Decline of the 1960s and 1970s
1.2 Equating and Scaling in Practice-A Brief Overview of This Book
1.3 Properties of Equating
1.3.1 Symmetry Property
1.3.2 Same Specifications Property
1.3.3 Equity Properties
1.3.4 Observed Score Equating Properties
1.3.5 Group Invariance Property
1.4 Equating Designs
1.4.1 Random Groups Design
1.4.2 Single Group Design
1.4.3 Single Group Design with Counterbalancing
1.4.4 ASVAB Problems with a Single Group Design
1.4.5 Common-Item Nonequivalent Groups Design
1.4.6 NAEP Reading Anomaly-Problems with Common Items
1.5 Error in Estimating Equating Relationships
1.6 Evaluating the Results of Equating
1.7 Testing Situations Considered
1.8 Preview
1.9 Exercises

2 Observed Score Equating Using the Random Groups Design
2.1 Mean Equating
2.2 Linear Equating
2.3 Properties of Mean and Linear Equating
2.4 Comparison of Mean and Linear Equating
2.5 Equipercentile Equating
2.5.1 Graphical Procedures
2.5.2 Analytic Procedures
2.5.3 Properties of Equated Scores in Equipercentile Equating
2.6 Estimating Observed Score Equating Relationships
2.7 Scale Scores
2.7.1 Linear Conversions
2.7.2 Truncation of Linear Conversions
2.7.3 Nonlinear Conversions
2.8 Equating Using Single Group Designs
2.9 Equating Using Alternate Scoring Schemes
2.10 Preview of What Follows
2.11 Exercises

3 Random Groups-Smoothing in Equipercentile Equating
3.1 A Conceptual Statistical Framework for Smoothing
3.2 Properties of Smoothing Methods
3.3 Presmoothing Methods
3.3.1 Polynomial Log-linear Method
3.3.2 Strong True Score Method
3.3.3 Illustrative Example
3.4 Postsmoothing Methods
3.4.1 Illustrative Example
3.5 Practical Issues in Equipercentile Equating
3.5.1 Summary of Smoothing Strategies
3.5.2 Equating Error and Sample Size
3.6 Exercises

4 Nonequivalent Groups-Linear Methods
4.1 Tucker Method
4.1.1 Linear Regression Assumptions
4.1.2 Conditional Variance Assumptions
4.1.3 Intermediate Results
4.1.4 Final Results
4.1.5 Special Cases
4.2 Levine Observed Score Method
4.2.1 Correlational Assumptions
4.2.2 Linear Regression Assumptions
4.2.3 Error Variance Assumptions
4.2.4 Intermediate Results
4.2.5 General Results
4.2.6 Classical Congeneric Model Results
4.3 Levine True Score Method
4.3.1 Results
4.3.2 First-Order Equity
4.4 Illustrative Example and Other Topics
4.4.1 Illustrative Example
4.4.2 Synthetic Population Weights
4.4.3 Mean Equating
4.4.4 Decomposing Observed Di.erences in Means and Variances
4.4.5 Relationships Among Tucker and Levine Equating Methods
4.4.6 Scale Scores
4.5 Appendix Proof that ó2 s (TX) = ã2 1ó2 s (TV ) Under the Classical Congeneric Model
4.6 Exercises

5 Nonequivalent Groups-Equipercentile Methods
5.1 Frequency Estimation Equipercentile Equating
5.1.1 Conditional Distributions
5.1.2 Frequency Estimation Method
5.1.3 Evaluating the Frequency Estimation Assumption
5.1.4 Numerical Example
5.1.5 Estimating the Distributions
5.2 Braun-Holland Linear Method
5.3 Chained Equipercentile Equating
5.4 Illustrative Example
5.4.1 Illustrative Results
5.4.2 Comparison Among Methods
5.4.3 Practical Issues in Equipercentile Equating with Common Items
5.5 Exercises

6 Item Response Theory Methods
6.1 Some Necessary IRT Concepts
6.1.1 Unidimensionality and Local Independence Assumptions
6.1.2 IRT Models
6.1.3 IRT Parameter Estimation
6.2 Transformations of IRT Scales
6.2.1 Transformation Equations
6.2.2 Demonstrating the Appropriateness of Scale Transformations
6.2.3 Expressing A and B Constants
6.2.4 Expressing A and B Constants in Terms of Groups of Items and/or Persons
6.3 Transforming IRT Scales When Parameters Are Estimated
6.3.1 Designs
6.3.2 Mean/Sigma and Mean/Mean Transformation Methods
6.3.3 Characteristic Curve Transformation Methods
6.3.4 Comparisons Among Scale Transformation Methods
6.4 Equating and Scaling
6.5 Equating True Scores
6.5.1 Test Characteristic Curves
6.5.2 True Score Equating Process
6.5.3 The Newton-Raphson Method
6.5.4 Using True Score Equating with Observed Scores
6.6 Equating Observed Scores
6.7 IRT True Score Versus IRT Observed Score Equating
6.8 Illustrative Example
6.8.1 Item Parameter Estimation and Scaling
6.8.2 IRT True Score Equating
6.8.3 IRT Observed Score Equating
6.8.4 Rasch Equating
6.9 Using IRT Calibrated Item Pools
6.9.1 Common-Item Equating to a Calibrated Pool
6.9.2 Item Preequating
6.9.3 Robustness to Violations of IRT Assumptions
6.10 Equating with Polytomous IRT
6.10.1 Polytomous IRT Models for Ordered Responses
6.10.2 Scoring Function, Item Response Function, and Test Characteristic Curve
6.10.3 Parameter Estimation and Scale Transformation with Polytomous IRT Models
6.10.4 True Score Equating
6.10.5 Observed Score Equating
6.10.6 Example using the Graded Response Model
6.11 Practical Issues and Caveat
6.12 Exercises

7 Standard Errors of Equating
7.1 De.nition of Standard Error of Equating
7.2 The Bootstrap
7.2.1 Standard Errors Using the Bootstrap
7.2.2 Standard Errors of Equating
7.2.3 Parametric Bootstrap
7.2.4 Standard Errors of Smoothed Equipercentile Equating
7.2.5 Standard Errors of Scale Scores
7.2.6 Standard Errors of Equating Chains
7.2.7 Mean Standard Error of Equating
7.2.8 Caveat
7.3 The Delta Method
7.3.1 Mean Equating Using Single Group and Random Groups Designs
7.3.2 Linear Equating Using the Random Groups Design
7.3.3 Equipercentile Equating Using the Random Groups Design
7.3.4 Standard Errors for Other Designs
7.3.5 Approximations
7.3.6 Standard Errors for Scale Scores
7.3.7 Standard Errors of Equating Chains
7.3.8 Using Delta Method Standard Errors
7.4 Using Standard Errors in Practice
7.5 Exercises

8 Practical Issues in Equating
8.1 Equating and the Test Development Process
8.1.1 Test Speci.cations
8.1.2 Characteristics of Common-item Sets
8.1.3 Changes in Test Specifications
8.2 Data Collection: Design and Implementation
8.2.1 Choosing Among Equating Designs
8.2.2 Developing Equating Linkage Plans
8.2.3 Examinee Groups Used in Equating
8.2.4 Sample Size Requirements
8.3 Choosing From Among the Statistical Procedures
8.3.1 Equating Criteria in Research Studies
8.3.2 Characteristics of Equating Situations
8.4 Choosing From Among Equating Results
8.4.1 Equating Versus Not Equating
8.4.2 Use of Robustness Checks
8.4.3 Choosing From Among Results in the Random Groups Design
8.4.4 Choosing From Among Results in the Common-Item Nonequivalent Groups Design
8.4.5 Use of Consistency Checks
8.4.6 Equating and Score Scales
8.4.7 Assessing First and Second Order Equity for Scale Scores
8.5 Importance of Standardization Conditions and Quality Control
8.5.1 Test Development
8.5.2 Test Administration and Standardization Conditions
8.5.3 Quality Control
8.5.4 Reequating
8.6 Conditions Conducive to Satisfactory Equating
8.7 Comparability Issues in Special Circumstances
8.7.1 Comparability Issues with Computer-Based Tests
8.7.2 Comparability of Performance Assessments
8.7.3 Score Comparability with Optional Test Sections
8.8 Conclusion
8.9 Exercises

9 Score Scales
9.1 Scaling Perspectives
9.2 Score Transformations
9.3 Incorporating Normative Information
9.3.1 Linear Transformations
9.3.2 Nonlinear Transformations
9.3.3 Example: Normalized Scale Scores
9.3.4 Importance of Norm Group in Setting the Score Scale
9.4 Incorporating Score Precision Information
9.4.1 Rules of Thumb for Number of Distinct Score Points
9.4.2 Linearly Transformed Score Scales with a Given Standard Error of Measurement
9.4.3 Score Scales with Approximately Equal Conditional Standard Errors of Measurement
9.4.4 Example: Incorporating Score Precision
9.4.5 Evaluating Psychometric Properties of Scale Scores
9.4.6 The IRT è-Scale as a Score Scale
9.5 Incorporating Content Information
9.5.1 Item Mapping
9.5.2 Scale Anchoring
9.5.3 Standard Setting
9.5.4 Numerical Example
9.5.5 Practical Usefulness
9.6 Maintaining Score Scales
9.7 Scales for Test Batteries and Composites
9.7.1 Test Batteries
9.7.2 Composite Scores
9.7.3 Maintaining Scales for Batteries and Composites
9.8 Vertical Scaling and Developmental Score Scales
9.8.1 Structure of Batteries
9.8.2 Type of Domain Being Measured
9.8.3 Definition of Growth
9.8.4 Designs for Data Collection for Vertical Scaling
9.8.5 Test Scoring
9.8.6 Hieronymus Statistical Methods
9.8.7 Thurstone Statistical Methods
9.8.8 IRT Statistical Methods
9.8.9 Thurstone Illustrative Example
9.8.10 IRT Illustrative Example
9.8.11 Statistics for Comparing Scaling Results
9.8.12 Some Limitations of Vertically Scaled Tests
9.8.13 Research on Vertical Scaling
9.9 Exercises

10 Linking
10.1 Linking Categorization Schemes and Criteria
10.1.1 Types of Linking
10.1.2 Mislevy/Linn Taxonomy
10.1.3 Degrees of Similarity
10.2 Group Invariance
10.2.1 Statistical Methods Using Observed Scores
10.2.2 Statistics for Overall Group Invariance
10.2.3 Statistics for Pairwise Group Invariance
10.2.4 Example: ACT and ITED Science Tests
10.3 Additional Examples
10.3.1 Extended Time
10.3.2 Test Adaptations and Translated Tests
10.4 Discussion
10.5 Exercises

11 Current and Future Challenges
11.1 Score Scales
11.2 Equating
11.3 Vertical Scaling
11.4 Linking
11.5 Summary References Appendix A: Answers to Exercises Appendix B: Computer Programs Index

Read More

Customer Reviews

Average Review:

Write a Review

and post it to your social network


Most Helpful Customer Reviews

See all customer reviews >