The Evolution and Role of Adaptive Tests

Introduction

The concept of tailored testing, initially introduced by William Turnbull, has long been a part of oral exams. In these exams, an examiner would adjust the difficulty of questions based on the test taker’s responses, continuing until a satisfactory level of understanding and confidence in the test taker’s score was achieved. Over time, this approach has been referred to by various names, including adaptive testing, branched testing, individualized testing, programmed testing, and sequential item testing.

Computers have played a role in testing since the 1970s, initially used for scoring and processing test reports. However, it wasn’t until the 1980s that they began administering tests, and the computing power needed to run Item Response Theory (IRT) based algorithms for computer adaptive tests (CAT) became sufficient in the 1990s.

Introduction of Concepts Paving the Way for Adaptive Testing

The first adaptive test, the Binet-Simon test, was age-based (ages 3-13) and compared a child’s performance to that of an average child of the same age. L.L. Thurstone was the first to introduce the concept of item difficulty, and Benjamin’s tailor-made test utilized item difficulties to determine which items to administer based on responses. The advent of IRT for modelling item responses and estimating test taker’s proficiency (ability) has led to the sophisticated Computer Adaptive Testing (CAT) systems we use today.

Advantages of CAT

Flexible Scheduling: Tests can be taken at any convenient time within a specified window.
Test Shortening: Tests can be 30-50% shorter without compromising accuracy.
Relevance: Irrelevant questions are minimized.
Improved Security: Each user receives a unique set of items, reducing the risk of cheating.

How IRT-based Computer Adaptive Testing (CAT) Works

The main components of a CAT system include:

Item Pool: A database of potential test items.
Initial Ability Estimation Algorithm: Used for test taker’s proficiency estimation during early part of the test. Techniques like Maximum A Posteriori (MAP) and Expected A Posteriori (EAP) estimators, Maximum Likelihood Estimation with Fences (MLEF), and Maximum Likelihood Estimation with Truncation (MLET) are used.
Intermediate Ability Estimation Algorithm: Typically, Maximum Likelihood Estimation.
Final Ability Estimation Algorithm: Estimation of proficiency at the test end to report to the test taker.
Item Selection Criteria: The criteria based on which the next test item is selected. Methods such as Fisher information-based selection or nearest b-value selection are used.
Content Constraint Management: Maintaining the required proportion of items from various content areas using methods like scripting.
Rules for Ending the Test: How to stop the test. It could be based on test length, Standard Error of Measurement (SEM), etc.

At the start of a CAT, the test taker’s proficiency is unknown, so the test begins with an item of average difficulty. CAT adapts to the test taker, presenting more challenging items after correct responses and easier items after incorrect ones. This process continues until a predefined stopping criterion is met.

The CAT algorithm operates iteratively through these steps:

Evaluate all un-administered items to determine the best one to present as the next item, based on the current proficiency(ability) estimate of the test taker.
Administer the selected item and record the test taker’s response.
Having more information with response to one more item, the understanding of the proficiency of the test taker is updated.
Repeat steps 1-3 until the stopping criterion is met.

Multistage Testing (MST)

Another adaptive testing design is Multistage Testing (MST), which addresses some limitations of CAT. MST offers advantages such as item review, item skipping, better control over test content, adherence to target content distributions, and consistent item order. While MST sacrifices some adaptivity compared to CAT, it remains more accurate than linear tests.

MST adapts at the sub-test (module) level rather than the item level. Each test stage has multiple modules (easy, medium, difficult). Based on performance in an initial routing module, test takers are directed to subsequent modules, where their performance determines further routing. This adaptivity at each stage continues until the final proficiency or ability estimate is reached.

Conclusion

Adaptive testing has revolutionized the way assessments are conducted, making them more personalized, efficient, and secure. From the early concepts of tailored testing to the sophisticated CAT and MST systems available today, the evolution of adaptive testing reflects significant advancements in educational and psychological measurement. With tools like Excelsoft’s Saras^TMeducators and institutions can leverage cutting-edge technology to deliver accurate and engaging assessments. As adaptive testing continues to evolve, it holds great promise for enhancing learning and evaluation processes across diverse fields.

Excelsoft’s Adaptive Testing Solutions

Excelsoft provides both CAT and MST test drivers. Our CAT solution, Saras^TMoffers a mix of algorithms to achieve optimal results and includes a simulator to fine-tune test configurations and algorithm choices. The solution facilitates configuration in terms of number of test panels, stages, and module assemblies, delivering comprehensive reports on both tests and candidate performances.

Article