The Evolution and Role of Adaptive Tests


The concept of tailored testing, initially introduced by William Turnbull, has long been a part of oral exams. In these exams, an examiner would adjust the difficulty of questions based on the test taker’s responses, continuing until a satisfactory level of understanding and confidence in the test taker’s score was achieved. Over time, this approach has been referred to by various names, including adaptive testing, branched testing, individualized testing, programmed testing, and sequential item testing.

Computers have played a role in testing since the 1970s, initially used for scoring and processing test reports. However, it wasn’t until the 1980s that they began administering tests, and the computing power needed to run Item Response Theory (IRT) based algorithms for computer adaptive tests (CAT) became sufficient in the 1990s.

Introduction of Concepts Paving the Way for Adaptive Testing

The first adaptive test, the Binet-Simon test, was age-based (ages 3-13) and compared a child’s performance to that of an average child of the same age. L.L. Thurstone was the first to introduce the concept of item difficulty, and Benjamin’s tailor-made test utilized item difficulties to determine which items to administer based on responses. The advent of IRT for modelling item responses and estimating test taker’s proficiency (ability) has led to the sophisticated Computer Adaptive Testing (CAT) systems we use today.

Advantages of CAT

  1. Flexible Scheduling: Tests can be taken at any convenient time within a specified window.
  2. Test Shortening: Tests can be 30-50% shorter without compromising accuracy.
  3. Relevance: Irrelevant questions are minimized.
  4. Improved Security: Each user receives a unique set of items, reducing the risk of cheating.

How IRT-based Computer Adaptive Testing (CAT) Works

The main components of a CAT system include:

  • Item Pool: A database of potential test items.
  • Initial Ability Estimation Algorithm: Used for test taker’s proficiency estimation during early part of the test. Techniques like Maximum A Posteriori (MAP) and Expected A Posteriori (EAP) estimators, Maximum Likelihood Estimation with Fences (MLEF), and Maximum Likelihood Estimation with Truncation (MLET) are used.
  • Intermediate Ability Estimation Algorithm: Typically, Maximum Likelihood Estimation.
  • Final Ability Estimation Algorithm: Estimation of proficiency at the test end to report to the test taker.
  • Item Selection Criteria: The criteria based on which the next test item is selected. Methods such as Fisher information-based selection or nearest b-value selection are used.
  • Content Constraint Management: Maintaining the required proportion of items from various content areas using methods like scripting.
  • Rules for Ending the Test: How to stop the test. It could be based on test length, Standard Error of Measurement (SEM), etc.

At the start of a CAT, the test taker’s proficiency is unknown, so the test begins with an item of average difficulty. CAT adapts to the test taker, presenting more challenging items after correct responses and easier items after incorrect ones. This process continues until a predefined stopping criterion is met.

The CAT algorithm operates iteratively through these steps:

  1. Evaluate all un-administered items to determine the best one to present as the next item, based on the current proficiency(ability) estimate of the test taker.
  2. Administer the selected item and record the test taker’s response.
  3. Having more information with response to one more item, the understanding of the proficiency of the test taker is updated.
  4. Repeat steps 1-3 until the stopping criterion is met.

Multistage Testing (MST)

Another adaptive testing design is Multistage Testing (MST), which addresses some limitations of CAT. MST offers advantages such as item review, item skipping, better control over test content, adherence to target content distributions, and consistent item order. While MST sacrifices some adaptivity compared to CAT, it remains more accurate than linear tests.

MST adapts at the sub-test (module) level rather than the item level. Each test stage has multiple modules (easy, medium, difficult). Based on performance in an initial routing module, test takers are directed to subsequent modules, where their performance determines further routing. This adaptivity at each stage continues until the final proficiency or ability estimate is reached.


Adaptive testing has revolutionized the way assessments are conducted, making them more personalized, efficient, and secure. From the early concepts of tailored testing to the sophisticated CAT and MST systems available today, the evolution of adaptive testing reflects significant advancements in educational and psychological measurement. With tools like Excelsoft’s SarasTM educators and institutions can leverage cutting-edge technology to deliver accurate and engaging assessments. As adaptive testing continues to evolve, it holds great promise for enhancing learning and evaluation processes across diverse fields.

Excelsoft’s Adaptive Testing Solutions

Excelsoft provides both CAT and MST test drivers. Our CAT solution, SarasTM offers a mix of algorithms to achieve optimal results and includes a simulator to fine-tune test configurations and algorithm choices. The solution facilitates configuration in terms of number of test panels, stages, and module assemblies, delivering comprehensive reports on both tests and candidate performances.

AI-Assisted Item Grouping and Test Blueprint Development

Test blueprinting is a robust process in the assessment lifecycle that facilitates efficient item grouping and ensures the creation of high-quality, well-balanced tests. It enables you to systematically organize the items based on characteristics such as similarity, meta-data, domain parameters, factors derived from item analytics, and others. This streamlines selecting the appropriate questions and determining the optimal number of questions when creating tests.

A few characteristics used in organizing the items:

  • Similarity: Similarity involves categorizing items that evaluate the same knowledge or skill using comparable content or formats. For instance, it organizes all multiple-choice-single-response, multiple-choice-multiple-response, and fill-in-the-blank items on a particular topic.
  • Meta-data: Organize items based on additional information about them, such as their author, creation date, geography, language, competency, skill, learning outcomes, and learning objectives.
  • Subject/domain parameters: Categorize items by the specific subject area or domain they cover (e.g., Math, Science, History).
  • Item psychometrics: Group items based on item analysis data such as:
    • The difficulty index indicates how difficult an item is for average-ability test-takers.
    • The discrimination index shows how well an item differentiates between high-performing and low-performing test-takers.
    • The guessing factor estimates the likelihood of a test-taker getting an item correct by chance alone.
    • The exposure parameters track the frequency with which an item has been used in past tests to prevent overuse and maintain test security.
    • The aging parameters analyze information currency and remove outdated questions from the pool presented to the test-takers.

Test and Item administrators can leverage item grouping to:

  • Construct item groups: Organize the items logically using the various grouping methods.
  • Develop test blueprints: Define a test’s desired structure and content by specifying various item filters.

Leveraging AI for constructing ability-based, well-balanced, and effective test blueprints:

By generating a set of filters, AI can significantly improve test quality by reducing the manual effort required from SMEs. SMEs can focus more on refining and fine-tuning the AI-generated test blueprints to ensure they precisely align with the desired test objectives and outcomes.

Construct item groups

  • The AI engine processes pre-defined curriculum and semantic content models to categorize and group items for test blueprints.
  • It identifies metadata constraints to meet test objectives and outcomes.
  • AI marks adversary items (enemy items) based on similarity scores, grouping them by difficulty and discrimination index for test construction, distinguishing between practice and high-stakes tests.

Develop test blueprints

  • AI can be used to build test blueprints by providing test objectives and outcomes as prompts.
    • Eg: 1:  Build a Science test blueprint to deliver tests for the students studying in grade 10 having an ability distribution of 40-High difficulty, 30-Average difficulty, and 30-Low difficulty.
    • Eg: 2: Build a Math test to validate the knowledge and comprehension level of the students in grade 9. Ensure to add more questions from the topic of Linear Algebra and Quadratic Equations.
    • Eg: 3: Build a Logical reasoning test for postgraduate students located in the African region.
  • AI can be used to validate the existing blueprints for their effectiveness and generate quick reports.
  • AI can direct test administrators to build a blueprint by asking leading questions.

Read this blog to discover the latest insights on AI in Assessments:

AI In Assessments – Automated Item Generation

In the assessment space, items are the basis for building intellectual property. Different industries utilize various processes and techniques to ensure the uniqueness and quality of these items. But how do these industries provide the uniqueness, consistency, and quality of the items developed? That’s where Artificial Intelligence (AI) will play a vital role in meticulously weaving elements such as item type, item language, item content (includes stimuli, stem, and distractors), item meta-data, item difficulty, and taxonomy levels together.

Let us understand the item design, item template, and the power of Artificial Intelligence (AI) in creating new items and refactoring the existing items.

Factors considered for Item design:

  • Interactivity: How the test taker will interact with the Item (e.g., choice-based selection vs. free text response)
  • Response Format: The type of response expected or captured (e.g., multiple choice single response, multiple choice multiple responses, fill-in-the-blank with free text, and a few more)
  • Scoring: How the Item is scored based on the response (e.g., auto scoring, semi-auto scoring, and manual scoring)

Every Item will be crafted with the listed factors in mind, ensuring it captures all parameters for presentation, evaluation, and analysis.

Item template design:

Like architectural template guide construction, item template is the foundation for consistent and high-quality item creation. While each item type will have its unique template outlining specific mandatory and optional parameters, all items will share a set of standard parameters necessary for their organization and analysis.

  • Item Stimuli
  • Item Stem
  • Item Distractors (in case of a predefined list of responses)
  • Response placeholder (in case of open-ended responses)
  • Answer Key (in the context of Objective Items)
  • Model answer (in the context of subjective items)
  • Complexity
  • Taxonomy classifications
  • Meta-data

Having captured all required parameters listed, the effective use of AI will significantly support authors in building new items and refactoring legacy items.

Leveraging AI for Efficient, Effective, and Rapid Item Generation:

By generating a set of items based on a predefined knowledge model, AI can significantly improve the speed of item generation and reduce the manual effort required from SMEs. SMEs can focus more on refining and fine-tuning the AI-generated items to ensure they precisely align with the desired learning objectives and assessment outcomes. The AI-generated item pools can serve as a springboard for collaboration and fine-tuning.

  • AI can generate a broader range of assessment items from the Knowledge bank, saving time and resources. (a content model of related information about a particular subject/topic).
  • AI can be used to create distractors for objective item types, preventing guessing factors and promoting a deeper understanding of the topic. It can generate model answers for the items used in the auto or manual marking process.
  • AI can analyze existing items and suggest modifications to meet newer objectives and outcomes, such as changing stem content, difficulty levels, and taxonomy levels and introducing newer distractors.
  • AI can suggest alternative item types based on the objective and outcome of an existing item (convert a multiple-choice item into a fill-in-the-blank format while ensuring the same knowledge or skill is assessed).
  • AI can create meta-data for each Item based on the outcome and analytical parameters defined.
  • AI can analyze items to identify potential biases based on language, content, or difficulty level. It can be used to group equivalent items and mark them as enemy items.

The future of assessment lies in the balanced collaboration between human expertise and AI capabilities. By adopting AI in item generation, the assessment space can unlock the potential to increase efficiency, cost-effectiveness, consistency, speed of development, content semantics, and an impactful item pool for all stakeholders involved.

Read these blogs to discover the latest insights on AI in Assessments:

Unleashing the Power of Artificial Intelligence in Online Assessment Tools

The recent advancement in the realms of Artificial Intelligence has opened up frontiers for improvements in the way technology can transform education. One of the key areas in education where AI in technology can positively improve effectiveness is online assessments. From automated item generation to personalized assessments, AI has the potential to revolutionize the way we teach, assess, and learn. We will delve into this blog post, exploring opportunities and applications AI can bring for online assessment tools.

AI in Automated Item Generation

Text-based generative AI models offer the capability to automate the creation of item or question content for assessments. Additionally, multimedia AI models can be utilized to generate accompanying multimedia assets, enhancing the overall richness of the assessment content.

AI in Adaptive and Dynamic Test Generation

AI enables the creation of adaptive and dynamic assessments that can adjust in real time based on a student’s performance. These assessments provide a personalized learning experience, catering to each student’s individual needs and abilities. AI-powered assessment tools can identify areas where students require additional support and adjust the difficulty level accordingly, ensuring a more engaging and effective learning process.

AI in Practical Assessments

AI-powered bots can engage in role-play scenarios, providing students with a realistic and interactive assessment experience. This technology allows for more authentic and immersive assessments, replicating real-world situations students may encounter in their future careers.

AI in Automated Marking

AI-powered marking tools can analyze subjective responses, including written and spoken responses, with a high degree of accuracy. NLP (Natural Language Processing) models aid in assessing written responses, while speech and linguistic criteria are used to evaluate spoken responses. This automation not only saves time for educators but also ensures consistent and unbiased marking.

AI in Analysis and Reporting

AI can perform statistical and psychometric analyses to provide detailed insights into student performance and assessment quality. It can generate data visualizations and interpret results, enabling educators to identify strengths, weaknesses, and areas for improvement. This data-driven approach helps educators make informed decisions about curriculum, instruction, and assessment strategies.

AI in Plagiarism and Malpractice Detection

Automated marking tools can identify instances of plagiarism in written responses. Remote proctoring solutions powered by AI monitor students during online assessments, analyzing video, audio, and keystroke data to detect suspicious behavior. Post-test forensics can further investigate potential cases of misconduct, ensuring the integrity and credibility of assessments.

AI for Automating Quality Assurance

AI can identify items that are biased, unfair, or ambiguous, ensuring the validity and reliability of assessments. AI-powered quality assurance tools can also provide feedback to item writers, helping them improve the quality of assessment items and reduce the likelihood of errors.


In conclusion, AI is revolutionizing the field of education and assessment. From adaptive testing to automated marking, AI-powered tools are transforming our approach to teaching and learning. AI’s ability to provide personalized, dynamic, and accurate assessments enhances the learning experience for students and allows educators to focus on providing high-quality instruction. As AI continues to advance, we can expect even more innovative and transformative applications of this technology in the realm of education and assessment.

Harnessing Generative AI for Efficient Test Data Generation

In the realm of software development and testing, the availability of high-quality test data is paramount. However, manually creating test data can be a time-consuming and laborious task, often leading to bottlenecks in the testing process. Generative AI, with its ability to produce realistic synthetic data, offers a solution to this challenge. In this article, we explore how generative AI revolutionises test data generation by automating the process, improving data quality, and accelerating the overall testing timeline.

Benefits of Using Generative AI for Test Data Generation:

1. Automation:

Generative AI automates the test data generation process, eliminating the need for manual data entry and reducing the associated time and effort. This automation enables developers and testers to focus on higher-value activities, such as improving the quality of test cases and analysing test results.

2. Improved Data Quality:

Generative AI algorithms can be trained on real-world data, allowing them to generate test data that closely resembles the actual input data. This leads to higher-quality test data that better reflects the scenarios encountered in production environments, improving the overall effectiveness of testing.

3. Increased Data Volume:

Generative AI can generate vast amounts of test data in a short time, addressing the challenge of data scarcity in testing. This enables thorough testing across multiple scenarios and edge cases, ensuring that the software application is robust and reliable under various conditions.

4. Improved Test Coverage:

Generative AI helps achieve broader test coverage by generating a diverse range of test data. This helps identify more defects and ensures that the testing process is thorough and comprehensive, reducing the likelihood of undetected issues in the software application.

5. Cost Reduction:

Automating the test data generation process and improving data quality leads to cost savings in the overall testing effort. By eliminating the need for manual data creation and reducing the time spent on testing, organisations can allocate resources more effectively and focus on innovation.

Generative AI has emerged as a powerful tool for test data generation, offering numerous benefits such as automation, improved data quality, increased data volume, enhanced test coverage, and cost reduction. By harnessing the capabilities of generative AI, organisations can streamline their testing processes, improve software quality, and accelerate the overall development timeline. As generative AI continues to evolve, it is poised to revolutionise testing methodologies and contribute significantly to the delivery of high-quality software applications.