Accuracy and Confidence of Objective Structured Clinical Examination Pass-Fail Decisions
Status
Completed: 22 February 2009
Project Details
A project completed in 2009, undertaken by University of Otago, to explore the factors underlying decision-making when there is a need to aggregate assessment results using the OCSE (objective structured clinical examinations) format.
Aims:
The main aims of the project were to:
- investigate the accuracy and confidence for decisions made by staff assessors given increasing information on students’ performances
- study how increasing the amount of data on student performance effects decision-making and confidence in these decisions made by examination boards
- identify the threats to the validation of decision-making relating to multi-component assessment and therefore will provide an example of how good practice might develop.
Methodology:
The project methodology involved:
- staff assessors being shown authentic anonymised student scores for an increasing number of OSCE stations and asked to make pass-fail decisions and give a degree of confidence in this decision
- staff assessors being given a fictional anecdote from a single observation made previously which was deliberately discordant with the staff assessors' views to that point and again they gave decisions and confidence
- interviews with the staff assessors, following completion of forms, regarding the rationale for their decisions.
Team
Dr Mike Tweed
Project Leader
University of OtagoTim Wilkinson
University of OtagoMark Thompson-Fawcett
University of OtagoStatus
Funding
$10,000.00 (excl GST)
Key Findings
The key findings from the project included:
- The strength of the research was that authentic results were collected and a gold standard outcome, that of the Board of Examinations BoE, was available. Faculty staff, who are involved in teaching and/or assessing consultation skills, were recruited, meaning a more realistic decision-making process can be reflected.
- Across the 10 stations for the candidate who was above pass threshold for all stations the mean level of confidence in a pass increased from 80 to 90%. For the students that failed the most stations the level of confidence in fail varied between 70 and 80%. The confidence-accuracy difference, a measure of overconfidence, was greatest for students whose performance was closest to the pass-fail threshold.
- Despite provision of progressively poor performances the staff assessors were not as confident in assigning fail. Internal and external factors affecting decision making process tend to contribute to their doubts. The anecdotal information changed 12% of the decisions to the extent that pass was altered to fail or vice versa.
- Although aiming for high levels of assessors’ confidence might be desirable, it seems unlikely that levels above 90% would be achieved. There were concerns on various external factors that may contribute to apparent failure, such as cultural background, nervous status, and fairness of station and internal factors, derivation of pass mark threshold, fairness of examiners and the flow of the stations.
Key Recommendations
The key implications from the project included:
Implications for standard setting and decision-making processes | Making decisions on students whose performance is close to decision thresholds is difficult and associated with overconfidence. As a majority of students perform above threshold, staff assessors are more comfortable assigning a pass rather than fail. In the presence of uncertainty, the staff assessors will tend to pass the student. This has implications for standard setting and decision-making processes.
Decision changes | To some assessors the plausible but unreliable anecdote is given an equal or greater importance than the evidence from the OSCE, and so leads to a decision change.
Further questions raised | Further questions raised related to what factors assessors would take into account either consciously or subliminally and why staff are under-confident in awarding a pass and even more so for a fail decision. A reluctance to award a fail may influence the borderline group and borderline regression and similar standard setting processes and hence final outcomes.
A research report prepared by Wei-Ming Hay.
(PDF, 182 KB, 22-pages).
- 16 February 2009