Using Large Language Models to Generate Script Concordance Test in Medical Education: ChatGPT and Claude

Kıyak, Yavuz Selim; Emekli, Emre

Por favor, use este identificador para citar o enlazar este ítem: https://doi.org/10.6018/edumed.636331

RefMan EndNote BibTex RefWorks Excel CSV PDF Mendeley

Título:	Using Large Language Models to Generate Script Concordance Test in Medical Education: ChatGPT and Claude
Otros títulos:	Uso de modelos de lenguaje de gran tamaño para generar pruebas de concordancia de guiones en la educación médica: ChatGPT y Claude
Fecha de publicación:	2025
Editorial:	Universidad de Murcia. Servicio de publicaciones
Cita bibliográfica:	Revista Española de Educación Médica, Vol. 6 Núm. 1 (2025)
ISSN:	2660-8529
Materias relacionadas:	CDU::6 - Ciencias aplicadas::61 - Medicina
Palabras clave:	ChatGPT Script concordance test Clinical reasoning Medical education Artificial intelligence
Resumen:	Using Large Language Models to Generate Script Concordance Test in Medical Education: ChatGPT and ClaudeYavuz Selim Kıyak1 , Emre Emekli21Department of Medical Education and Informatics, Gazi University Faculty of Medicine, Ankara, Turkiye; yskiyak@gazi.edu.tr, 0000-0002-5026-32342Department of Radiology, Faculty of Medicine, Eskişehir Osmangazi University, Eskişehir, Turkiye; 0000-0001-5989-1897 Correspondence: yskiyak@gazi.edu.trReceived: 4/11/24; Accepted: 2/12/24; Published: 3/12/24AbstractWe aimed to determine the quality of AI-generated (ChatGPT-4 and Claude 3) Script ConcordanceTest (SCT) items through an expert panel.We generated SCT items on abdominal radiology usinga complex prompt in large language model (LLM) chatbots (ChatGPT-4 and Claude 3 (Sonnet) inApril 2024) and evaluated the items’ quality through an expert panel of 16 radiologists. Expertpanel, which was blind to the origin of the items provided without modifications, independentlyanswered each item and assessed them using 12 quality indicators. Data analysis includeddescriptive statistics, bar charts to compare responses against accepted forms, and a heatmap toshow performance in terms of the quality indicators. SCT items generated by chatbots assessclinical reasoning rather than only factual recall (ChatGPT: 92.50%, Claude: 85.00%). The heatmapindicated that the items were generally acceptable, with most responses favorable across qualityindicators (ChatGPT: 71.77%, Claude: 64.23%). The comparison of the bar charts with acceptableand unacceptable forms revealed that 73.33% and 53.33% of the questions in the items can beconsidered acceptable, respectively, for ChatGPT and Claude. The use of LLMs to generate SCTitems can be helpful for medical educators by reducing the required time and effort. Although theprompt provides a good starting point, it remains crucial to review and revise AI-generated SCTitems before educational use. The prompt and the custom GPT, “Script Concordance TestGenerator”, available at https://chatgpt.com/g/g-RlzW5xdc1-script-concordance-test-generator,can streamline SCT item development.
Autor/es principal/es:	Kıyak, Yavuz Selim Emekli, Emre
URI:	http://hdl.handle.net/10201/155793
DOI:	https://doi.org/10.6018/edumed.636331
Tipo de documento:	info:eu-repo/semantics/article
Número páginas / Extensión:	8
Derechos:	info:eu-repo/semantics/openAccess Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Aparece en las colecciones:	Vol. 6 Nº 1 (2025)

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
Using+LLMs+to+generate+SCT.pdf	English	530,86 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro Dublin Core completo del ítem Mostrar el registro PREMIS del ítem Estadísticas

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons