JHSM

Journal of Health Sciences and Medicine (JHSM) is an unbiased, peer-reviewed, and open access international medical journal. The Journal publishes interesting clinical and experimental research conducted in all fields of medicine, interesting case reports, and clinical images, invited reviews, editorials, letters, comments, and related knowledge.

EndNote Style
Index
Original Article
Evaluation of artificial intelligence in thoracic surgery internship education: accuracy and usability of AI-generated exam questions
Aims: This study aims to evaluate the usefulness and reliability of artificial intelligence (AI) applications in thoracic surgery internship education and exam preparation.
Methods: Claude Sonnet 3.7 AI was provided with core topics covered in the 5th-year thoracic surgery internship and was instructed to generate a 20-question multiple-choice exam, including an answer key. Four thoracic surgery specialists assessed the AI-generated questions using the Delphi panel method, classifying them as correct, minor error, or major error. Major errors included the absence of the correct answer among choices, incorrect AI-marked answers, or contradictions with established medical knowledge. A second exam was manually created by a thoracic surgery specialist and evaluated using the same methodology. Seven volunteer 5th-year medical students completed both exams, and the correlation between their scores was statistically analyzed.
Results: Among AI-generated questions, 8 (40%) contained major errors, while 1 (5%) had a minor error. The expert-generated exam had a perfect accuracy rate, whereas the AI-generated exam had significantly lower accuracy (p=0.001). Median scores were 75 (67-100) for the AI exam and 85 (70-95) for the expert exam. No significant correlation was found between students’ scores (r=0.042, p=0.929).
Conclusion: AI-generated questions had a high error rate (40% major, 5% minor), making them unreliable for unsupervised use in medical education. While AI may provide partial benefits under expert supervision, it currently lacks the accuracy required for independent implementation in thoracic surgery education.


1. Feigerlova E, Hani H, Hothersall-Davies E. A systematic review of the impact of artificial intelligence on educational outcomes in health professions education. BMC Med Educ. 2025;25:129. doi:10.1186/s12909-025-06719-5
2. Ennab F, Farhan H, Zary N. Generative artificial intelligence and its role in the development of clinical cases in medical education: a scoping review protocol. Preprints. 2025. doi:10.20944/preprints202501.1031.v1
3. Preiksaitis C, Rose C. Opportunities, challenges, and future directions of generative artificial intelligence in medical education: scoping review. JMIR Med Educ. 2023;9:e48785. doi:10.2196/48785
4. Koçak B, Ponsiglione A, Stanzione A, et al. Bias in artificial intelligence for medical imaging: fundamentals, detection, avoidance, mitigation, challenges, ethics, and prospects. Diagn Interv Radiol. 2025;31(2):75-88. doi:10.4274/dir.2024.242854
5. Colton S, Hatcher T. The web-based Delphi research technique as a method for content validation in HRD and adult education research. Online Submission. 2004.
6. Hicke Y, Geathers J, Rajashekar N, et al. MedSimAI: simulation and formative feedback generation to enhance deliberate practice in medical education. arXiv preprint arXiv:2503.05793. 2025. doi:10.48550/arXiv.2503.05793
7. Hersh W. Generative artificial intelligence: implications for biomedical and health professions education. arXiv preprint arXiv:2501.10186. 2025. doi:10.48550/arXiv.2501.10186
8. Mir MM, Mir GM, Raina NT, et al. Application of artificial intelligence in medical education: current scenario and future perspectives. J Adv Med Educ Prof. 2023;11(3):133-140. doi:10.30476/JAMP.2023.98655.1803
9. Barile J, Margolis A, Cason G, et al. Diagnostic accuracy of a large language model in pediatric case studies. JAMA Pediatr. 2024;178(3):313-315. doi:10.1001/jamapediatrics.2023.5750
10. Narayanan S, Ramakrishnan R, Durairaj E, Das A. Artificial intelligence revolutionizing the field of medical education. Cureus. 2023;15(11):e49604. doi:10.7759/cureus.49604
11. Johnson D, Goodman R, Patrinely J, et al. Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the Chat-GPT model. Res Sq [Preprint]. 2023:rs.3.rs-2566942. doi:10.21203/rs.3.rs-2566942/v1
12. Law AK, So J, Lui CT, et al. AI versus human-generated multiple-choice questions for medical education: a cohort study in a high-stakes examination. BMC Med Educ. 2025;25(1):208. doi:10.1186/s12909-025-06796-6
13. Al Shuraiqi S, Aal Abdulsalam A, Masters K, Zidoum H, AlZaabi A. Automatic generation of medical case-based multiple-choice questions (MCQs): a review of methodologies, applications, evaluation, and future directions. Big Data Cogn Comput. 2024;8(10):139. doi: 10.3390/bdcc8100139 </ol> </div> <p>
Volume 8, Issue 3, 2025
Page : 524-528
_Footer