DOI QR코드

DOI QR Code

Performance of ChatGPT 3.5 and 4 on U.S. dental examinations: the INBDE, ADAT, and DAT

  • Mahmood Dashti (Dentofacial Deformities Research Center, Research Institute of Dental Sciences, Shahid Beheshti University of Medical Sciences) ;
  • Shohreh Ghasemi (Department of Trauma and Craniofacial Reconstruction, Queen Mary College) ;
  • Niloofar Ghadimi (Department of Oral and Maxillofacial Radiology, Dental School, Islamic Azad University of Medical Sciences) ;
  • Delband Hefzi (School of Dentistry, Tehran University of Medical Science) ;
  • Azizeh Karimian (Department of Biostatistics, Dental Research Center, Golestan University of Medical Sciences) ;
  • Niusha Zare (Department of Operative Dentistry, University of Southern California) ;
  • Amir Fahimipour (Discipline of Oral Surgery, Medicine and Diagnostics, School of Dentistry, Faculty of Medicine and Health, Westmead Centre for Oral Health, The University of Sydney) ;
  • Zohaib Khurshid (Department of Prosthodontics and Dental Implantology, King Faisal University) ;
  • Maryam Mohammadalizadeh Chafjiri (Department of Oral and Maxillofacial Pathology, School of Dentistry, Shahid Beheshti University of Medical Sciences) ;
  • Sahar Ghaedsharaf (Department of Oral and Maxillofacial Radiology, School of Dentistry, Shahid Beheshti University of Medical Sciences)
  • Received : 2024.02.29
  • Accepted : 2024.04.27
  • Published : 2024.09.30

Abstract

Purpose: Recent advancements in artificial intelligence (AI), particularly tools such as ChatGPT developed by OpenAI, a U.S.-based AI research organization, have transformed the healthcare and education sectors. This study investigated the effectiveness of ChatGPT in answering dentistry exam questions, demonstrating its potential to enhance professional practice and patient care. Materials and Methods: This study assessed the performance of ChatGPT 3.5 and 4 on U.S. dental exams - specifically, the Integrated National Board Dental Examination (INBDE), Dental Admission Test (DAT), and Advanced Dental Admission Test (ADAT) - excluding image-based questions. Using customized prompts, ChatGPT's answers were evaluated against official answer sheets. Results: ChatGPT 3.5 and 4 were tested with 253 questions from the INBDE, ADAT, and DAT exams. For the INBDE, both versions achieved 80% accuracy in knowledge-based questions and 66-69% in case history questions. In ADAT, they scored 66-83% in knowledge-based and 76% in case history questions. ChatGPT 4 excelled on the DAT, with 94% accuracy in knowledge-based questions, 57% in mathematical analysis items, and 100% in comprehension questions, surpassing ChatGPT 3.5's rates of 83%, 31%, and 82%, respectively. The difference was significant for knowledge-based questions(P=0.009). Both versions showed similar patterns in incorrect responses. Conclusion: Both ChatGPT 3.5 and 4 effectively handled knowledge-based, case history, and comprehension questions, with ChatGPT 4 being more reliable and surpassing the performance of 3.5. ChatGPT 4's perfect score in comprehension questions underscores its trainability in specific subjects. However, both versions exhibited weaker performance in mathematical analysis, suggesting this as an area for improvement.

Keywords

References

  1. Dashti M, Londono J, Ghasemi S, Khurshid Z, Khosraviani F, Moghaddasi N, et al. Attitudes, knowledge, and perceptions of dentists and dental students toward artificial intelligence: a systematic review. J Taibah Univ Med Sci 2024; 19: 327-37.
  2. Yuzbasioglu E. Attitudes and perceptions of dental students towards artificial intelligence. J Dent Educ 2021; 85: 60-8. https://doi.org/10.1002/jdd.12385
  3. Sur J, Bose S, Khan F, Dewangan D, Sawriya E, Roul A. Knowledge, attitudes, and perceptions regarding the future of artificial intelligence in oral radiology in India: a survey. Imaging Sci Dent 2020; 50: 193-8. https://doi.org/10.5624/isd.2020.50.3.193
  4. Livberber T, Ayvaz S. The impact of Artificial Intelligence in academia: views of Turkish academics on ChatGPT. Heliyon 2023; 9: e19688.
  5. Alhaidry HM, Fatani B, Alrayes JO, Almana AM, Alfhaed NK. ChatGPT in dentistry: a comprehensive review. Cureus 2023; 15: e38317.
  6. Tiwari A, Kumar A, Jain S, Dhull KS, Sajjanar A, Puthenkandathil R, et al. Implications of ChatGPT in public health dentistry: a systematic review. Cureus 2023; 15: e40367.
  7. Egli A. ChatGPT, GPT-4, and other large language models: the next revolution for clinical microbiology? Clin Infect Dis 2023; 77: 1322-8. https://doi.org/10.1093/cid/ciad407
  8. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepano C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2023; 2: e0000198.
  9. De Angelis L, Baglivo F, Arzilli G, Privitera GP, Ferragina P, Tozzi AE, et al. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Front Public Health 2023; 11: 1166120.
  10. Dashti M, Londono J, Ghasemi S, Moghaddasi N. How much can we rely on artificial intelligence chatbots such as the ChatGPT software program to assist with scientific writing? J Prosthet Dent(in press).
  11. Chakravorty S, Aulakh BK, Shil M, Nepale M, Puthenkandathil R, Syed W. Role of Artificial Intelligence (AI) in dentistry: a literature review. J Pharm Bioallied Sci 2024; 16(Suppl 1): S14-6. https://doi.org/10.4103/jpbs.jpbs_466_23
  12. Ji Z, Lee N, Frieske R, Yu T, Su D, Xu Y, et al. Survey of hallucination in natural language generation. ACM Comput Surv 2023; 55: 248.