DOI QR코드

DOI QR Code

The transformative impact of large language models on medical writing and publishing: current applications, challenges and future directions

  • Sangzin Ahn (Department of Pharmacology and PharmacoGenomics Research Center Inje University College of Medicine)
  • 투고 : 2024.03.20
  • 심사 : 2024.06.14
  • 발행 : 2024.09.01

초록

Large language models (LLMs) are rapidly transforming medical writing and publishing. This review article focuses on experimental evidence to provide a comprehensive overview of the current applications, challenges, and future implications of LLMs in various stages of academic research and publishing process. Global surveys reveal a high prevalence of LLM usage in scientific writing, with both potential benefits and challenges associated with its adoption. LLMs have been successfully applied in literature search, research design, writing assistance, quality assessment, citation generation, and data analysis. LLMs have also been used in peer review and publication processes, including manuscript screening, generating review comments, and identifying potential biases. To ensure the integrity and quality of scholarly work in the era of LLM-assisted research, responsible artificial intelligence (AI) use is crucial. Researchers should prioritize verifying the accuracy and reliability of AI-generated content, maintain transparency in the use of LLMs, and develop collaborative human-AI workflows. Reviewers should focus on higher-order reviewing skills and be aware of the potential use of LLMs in manuscripts. Editorial offices should develop clear policies and guidelines on AI use and foster open dialogue within the academic community. Future directions include addressing the limitations and biases of current LLMs, exploring innovative applications, and continuously updating policies and practices in response to technological advancements. Collaborative efforts among stakeholders are necessary to harness the transformative potential of LLMs while maintaining the integrity of medical writing and publishing.

키워드

과제정보

The generative AI chatbot Claude 3 Opus was used in the process of writing and revising the outline of the manuscript, as well as in the process of revising the wording and grammar of the manuscript.

참고문헌

  1. Wong F, Zheng EJ, Valeri JA, Donghia NM, Anahtar MN, Omori S, Li A, Cubillos-Ruiz A, Krishnan A, Jin W, Manson AL, Friedrichs J, Helbig R, Hajian B, Fiejtek DK, Wagner FF, Soutter HH, Earl AM, Stokes JM, Renner LD, et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature. 2024;626:177-185.
  2. Cotton DRE, Cotton PA, Shipway JR. Chatting and cheating: ensuring academic integrity in the era of ChatGPT. Innov Educ Teach Int. 2024;61:228-239.
  3. Carobene A, Padoan A, Cabitza F, Banfi G, Plebani M. Rising adoption of artificial intelligence in scientific publishing: evaluating the role, risks, and ethical implications in paper drafting and review process. Clin Chem Lab Med. 2023;62:835-843.
  4. Ho A, Besiroglu T, Erdil E, Owen D, Rahman R, Guo ZC, Atkinson D, Thompson N, Sevilla J. Algorithmic progress in language models. arXiv:2403.05812 [Preprint]. 2024 [cited 2024 Mar 18]. Available from: https://doi.org/10.48550/arXiv.2403.05812
  5. Perkins M, Roe J. Academic publisher guidelines on AI usage: a ChatGPT supported thematic analysis. F1000Res. 2024;12:1398.
  6. Thorp HH. ChatGPT is fun, but not an author. Science. 2023;379:313.
  7. Ahn S. Generative AI guidelines in Korean medical journals: a survey using human-AI collaboration. medRxiv [Preprint]. 2024 [cited 2024 Mar 15]. Available from: https://doi.org/10.1101/2024.03.08.24303960
  8. Lin Z. Towards an AI policy framework in scholarly publishing. Trends Cogn Sci 2024;28:85-88.
  9. Raman R. Transparency in research: an analysis of ChatGPT usage acknowledgment by authors across disciplines and geographies. Account Res. 2023:1-22.
  10. Nordling L. How ChatGPT is transforming the postdoc experience. Nature. 2023;622:655-657.
  11. Eppler M, Ganjavi C, Ramacciotti LS, Piazza P, Rodler S, Checcucci E, Gomez Rivas J, Kowalewski KF, Belenchon IR, Puliatti S, Taratkin M, Veccia A, Baekelandt L, Teoh JY, Somani BK, Wroclawski M, Abreu A, Porpiglia F, Gill IS, Murphy DG, et al. Awareness and use of ChatGPT and large language models: a prospective crosssectional global survey in urology. Eur Urol. 2024;85:146-153.
  12. Maroteau G, An JS, Murgier J, Hulet C, Ollivier M, Ferreira A. Evaluation of the impact of large language learning models on articles submitted to Orthopaedics & Traumatology: Surgery & Research (OTSR): a significant increase in the use of artificial intelligence in 2023. Orthop Traumatol Surg Res. 2023;109:103720.
  13. Mese I. Tracing the footprints of AI in radiology literature: a detailed analysis of journal abstracts. Rofo. 2024. doi: 10.1055/a-2224-9230. [Epub ahead of print]
  14. Liang W, Izzo Z, Zhang Y, Lepp H, Cao H, Zhao X, Chen L, Ye H, Liu S, Huang Z, McFarland DA, Zou JY. Monitoring AI-modified content at scale: a case study on the impact of ChatGPT on AI conference peer reviews. arXiv:2403.07183 [Preprint]. 2024 [cited 2024 Mar 18]. Available from: https://doi.org/10.48550/arXiv.2403.07183
  15. Noy S, Zhang W. Experimental evidence on the productivity effects of generative artificial intelligence. Science. 2023;381:187-192.
  16. Haven TL, Bouter LM, Smulders YM, Tijdink JK. Perceived publication pressure in Amsterdam: survey of all disciplinary fields and academic ranks. PLoS One. 2019;14:e0217931.
  17. Majovsky M, Cerny M, Kasal M, Komarc M, Netuka D. Artificial intelligence can generate fraudulent but authentic-looking scientific medical articles: Pandora's box has been opened. J Med Internet Res. 2023;25:e46924.
  18. Brameier DT, Alnasser AA, Carnino JM, Bhashyam AR, von Keudell AG, Weaver MJ. Artificial intelligence in orthopaedic surgery: can a large language model "Write" a believable orthopaedic journal article? J Bone Joint Surg Am. 2023;105:1388-1392.
  19. Gao CA, Howard FM, Markov NS, Dyer EC, Ramesh S, Luo Y, Pearson AT. Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digit Med. 2023;6:75.
  20. Liang W, Yuksekgonul M, Mao Y, Wu E, Zou J. GPT detectors are biased against non-native English writers. Patterns (N Y). 2023;4:100779.
  21. Demir GB, Sukut Y, Duran GS, Topsakal KG, Gorgulu S. Enhancing systematic reviews in orthodontics: a comparative examination of GPT-3.5 and GPT-4 for generating PICO-based queries with tailored prompts and configurations. Eur J Orthod. 2024;46:cjae011.
  22. Athaluri SA, Manthena SV, Kesapragada VSRKM, Yarlagadda V, Dave T, Duddumpudi RTS. Exploring the boundaries of reality: investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references. Cureus. 2023;15:e37432.
  23. Fijacko N, Creber RM, Abella BS, Kocbek P, Metlicar S, Greif R, Stiglic G. Using generative artificial intelligence in bibliometric analysis: 10 years of research trends from the European Resuscitation Congresses. Resusc Plus. 2024;18:100584.
  24. Babl FE, Babl MP. Generative artificial intelligence: can ChatGPT write a quality abstract? Emerg Med Australas. 2023;35:809-811.
  25. Williams DO, Fadda E. Can ChatGPT pass Glycobiology? Glycobiology. 2023;33:606-614.
  26. Lawrence KW, Habibi AA, Ward SA, Lajam CM, Schwarzkopf R, Rozell JC. Human versus artificial intelligence-generated arthroplasty literature: A single-blinded analysis of perceived communication, quality, and authorship source. Int J Med Robot. 2024;20:e2621.
  27. Hwang T, Aggarwal N, Khan PZ, Roberts T, Mahmood A, Griffiths MM, Parsons N, Khan S. Can ChatGPT assist authors with abstract writing in medical journals? Evaluating the quality of scientific abstracts generated by ChatGPT and original abstracts. PLoS One. 2024;19:e0297701.
  28. Sikander B, Baker JJ, Deveci CD, Lund L, Rosenberg J. ChatGPT-4 and human researchers are equal in writing scientific introduction sections: a blinded, randomized, non-inferiority controlled study. Cureus. 2023;15:e49019.
  29. Buholayka M, Zouabi R, Tadinada A. The readiness of ChatGPT to write scientific case reports independently: a comparative evaluation between human and artificial intelligence. Cureus. 2023;15:e39386.
  30. Zhou Z. Evaluation of ChatGPT's capabilities in medical report generation. Cureus. 2023;15:e37589.
  31. Semrl N, Feigl S, Taumberger N, Bracic T, Fluhr H, Blockeel C, Kollmann M. AI language models in human reproduction research: exploring ChatGPT's potential to assist academic writing. Hum Reprod. 2023;38:2281-2288.
  32. Deveci CD, Baker JJ, Sikander B, Rosenberg J. A comparison of cover letters written by ChatGPT-4 or humans. Dan Med J. 2023;70:A06230412.
  33. Song C, Song Y. Enhancing academic writing skills and motivation: assessing the efficacy of ChatGPT in AI-assisted language learning for EFL students. Front Psychol. 2023;14:1260843.
  34. Lingard L, Chandritilake M, de Heer M, Klasen J, Maulina F, Olmos- Vega F, St-Onge C. Will ChatGPT's free language editing service level the playing field in science communication?: insights from a collaborative project with non-native English scholars. Perspect Med Educ. 2023;12:565-574.
  35. Porsdam Mann S, Earp BD, Moller N, Vynn S, Savulescu J. AUTOGEN: a personalized large language model for academic enhancement-ethics and proof of principle. Am J Bioeth. 2023;23:28-41.
  36. Wu RT, Dang RR. ChatGPT in head and neck scientific writing: a precautionary anecdote. Am J Otolaryngol. 2023;44:103980.
  37. Aiumtrakul N, Thongprayoon C, Suppadungsuk S, Krisanapan P, Miao J, Qureshi F, Cheungpasitporn W. Navigating the landscape of personalized medicine: the relevance of ChatGPT, BingChat, and Bard AI in nephrology literature searches. J Pers Med. 2023;13:1457.
  38. Frosolini A, Franz L, Benedetti S, Vaira LA, de Filippis C, Gennaro P, Marioni G, Gabriele G. Assessing the accuracy of ChatGPT references in head and neck and ENT disciplines. Eur Arch Otorhinolaryngol. 2023;280:5129-5133.
  39. Lechien JR, Briganti G, Vaira LA. Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology-head and neck surgery. Eur Arch Otorhinolaryngol. 2024;281:2159-2165.
  40. Wu K, Wu E, Cassasola A, Zhang A, Wei K, Nguyen T, Riantawan S, Riantawan PS, Ho DE, Zou J. How well do LLMs cite relevant medical references? An evaluation framework and analyses. arXiv:2402.02008 [Preprint]. 2024 [cited 2024 Mar 15]. Available from: https://doi.org/10.48550/arXiv.2402.02008
  41. Piccolo SR, Denny P, Luxton-Reilly A, Payne SH, Ridge PG. Evaluating a large language model's ability to solve programming exercises from an introductory bioinformatics course. PLoS Comput Biol. 2023;19:e1011511.
  42. Reason T, Rawlinson W, Langham J, Gimblett A, Malcolm B, Klijn S. Artificial intelligence to automate health economic modelling: a case study to evaluate the potential application of large language models. Pharmacoecon Open. 2024;8:191-203.
  43. Wang L, Ge X, Liu L, Hu G. Code interpreter for bioinformatics: are we there yet? Ann Biomed Eng. 2024;52:754-756.
  44. Ahn S. Data science through natural language with ChatGPT's Code Interpreter. Transl Clin Pharmacol. 2024;32:e8.
  45. Manning BS, Zhu K, Horton JJ. Automated social science: language models as scientist and subjects. arXiv:2404.11794 [Preprint]. 2024 [cited 2024 Jun 3]. Available from: https://doi.org/10.48550/arXiv.2404.11794
  46. Romera-Paredes B, Barekatain M, Novikov A, Balog M, Kumar MP, Dupont E, Ruiz FJR, Ellenberg JS, Wang P, Fawzi O, Kohli P, Fawzi A. Mathematical discoveries from program search with large language models. Nature. 2024;625:468-475.
  47. Boiko DA, MacKnight R, Kline B, Gomes G. Autonomous chemical research with large language models. Nature. 2023;624:570-578.
  48. Lechien JR, Gorton A, Robertson J, Vaira LA. Is ChatGPT-4 accurate in proofread a manuscript in otolaryngology-head and neck surgery? Otolaryngol Head Neck Surg. 2024;170:1527-1530.
  49. Checco A, Bracciale L, Loreti P, Pinfield S, Bianchi G. AI-assisted peer review. Humanit Soc Sci Commun. 2021;8:25.
  50. Nashwan AJ, Jaradat JH. Streamlining systematic reviews: harnessing large language models for quality assessment and risk-of-bias evaluation. Cureus. 2023;15:e43023.
  51. Dang R, Hanba C. A large language model's assessment of methodology reporting in head and neck surgery. Am J Otolaryngol. 2024;45:104145.
  52. Merton RK. The Matthew effect in science. The reward and communication systems of science are considered. Science. 1968;159:56-63.
  53. Diaz Milian R, Moreno Franco P, Freeman WD, Halamka JD. Revolution or peril? The controversial role of large language models in medical manuscript writing. Mayo Clin Proc. 2023;98:1444-1448.
  54. Liang W, Zhang Y, Cao H, Wang B, Ding D, Yang X, Vodrahalli K, He S, Smith D, Yin Y, McFarland D, Zou J. Can large language models provide useful feedback on research papers? A large-scale empirical analysis. arXiv:2310.01783 [Preprint]. 2024 [cited 2024 Mar 19]. Available from: https://doi.org/10.48550/arXiv.2310.01783
  55. Saad A, Jenko N, Ariyaratne S, Birch N, Iyengar KP, Davies AM, Vaishya R, Botchu R. Exploring the potential of ChatGPT in the peer review process: an observational study. Diabetes Metab Syndr. 2024;18:102946.
  56. Hosseini M, Horbach SPJM. Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review. Res Integr Peer Rev. 2023;8:4. Erratum in: Res Integr Peer Rev. 2023;8:7.
  57. Kaplan DM, Palitsky R, Arconada Alvarez SJ, Pozzo NS, Greenleaf MN, Atkinson CA, Lam WA. What's in a name? Experimental evidence of gender bias in recommendation letters generated by Chat-GPT. J Med Internet Res. 2024;26:e51837.
  58. Navigli R, Conia S, Ross B. Biases in large language models: origins, inventory, and discussion. ACM J Data Inf Qual. 2023;15:10:1-10:21.
  59. Rawashdeh B, Kim J, AlRyalat SA, Prasad R, Cooper M. ChatGPT and artificial intelligence in transplantation research: is it always correct? Cureus. 2023;15:e42150.
  60. Lukac S, Dayan D, Fink V, Leinert E, Hartkopf A, Veselinovic K, Janni W, Rack B, Pfister K, Heitmeir B, Ebner F. Evaluating Chat-GPT as an adjunct for the multidisciplinary tumor board decision-making in primary breast cancer cases. Arch Gynecol Obstet. 2023;308:1831-1844.
  61. Jin Q, Chen F, Zhou Y, Xu Z, Cheung JM, Chen R, Summers RM, Rousseau JF, Ni P, Landsman MJ, Baxter SL, Al'Aref SJ, Li Y, Chen A, Brejt JA, Chiang MF, Peng Y, Lu Z. Hidden flaws behind expert-level accuracy of GPT-4 vision in medicine. arXiv:2401.08396 [Preprint]. 2024 [cited 2024 Mar 19]. Available from: https://doi.org/10.48550/arXiv.2401.08396
  62. Kumar H, Rothschild DM, Goldstein DG, Hofman JM. Math education with large language models: peril or promise? SSRN [Preprint]. 2023 [cited 2024 Mar 19]. Available from: https://ssrn.com/abstract=4641653
  63. Dell'Acqua F. Falling asleep at the wheel: human/AI collaboration in a field experiment on HR recruiters. 2022 [cited 2024 Mar 19]. Available from: https://static1.squarespace.com/static/604b23e38c22a96e9c78879e/t/62d5d9448d061f7327e8a7e7/1658181956291/Falling+Asleep+at+the+Wheel+-+Fabrizio+DellAcqua.pdf
  64. Ganjavi C, Eppler MB, Pekcan A, Biedermann B, Abreu A, Collins GS, Gill IS, Cacciamani GE. Publishers' and journals' instructions to authors on use of generative artificial intelligence in academic and scientific publishing: bibliometric analysis. BMJ. 2024;384:e077192.
  65. Ballester PL. Open science and software assistance: commentary on "artificial intelligence can generate fraudulent but authentic-looking scientific medical articles: Pandora's box has been opened". J Med Internet Res. 2023;25:e49323.