The transformative impact of large language models on medical writing and publishing: current applications, challenges and future directions

Sangzin Ahn;

doi:10.4196/kjpp.2024.28.5.393

The Korean Journal of Physiology and Pharmacology

제28권5호
/
Pages.393-401
/
2024
/
1226-4512(pISSN)
/
2093-3827(eISSN)

대한약리학회 (The Korean Society of Pharmacology)

DOI QR Code

The transformative impact of large language models on medical writing and publishing: current applications, challenges and future directions

Sangzin Ahn (Department of Pharmacology and PharmacoGenomics Research Center Inje University College of Medicine)

투고 : 2024.03.20
심사 : 2024.06.14
발행 : 2024.09.01

https://doi.org/10.4196/kjpp.2024.28.5.393 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Large language models (LLMs) are rapidly transforming medical writing and publishing. This review article focuses on experimental evidence to provide a comprehensive overview of the current applications, challenges, and future implications of LLMs in various stages of academic research and publishing process. Global surveys reveal a high prevalence of LLM usage in scientific writing, with both potential benefits and challenges associated with its adoption. LLMs have been successfully applied in literature search, research design, writing assistance, quality assessment, citation generation, and data analysis. LLMs have also been used in peer review and publication processes, including manuscript screening, generating review comments, and identifying potential biases. To ensure the integrity and quality of scholarly work in the era of LLM-assisted research, responsible artificial intelligence (AI) use is crucial. Researchers should prioritize verifying the accuracy and reliability of AI-generated content, maintain transparency in the use of LLMs, and develop collaborative human-AI workflows. Reviewers should focus on higher-order reviewing skills and be aware of the potential use of LLMs in manuscripts. Editorial offices should develop clear policies and guidelines on AI use and foster open dialogue within the academic community. Future directions include addressing the limitations and biases of current LLMs, exploring innovative applications, and continuously updating policies and practices in response to technological advancements. Collaborative efforts among stakeholders are necessary to harness the transformative potential of LLMs while maintaining the integrity of medical writing and publishing.

키워드

과제정보

The generative AI chatbot Claude 3 Opus was used in the process of writing and revising the outline of the manuscript, as well as in the process of revising the wording and grammar of the manuscript.

참고문헌

Wong F, Zheng EJ, Valeri JA, Donghia NM, Anahtar MN, Omori S, Li A, Cubillos-Ruiz A, Krishnan A, Jin W, Manson AL, Friedrichs J, Helbig R, Hajian B, Fiejtek DK, Wagner FF, Soutter HH, Earl AM, Stokes JM, Renner LD, et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature. 2024;626:177-185.
Cotton DRE, Cotton PA, Shipway JR. Chatting and cheating: ensuring academic integrity in the era of ChatGPT. Innov Educ Teach Int. 2024;61:228-239.
Carobene A, Padoan A, Cabitza F, Banfi G, Plebani M. Rising adoption of artificial intelligence in scientific publishing: evaluating the role, risks, and ethical implications in paper drafting and review process. Clin Chem Lab Med. 2023;62:835-843.
Ho A, Besiroglu T, Erdil E, Owen D, Rahman R, Guo ZC, Atkinson D, Thompson N, Sevilla J. Algorithmic progress in language models. arXiv:2403.05812 [Preprint]. 2024 [cited 2024 Mar 18]. Available from: https://doi.org/10.48550/arXiv.2403.05812
Perkins M, Roe J. Academic publisher guidelines on AI usage: a ChatGPT supported thematic analysis. F1000Res. 2024;12:1398.
Thorp HH. ChatGPT is fun, but not an author. Science. 2023;379:313.
Ahn S. Generative AI guidelines in Korean medical journals: a survey using human-AI collaboration. medRxiv [Preprint]. 2024 [cited 2024 Mar 15]. Available from: https://doi.org/10.1101/2024.03.08.24303960
Lin Z. Towards an AI policy framework in scholarly publishing. Trends Cogn Sci 2024;28:85-88.
Raman R. Transparency in research: an analysis of ChatGPT usage acknowledgment by authors across disciplines and geographies. Account Res. 2023:1-22.
Nordling L. How ChatGPT is transforming the postdoc experience. Nature. 2023;622:655-657.
Eppler M, Ganjavi C, Ramacciotti LS, Piazza P, Rodler S, Checcucci E, Gomez Rivas J, Kowalewski KF, Belenchon IR, Puliatti S, Taratkin M, Veccia A, Baekelandt L, Teoh JY, Somani BK, Wroclawski M, Abreu A, Porpiglia F, Gill IS, Murphy DG, et al. Awareness and use of ChatGPT and large language models: a prospective crosssectional global survey in urology. Eur Urol. 2024;85:146-153.
Maroteau G, An JS, Murgier J, Hulet C, Ollivier M, Ferreira A. Evaluation of the impact of large language learning models on articles submitted to Orthopaedics & Traumatology: Surgery & Research (OTSR): a significant increase in the use of artificial intelligence in 2023. Orthop Traumatol Surg Res. 2023;109:103720.
Mese I. Tracing the footprints of AI in radiology literature: a detailed analysis of journal abstracts. Rofo. 2024. doi: 10.1055/a-2224-9230. [Epub ahead of print]
Liang W, Izzo Z, Zhang Y, Lepp H, Cao H, Zhao X, Chen L, Ye H, Liu S, Huang Z, McFarland DA, Zou JY. Monitoring AI-modified content at scale: a case study on the impact of ChatGPT on AI conference peer reviews. arXiv:2403.07183 [Preprint]. 2024 [cited 2024 Mar 18]. Available from: https://doi.org/10.48550/arXiv.2403.07183
Noy S, Zhang W. Experimental evidence on the productivity effects of generative artificial intelligence. Science. 2023;381:187-192.
Haven TL, Bouter LM, Smulders YM, Tijdink JK. Perceived publication pressure in Amsterdam: survey of all disciplinary fields and academic ranks. PLoS One. 2019;14:e0217931.
Majovsky M, Cerny M, Kasal M, Komarc M, Netuka D. Artificial intelligence can generate fraudulent but authentic-looking scientific medical articles: Pandora's box has been opened. J Med Internet Res. 2023;25:e46924.
Brameier DT, Alnasser AA, Carnino JM, Bhashyam AR, von Keudell AG, Weaver MJ. Artificial intelligence in orthopaedic surgery: can a large language model "Write" a believable orthopaedic journal article? J Bone Joint Surg Am. 2023;105:1388-1392.
Gao CA, Howard FM, Markov NS, Dyer EC, Ramesh S, Luo Y, Pearson AT. Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digit Med. 2023;6:75.
Liang W, Yuksekgonul M, Mao Y, Wu E, Zou J. GPT detectors are biased against non-native English writers. Patterns (N Y). 2023;4:100779.
Demir GB, Sukut Y, Duran GS, Topsakal KG, Gorgulu S. Enhancing systematic reviews in orthodontics: a comparative examination of GPT-3.5 and GPT-4 for generating PICO-based queries with tailored prompts and configurations. Eur J Orthod. 2024;46:cjae011.
Athaluri SA, Manthena SV, Kesapragada VSRKM, Yarlagadda V, Dave T, Duddumpudi RTS. Exploring the boundaries of reality: investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references. Cureus. 2023;15:e37432.
Fijacko N, Creber RM, Abella BS, Kocbek P, Metlicar S, Greif R, Stiglic G. Using generative artificial intelligence in bibliometric analysis: 10 years of research trends from the European Resuscitation Congresses. Resusc Plus. 2024;18:100584.
Babl FE, Babl MP. Generative artificial intelligence: can ChatGPT write a quality abstract? Emerg Med Australas. 2023;35:809-811.
Williams DO, Fadda E. Can ChatGPT pass Glycobiology? Glycobiology. 2023;33:606-614.
Lawrence KW, Habibi AA, Ward SA, Lajam CM, Schwarzkopf R, Rozell JC. Human versus artificial intelligence-generated arthroplasty literature: A single-blinded analysis of perceived communication, quality, and authorship source. Int J Med Robot. 2024;20:e2621.
Hwang T, Aggarwal N, Khan PZ, Roberts T, Mahmood A, Griffiths MM, Parsons N, Khan S. Can ChatGPT assist authors with abstract writing in medical journals? Evaluating the quality of scientific abstracts generated by ChatGPT and original abstracts. PLoS One. 2024;19:e0297701.
Sikander B, Baker JJ, Deveci CD, Lund L, Rosenberg J. ChatGPT-4 and human researchers are equal in writing scientific introduction sections: a blinded, randomized, non-inferiority controlled study. Cureus. 2023;15:e49019.
Buholayka M, Zouabi R, Tadinada A. The readiness of ChatGPT to write scientific case reports independently: a comparative evaluation between human and artificial intelligence. Cureus. 2023;15:e39386.
Zhou Z. Evaluation of ChatGPT's capabilities in medical report generation. Cureus. 2023;15:e37589.
Semrl N, Feigl S, Taumberger N, Bracic T, Fluhr H, Blockeel C, Kollmann M. AI language models in human reproduction research: exploring ChatGPT's potential to assist academic writing. Hum Reprod. 2023;38:2281-2288.
Deveci CD, Baker JJ, Sikander B, Rosenberg J. A comparison of cover letters written by ChatGPT-4 or humans. Dan Med J. 2023;70:A06230412.
Song C, Song Y. Enhancing academic writing skills and motivation: assessing the efficacy of ChatGPT in AI-assisted language learning for EFL students. Front Psychol. 2023;14:1260843.
Lingard L, Chandritilake M, de Heer M, Klasen J, Maulina F, Olmos- Vega F, St-Onge C. Will ChatGPT's free language editing service level the playing field in science communication?: insights from a collaborative project with non-native English scholars. Perspect Med Educ. 2023;12:565-574.
Porsdam Mann S, Earp BD, Moller N, Vynn S, Savulescu J. AUTOGEN: a personalized large language model for academic enhancement-ethics and proof of principle. Am J Bioeth. 2023;23:28-41.
Wu RT, Dang RR. ChatGPT in head and neck scientific writing: a precautionary anecdote. Am J Otolaryngol. 2023;44:103980.
Aiumtrakul N, Thongprayoon C, Suppadungsuk S, Krisanapan P, Miao J, Qureshi F, Cheungpasitporn W. Navigating the landscape of personalized medicine: the relevance of ChatGPT, BingChat, and Bard AI in nephrology literature searches. J Pers Med. 2023;13:1457.
Frosolini A, Franz L, Benedetti S, Vaira LA, de Filippis C, Gennaro P, Marioni G, Gabriele G. Assessing the accuracy of ChatGPT references in head and neck and ENT disciplines. Eur Arch Otorhinolaryngol. 2023;280:5129-5133.
Lechien JR, Briganti G, Vaira LA. Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology-head and neck surgery. Eur Arch Otorhinolaryngol. 2024;281:2159-2165.
Wu K, Wu E, Cassasola A, Zhang A, Wei K, Nguyen T, Riantawan S, Riantawan PS, Ho DE, Zou J. How well do LLMs cite relevant medical references? An evaluation framework and analyses. arXiv:2402.02008 [Preprint]. 2024 [cited 2024 Mar 15]. Available from: https://doi.org/10.48550/arXiv.2402.02008
Piccolo SR, Denny P, Luxton-Reilly A, Payne SH, Ridge PG. Evaluating a large language model's ability to solve programming exercises from an introductory bioinformatics course. PLoS Comput Biol. 2023;19:e1011511.
Reason T, Rawlinson W, Langham J, Gimblett A, Malcolm B, Klijn S. Artificial intelligence to automate health economic modelling: a case study to evaluate the potential application of large language models. Pharmacoecon Open. 2024;8:191-203.
Wang L, Ge X, Liu L, Hu G. Code interpreter for bioinformatics: are we there yet? Ann Biomed Eng. 2024;52:754-756.
Ahn S. Data science through natural language with ChatGPT's Code Interpreter. Transl Clin Pharmacol. 2024;32:e8.
Manning BS, Zhu K, Horton JJ. Automated social science: language models as scientist and subjects. arXiv:2404.11794 [Preprint]. 2024 [cited 2024 Jun 3]. Available from: https://doi.org/10.48550/arXiv.2404.11794
Romera-Paredes B, Barekatain M, Novikov A, Balog M, Kumar MP, Dupont E, Ruiz FJR, Ellenberg JS, Wang P, Fawzi O, Kohli P, Fawzi A. Mathematical discoveries from program search with large language models. Nature. 2024;625:468-475.
Boiko DA, MacKnight R, Kline B, Gomes G. Autonomous chemical research with large language models. Nature. 2023;624:570-578.
Lechien JR, Gorton A, Robertson J, Vaira LA. Is ChatGPT-4 accurate in proofread a manuscript in otolaryngology-head and neck surgery? Otolaryngol Head Neck Surg. 2024;170:1527-1530.
Checco A, Bracciale L, Loreti P, Pinfield S, Bianchi G. AI-assisted peer review. Humanit Soc Sci Commun. 2021;8:25.
Nashwan AJ, Jaradat JH. Streamlining systematic reviews: harnessing large language models for quality assessment and risk-of-bias evaluation. Cureus. 2023;15:e43023.
Dang R, Hanba C. A large language model's assessment of methodology reporting in head and neck surgery. Am J Otolaryngol. 2024;45:104145.
Merton RK. The Matthew effect in science. The reward and communication systems of science are considered. Science. 1968;159:56-63.
Diaz Milian R, Moreno Franco P, Freeman WD, Halamka JD. Revolution or peril? The controversial role of large language models in medical manuscript writing. Mayo Clin Proc. 2023;98:1444-1448.
Liang W, Zhang Y, Cao H, Wang B, Ding D, Yang X, Vodrahalli K, He S, Smith D, Yin Y, McFarland D, Zou J. Can large language models provide useful feedback on research papers? A large-scale empirical analysis. arXiv:2310.01783 [Preprint]. 2024 [cited 2024 Mar 19]. Available from: https://doi.org/10.48550/arXiv.2310.01783
Saad A, Jenko N, Ariyaratne S, Birch N, Iyengar KP, Davies AM, Vaishya R, Botchu R. Exploring the potential of ChatGPT in the peer review process: an observational study. Diabetes Metab Syndr. 2024;18:102946.
Hosseini M, Horbach SPJM. Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review. Res Integr Peer Rev. 2023;8:4. Erratum in: Res Integr Peer Rev. 2023;8:7.
Kaplan DM, Palitsky R, Arconada Alvarez SJ, Pozzo NS, Greenleaf MN, Atkinson CA, Lam WA. What's in a name? Experimental evidence of gender bias in recommendation letters generated by Chat-GPT. J Med Internet Res. 2024;26:e51837.
Navigli R, Conia S, Ross B. Biases in large language models: origins, inventory, and discussion. ACM J Data Inf Qual. 2023;15:10:1-10:21.
Rawashdeh B, Kim J, AlRyalat SA, Prasad R, Cooper M. ChatGPT and artificial intelligence in transplantation research: is it always correct? Cureus. 2023;15:e42150.
Lukac S, Dayan D, Fink V, Leinert E, Hartkopf A, Veselinovic K, Janni W, Rack B, Pfister K, Heitmeir B, Ebner F. Evaluating Chat-GPT as an adjunct for the multidisciplinary tumor board decision-making in primary breast cancer cases. Arch Gynecol Obstet. 2023;308:1831-1844.
Jin Q, Chen F, Zhou Y, Xu Z, Cheung JM, Chen R, Summers RM, Rousseau JF, Ni P, Landsman MJ, Baxter SL, Al'Aref SJ, Li Y, Chen A, Brejt JA, Chiang MF, Peng Y, Lu Z. Hidden flaws behind expert-level accuracy of GPT-4 vision in medicine. arXiv:2401.08396 [Preprint]. 2024 [cited 2024 Mar 19]. Available from: https://doi.org/10.48550/arXiv.2401.08396
Kumar H, Rothschild DM, Goldstein DG, Hofman JM. Math education with large language models: peril or promise? SSRN [Preprint]. 2023 [cited 2024 Mar 19]. Available from: https://ssrn.com/abstract=4641653
Dell'Acqua F. Falling asleep at the wheel: human/AI collaboration in a field experiment on HR recruiters. 2022 [cited 2024 Mar 19]. Available from: https://static1.squarespace.com/static/604b23e38c22a96e9c78879e/t/62d5d9448d061f7327e8a7e7/1658181956291/Falling+Asleep+at+the+Wheel+-+Fabrizio+DellAcqua.pdf
Ganjavi C, Eppler MB, Pekcan A, Biedermann B, Abreu A, Collins GS, Gill IS, Cacciamani GE. Publishers' and journals' instructions to authors on use of generative artificial intelligence in academic and scientific publishing: bibliometric analysis. BMJ. 2024;384:e077192.
Ballester PL. Open science and software assistance: commentary on "artificial intelligence can generate fraudulent but authentic-looking scientific medical articles: Pandora's box has been opened". J Med Internet Res. 2023;25:e49323.

The Korean Journal of Physiology and Pharmacology

The transformative impact of large language models on medical writing and publishing: current applications, challenges and future directions

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)