Acknowledgement
This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT)(No. 2022R1A5A7083908).
References
- Abbas, A., Tirumala, K., Simig, D., Ganguli, S. & Morcos, A.S. (2023). SemDeDup: Data-efficient learning at web-scale through semantic deduplication. arXiv preprint. https://arxiv.org/abs/2303.09540
- Agrawal, G., Kumarage, T., Alghami, Z. & Liu, H. (2023). Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey. arXiv preprint. https://arxiv.org/abs/2311.07914
- Amayuelas, A., Pan, L., Chen, W. & Wang, W. (2023). Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.13712
- Andriopoulos, K. & Pouwelse, J. (2023). Augmenting LLMs with Knowledge: A survey on hallucination prevention. arXiv preprint. https://arxiv.org/abs/2309.16459
- Augenstein, I., Baldwin, T., Cha, M., Chakraborty, T., Ciampaglia, G. L., Corney, D., DiResta, R., Ferrara, E., Hale, S., Halevy, A., Hovy, E., Ji, H., Menczer, F., Miguez, R., Nakov, P., Scheufele, D., Sharma, S. & Zagni, G. (2023). Factuality Challenges in the Era of Large Language Models. arXiv preprint. https://arxiv.org/abs/231005189.
- Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., Joseph, N., Kadavath, S., Kernion, J., Conerly, T., El-Showk, S., Elhage, N., Hatfield-Dodds, Z., Hernandez, D., Hume, T., Johnston, S., Kravec, S., Lovitt, L., Nanda, N., Olsson, C., Amodei, D., Brown, T.B., Clark, J., McCandlish, S., Olah, C., Mann, B. & Kaplan, J. (2022). Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv preprint. https://arxiv.org/abs/2204.05862
- Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., Do, Q. V., Xu, Y. & Fung, P. (2023). A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. arXiv preprint. https://arxiv.org/abs/2302.04023
- Barrett, C., Boyd, B., Bursztein, E., Carlini, N., Chen, B., Choi, J., Chowdhury, A., Christodorescu, M., Datta, A., Feizi, S., Fisher, K., Hashimoto, T., Hendrycks, D., Jha, S., Kang, D., Kerschbaum, F., Mitchell, E., Mitchell, J., Ramzan, Z., Shams, K., Song, D., Taly, A. & Yang, D. (2023). "Identifying and mitigating the security risks of generative ai." Foundations and Trends® in Privacy and Security, 6(1), 1-52. https://doi.org/10.1561/3300000041
- Beltagy, I., Peters, M.E. & Cohan, A. (2020). Longformer: The Long-Document Transformer. arXiv preprint. https://arxiv.org/abs/2004.05150
- Berglund, L., Tong, M., Kaufmann, M., Balesni, M., Stickland, A. C., Korbak, T. & Evans, O. (2023). The Reversal Curse: LLMs trained on" A is B" fail to learn" B is A". arXiv preprint. https://arxiv.org/abs/2309.12288
- Chen, X., Li, M., Gao, X. & Zhang, X. (2022). "Towards improving faithfulness in abstractive summarization." Advances in Neural Information Processing Systems, 35, 24516-24528.
- Chen, Y., Liu, Y., Meng, F., Chen, Y., Xu, J. & Zhou, J. (2023a). Improving Translation Faithfulness of Large Language Models via Augmenting Instructions. arXiv preprint. https://arxiv.org/abs/2308.12674
- Chen, S., Zhao, Y., Zhang, J., Chern, E., Gao, S., Liu, P. & He, J. (2023b). FELM: Benchmarking Factuality Evaluation of Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.00741
- Chen, B., Zhang, Z., Langrene, N. & Zhu, S. (2023c). Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review. arXiv preprint. https://arxiv.org/abs/2310.14735
- Chen, X., Song, D., Gui, H., Wang, C., Zhang, N., Yong, J., Huang, F., Lv, C., Zhang, D. & Chen, H. (2023d). FactCHD: Benchmarking Fact-Conflicting Hallucination Detection. arXiv preprint. https://arxiv.org/abs/2310.12086
- Chen, Z., Li, D., Zhao, X., Hu, B. & Zhang, M. (2023e). Temporal Knowledge Question Answering via Abstract Reasoning Induction. arXiv preprint. https://arxiv.org/abs/2311.09149
- Chen, Y., Sikka, K., Cogswell, M., Ji, H. & Divakaran, A. (2023f). DRESS: Instructing Large Vision- Language Models to Align and Interact with Humans via Natural Language Feedback. arXiv preprint. https://arxiv.org/abs/2311.10081
- Cheng, Q., Sun, T., Zhang, W., Wang, S., Liu, X., Zhang, M., He, J., Huang, M., Yin, Z., Chen, K. & Qiu, X. (2023a). Evaluating Hallucinations in Chinese Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.03368
- Cheng, D., Huang, S., Bi, J., Zhan, Y., Liu, J., Wang, Y., Sun, H., Wei, F., Deng, D. & Zhang, Q. (2023b). UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation. arXiv preprint. https://arxiv.org/abs/2303.08518
- Cheng, Q., Sun, T., Liu, X., Zhang, W., Yin, Z., Li, S., Li, L., He, Z., Chen, K. & Qiu, X. (2024). Can AI Assistants Know What They Don't Know?. arXiv preprint. https://arxiv.org/abs/2401.13275
- Chiesurin, S., Dimakopoulos, D., Cabezudo, M. A. S., Eshghi, A., Papaioannou, I., Rieser, V. & Konstas, I. (2023). The dangers of trusting stochastic parrots: Faithfulness and trust in open-domain conversational question answering. arXiv preprint. https://arxiv.org/abs/2305.16519
- Cho, J., Hu, Y., Garg, R., Anderson, P., Krishna, R., Baldridge, J., Bansal, M., Pont-Tuset, J. & Wang, S. (2023). Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Textto- Image Generation. arXiv preprint. https://arxiv.org/abs/2310.18235
- Chrysostomou, G., Zhao, Z., Williams, M. & Aletras, N. (2023). Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization. arXiv preprint. https://arxiv.org/abs/2311.09335
- Cohen, R., Hamri, M., Geva, M. & Globerson, A. (2023). LM vs LM: Detecting Factual Errors via Cross Examination. arXiv preprint. https://arxiv.org/abs/2305.13281
- Cotra, A. (2021). "Why AI Alignment Could Be Hard with Modern Deep Learning." https://www.cold-takes.com/why-ai-alignment-could-behard- with-modern-deep-learning/. (Retrieved on April 27, 2024).
- Cui, C., Zhou, Y., Yang, X., Wu, S., Zhang, L., Zou, J. & Yao, H. (2023). Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges. arXiv preprint. https://arxiv.org/abs/2311.03287
- Dai, W., Liu, Z., Ji, Z., Su, D. & Fung, P. (2022). Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training. arXiv preprint. https://arxiv.org/abs/2210.07688 https://doi.org/10.07688
- Dai, Y., Lang, H., Zeng, K., Huang, F. & Li, Y. (2023). Exploring Large Language Models for Multi- Modal Out-of-Distribution Detection. arXiv preprint. https://arxiv.org/abs/2310.08027 https://doi.org/10.08027
- Daull, X., Bellot, P., Bruno, E., Martin, V. & Murisasco, E. (2023). Complex QA and Language Models Hybrid Architectures, Survey. arXiv preprint. https://arxiv.org/abs/2302.09051
- Deng, H., Ding, L., Liu, X., Zhang, M., Tao, D. & Zhang, M. (2022). Improving Simultaneous Machine Translation with Monolingual Data. arXiv preprint. https://arxiv.org/abs/2212.01188.
- Dhuliawala, S., Komeili, M., Xu, J., Raileanu, R., Li, X., Celikyilmaz, A. & Weston, J. (2023). Chain-of-Verification Reduces Hallucination in Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.11495
- Ding, Y., Wang, Z., Ahmad, W. U., Ramanathan, M. K., Nallapati, R., Bhatia, P., Roth, D. & Xiang, B. (2022). CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context. arXiv preprint. https://arxiv.org/abs/2212.10007
- Dong, G., Yuan, H., Lu, K., Li, C., Xue, M., Liu, D., Wang, W., Yuan, Z., Zhou, C. & Zhou, J. (2023a). How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition. arXiv preprint. https://arxiv.org/abs/2310.05492
- Dong, Z., Tang, T., Li, J., Zhao, W. X. & Wen, J. R. (2023b). BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.13345.
- Du, L., Wang, Y., Xing, X., Ya, Y., Li, X., Jiang, X. & Fang, X. (2023). Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis. arXiv preprint. https://arxiv.org/abs/2309.05217.
- Durante, Z., Huang, Q., Wake, N., Gong, R., Park, J. S., Sarkar, B., Taori, R., Noda, Y., Terzopoulos, D., Choi, Y., Ikeuchi, K., Vo, H., Fei-Fei, L. & Gao, J. (2022). Agent AI: Surveying the Horizons of Multimodal Interaction. arXiv preprint. https://arxiv.org/abs/2401.03568
- Elaraby, M. S., Lu, M., Dunn, J., Zhang, X., Wang, Y. & Liu, S. (2023). Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models. arXiv preprint. https://arxiv.org/abs/2308.11764
- Eun, J., & Hwang, S. (2020). "An Exploratory Study on Policy Decision Making with Artificial Intelligence: Applying Problem Structuring Typology on Success and Failure Cases." Informatization Policy, 27(4), 47-66. https://doi.org/10.22693/NIAIP.2020.27.4.047
- Fadeeva, E., Vashurin, R., Tsvigun, A., Vazhentsev, A., Petrakov, S., Fedyanin, K., Vasilev, D., Goncharova, E., Panchenko, A., Panov, M., Baldwin, T. & Shelmanov, A. (2023). LM-Polygraph: Uncertainty Estimation for Language Models. arXiv preprint. https://arxiv.org/abs/2311.07383
- Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S. & Zhang, J. M. (2023). Large Language Models for Software Engineering: Survey and Open Problems. arXiv preprint. https://arxiv.org/abs/2310.03533
- Farinhas, A., de Souza, J. G. C. & Martins, A. F. T. (2023). An Empirical Study of Translation Hypothesis Ensembling with Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.11430
- Fei, H., Liu, Q., Zhang, M., Zhang, M. & Chua, T. S. (2023). Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination. arXiv preprint. https://arxiv.org/abs/2305.12256
- Feng, H., Fan, Y., Liu, X., Lin, T. E., Yao, Z., Wu, Y., Huang, F., Li, Y. & Ma, Q. (2023). Improving Factual Consistency of Text Summarization by Adversarially Decoupling Comprehension and Embellishment Abilities of LLMs. arXiv preprint. https://arxiv.org/abs/2310.19347
- Ferrara, E. (2023). "Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies." Sci, 6(1), 3.
- Foster, D. (2022). Generative deep learning. O'Reilly Media, Inc..
- Fung, Y. R., Chakraborty, T., Guo, H., Rambow, O., Muresan, S. & Ji, H. (2022). NormSAGE: Multi- Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly. arXiv preprint. https://arxiv.org/abs/2210.08604
- Friel, R. & Sanyal, A. (2023). Chainpoll: A High Efficacy Method for LLM Hallucination Detection. arXiv preprint. https://arxiv.org/abs/2310.18344 https://doi.org/10.18344
- Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Guo, Q., Wang, M. & Wang, H. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv preprint. https://arxiv.org/abs/2312.10997
- Ghandi, T., Pourreza, H. & Mahyar, H. (2023). Deep Learning Approaches on Image Captioning: A Review. arXiv preprint. https://arxiv.org/abs/2201.12944. https://doi.org/10.1145/3617592
- Gou, Z., Shao, Z., Gong, Y., Shen, Y., Yang, Y., Duan, N. & Chen, W. (2023). CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing. arXiv preprint. https://arxiv.org/abs/2305.11738
- Guan, J., Dodge, J., Wadden, D., Huang, M. & Peng, H. (2023). Language Models Hallucinate, but May Excel at Fact Verification. arXiv preprint. https://arxiv.org/abs/2310.14564
- Guerreiro, N. M., Colombo, P., Piantanida, P. & Martins, A. F. T. (2022). Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation. arXiv preprint. https://arxiv.org/abs/2212.09631
- Gu, D., On, B. & Jeong, D. (2022). "Relevance and Redundancy-based Loss Function of KoBART Model for Improvement of the Factual Inconsistency Problem in Abstractive Summarization." The Journal of Korean Institute of Information Technology, 20(12), 25-36. https://doi.org/10.14801/jkiit.2022.20.12.25
- Gupta, V., Pandya, P., Kataria, T., Gupta, V. & Roth, D. (2023). Multi-Set Inoculation: Assessing Model Robustness Across Multiple Challenge Sets. arXiv preprint. https://arxiv.org/abs/2311.08662
- Ha, D., Dai, A. & Le, Q. V. (2016). Hypernetworks. arXiv preprint. https://arxiv.org/abs/1609.09106
- He, Z., Liang, T., Jiao, W., Zhang, Z., Yang, Y., Wang, R., Tu, Z., Shi, S. & Wang, X. (2023). Exploring Human-Like Translation Strategy with Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.04118
- Hua, W., Xu, S., Ge, Y. & Zhang, Y. (2023). How to Index Item IDs for Recommendation Foundation Models. arXiv preprint. https://arxiv.org/abs/2305.06569 https://doi.org/10.1145/3624918.3625339
- Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B. & Liu, T. (2023a). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv preprint. https://arxiv.org/abs/2311.05232
- Huang, Q., Dong, X., Zhang, P., Wang, B., He, C., Wang, J., Lin, D., Zhang, W. & Yu, N. (2023b). OPERA: Alleviating Hallucination in Multi- Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation. arXiv preprint. https://arxiv.org/abs/2311.17911
- Jafari, M., Sadeghi, D., Shoeibi, A., Alinejad-Rokny, H., Beheshti, A., Garcia, D. L., Chen, Z., Acharya, U. R. & Gorriz, J. M. (2023). Empowering Precision Medicine: AI-Driven Schizophrenia Diagnosis via EEG Signals: A Comprehensive Review from 2002-2023. arXiv preprint. https://arxiv.org/abs/2309.12202
- Ji, Z., Liu, Z., Lee, N., Yu, T., Wilie, B., Zeng, M. & Fung, P. (2022). RHO (ρ): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding. arXiv preprint. https://arxiv.org/abs/2212.01588
- Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Madotto, A. & Fung, P. (2023a). "Survey of Hallucination in Natural Language Generation." ACM Computing Surveys, 55(12), 1-38. https://doi.org/10.1145/3571730
- Ji, Z., Yu, T., Xu, Y., Lee, N., Ishii, E. & Fung, P. (2023b). Towards Mitigating Hallucination in Large Language Models via Self-Reflection. arXiv preprint. https://arxiv.org/abs/2310.06271 https://doi.org/10.06271
- Jiang, C., Xu, H., Dong, M., Chen, J., Ye, W., Yan, M., Ye, Q., Zhang, J., Huang, F. & Zhang, S. (2023). Hallucination Augmented Contrastive Learning for Multimodal Large Language Model. arXiv preprint. https://arxiv.org/abs/2312.06968
- Jiao, W., Wang, W., Huang, J., Wang, X., Shi, S. & Tu, Z. (2023). Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine. arXiv preprint. https://arxiv.org/abs/2301.08745
- Jha, S., Jha, S. K., Lincoln, P., Bastian, N. D., Velasquez, A. & Neema, S. (2023). Dehallucinating Large Language Models Using Formal Methods Guided Iterative Prompting. Paper presented at 2023 IEEE International Conference on Assured Autonomy (ICAA), June 6-8.
- Kamalloo, E., Dziri, N., Clarke, C. L. A. & Rafiei, D. (2023). Evaluating Open-Domain Question Answering in the Era of Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.06984
- Kanda, N., Yoshioka, T. & Liu, Y. (2023). Factual Consistency Oriented Speech Recognition. arXiv preprint. https://arxiv.org/abs/2302.12369
- Kang, C. & Choi, J. (2023). Impact of Co-occurrence on Factual Knowledge of Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.08256
- Kang, H. & Liu, X. Y. (2023). Deficiency of Large Language Models in Finance: An Empirical Examination of Hallucination. arXiv preprint. https://arxiv.org/abs/2311.15548
- Kang, H., Ni, J. & Yao, H. (2023). Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification. arXiv preprint. https://arxiv.org/abs/2311.09114
- Kasai, J., Sakaguchi, K., Takahashi, Y., Le Bras, R., Asai, A., Yu, X.V., Radev, D.R., Smith, N.A., Choi, Y. & Inui, K. (2022). RealTime QA: What's the Answer Right Now? arXiv preprint. https://arxiv.org/abs/2207.13332
- Kasanishi, T., Isonuma, M., Mori, J. & Sakata, I. (2023). SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation. arXiv preprint. https://arxiv.org/abs/2305.15186
- Khalid, H., Tariq, S., Kim, M. & Woo, S. (2021). FakeAVCeleb: A novel audio-video multimodal deepfake dataset. arXiv preprint. https://arxiv.org/abs/2108.05080 108.05080
- Koksal, A., Aksitov, R. & Chang, C. (2023). Hallucination Augmented Recitations for Language Models. arXiv preprint. https://arxiv.org/abs/2311.07424
- Ladhak, F., Durmus, E., Suzgun, M., Zhang, T., Jurafsky, D., McKeown, K. & Hashimoto, T. (2023). When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 3206-3219.
- Lattimer, B. M., Chen, P., Zhang, X. & Yang, Y. (2023). Fast and Accurate Factual Inconsistency Detection Over Long Documents. arXiv preprint. https://arxiv.org/abs/2310.13189 https://doi.org/10.13189
- Lee, N., Ping, W., Xu, P., Patwary, M., Shoeybi, M. & Catanzaro, B. (2022). Factuality Enhanced Language Models for Open-Ended Text Generation. arXiv preprint. https://arxiv.org/abs/2206.04624
- Lee, Z. & Nam, H. (2022). "A Literature Review Study in the Field of Artificial Intelligence (AI) Aplications, AI-Related Management, and AI Application Risk." Informatization Policy, 29(2), 3-36. https://doi.org/10.22693/NIAIP.2022.29.2.003
- Lei, D., Li, Y., Hu, M., Wang, M., Yun, V., Ching, E. & Kamal, E. (2023). Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations. arXiv preprint. https://arxiv.org/abs/2310.03951 https://doi.org/10.03951
- Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, M., Yih, W., Rocktaschel, T., Riedel, S. & Kiela, D. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." Advances in Neural Information Processing Systems, 33, 9459-9474.
- Li, J., Cheng, X., Zhao, W. X., Nie, J. Y. & Wen, J. R. (2023a). HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.11747
- Li, Z., Zhang, S., Zhao, H., Yang, Y. & Yang, D. (2023b). BatGPT: A Bidirectional Autoregessive Ta l ke r f ro m G e n e ra t i ve P re-t ra i n e d Transformer. arXiv preprint. https://arxiv.org/abs/2307.00360
- Li, K., Patel, O., Vi'egas, F., Pfister, H. & Wattenberg, M. (2023c). Inference-Time Intervention: Eliciting Truthful Answers from a Language Model. arXiv preprint. https://arxiv.org/abs/2306.03341
- Li, B., Zhou, B., Wang, F., Fu, X., Roth, D. & Chen, M. (2023d). Deceiving Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?. arXiv preprint. https://arxiv.org/abs/2311.09702
- Li, Y., Li, Y., Zhang, M., Su, C., Ren, M., Qiao, X., Zhao, X., Piao, M., Yu, J., Lv, X., Ma, M., Zhao, Y. & Yang, H. (2023e). A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting. arXiv preprint. https://arxiv.org/abs/2309.09552
- Li, Y., Du, Y., Zhou, K., Wang, J., Zhao, W. X. & Wen, J. R. (2023f). Evaluating Object Hallucination in Large Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2305.10355 10355
- Li, J., Chen, J., Ren, R., Cheng, X., Zhao, W. X., Nie, J. Y. & Wen, J. R. (2024). The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models. arXiv preprint. https://arxiv.org/abs/2401.03205
- Lin, S.C., Hilton, J. & Evans, O. (2022). TruthfulQA: Measuring How Models Mimic Human Falsehoods. Annual Meeting of the Association for Computational Linguistics. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 3214-3252.
- Liu, W., Li, G., Zhang, K., Du, B., Chen, Q., Hu, X., Xu, H., Chen, J. & Wu, J. (2023a). Mind's Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models. arXiv preprint. https://arxiv.org/abs/2311.09214
- Liu, B., Ash, J.T., Goel, S., Krishnamurthy, A. & Zhang, C. (2023b). Exposing Attention Glitches with Flip-Flop Language Modeling. arXiv preprint. https://arxiv.org/abs/2306.00946
- Liu, G., Wang, X., Yuan, L., Chen, Y. & Peng, H. (2023c). Prudent Silence or Foolish Babble? Examining Large Language Models' Responses to the Unknown. arXiv preprint. https://arxiv.org/abs/2311.09731
- Liu, Y., Wang, K., Shao, W., Luo, P., Qiao, Y., Shou, M. Z., Zhang, K. & You, Y. (2023d). MLLMs- Augmented Visual-Language Representation Learning. arXiv preprint. https://arxiv.org/abs/2311.18765
- Lovenia, H., Dai, W., Cahyawijaya, S., Ji, Z. & Fung, P. (2023). Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2310.05338 https://doi.org/10.05338
- Luo, J., Xiao, C. & Ma, F. (2023). Zero-Resource Hallucination Prevention for Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.02654
- Luo, J., Li, T., Wu, D., Jenkin, M., Liu, S. & Dudek, G. (2024). Hallucination Detection and Hallucination Mitigation: An Investigation. arXiv preprint. https://arxiv.org/abs/2401.08358
- Ma, W., Liu, S., Wang, W., Hu, Q., Liu, Y., Zhang, C., Nie, L. & Liu, Y. (2023). ChatGPT: Understanding Code Syntax and Semantics. arXiv preprint. https://arxiv.org/abs/2305.12138
- Manakul, P., Liusie, A. & Gales, M.J. (2023). SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. arXiv preprint. https://arxiv.org/abs/2303.08896
- Mei, K. & Zhang, Y. (2023). LightLM: A Lightweight Deep and Narrow Language Model for Generative Recommendation. arXiv preprint. https://arxiv.org/abs/2310.17488.
- Meng, K., Bau, D., Andonian, A. & Belinkov, Y. (2022). "Locating and Editing Factual Associations in GPT." Advances in Neural Information Processing Systems, 35, 17359-17372.
- Miao, M., Meng, F., Liu, Y., Zhou, X. H. & Zhou, J. (2021). Prevent the Language Model from Being Overconfident in Neural Machine Translation. arXiv preprint. https://arxiv.org/abs/2105.11098 105.11098
- Miao, N., Teh, Y.W. & Rainforth, T. (2023). SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning. arXiv preprint. https://arxiv.org/abs/2308.00436
- Min, S., Krishna, K., Lyu, X., Lewis, M., Yih, W., Koh, P., Iyyer, M., Zettlemoyer, L. & Hajishirzi, H. (2023). FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation. arXiv preprint. https://arxiv.org/abs/2305.14251
- Mitchell, E., Lin, C., Bosselut, A., Finn, C. & Manning, C.D. (2022, April 25-29). Fast Model Editing at Scale [Poster Presentation]. The Tenth International Conference on Learning Representations, ICLR 2022 Virtual Event. https://openreview.net/forum?id=0DcZxeWfOPt
- Mohammadshahi, A., Vamvas, J. & Sennrich, R. (2023). Investigating Multi-Pivot Ensembling with Massively Multilingual Machine Translation Models. arXiv preprint. https://arxiv.org/abs/2311.07439
- Montagnese, M., Leptourgos, P., Fernyhough, C., Waters, F., Laroi, F., Jardri, R., McCarthy-Jones, S., Thomas, N., Dudley, R., Taylor, J.-P., Collerton, D. & Urwyler, P. (2021). "A review of multimodal hallucinations: categorization, assessment, theoretical perspectives, and clinical recommendations." Schizophrenia Bulletin, 47(1), 237-248. https://doi.org/10.1093/schbul/sbaa101
- Moses, L. (2024). "OpenAI's Sora's Best Features and Biggest Limitations." Business Insider, April 19.
- Muhlgay, D., Ram, O., Magar, I., Levine, Y., Ratner, N., Belinkov, Y., Abend, O., Leyton-Brown, K., Shashua, A. & Shoham, Y. (2023). Generating Benchmarks for Factuality Evaluation of Language Models. arXiv preprint. https://arxiv.org/abs/2307.06908
- Nathani, D., Wang, D., Pan, L. & Wang, W. Y. (2023). MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.12426
- Oh, D. (2024). "Research Trends for Dehallucination of Natural Language Generation Model." Communications of the Korean Institute of Information Scientists and Engineers, 42(1), 15-20.
- Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J. & Lowe, R. (2022). "Training Language Models to Follow Instructions with Human Feedback." Advances in neural information processing systems, 35, 27730-27744.
- Pan, L., Saxon, M., Xu, W., Nathani, D., Wang, X. & Wang, W. Y. (2023). Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Self-Correction Strategies. arXiv preprint. https://arxiv.org/abs/2308.03188
- Park, D. (2024). Media Artificial Intelligence. Seoul: Yulgokbook Publishing Company.
- Park, D. (2023a). "Journalism Artificial Intelligence Based on Trustworthy Artificial Intelligence : Toward a Commensurability between Media Trust and Trustworthiness of Artificial Intelligence System." Media & Society, 31(4), 5-47. https://doi.org/10.52874/medsoc.2023.11.31.4.5
- Park, D. (2023b). "Topology of Media Bias : Fat-Tailed Distribution as Universal Distribution of Quotation by Analyzing News Source Networks with 16.5 Million Articles." Korean Journal of Journalism & Communication Studies, 67(6), 189-222. https://doi.org/10.20879/kjjcs.2023.67.6.006
- Peng, B., Galley, M., He, P., Cheng, H., Xie, Y., Hu, Y., Huang, Q., Liden, L., Yu, Z., Chen, W. & Gao, J. (2023). Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback. arXiv preprint. https://arxiv.org/abs/2302.12813
- Press, O., Zhang, M., Min, S., Schmidt, L., Smith, N. A. & Lewis, M. (2022). Measuring and Narrowing the Compositionality Gap in Language Models. arXiv preprint. https://arxiv.org/abs/2210.03350
- Qiu, Z., Liu, W., Xiao, T. Z., Liu, Z., Bhatt, U., Luo, Y., Weller, A. & Scholkopf, B. (2022). Iterative Teaching by Data Hallucination. arXiv preprint. https://arxiv.org/abs/2210.17467
- Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. & Sutskever, I. (2019). "Language Models are Unsupervised Multitask Learners." https://openai.com/research/better-languagemodels (Retrieved on April 27, 2024).
- Ram, O., Levine, Y., Dalmedigos, I., Muhlgay, D., Shashua, A., Leyton-Brown, K. & Shoham, Y. (2023). In-Context Retrieval-Augmented Language Models. arXiv preprint. https://arxiv.org/abs/2302.00083
- Rawte, V., Chakraborty, S., Pathak, A., Sarkar, A., Tonmoy, S. M. T. I., Chadha, A., Sheth, A. P. & Das, A. (2023a). The Troubling Emergence of Hallucination in Large Language Models - An Extensive Definition, Quantification, and Prescriptive Remediations. arXiv preprint. https://arxiv.org/abs/2310.04988
- Rawte, V., Sheth, A. & Das, A. (2023b). A Survey of Hallucination in Large Foundation Models. arXiv preprint. https://arxiv.org/abs/2309.05922
- Rehman, T., Mandal, R., Agarwal, A. & Sanyal, D. K. (2023). Hallucination Reduction in Long Input Text Summarization. arXiv preprint. https://arxiv.org/abs/2309.16781
- Rejeleene, R., Xu, X. & Talburt, J. (2024). Towards Trustable Language Models: Investigating Information Quality of Large Language Models. arXiv preprint. https://arxiv.org/abs/2401.13086
- Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684-10695.
- Saha, S., Yu, X. V., Bansal, M., Pasunuru, R. & Celikyilmaz, A. (2022). MURMUR: Modular Multi-Step Reasoning for Semi-Structured Datato-Text Generation. arXiv preprint. https://arxiv.org/abs/2212.08607
- Sarkar, D., Bali, R. & Ghosh, T. (2018). Hands-On Transfer Learning with Python: Implement Advanced Deep Learning and Neural Network Models using TensorFlow and Keras. Packt Publishing.
- Saunders, W., Yeh, C., Wu, J., Bills, S., Long, O., Ward, J. & Leike, J. (2022). Self-Critiquing Models for Assisting Human Evaluators. arXiv preprint. https://arxiv.org/abs/2206.05802
- Schulman, J. (2023). "Reinforcement Learning from Human Fheedback: Progress and Challenges." https://www.youtube.com/watch?v=hhiLw5Q_UFg. (Retrieved on April 27, 2024).
- Shi, W., Han, X., Lewis, M., Tsvetkov, Y., Zettlemoyer, L. & Yih, S. W. T. (2023). Trusting Your Evidence: Hallucinate Less with Context-Aware Decoding. arXiv preprint. https://arxiv.org/abs/2305.14739
- Shi, Z., Wang, Z., Fan, H., Yin, Z., Sheng, L., Qiao, Y. & Shao, J. (2023). ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models. arXiv preprint. https://arxiv.org/abs/2311.02692
- Si, C., Gan, Z., Yang, Z., Wang, S., Wang, J., Boyd-Graber, J. L. & Wang, L. (2022). Prompting GPT-3 To Be Reliable. arXiv preprint. https://arxiv.org/abs/2210.09150
- Song, M. & Lee, S. (2024). "What Concerns Does ChatGPT Raise for Us?: An Analysis Centered on CTM (Correlated Topic Modeling) of YouTube Video News Comments." Informatization Policy, 31(1), 3-31. https://doi.org/10.22693/NIAIP.2024.31.1.003
- Sun, Z., Shen, Y., Zhou, Q., Zhang, H., Chen, Z., Cox, D.D., Yang, Y. & Gan, C. (2023a). Principle- Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision. arXiv preprint. https://arxiv.org/abs/2305.03047
- Sun, Q., Yin, Z., Li, X., Wu, Z., Qiu, X. & Kong, L. (2023b). Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration. arXiv preprint. https://arxiv.org/abs/2310.00280
- Tam, D., Mascarenhas, A., Zhang, S., Kwan, S., Bansal, M. & Raffel, C. (2022). Evaluating the Factual Consistency of Large Language Models Through News Summarization. arXiv preprint. https://arxiv.org/abs/2211.08412
- Tian, K., Mitchell, E., Yao, H., Manning, C. D. & Finn, C. (2023). Fine-tuning Language Models for Factuality. arXiv preprint. https://arxiv.org/abs/2311.08401
- Tonmoy, S. M. T. I., Zaman, S. M. M., Jain, V., Rani, A., Rawte, V., Chadha, A. & Das, A. (2024). A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models. arXiv preprint. https://arxiv.org/abs/2401.01313
- Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M., Lacroix, T., Roziere, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E. & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv preprint. https://arxiv.org/abs/2302.13971
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. & Polosukhin, I. (2017). "Attention is All You Need." Advances in neural information processing systems, 30.
- Verma, S., Goel, T., Tanveer, M., Ding, W. & Sharma, R. (2024). Machine learning techniques for the Schizophrenia diagnosis: A comprehensive review and future research directions. arXiv preprint. https://arxiv.org/abs/2301.07496
- Vu, T., Iyyer, M., Wang, X., Constant, N., Wei, J., Wei, J., Tar, C., Sung, Y., Zhou, D., Le, Q. & Luong, T. (2023). FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation. arXiv preprint. https://arxiv.org/abs/2310.03214
- Wan, D., Zhang, S. & Bansal, M. (2023). HistAlign: Improving Context Dependency in Language Generation by Aligning with History. arXiv preprint. https://arxiv.org/abs/2305.04782
- Wang, Y., Zhong, W., Li, L., Mi, F., Zeng, X., Huang, W., Shang, L., Jiang, X. & Liu, Q. (2023a). Aligning Large Language Models with Human: A Survey. arXiv preprint. https://arxiv.org/abs/2307.12966
- Wang, Z., Mao, S., Wu, W., Ge, T., Wei, F. & Ji, H. (2023b). Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration. arXiv preprint. https://arxiv.org/abs/2307.05300
- Wang, J., Wang, Y., Xu, G., Zhang, J., Gu, Y., Jia, H., Yan, M., Zhang, J. & Sang, J. (2023c). An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation. arXiv preprint. https://arxiv.org/abs/2311.07397
- Wang, B., Wu, F., Han, X., Peng, J., Zhong, H., Zhang, P., Dong, X., Li, W., Li, W., Wang, J. & He, C. (2023d). VIGC: Visual Instruction Generation and Correction. arXiv preprint. https://arxiv.org/abs/2308.12714
- Wang, J., Zhou, Y., Xu, G., Shi, P., Zhao, C., Xu, H., Ye, Q., Yan, M., Zhang, J., Zhu, J. & Sang, J. (2023e). Evaluation and Analysis of Hallucination in Large Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2308.15126
- Wang, F. (2024). LightHouse: A Survey of AGI Hallucination. arXiv preprint. https://arxiv.org/abs/2401.06792
- Wang, J., Chang, Y., Li, Z., An, N., Ma, Q., Hei, L., Luo, H., Lu, Y. & Ren, F. (2024a). TechGPT-2.0: A large language model project to solve the task of knowledge graph construction. arXiv preprint. https://arxiv.org/abs/2401.04507
- Wang, X., Zhou, Y., Liu, X., Lu, H., Xu, Y., He, F., Yoon, J., Lu, T., Bertasius, G., Bansal, M., Yao, H. & Huang, F. (2024b). Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences. arXiv preprint. https://arxiv.org/abs/2401.10529
- Wei, J. W., Huang, D., Lu, Y., Zhou, D. & Le, Q. (2023). Simple Synthetic Data Reduces Sycophancy in Large Language Models. arXiv preprint. https://arxiv.org/abs/2308.03958
- Wilie, B., Xu, Y., Chung, W., Cahyawijaya, S., Lovenia, H. & Fung, P. (2023). PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded Dialogue Systems. arXiv preprint. https://arxiv.org/abs/2309.10413
- Xiong, M., Hu, Z., Lu, X., Li, Y., Fu, J., He, J. & Hooi, B. (2023). Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs. arXiv preprint. https://arxiv.org/abs/2306.13063
- Xu, P., Shao, W., Zhang, K., Gao, P., Liu, S., Lei, M., Meng, F., Huang, S., Qiao, Y. & Luo, P. (2023). LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2306.09265
- Xue, T., Wang, Z., Wang, Z., Han, C., Yu, P. & Ji, H. (2023). RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chainof- Thought. arXiv preprint. https://arxiv.org/abs/2305.11499
- Yang, Z., Dai, Z., Salakhutdinov, R. & Cohen, W.W. (2018). Breaking the Softmax Bottleneck: A High-Rank RNN Language Model. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, April 30 - May 3, 2018.
- Yang, S., Sun, R. & Wan, X. (2023a). A New Benchmark and Reverse Validation Method for Passagelevel Hallucination Detection. arXiv preprint. https://arxiv.org/abs/2310.06498 https://doi.org/10.06498
- Yang, L., Zhang, S., Yu, Z., Bao, G., Wang, Y., Wang, J., Xu, R., Ye, W., Xie, X., Chen, W. & Zhang, Y. (2023c). Supervised Knowledge Makes Large Language Models Better In-context Learners. arXiv preprint. https://arxiv.org/abs/2312.15918
- Yao, J., Ning, K., Liu, Z., Ning, M. & Yuan, L. (2023a). LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples. arXiv preprint. https://arxiv.org/abs/2310.01469 https://doi.org/10.01469
- Yao, Y., Xu, X. & Liu, Y. (2023b). Large Language Model Unlearning. arXiv preprint. https://arxiv.org/abs/2310.10683
- Ye, H., Liu, T., Zhang, A., Hua, W. & Jia, W. (2023a). Cognitive Mirage: A Review of Hallucinations in Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.06794
- Yu, X., Cheng, H., Liu, X., Roth, D. & Gao, J. (2023). Automatic Hallucination Assessment for Aligned Large Language Models via Transferable Adversarial Attacks. arXiv preprint. https://arxiv.org/abs/2310.12516 https://doi.org/10.12516
- Yun, H. S., Marshall, I. J., Trikalinos, T. A. & Wallace, B. C. (2023). Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews. arXiv preprint. https://arxiv.org/abs/2305.11828
- Zha, Y., Yang, Y., Li, R. & Hu, Z. (2023). AlignScore: Evaluating Factual Consistency with A Unified Alignment Function. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 11328-11348.
- Zhai, Y., Tong, S., Li, X., Cai, M., Qu, Q., Lee, Y. J. & Ma, Y. (2023). Investigating the Catastrophic Forgetting in Multimodal Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.10313 10313
- Zhang, H., Duckworth, D., Ippolito, D. & Neelakantan, A. (2020). Trading off diversity and quality in natural language generation. arXiv preprint. https://arxiv.org/abs/2004.10450
- Zhang, J., Li, Z., Das, K., Malin, B. & Sricharan, K. (2023c). SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semanticaware Cross-check Consistency. arXiv preprint. https://arxiv.org/abs/2311.01740
- Zhang, M., Press, O., Merrill, W., Liu, A. & Smith, N. A. (2023b). How Language Model Hallucinations Can Snowball. arXiv preprint. https://arxiv.org/abs/2305.13534
- Zhang, S., Pan, L., Zhao, J. & Wang, W.Y. (2023d). The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.13669
- Zhang, T., Qiu, L., Guo, Q., Deng, C., Zhang, Y., Zhang, Z., Zhou, C., Wang, X. & Fu, L. (2023e). Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus. arXiv preprint. https://arxiv.org/abs/2311.13230
- Zhang, Y., Cui, L., Bi, W. & Shi, S. (2023f). Alleviating Hallucinations of Large Language Models through Induced Hallucinations. arXiv preprint. https://arxiv.org/abs/2312.15710
- Zhang, Y., Li, Y., Cui, L., Cai, D., Liu, L., Fu, T., Huang, X., Zhao, E., Zhang, Y., Chen, Y., Wang, L., Luu, A.T., Bi, W., Shi, F. & Shi, S. (2023a). Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.01219
- Zhao, R., Li, X., Joty, S.R., Qin, C. & Bing, L. (2023a). Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 5823-5840.
- Zhao, Z., Wang, B., Ouyang, L., Dong, X., Wang, J. & He, C. (2023b). Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization. arXiv preprint. https://arxiv.org/abs/2311.16839
- Zhong, Z., Wu, Z., Manning, C. D., Potts, C. & Chen, D. (2023). Mquake: Assessing Knowledge Editing in Language Models via Multi-Hop Questions . arXiv preprint. https://arxiv.org/abs/2305.14795
- Zhou, C., Liu, P., Xu, P., Iyer, S., Sun, J., Mao, Y., Ma, X., Efrat, A., Yu, P., Yu, L., Zhang, S., Ghosh, G., Lewis, M., Zettlemoyer, L. & Levy, O. (2023a). LIMA: Less Is More for Alignment. arXiv preprint. https://arxiv.org/abs/2305.11206
- Zhou, Y., Cui, C., Yoon, J., Zhang, L., Deng, Z., Finn, C., Bansal, M. & Yao, H. (2023b). Analyzing and Mitigating Object Hallucination in Large Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2310.00754 https://doi.org/10.00754
- Zhu, J., Qi, J., Ding, M., Chen, X., Luo, P., Wang, X., Liu, W., Wang, L. & Wang, J. (2023). Understanding Self-Supervised Pretraining with Part-Aware Representation Learning. arXiv preprint. https://arxiv.org/abs/2301.11915
- Zong, M. & Krishnamachari, B. (2022). A Survey on GPT-3. arXiv preprint. https://arxiv.org/abs/2212.00857