DOI QR코드

DOI QR Code

Analysis of Prompt Engineering Methodologies and Research Status to Improve Inference Capability of ChatGPT and Other Large Language Models

ChatGPT 및 거대언어모델의 추론 능력 향상을 위한 프롬프트 엔지니어링 방법론 및 연구 현황 분석

  • Sangun Park (Department of Industrial Management Information Engineering) ;
  • Juyoung Kang (e-Business Department, School of Business, Ajou University)
  • 박상언 (경기대학교 산업경영정보공학과) ;
  • 강주영 (아주대학교 e비즈니스학부)
  • Received : 2023.12.10
  • Accepted : 2023.12.18
  • Published : 2023.12.31

Abstract

After launching its service in November 2022, ChatGPT has rapidly increased the number of users and is having a significant impact on all aspects of society, bringing a major turning point in the history of artificial intelligence. In particular, the inference ability of large language models such as ChatGPT is improving at a rapid pace through prompt engineering techniques. This reasoning ability can be considered as an important factor for companies that want to adopt artificial intelligence into their workflows or for individuals looking to utilize it. In this paper, we begin with an understanding of in-context learning that enables inference in large language models, explain the concept of prompt engineering, inference with in-context learning, and benchmark data. Moreover, we investigate the prompt engineering techniques that have rapidly improved the inference performance of large language models, and the relationship between the techniques.

ChatGPT는 2022년 11월에 서비스를 시작한 후 급격하게 사용자 수가 늘어나며 인공지능의 역사에서 큰 전환점을 가져올 정도로 사회 곳곳에 많은 영향을 미치고 있다. 특히 ChatGPT와 같은 거대언어모델의 추론 능력은 프롬프트 엔지니어링 기법을 통해 빠른 속도로 그 성능이 발전하고 있다. 인공지능을 워크플로우에 도입하려고 하는 기업이나 활용하려고 하는 개인에게 이와 같은 추론 능력은 중요한 요소로 고려될 수 있다. 본 논문에서는 거대언어모델에서 추론을 가능하게 한 문맥내 학습에 대한 이해를 시작으로 하여 프롬프트 엔지니어링의 개념과 추론 유형 및 벤치마크 데이터에 대해 설명하고, 이를 기반으로 하여 최근 거대언어모델의 추론 성능을 급격히 향상시킨 프롬프트 엔지니어링 기법들에 대해 조사하고 발전과정과 기법들 간의 연관성에 대해 상세히 알아보고자 한다.

Keywords

Acknowledgement

이 논문은 2023년 대한민국 교육부와 한국연구재단의 지원을 받아 수행된 연구임 (NRF-2021S1A3A2A02089039)

References

  1. 김맹근. (2023). ChatGPT 활용 사례 및 전망. 디지털 비즈온. http://www.digitalbizon.com/news/articleView.html?idxno=2331610 
  2. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901.
  3. Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., ... & Fiedel, N. (2023). Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240), 1-113.
  4. Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., ... & Schulman, J. (2021). Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  5. Diao, S., Wang, P., Lin, Y., & Zhang, T. (2023). Active prompting with chain-of-thought for large language models. arXiv preprint arXiv:2302.12246.
  6. Fan, A., Lewis, M., & Dauphin, Y. (2018). Hierarchical neural story generation. Annual Meeting of the Association for Computational Linguistics, 56(1), 889-898.
  7. Ficler, J., & Goldberg, Y. (2017). Controlling linguistic style aspects in neural language generation. arXiv preprint arXiv:1707.02633.
  8. Geva, M., Khashabi, D., Segal, E., Khot, T., Roth, D., & Berant, J. (2021). Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9, 346-361. https://doi.org/10.1162/tacl_a_00370
  9. Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2019). The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751.
  10. Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, 35, 22199-22213.
  11. Koncel-Kedziorski, R., Hajishirzi, H., Sabharwal, A., Etzioni, O., & Ang, S. D. (2015). Parsing algebraic word problems into equations. Transactions of the Association for Computational Linguistics, 3, 585-597. https://doi.org/10.1162/tacl_a_00160
  12. Koncel-Kedziorski, R., Roy, S., Amini, A., Kushman, N., & Hajishirzi, H. (2016, June). MAWPS: A math word problem repository. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1152-1157).
  13. Lampinen, A. K., Dasgupta, I., Chan, S. C., Matthewson, K., Tessler, M. H., Creswell, A., ... & Hill, F. (2022). Can language models learn from explanations in context?. arXiv preprint arXiv:2204.02329.
  14. Liu, J., Liu, A., Lu, X., Welleck, S., West, P., Bras, R. L., ... & Hajishirzi, H. (2021). Generated knowledge prompting for commonsense reasoning. arXiv preprint arXiv:2110.08387.
  15. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1-35.
  16. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.
  17. Patel, A., Bhattamishra, S., & Goyal, N. (2021). Are NLP models really able to solve simple math word problems?. arXiv preprint arXiv:2103.07191.
  18. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
  19. Reynolds, L., & McDonell, K. (2021, May). Prompt programming for large language models: Beyond the few-shot paradigm. Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1-7).
  20. Roy, S., & Roth, D. (2016). Solving general arithmetic word problems. arXiv preprint arXiv:1608.01413.
  21. Srivastava, A., Rastogi, A., Rao, A., Shoeb, A. A. M., Abid, A., Fisch, A., ... & Wang, G. (2022). Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
  22. Talmor, A., Herzig, J., Lourie, N., & Berant, J. (2018). Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937.
  23. Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H. T., ... & Le, Q. (2022). Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
  24. Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., ... & Zhou, D. (2022). Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  25. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., ... & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824-24837.
  26. Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., & Narasimhan, K. (2023). Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  27. Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
  28. Zhang, Z., Zhang, A., Li, M., & Smola, A. (2022). Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493.