DOI QR코드

DOI QR Code

A Survey of Automatic Code Generation from Natural Language

  • Shin, Jiho (Dept. of Information and Communication Technology, Handong Global University) ;
  • Nam, Jaechang (School of Computer Science and Electrical Engineering, Handong Global University)
  • 투고 : 2020.01.31
  • 심사 : 2020.04.03
  • 발행 : 2021.06.30

초록

Many researchers have carried out studies related to programming languages since the beginning of computer science. Besides programming with traditional programming languages (i.e., procedural, object-oriented, functional programming language, etc.), a new paradigm of programming is being carried out. It is programming with natural language. By programming with natural language, we expect that it will free our expressiveness in contrast to programming languages which have strong constraints in syntax. This paper surveys the approaches that generate source code automatically from a natural language description. We also categorize the approaches by their forms of input and output. Finally, we analyze the current trend of approaches and suggest the future direction of this research domain to improve automatic code generation with natural language. From the analysis, we state that researchers should work on customizing language models in the domain of source code and explore better representations of source code such as embedding techniques and pre-trained models which have been proved to work well on natural language processing tasks.

키워드

과제정보

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea Government (MIST) (No. 2018R1C1B6001919).

참고문헌

  1. D. Price, E. Rilofff, J. Zachary, and B. Harvey, "NaturalJava: a natural language interface for programming in Java," in Proceedings of the 5th International Conference on Intelligent User Interfaces, New Orleans, LA, 2000, pp. 207-211.
  2. D. Vadas and J. R. Curran, "Programming with unrestricted natural language," in Proceedings of the Australasian Language Technology Workshop, Sydney, Australia, 2005, pp. 191-199.
  3. M. Allamanis, E. T. Barr, P. Devanbu, and C. Sutton, "A survey of machine learning for big code and naturalness," ACM Computing Surveys (CSUR), vol. 51, no. 4, pp. 1-37, 2018.
  4. G. Neubig and M. Allamanis, "Modelling Natural Language, Programs, and their Intersection," in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts, New Orleans, LA, 2018, pp. 1-3.
  5. O. Pulido-Prieto and U. Juarez-Martinez, "A survey of naturalistic programming technologies," ACM Computing Surveys (CSUR), vol. 50, no. 5, pp. 1-35, 2017.
  6. G. L. White and M. P. Sivitanides, "A theory of the relationships between cognitive requirements of computer programming languages and programmers' cognitive characteristics," Journal of Information Systems Education, vol. 13, no. 1, pp. 59-66, 2002.
  7. C. Ghezzi and D. Mandrioli, "The challenges of software engineering education," in Software Engineering Education in the Modern Age. Heidelberg, Germany: Springer, 2005, pp. 115-127.
  8. M. Allamanis, D. Tarlow, A. Gordon, and Y. Wei, "Bimodal modelling of source code and natural language," in Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France, 2015, pp. 2123-2132.
  9. P. Clark, B. Porter, and B. P. Works, "Km-the knowledge machine 2.0: user's manual," Department of Computer Science, University of Texas at Austin, Austin, TX, 2004.
  10. A. Desai, S. Gulwani, V. Hingorani, N. Jain, A. Karkare, M. Marron, and S. Roy, "Program synthesis using natural language," in Proceedings of the 38th International Conference on Software Engineering, Austin, TX, 2016, pp. 345-356.
  11. G. E. Heidorn, "An interactive simulation programming system which converses in English," in Proceedings of the 6th Conference on Winter Simulation, San Francisco, CA, 1973, pp. 781-794.
  12. G. Little and R. C. Miller, "Translating keyword commands into executable code," in Proceedings of the 19th Annual ACM Symposium on User Interface Software and Technology, Montreux, Switzerland, 2006, pp. 135-144.
  13. R. Knoll and M. Mezini, "Pegasus: first steps toward a naturalistic programming language," in Companion to the 21st ACM SIGPLAN Symposium on Object-Oriented Programming Systems, Languages, and Applications, Portland, OR, 2006, pp. 542-559.
  14. A. Cozzie, M. Finnicum, and S. T. King, "Macho: Programming with Man Pages," in Proceedings of the 13th Workshop on Hot Topics in Operating Systems (HotOS), Napa, CA, 2011.
  15. X. Gu, H. Zhang, and S. Kim, "Deep code search," in Proceedings of 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), Gothenburg, Sweden, 2018, pp. 933-944.
  16. W. Ling, E. Grefenstette, K. M. Hermann, T. Kocisky, A. Senior, F. Wang, and P. Blunsom, "Latent predictor networks for code generation," 2016 [Online]. Available: https://arxiv.org/abs/1603.06744.
  17. P. Yin and G. Neubig, "A syntactic neural model for general-purpose code generation," 2017 [Online]. Available: https://arxiv.org/abs/1704.01696.
  18. T. J. Schriber, "Simulation using GPSS," University of Michigan, Ann Arbor, MI, 1974.
  19. S. Gulwani and M. Marron, "Nlyze: Interactive programming by natural language for spreadsheet data analysis and manipulation," in Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, 2014, pp. 803-814.
  20. S. Shi, Y. Wang, C. Y. Lin, X. Liu, and Y. Rui, "Automatically solving number word problems by semantic parsing and reasoning," in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, pp. 1132-1142.
  21. S. Mandal and S. K. Naskar, "Natural language programing with automatic code generation towards solving addition-subtraction word problems," in Proceedings of the 14th International Conference on Natural Language Processing (ICON), Kolkata, India, 2017, pp. 146-154.
  22. S. Chong and R. Pucella, "A framework for creating natural language user interfaces for action-based applications," 2004 [Online]. Available: https://arxiv.org/abs/cs/0412065.
  23. A. Begel and S. L. Graham, "Spoken programs," in Proceedings of 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05), Dallas, TX, 2005, pp. 99-106.
  24. M. Mefteh, A. Ben Hamadou, and R. Knoll, "Ara_Pegasus: a new framework for programming using the Arabic natural language," in Proceedings of International Conference on Computing and Information Technology (ICCIT), Chittagong, Bangladesh, 2012, pp. 468-473.
  25. H. Lieberman and M. Ahmad, "Knowing what you're talking about: natural language programming of a multi-player online game," in No Code Required. Amsterdam, The Netherlands: Morgan Kaufmann, 2010, pp. 331-343.
  26. V. Le, S. Gulwani, and Z. Su, "Smartsynth: synthesizing smartphone automation scripts from natural language," in Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, Taipei, Taiwan, 2013, pp. 193-206.
  27. M. Landhausser, S. Weigelt, and W. F. Tichy, "NLCI: a natural language command interpreter," Automated Software Engineering, vol. 24, no. 4, pp. 839-861, 2017. https://doi.org/10.1007/s10515-016-0202-1
  28. T. Gvero and V. Kuncak, "Synthesizing Java expressions from free-form queries," in Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, Pittsburgh, PA, 2015, pp. 416-432.
  29. C. Quirk, R. Mooney, and M. Galley, "Language to code: learning semantic parsers for if-this-then-that recipes," in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 2015, pp. 878-888.
  30. M. Raghothaman, Y. Wei, and Y. Hamadi, "Swim: synthesizing what I mean-code search and idiomatic snippet synthesis," in Proceedings of 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), Austin, TX, 2016, pp. 357-367.
  31. X. V. Lin, C. Wang, D. Pang, K. Vu, and M. D. Ernst, "Program synthesis from natural language using recurrent neural networks," Department of Computer Science and Engineering, University of Washington, Seattle, WA, Tech. Rep. No. UW-CSE-17-03-01, 2017.
  32. X. V. Lin, C. Wang, L. Zettlemoyer, and M. D. Ernst, "Nl2bash: a corpus and semantic parser for natural language interface to the Linux operating system," 2018 [Online]. Available: https://arxiv.org/abs/1802.08979.
  33. V. Zhong, C. Xiong, and R. Socher, "Seq2sql: generating structured queries from natural language using reinforcement learning," 2017 [Online]. Available: https://arxiv.org/abs/1709.00103.
  34. R. Sirres, T. F. Bissyande, D. Kim, D. Lo, J. Klein, K. Kim, and Y. Le Traon, "Augmenting and structuring user queries to support efficient free-form code search," Empirical Software Engineering, vol. 23, no. 5, pp. 2622-2654, 2018. https://doi.org/10.1007/s10664-017-9544-y
  35. V. Schlegel, B. Lang, S. Handschuh, and A. Freitas, "Vajra: step-by-step programming with natural language," in Proceedings of the 24th International Conference on Intelligent User Interfaces, Marina del Ray, CA, 2019, pp. 30-39.
  36. H. Liu and H. Lieberman, "Metafor: visualizing stories as code," in Proceedings of the 10th International Conference on Intelligent User Interfaces, San Diego, CA, 2005, pp. 305-307.
  37. P. Clark, P. Harrison, T. Jenkins, J. A. Thompson, and R. H. Wojcik, "Acquiring and using world knowledge using a restricted subset of English," in Proceedings of the Eighteenth International Florida Artificial Intelligence Research Society Conference, Clearwater Beach, FL, 2005, pp. 506-511.
  38. P. Clark, W. R. Murray, P. Harrison, and J. Thompson, "Naturalness vs. predictability: a key debate in controlled languages," in Controlled Natural Language. Heidelberg, Germany: Springer, 2009, pp. 65-81.
  39. K. Somasundaram and H. Swaminathan, "Automatic programming through natural language compiler," in Proceedings on the International Conference on Artificial Intelligence (ICAI), Las Vegas, NV, 2011.
  40. M. Soeken, R. Wille, and R. Drechsler, "Assisted behavior driven development using natural language processing," in Objects, Models, Components, Patterns. Heidelberg, Germany: Springer, 2012, pp. 269-287.
  41. N. Kushman and R. Barzilay, "Using semantic unification to generate regular expressions from natural language," in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, 2013, pp. 826-836.
  42. V. Harinarayan, A. Rajaraman, and A. Ranganathan, "Hybrid machine/human computing arrangement," U.S. Patent 7197459, Mar 27, 2007.
  43. A. E. Cozzie and S. King, "Macho: Writing programs with natural language and examples," 2012 [Online]. Available: https://www.ideals.illinois.edu/handle/2142/33791.
  44. M. Manshadi, D. Gildea, and J. Allen, "Integrating programming by example and natural language programming," in Proceedings of the 27th AAAI Conference on Artificial Intelligence, Bellevue, WA, 2013.
  45. T. M. Mitchell, "Generalization as search," Artificial Intelligence, vol. 18, no. 2, pp. 203-226, 1982. https://doi.org/10.1016/0004-3702(82)90040-6
  46. J. Howard and S. Ruder, "Universal language model fine-tuning for text classification," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), Melbourne, Australia, 2018, pp. 328-339.
  47. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, K. (2018). "Bert: pre-training of deep bidirectional transformers for language understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Minneapolis, MN, 2019, pp. 4171-4186.
  48. M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, L. (2018). Deep contextualized word representations," in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), New Orleans, LA, 2018, pp. 2227-2237.