Browse > Article
http://dx.doi.org/10.3745/JIPS.04.0216

A Survey of Automatic Code Generation from Natural Language  

Shin, Jiho (Dept. of Information and Communication Technology, Handong Global University)
Nam, Jaechang (School of Computer Science and Electrical Engineering, Handong Global University)
Publication Information
Journal of Information Processing Systems / v.17, no.3, 2021 , pp. 537-555 More about this Journal
Abstract
Many researchers have carried out studies related to programming languages since the beginning of computer science. Besides programming with traditional programming languages (i.e., procedural, object-oriented, functional programming language, etc.), a new paradigm of programming is being carried out. It is programming with natural language. By programming with natural language, we expect that it will free our expressiveness in contrast to programming languages which have strong constraints in syntax. This paper surveys the approaches that generate source code automatically from a natural language description. We also categorize the approaches by their forms of input and output. Finally, we analyze the current trend of approaches and suggest the future direction of this research domain to improve automatic code generation with natural language. From the analysis, we state that researchers should work on customizing language models in the domain of source code and explore better representations of source code such as embedding techniques and pre-trained models which have been proved to work well on natural language processing tasks.
Keywords
Naturalistic Programming; Software Engineering; Survey; Source Code Generation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 V. Le, S. Gulwani, and Z. Su, "Smartsynth: synthesizing smartphone automation scripts from natural language," in Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, Taipei, Taiwan, 2013, pp. 193-206.
2 M. Allamanis, E. T. Barr, P. Devanbu, and C. Sutton, "A survey of machine learning for big code and naturalness," ACM Computing Surveys (CSUR), vol. 51, no. 4, pp. 1-37, 2018.
3 D. Price, E. Rilofff, J. Zachary, and B. Harvey, "NaturalJava: a natural language interface for programming in Java," in Proceedings of the 5th International Conference on Intelligent User Interfaces, New Orleans, LA, 2000, pp. 207-211.
4 D. Vadas and J. R. Curran, "Programming with unrestricted natural language," in Proceedings of the Australasian Language Technology Workshop, Sydney, Australia, 2005, pp. 191-199.
5 O. Pulido-Prieto and U. Juarez-Martinez, "A survey of naturalistic programming technologies," ACM Computing Surveys (CSUR), vol. 50, no. 5, pp. 1-35, 2017.
6 S. Shi, Y. Wang, C. Y. Lin, X. Liu, and Y. Rui, "Automatically solving number word problems by semantic parsing and reasoning," in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, pp. 1132-1142.
7 M. Allamanis, D. Tarlow, A. Gordon, and Y. Wei, "Bimodal modelling of source code and natural language," in Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France, 2015, pp. 2123-2132.
8 G. E. Heidorn, "An interactive simulation programming system which converses in English," in Proceedings of the 6th Conference on Winter Simulation, San Francisco, CA, 1973, pp. 781-794.
9 P. Yin and G. Neubig, "A syntactic neural model for general-purpose code generation," 2017 [Online]. Available: https://arxiv.org/abs/1704.01696.
10 C. Quirk, R. Mooney, and M. Galley, "Language to code: learning semantic parsers for if-this-then-that recipes," in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 2015, pp. 878-888.
11 A. Begel and S. L. Graham, "Spoken programs," in Proceedings of 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05), Dallas, TX, 2005, pp. 99-106.
12 P. Clark, P. Harrison, T. Jenkins, J. A. Thompson, and R. H. Wojcik, "Acquiring and using world knowledge using a restricted subset of English," in Proceedings of the Eighteenth International Florida Artificial Intelligence Research Society Conference, Clearwater Beach, FL, 2005, pp. 506-511.
13 X. V. Lin, C. Wang, L. Zettlemoyer, and M. D. Ernst, "Nl2bash: a corpus and semantic parser for natural language interface to the Linux operating system," 2018 [Online]. Available: https://arxiv.org/abs/1802.08979.
14 A. Cozzie, M. Finnicum, and S. T. King, "Macho: Programming with Man Pages," in Proceedings of the 13th Workshop on Hot Topics in Operating Systems (HotOS), Napa, CA, 2011.
15 R. Sirres, T. F. Bissyande, D. Kim, D. Lo, J. Klein, K. Kim, and Y. Le Traon, "Augmenting and structuring user queries to support efficient free-form code search," Empirical Software Engineering, vol. 23, no. 5, pp. 2622-2654, 2018.   DOI
16 V. Schlegel, B. Lang, S. Handschuh, and A. Freitas, "Vajra: step-by-step programming with natural language," in Proceedings of the 24th International Conference on Intelligent User Interfaces, Marina del Ray, CA, 2019, pp. 30-39.
17 H. Liu and H. Lieberman, "Metafor: visualizing stories as code," in Proceedings of the 10th International Conference on Intelligent User Interfaces, San Diego, CA, 2005, pp. 305-307.
18 V. Zhong, C. Xiong, and R. Socher, "Seq2sql: generating structured queries from natural language using reinforcement learning," 2017 [Online]. Available: https://arxiv.org/abs/1709.00103.
19 P. Clark, W. R. Murray, P. Harrison, and J. Thompson, "Naturalness vs. predictability: a key debate in controlled languages," in Controlled Natural Language. Heidelberg, Germany: Springer, 2009, pp. 65-81.
20 M. Soeken, R. Wille, and R. Drechsler, "Assisted behavior driven development using natural language processing," in Objects, Models, Components, Patterns. Heidelberg, Germany: Springer, 2012, pp. 269-287.
21 G. Neubig and M. Allamanis, "Modelling Natural Language, Programs, and their Intersection," in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts, New Orleans, LA, 2018, pp. 1-3.
22 M. Raghothaman, Y. Wei, and Y. Hamadi, "Swim: synthesizing what I mean-code search and idiomatic snippet synthesis," in Proceedings of 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), Austin, TX, 2016, pp. 357-367.
23 N. Kushman and R. Barzilay, "Using semantic unification to generate regular expressions from natural language," in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, 2013, pp. 826-836.
24 V. Harinarayan, A. Rajaraman, and A. Ranganathan, "Hybrid machine/human computing arrangement," U.S. Patent 7197459, Mar 27, 2007.
25 X. V. Lin, C. Wang, D. Pang, K. Vu, and M. D. Ernst, "Program synthesis from natural language using recurrent neural networks," Department of Computer Science and Engineering, University of Washington, Seattle, WA, Tech. Rep. No. UW-CSE-17-03-01, 2017.
26 G. Little and R. C. Miller, "Translating keyword commands into executable code," in Proceedings of the 19th Annual ACM Symposium on User Interface Software and Technology, Montreux, Switzerland, 2006, pp. 135-144.
27 K. Somasundaram and H. Swaminathan, "Automatic programming through natural language compiler," in Proceedings on the International Conference on Artificial Intelligence (ICAI), Las Vegas, NV, 2011.
28 M. Manshadi, D. Gildea, and J. Allen, "Integrating programming by example and natural language programming," in Proceedings of the 27th AAAI Conference on Artificial Intelligence, Bellevue, WA, 2013.
29 T. M. Mitchell, "Generalization as search," Artificial Intelligence, vol. 18, no. 2, pp. 203-226, 1982.   DOI
30 J. Howard and S. Ruder, "Universal language model fine-tuning for text classification," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), Melbourne, Australia, 2018, pp. 328-339.
31 G. L. White and M. P. Sivitanides, "A theory of the relationships between cognitive requirements of computer programming languages and programmers' cognitive characteristics," Journal of Information Systems Education, vol. 13, no. 1, pp. 59-66, 2002.
32 C. Ghezzi and D. Mandrioli, "The challenges of software engineering education," in Software Engineering Education in the Modern Age. Heidelberg, Germany: Springer, 2005, pp. 115-127.
33 P. Clark, B. Porter, and B. P. Works, "Km-the knowledge machine 2.0: user's manual," Department of Computer Science, University of Texas at Austin, Austin, TX, 2004.
34 A. Desai, S. Gulwani, V. Hingorani, N. Jain, A. Karkare, M. Marron, and S. Roy, "Program synthesis using natural language," in Proceedings of the 38th International Conference on Software Engineering, Austin, TX, 2016, pp. 345-356.
35 R. Knoll and M. Mezini, "Pegasus: first steps toward a naturalistic programming language," in Companion to the 21st ACM SIGPLAN Symposium on Object-Oriented Programming Systems, Languages, and Applications, Portland, OR, 2006, pp. 542-559.
36 X. Gu, H. Zhang, and S. Kim, "Deep code search," in Proceedings of 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), Gothenburg, Sweden, 2018, pp. 933-944.
37 J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, K. (2018). "Bert: pre-training of deep bidirectional transformers for language understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Minneapolis, MN, 2019, pp. 4171-4186.
38 M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, L. (2018). Deep contextualized word representations," in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), New Orleans, LA, 2018, pp. 2227-2237.
39 A. E. Cozzie and S. King, "Macho: Writing programs with natural language and examples," 2012 [Online]. Available: https://www.ideals.illinois.edu/handle/2142/33791.
40 M. Mefteh, A. Ben Hamadou, and R. Knoll, "Ara_Pegasus: a new framework for programming using the Arabic natural language," in Proceedings of International Conference on Computing and Information Technology (ICCIT), Chittagong, Bangladesh, 2012, pp. 468-473.
41 W. Ling, E. Grefenstette, K. M. Hermann, T. Kocisky, A. Senior, F. Wang, and P. Blunsom, "Latent predictor networks for code generation," 2016 [Online]. Available: https://arxiv.org/abs/1603.06744.
42 T. J. Schriber, "Simulation using GPSS," University of Michigan, Ann Arbor, MI, 1974.
43 S. Gulwani and M. Marron, "Nlyze: Interactive programming by natural language for spreadsheet data analysis and manipulation," in Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, 2014, pp. 803-814.
44 S. Mandal and S. K. Naskar, "Natural language programing with automatic code generation towards solving addition-subtraction word problems," in Proceedings of the 14th International Conference on Natural Language Processing (ICON), Kolkata, India, 2017, pp. 146-154.
45 S. Chong and R. Pucella, "A framework for creating natural language user interfaces for action-based applications," 2004 [Online]. Available: https://arxiv.org/abs/cs/0412065.
46 H. Lieberman and M. Ahmad, "Knowing what you're talking about: natural language programming of a multi-player online game," in No Code Required. Amsterdam, The Netherlands: Morgan Kaufmann, 2010, pp. 331-343.
47 M. Landhausser, S. Weigelt, and W. F. Tichy, "NLCI: a natural language command interpreter," Automated Software Engineering, vol. 24, no. 4, pp. 839-861, 2017.   DOI
48 T. Gvero and V. Kuncak, "Synthesizing Java expressions from free-form queries," in Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, Pittsburgh, PA, 2015, pp. 416-432.