DOI QR코드

DOI QR Code

A Study on Test-Driven Development Method with the Aid of Generative AI in Software Engineering

  • Woochang Shin (Dept. of Computer Science, Seokyeong University)
  • Received : 2024.09.26
  • Accepted : 2024.10.05
  • Published : 2024.11.30

Abstract

This study explores the integration of Generative AI into Test-Driven Development (TDD) to efficiently produce code that accurately reflects programmers' requirements in software engineering. Using the Account class as an example, we analyzed the code generation capabilities of leading Generative AI models-OpenAI's ChatGPT, GitHub's Copilot, and Google's Gemini. Our findings indicate that while Generative AI can automatically generate code, it often fails to capture programmers' intent, potentially leading to functional errors or security vulnerabilities. By applying TDD principles and providing detailed test cases to the Generative AI, we demonstrated that the generated code more closely aligns with the programmer's intentions and successfully passes specified tests. This approach reduces the need for manual code reviews and enhances development efficiency. We propose a development process that combines TDD with Generative AI, leveraging the strengths of both to efficiently produce high-quality software. Future research will focus on extending this approach to more complex systems and exploring automatic test case generation techniques.

Keywords

Acknowledgement

This Research was supported by Seokyeong University in 2023.

References

  1. M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de O. Pinto, J. Kaplan, and W. Zaremba, "Evaluating Large Language Models Trained on Code," arXiv preprint arXiv:2107.03374, 2021. DOI: https://doi.org/10.48550/arXiv.2107.03374
  2. A. Svyatkovskiy, S. Deng, S. Fu, and N. Sundaresan, "IntelliCode Compose: Code Generation Using Transformer," Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1433-1443, 2020. DOI: https://doi.org/10.1145/3368089.3417058
  3. M. Tufano, C. Watson, G. Bavota, M. D. Penta, M. White, and D. Poshyvanyk, "Deep learning similarities from different representations of source code," 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp. 542-553, 2018. DOI: https://dl.acm.org/doi/10.1145/3196398.3196431
  4. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative Adversarial Networks," Communications of the ACM, Volume 63, Issue 11, pp. 139-144, 2020. DOI: https://doi.org/10.1145/3422622
  5. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, and D. Amodei, "Language models are few-shot learners," Advances in Neural Information Processing Systems, Vol. 33, pp. 1877-1901, 2020. DOI: https://doi.org/10.48550/arXiv.2005.14165
  6. H. Pearce, B. Ahmad, B. Tan, B.D. Gavitt, and R. Karri, "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions," In 2022 IEEE Symposium on Security and Privacy (SP), pp. 754-768, 2022. DOI: https://doi.org/10.1109/SP46214.2022.9833571
  7. N. Perry, M. Srivastava, D. Kumar, and D. Boneh, "Do Users Write More Insecure Code with AI Assistants?," CCS '23: Proc. of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pp. 2785-2799, 2023. DOI: https://doi.org/10.1145/3576915.3623157
  8. K. Beck, Test-Driven Development: By Example. Addison-Wesley Professional, 2003.
  9. A. Svyatkovskiy, S. Deng, S. Fu, and N. Sundaresan, " Intellicode Compose: Code Generation Using Transformer, " Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1433-1443, 2020. DOI: http://doi.org/10.1145/3368089.3417058
  10. J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang, C. Cai, M. Terry, Q. Le, and C. Sutton, "Program Synthesis with Large Language Models, " arXiv preprint arXiv:2108.07732, 2021. DOI: https://doi.org/10.48550/arXiv.2108.07732
  11. Y. Tian, K. Pei, S. Jana, and B. Ray, "DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars," Proceedings of the 40th International Conference on Software Engineering, pp. 303-314. 2021. DOI: https://doi.org/10.48550/arXiv.1708.08559
  12. H. Ayenew and M. Wagaw, "Software Test Case Generation Using Natural Language Processing (NLP): A Systematic Literature Review," Artificial Intelligence Evolution, pp. 1-10, 2024. DOI: https://doi.org/10.37256/aie.5120243220
  13. S. Bhatia, T. Gandhi, D. Kumar, and P. Jalote. "Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools," In Proceedings of the 1st International Workshop on Large Language Models for Code (LLM4Code '24). ACM, NY, USA, pp. 54-61, 2024. DOI:https://doi.org/10.1145/3643795.3648396
  14. M. Tufano, C. Watson, G. Bavota, M. Di Penta, M. White, and D. Poshyvanyk, "An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation," ACM Transactions on Software Engineering and Methodology, vol. 28, no. 4, pp. 1-29, 2019. DOI: https://doi.org/10.48550/arXiv.1812.08693
  15. M. Vasic, A. Kanade, P. Maniatis, D. Bieber, and R. Shingh, "Neural Program Repair by Jointly Learning to Localize and Repair," Proceedings of the 6th International Conference on Learning Representations, 2019. DOI:https://doi.org/10.48550/arXiv.1904.01720