DOI QR코드

DOI QR Code

Technical Trends in Artificial Intelligence for Robotics Based on Large Language Models

거대언어모델 기반 로봇 인공지능 기술 동향

  • J. Lee ;
  • S. Park ;
  • N.W. Kim ;
  • E. Kim ;
  • S.K. Ko
  • 이준기 (에너지지능화연구실 ) ;
  • 박상준 (에너지지능화연구실 ) ;
  • 김낙우 (에너지지능화연구실 ) ;
  • 김에덴 (에너지지능화연구실 ) ;
  • 고석갑 (에너지지능화연구실 )
  • Published : 2024.02.01

Abstract

In natural language processing, large language models such as GPT-4 have recently been in the spotlight. The performance of natural language processing has advanced dramatically driven by an increase in the number of model parameters related to the number of acceptable input tokens and model size. Research on multimodal models that can simultaneously process natural language and image data is being actively conducted. Moreover, natural-language and image-based reasoning capabilities of large language models is being explored in robot artificial intelligence technology. We discuss research and related patent trends in robot task planning and code generation for robot control using large language models.

Keywords

References

  1. L. Kunze et al., "Artificial intelligence for long-term robot autonomy: A survey," IEEE Robot. Autom. Lett., vol. 3, no. 4, 2018, pp. 4023-4030.  https://doi.org/10.1109/LRA.2018.2860628
  2. M. Brady, "Artificial intelligence and robotics," Artif. Intell., vol. 26, no. 1, 1985, pp. 79-121.  https://doi.org/10.1016/0004-3702(85)90013-X
  3. M. Soori, B. Arezoo, and R. Dastres, "Artificial intelligence, machine learning and deep learning in advanced robotics, a review," Cognit. Robot., vol. 3, 2023. 
  4. S. Cebollada et al., "A state-of-the-art review on mobile robotics tasks using artificial intelligence and visual data," Expert Syst. Appl., vol. 167, 2021, article no. 114195. 
  5. J. Wei et al., "Chain-of-thought prompting elicits reasoning in large language models," in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 24824-24837. 
  6. T. Brown et al., "Language models are few-shot learners," in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 1877-1901. 
  7. M. Ahn et al., "Do as i can, not as i say: Grounding language in robotic affordances," arXiv preprint, CoRR, 2022, arXiv: 2204.01691. 
  8. D. Driess et al., "Palm-e: An embodied multimodal language model," arXiv preprint, CoRR, 2023, arXiv: 2303.03378. 
  9. A. Brohan et al., "Rt-1: Robotics transformer for real-world control at scale," arXiv preprint, CoRR, 2022, arXiv: 2212.06817. 
  10. A. Brohan et al., "Rt-2: Vision-language-action models transfer web knowledge to robotic control," arXiv preprint, CoRR, 2023, arXiv: 2307.15818. 
  11. X. Chen et al., "PaLI-X: On scaling up a multilingual vision and language model," arXiv preprint, CoRR, 2023, arXiv: 2305.18565. 
  12. A. Padalkar et al., "Open x-embodiment: Robotic learning datasets and rt-x models," arXiv preprint, CoRR, 2023, arXiv: 2310.08864. 
  13. J. Liang et al., "Code as policies: Language model programs for embodied control," in Proc. ICRA, (London, U.K.), May 2023. 
  14. I. Singh et al., "Progprompt: Generating situated robot task plans using large language models," in Proc. ICRA, (London, U.K.), May 2023. 
  15. P. Shah, Controlling a robot based on free-form natural language input, U.S. Patent 2021-0086353, Mar. 25, 2021. 
  16. K. Hausman, Natural language control of a robot, U.S. Patent 2023-0311335, Oct. 5, 2023. 
  17. C.J. Paxton, Interpreting discrete tasks from complex instructions for robotic systems and applications, U.S. Patent 2023-0297074, Sept. 21, 2023. 
  18. C.J. Paxton, Semantic rearrangement of unknown objects from natural language commands, U.S. Patent 2023-0073154, Mar. 9, 2023. 
  19. A. HANDA, Neural networks to generate robotic task demonstrations, U.S. Patent 2023-0191596, June 9, 2023.