Technical Trends in On-device Small Language Model Technology Development

온디바이스 소형언어모델 기술개발 동향

  • G. Kim ;
  • K. Yoon ;
  • R. Kim ;
  • J. H. Ryu ;
  • S. C. Kim
  • 김근용 (엣지컴퓨팅응용서비스연구실) ;
  • 윤기하 (엣지컴퓨팅응용서비스연구실) ;
  • 김량수 (엣지컴퓨팅응용서비스연구실) ;
  • 류지형 (엣지컴퓨팅응용서비스연구실) ;
  • 김성창 (엣지컴퓨팅응용서비스연구실)
  • Published : 2024.08.01


This paper introduces the technological development trends in on-device SLMs (Small Language Models). Large Language Models (LLMs) based on the transformer model have gained global attention with the emergence of ChatGPT, providing detailed and sophisticated responses across various knowledge domains, thereby increasing their impact across society. While major global tech companies are continuously announcing new LLMs or enhancing their capabilities, the development of SLMs, which are lightweight versions of LLMs, is intensely progressing. SLMs have the advantage of being able to run as on-device AI on smartphones or edge devices with limited memory and computing resources, enabling their application in various fields from a commercialization perspective. This paper examines the technical features for developing SLMs, lightweight technologies, semiconductor technology development trends for on-device AI, and potential applications across various industries.



본 연구는 산업통상자원부(MOTIE)와 한국에너지기술평가원(KETEP)의 지원을 받아 수행한 연구 과제입니다[No. 2021202090053B].


  1. A. Vaswani et al., "Attention is all you need," in Proc. NeurIPS, (Long Beach, CA, USA), Dec. 2017.
  2. Samsung Newsroom, "삼성전자, '삼성 AI 포럼'서 자체 개발 생성형 AI '삼성 가우스' 공개," 2023. 11. 8.
  3. D. Hendrycks et al., "Measuring massive multitask language understanding," in Proc. ICLR, (Virtual Only), May 2021.
  4. L. Zheng et al., "Judging LLM-as-a-judge with MTbench and chatbot arena," in Proc. NeurIPS, (New Orleans, LA, USA), Dec. 2023.
  5. Meta, Introducing Meta Llama 3: The Most Capable Openly Available LLM to Date,
  6. J. Ainslie et al., "GQA: Training generalized multi-query transformer models from multi-head checkpoints," in Proc. EMNLP, (Singapore, Singapore), Dec. 2023.
  7. A.Q. Jiang et al., "Mistral 7B," arXiv preprint, CoRR, 2023, arXiv: 2310.06825.
  8. Google Gemma Team, "Gemma: Open models based on gemini research and technology," arXiv preprint, CoRR, 2024, arXiv: 2403.08295.
  9. M. Abdin et al., "Phi-3 technical report: A highly capable language model locally on your phone," arXiv preprint, CoRR, 2024, arXiv: 2404.14219.
  10. S. Gunasekar et al., "Textbooks are all you need," arXiv preprint, CoRR, 2023, arXiv: 2306.11644.
  11. X. Ma et al., "LLM-Pruner: On the structural pruning of large language models," in Proc. NeurIPS, (New Orleans, LA, USA), Dec. 2023.
  12. M. Sun et al., "A simple and effective pruning approach for large language models," in Proc. ICLR, (Vienna, Austria), May 2024.
  13. Y. Gu et al., "MiniLLM: Knowledge distillation of large language models," in Proc. ICLR, (Vienna, Austria), May 2024.
  14. W. Liu et al., "Mind's mirror: Distilling self-evaluation capability and comprehensive thinking from large language models," arXiv preprint, CoRR, 2024, arXiv: 2311.09214v3.
  15. A.Q. Jiang et al., "Mixtral of experts," arXiv preprint, CoRR, 2024, arXiv: 2401.04088.
  16. 전원, 여준기, "초거대 인공지능 프로세서 반도체 기술 개발동향," 전자통신동향분석, 제38권 제5호, 2023.
  17. Qualcomm, Unlocking On-Device Generative AI With an NPU and Heterogeneous Computing, 2024. 2.
  18. CNET, Apple A17 Pro: The New Chip Brain in the iPhone 15 Pro, Pro Max, 2023. 9. 12.
  19. Samsung Newsroom, "삼성전자, 업계 최초 'CXL 2.0 D램' 개발," 2023. 5. 12.
  20. SK하이닉스 뉴스룸, "[2023 AI 메모리 결산] HBM.PIM.CXL 라인업 '탄탄' SK하이닉스, Global No.1 AI Company로 도약한다," 2023. 12. 20.
  21. S. Jung et al., "A crossbar array of magnetoresistive memory devices for in-memory computing," Nature, vol. 601, 2022, pp. 211-216.
  22. H. Liu et al., "Visual instruction tuning," in Proc. NeurIPS, (New Orleans, LA, USA), Dec. 2023.
  23. AI-RAN alliance,