통합 검색 | Korea Science

이재홍;김덕수
- 방송공학회논문지
- /
- 제28권1호
- /
- pp.21-30
- /
- 2023
컴퓨터 생성 홀로그래피는 일반 이미지에 비해 연산 부하와 메모리 요구량이 크다. 본 논문은 정밀도를 낮추어 연산속도를 높이는저정밀도(low-precision) 및 혼합정밀도(mixed precision) 연산 방법을 회절연산에 적용하여, 정밀도에 따른 홀로그램의 생성 속도와 품질의 변화를 분석한다. 본 논문은 배정밀도, 단정밀도, bfloat16 정밀도에서의 회전 연산을 비교하였으며, bfloat16의 회절연산의 속도가 배정밀도에 비해 최대 5.94배, 단정밀도에 비해 1.52배 빠른 것을 확인하였다. 또한, MSE, PSNR, SSIM을 기준으로 회절 연산의오차를 측정하였으며, 정밀도가 낮아질수록 홀로그램 품질이 낮아지는 것을 확인했다. 하지만, 정성적인 이미지 품질에는 유의미한 영향이 없는 것을 확인했다. 이러한 결과는, bfloat16등 낮은 정밀도 연산의 홀로그램 연산으로의 적용 가능성을 보여준다.
https://doi.org/10.5909/JBE.2023.28.1.21 인용 PDF

김혜지;한진호;권영수
- 전자통신동향분석
- /
- 제37권1호
- /
- pp.53-62
- /
- 2022
With increasing size of transformer-based neural networks, a light-weight algorithm and efficient AI accelerator has been developed to train these huge networks in practical design time. In this article, we present a survey of state-of-the-art research on the low-precision computational algorithms especially for floating-point formats and their hardware accelerator. We describe the trends by focusing on the work of two leading research groups-IBM and Seoul National University-which have deep knowledge in both AI algorithm and hardware architecture. For the low-precision algorithm, we summarize two efficient floating-point formats (hybrid FP8 and radix-4 FP4) with accuracy-preserving algorithms for training on the main research stream. Moreover, we describe the AI processor architecture supporting the low-bit mixed precision computing unit including the integer engine.
https://doi.org/10.22648/ETRI.2022.J.370106 인용 PDF

이종남;박종화;신경욱
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2000년도 하계종합학술대회 논문집(2)
- /
- pp.149-152
- /
- 2000
A dual-mode multiplier (DMM) that performs single- and double-precision multiplications has been designed. An algorithm for efficiently implementing double-precision multiplication with a single-precision multiplier was proposed, which is based on partitioning double-precision multiplication into four single-precision sub-multiplications and computing them with sequential accumulations. When compared with conventional double-precision multipliers, our approach reduces the hardware complexity by about one third resulting in small silicon area and low-power dissipation at the expense of increased latency and throughput cycles.
PDF