• 제목/요약/키워드: neural processing unit

검색결과 100건 처리시간 0.021초

Rapid and Brief Communication GPU implementation of neural networks

  • Oh, Kyoung-Su;Jung, Kee-Chul
    • 한국HCI학회:학술대회논문집
    • /
    • 한국HCI학회 2007년도 학술대회 3부
    • /
    • pp.322-325
    • /
    • 2007
  • Graphics processing unit (GPU) is used for a faster artificial neural network. It is used to implement the matrix multiplication of a neural network to enhance the time performance of a text detection system. Preliminary results produced a 20-fold performance enhancement using an ATI RADEON 9700 PRO board. The parallelism of a GPU is fully utilized by accumulating a lot of input feature vectors and weight vectors, then converting the many inner-product operations into one matrix operation. Further research areas include benchmarking the performance with various hardware and GPU-aware learning algorithms. (c) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

A layer-wise frequency scaling for a neural processing unit

  • Chung, Jaehoon;Kim, HyunMi;Shin, Kyoungseon;Lyuh, Chun-Gi;Cho, Yong Cheol Peter;Han, Jinho;Kwon, Youngsu;Gong, Young-Ho;Chung, Sung Woo
    • ETRI Journal
    • /
    • 제44권5호
    • /
    • pp.849-858
    • /
    • 2022
  • Dynamic voltage frequency scaling (DVFS) has been widely adopted for runtime power management of various processing units. In the case of neural processing units (NPUs), power management of neural network applications is required to adjust the frequency and voltage every layer to consider the power behavior and performance of each layer. Unfortunately, DVFS is inappropriate for layer-wise run-time power management of NPUs due to the long latency of voltage scaling compared with each layer execution time. Because the frequency scaling is fast enough to keep up with each layer, we propose a layerwise dynamic frequency scaling (DFS) technique for an NPU. Our proposed DFS exploits the highest frequency under the power limit of an NPU for each layer. To determine the highest allowable frequency, we build a power model to predict the power consumption of an NPU based on a real measurement on the fabricated NPU. Our evaluation results show that our proposed DFS improves frame per second (FPS) by 33% and saves energy by 14% on average, compared with DVFS.

합성곱 신경망의 학습 가속화를 위한 방법 (A Method for accelerating training of Convolutional Neural Network)

  • 최세진;정준모
    • 문화기술의 융합
    • /
    • 제3권4호
    • /
    • pp.171-175
    • /
    • 2017
  • 최근 CNN(Convolutional Neural Network)의 구조가 복잡해지고 신견망의 깊이가 깊어지고 있다. 이에 따라 신경망의 학습에 요구되는 연산량 및 학습 시간이 증가하게 되었다. 최근 GPGPU 및 FPGA를 이용하여 신경망의 학습 속도를 가속화 하는 방법에 대한 연구가 활발히 진행되고 있다. 본 논문에서는 NVIDIA GPGPU를 제어하는 CUDA를 이용하여 CNN의 특징추출부와 분류부에 대한 연산을 가속화하는 방법을 제시한다. 특징추출부와 분류부에 대한 연산을 GPGPU의 블록 및 스레드로 할당하여 병렬로 처리하였다. 본 논문에서 제안하는 방법과 기존 CPU를 이용하여 CNN을 학습하여 학습 속도를 비교하였다. MNIST 데이터세트에 대하여 총 5 epoch을 학습한 결과 제안하는 방법이 CPU를 이용하여 학습한 방법에 비하여 약 314% 정도 학습 속도가 향상된 것을 확인하였다.

시그모이드 함수의 디지털 구현에 관한 연구 (On the Digital Implementation of the Sigmoid function)

  • 이호선;홍봉화
    • 정보학연구
    • /
    • 제4권3호
    • /
    • pp.155-163
    • /
    • 2001
  • 디지털 신경회로망의 구현에 있어 시그모이드 함수의 구현은 매우 복잡하고 구현하기 어렵다. 따라서, 본 논문에서는 디지털 신경회로망 구현에 문제가 되는 시그모이드 함수처리를 위한 설계 방법을 제안하였다. 제안된 방법은 잉여수계를 이용하여 MAC(Multiplier and Accumulator) 연산 시, 캐리 전파 없이 고속의 연산을 수행할 수 있고 시그모이드 함수처리를 고속으로 수행할 수 있다. 모의실험결과, 각각의 신경 프로세스에 있어서 4.6nsec 이상의 속도를 보임으로써 고속디지털 신경회로망 구현에 적용될 수 있을 것으로 기대된다.

  • PDF

Use of High-performance Graphics Processing Units for Power System Demand Forecasting

  • He, Ting;Meng, Ke;Dong, Zhao-Yang;Oh, Yong-Taek;Xu, Yan
    • Journal of Electrical Engineering and Technology
    • /
    • 제5권3호
    • /
    • pp.363-370
    • /
    • 2010
  • Load forecasting has always been essential to the operation and planning of power systems in deregulated electricity markets. Various methods have been proposed for load forecasting, and the neural network is one of the most widely accepted and used techniques. However, to obtain more accurate results, more information is needed as input variables, resulting in huge computational costs in the learning process. In this paper, to reduce training time in multi-layer perceptron-based short-term load forecasting, a graphics processing unit (GPU)-based computing method is introduced. The proposed approach is tested using the Korea electricity market historical demand data set. Results show that GPU-based computing greatly reduces computational costs.

잉여수계를 이용한 역전파 신경회로망 구현 (The Implementation of Back Propagation Neural Network using the Residue Number System)

  • 홍봉화;이호선
    • 정보학연구
    • /
    • 제2권2호
    • /
    • pp.145-161
    • /
    • 1999
  • 본 논문에서는 캐리 전파가 없어 고속연산이 가능한 잉여 수 체계를 이용하여 고속으로 동작할 수 있는 역전파 신경회로망을 설계방법을 제안하였다. 설계된 신경회로망은 잉여수계를 이용한 MAC 연산기와 혼합계수 변환을 이용한 시그모이드 함수 연산 부로 구성되며, 설계된 회로는 VHDL로 기술하였고 Compass 툴로 합성하였다. 실험결과, 가장 나쁜 경로일 경우, 약 19nsec의 지연속도를 보였고, 기존의 실수 연산기에 비하여 약 40%정도 하드웨어 크기를 줄일 수 있었다. 본 논문에서 설계한 신경회로망은 실시간 처리를 요하는 병렬분산처리 시스템에 적용될 수 있을 것으로 기대된다.

  • PDF

Role of Carbon Monoxide in Neurovascular Repair Processing

  • Choi, Yoon Kyung
    • Biomolecules & Therapeutics
    • /
    • 제26권2호
    • /
    • pp.93-100
    • /
    • 2018
  • Carbon monoxide (CO) is a gaseous molecule produced from heme by heme oxygenase (HO). Endogenous CO production occurring at low concentrations is thought to have several useful biological roles. In mammals, especially humans, a proper neurovascular unit comprising endothelial cells, pericytes, astrocytes, microglia, and neurons is essential for the homeostasis and survival of the central nervous system (CNS). In addition, the regeneration of neurovascular systems from neural stem cells and endothelial precursor cells after CNS diseases is responsible for functional repair. This review focused on the possible role of CO/HO in the neurovascular unit in terms of neurogenesis, angiogenesis, and synaptic plasticity, ultimately leading to behavioral changes in CNS diseases. CO/HO may also enhance cellular networks among endothelial cells, pericytes, astrocytes, and neural stem cells. This review highlights the therapeutic effects of CO/HO on CNS diseases involved in neurogenesis, synaptic plasticity, and angiogenesis. Moreover, the cellular mechanisms and interactions by which CO/HO are exploited for disease prevention and their therapeutic applications in traumatic brain injury, Alzheimer's disease, and stroke are also discussed.

Gated recurrent unit (GRU) 신경망을 이용한 적혈구 침강속도 예측 (Forecasting of erythrocyte sedimentation rate using gated recurrent unit (GRU) neural network)

  • 이재진;홍현지;송재민;염은섭
    • 한국가시화정보학회지
    • /
    • 제19권1호
    • /
    • pp.57-61
    • /
    • 2021
  • In order to determine erythrocyte sedimentation rate (ESR) indicating acute phase inflammation, a Westergren method has been widely used because it is cheap and easy to be implemented. However, the Westergren method requires quite a long time for 1 hour. In this study, a gated recurrent unit (GRU) neural network was used to reduce measurement time of ESR evaluation. The sedimentation sequences of the erythrocytes were acquired by the camera and data processed through image processing were used as an input data into the neural network models. The performance of a proposed models was evaluated based on mean absolute error. The results show that GRU model provides best accurate prediction than others within 30 minutes.

디지털 뉴런프로세서의 설계에 관한 연구 (Design of the Digital Neuron Processor)

  • 홍봉화;이호선;박화세
    • 전자공학회논문지 IE
    • /
    • 제44권3호
    • /
    • pp.12-22
    • /
    • 2007
  • 본 논문에서는 잉여수체계(Residue Number System)를 이용하여 고속의 디지털 신경회로망을 제안하고 이를 구현하기 위한 중요연산부인 고속의 디지털 뉴런프로세서를 설계하였다. 설계된 디지털 뉴런프로세서는 잉여수계를 이용한 MAC 연산기와 혼합계수 변환을 이용한 시그모이드 함수 연산 부로 구성되며, 설계된 회로는 VHDL로 기술하였고 Compass 툴로 합성하였다. 실험결과, 본 논문에서 설계한 디지털 뉴런프로세서는 19.2nsec의 속도를 보였으며, 실수연산기로 설계한 뉴런프로세서에 비하여 약 50%정도 하드웨어 크기를 줄일 수 있었다. 본 논문에서 설계한 뉴런프로세서는 실시간 처리를 요하는 병렬분산처리 시스템에 적용될 수 있을 것으로 기대된다.

Cycle-accurate NPU 시뮬레이터 및 데이터 접근 방식에 따른 NPU 성능평가 (Cycle-accurate NPU Simulator and Performance Evaluation According to Data Access Strategies)

  • 권구윤;박상우;서태원
    • 대한임베디드공학회논문지
    • /
    • 제17권4호
    • /
    • pp.217-228
    • /
    • 2022
  • Currently, there are increasing demands for applying deep neural networks (DNNs) in the embedded domain such as classification and object detection. The DNN processing in embedded domain often requires custom hardware such as NPU for acceleration due to the constraints in power, performance, and area. Processing DNN models requires a large amount of data, and its seamless transfer to NPU is crucial for performance. In this paper, we developed a cycle-accurate NPU simulator to evaluate diverse NPU microarchitectures. In addition, we propose a novel technique for reducing the number of memory accesses when processing convolutional layers in convolutional neural networks (CNNs) on the NPU. The main idea is to reuse data with memory interleaving, which recycles the overlapping data between previous and current input windows. Data memory interleaving makes it possible to quickly read consecutive data in unaligned locations. We implemented the proposed technique to the cycle-accurate NPU simulator and measured the performance with LeNet-5, VGGNet-16, and ResNet-50. The experiment shows up to 2.08x speedup in processing one convolutional layer, compared to the baseline.