• Title/Summary/Keyword: pipelining

Search Result 141, Processing Time 0.027 seconds

Hardware Implementation for MLP Based Text Detection (MLP 기반의 문자 추출을 위한 하드웨어 구현)

  • Kyoung, Dong-Wuk;Jung, Kee-Chul
    • 한국HCI학회:학술대회논문집
    • /
    • 2006.02a
    • /
    • pp.766-771
    • /
    • 2006
  • 현재 많은 신경망의 하드웨어 구현은 부동 소수점 연산에 비해서 적은 면적과 빠른 수행시간을 가지는 고정소수점 연산을 많이 사용하지만, 소프트웨어에서는 일반적으로 높은 정확도를 가지는 부동소수점 연산을 사용한다. 신경망의 하드웨어 구현에서 많이 사용하는 고정소수점 연산은 부동소수점 연산에 비해서 빠른 처리속도와 적은 면적으로써 쉽게 하드웨어 구현에 용이하지만, 부동소수점 연산에 비해서 낮은 정확도와 기존의 부동소수점 연산을 사용하는 소프트웨어 신경망을 쉽게 적용할 수 없는 단점을 가진다. 본 논문에서는 부동소수점 연산을 사용하여 문자 추출 MLP의 데이터 변환 없이 적용할 수 있는 전체 파이프라이닝 설계 구조를 제안한다. 제안된 설계방법은 신경망의 전체 구조를 입력층과 은닉층을 링크 병렬화 방법과 은닉층과 출력층을 뉴런 병렬화 방법을 개선하여 쉽게 파이프라이닝 구조로 설계함으로써 신경망 처리는 은닉층 뉴런수와 동일한 주기로 처리되며, 기존의 문자추출 소프트웨어 신경망을 제안된 하드웨어 설계방법으로 구현하였을 때 11배의 빠른 성능을 나타낸다.

  • PDF

CMOS Clockless Wave Pipelined Adder Using Edge-Sensing Completion Detection (에지완료 검출을 이용한 클럭이 없는 CMOS 웨이브파이프라인 덧셈기 설계)

  • Ahn, Yong-Sung;Kang, Jin-Ku
    • Journal of IKEEE
    • /
    • v.8 no.2 s.15
    • /
    • pp.161-165
    • /
    • 2004
  • In this paper, an 8bit wave pipelined adder using the static CMOS plus Edge-Sensing Completion Detection Logic is presented. The clockless wave-pipelining algorithm was implemented in the circuit design. The Edge-Sensing Completion Detection (ESCD) in the algorithm is consisted of edge-sensing circuits and latches. Using the algorithm, skewed data at the output of 8bit adder could be aligned. Simulation results show that the adder operates at 1GHz in $0.35{\mu}m$ CMOS technology with 3.3V supply voltage.

  • PDF

Implementation of the ACELP/MPMLQ-Based Dual-Rate Voice Coder Using DSP (ACELP/MP-MLQ에 기초한 dual-rate 음성 코더의 DSP 구현)

  • Lee Jae-Sik;Son Yong-Ki;Jeon Il;Chang Tae-Gyu;Min Byoung-Ki
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.51-54
    • /
    • 2000
  • This paper describes the fixed-point DSP implementation of a CELP(code-excited linear prediction)-based speech coder. The effective realization methodologies to maximize the utilization of the DSP's architectural features, specifically Parallel movement and pipelining are also presented together with the implementation results targeted for the ITU-T standard G.723.1 using Motorola DSP56309. The operation of the implemented speech coder is verified using the test vectors offered by the standard as well as using the peripheral interface circuits designed for the coder's real-time operation.

  • PDF

High-Speed Intra Prediction VLSI Implementation for HEVC (HEVC 용 고속 인트라 예측 VLSI 구현)

  • Jo, Hyeonsu;Hong, Youpyo;Jang, Hanbeyoul
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.11
    • /
    • pp.1502-1506
    • /
    • 2016
  • HEVC (High Efficiency Video Coding) is a recently proposed video compression standard that has a two times greater coding efficiency than previous video compression standards. The key factors of high compression performance and increasement of computational complexity are the various types of block partitions and modes of intra prediction in HEVC. This paper presents an intra prediction hardware architecture for HEVC utilizing pipelining and interleaving techniques to increase the efficiency and performance while reducing the requirement for hardware resources.

DSP Performance Maximization with Multisample Technique

  • Lee, Hosun;Lawrence K.W. Law;Youngyearl Han
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.471-474
    • /
    • 2000
  • In this paper, we present multisample DSP coding technique for StarCore, SC 140 DSP. The multisample programming is a pipelining technique that exploits operand reuse both coefficients and variables within kernel. A coefficient or operand is loaded once from memory and then the value may be used by multiple ALUs. It is possible to evaluate one intermediate product from each of four output sample calculations in parallel . Therefore, parallelization has been achieved by processing multiple samples in parallel rather than multiple intermediate products belonging to only one sample. The benefits of decreasing the number of memory moves per sample is to increase the algorithm perforomance. In this paper, the multisample technique has been implemented in FIR filter calculation using Motorola StarCore DSP development tool.

  • PDF

High-Speed Hardware Architectures for ARIA with Composite Field Arithmetic and Area-Throughput Trade-Offs

  • Lee, Sang-Woo;Moon, Sang-Jae;Kim, Jeong-Nyeo
    • ETRI Journal
    • /
    • v.30 no.5
    • /
    • pp.707-717
    • /
    • 2008
  • This paper presents two types of high-speed hardware architectures for the block cipher ARIA. First, the loop architectures for feedback modes are presented. Area-throughput trade-offs are evaluated depending on the S-box implementation by using look-up tables or combinational logic which involves composite field arithmetic. The sub-pipelined architectures for non-feedback modes are also described. With loop unrolling, inner and outer round pipelining techniques, and S-box implementation using composite field arithmetic over $GF(2^4)^2$, throughputs of 16 Gbps to 43 Gbps are achievable in a 0.25 ${\mu}m$ CMOS technology. This is the first sub-pipelined architecture of ARIA for high throughput to date.

  • PDF

FPGA Design of High-Speed Motion Estimator (고속 움직임 예측기의 FPGA 설계)

  • Lim, Jeong-Hun;Seo, Young-Ho;Choi, Hyun-Jun;Kim, Dong-Wook
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2010.07a
    • /
    • pp.104-107
    • /
    • 2010
  • 본 논문은 H.264/AVC 디코더의 하드웨어 구현 시 가장 많은 시간을 소비하는 부분이 움직임 추정기를 하드웨어로 구현하였다. 움직임 추정을 함에 있어서 외부메모리 Access 량을 줄이고, SAD연산을 수행할 때 Clock의 손실 없이 계산을 하는 움직임 예측기를 제안한다. 제안한 구조는 재탐색 구간에서 이전 탐색 범위와 공통부분을 이루는 부분을 레지스터에 따로 저장해 두었다가, 재탐색시에 이전 Data를 사용하는 방법을 이용하였다. 움직임 추정을 수행할 때의 SAD (Sum of absolute differences)연산 부분과 Adder-tree를 묶은 PU Array와 SAD 누적기, 선택기를 Pipelining을 통하여 Clock의 손실 없이 연속적으로 계산하는 움직임 예측기를 설계하였다. 구현한 하드웨어는 최대 446.43MHz의 주파수에서 동작할 수 있었고, 탐색영역 64${\times}$64, 참조 프레임 3, 그리고 영상크기 1920${\times}$1080 기준으로 구현한 결과 50 프레임을 처리할 수 있는 성능을 보였다.

  • PDF

A New Two-Level Index Mapping Scheme for Pipelined Implementation of Multidimensional DFT (새로운 이중 색인 사상에 의한 다차원 DFT의 파이프라인 구조 개발)

  • Yu, Sung-Wook
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.56 no.4
    • /
    • pp.790-794
    • /
    • 2007
  • This paper presents a new index mapping method for DFT (Discrete Fourier Transform) and its application to multidimensional DFT. Unlike conventional index mapping methods such as DIT (Decimation in Time) or DIF (Decimation in Frequency) algorithms, the proposed method is based on two levels of decomposition and it can be very efficiently used for implementing multidimensional DFT as well as 1-dimensional DFT. The proposed pipelined architecture for multidimensional DFT is very flexible so that it can lead to the best tradeoff between performance and hardware requirements. Also, it can be easily extended to higher dimensional DFTs since the number of CEs (Computational Elements) and DCs (Delay Commutators) increase only linearly with the dimension. Various implementation options based on different radices and different pipelining depths will be presented.

A Datapath Scheduling Under Resource Constraints (자원제약조건 하에서의 데이터패스 스케듈링)

  • 이근만;임인칠
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.4
    • /
    • pp.424-432
    • /
    • 1992
  • This paper deals with the scheduling ploblems, which are the most important subtasks in High-level syntheses. IP(integer programming) formulations is used as the scheduling problem approach. This paper describes a new resource-constraints scheduling algorithm. We have concentrated our attentions on the multicycle operations and the structural pipelining, and we fully analyze the characteristics of operators to achieve the maximal performance and the maximal resource sharing. For experiment results, we choose the 5-th order digital wave filter as a benchmark and do the schedule, Finally, we can obtain near-optimal scheduling results.

  • PDF

A Low-Noise and Small-Size DC Reference Circuit for High Speed CMOS A/D Converters

  • Hwang, Sang-Hoon;Song, Min-Kyu
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.7 no.1
    • /
    • pp.43-50
    • /
    • 2007
  • In a high-speed flash style or a pipelining style analog-to-digital converter (A/D converter), the DC reference fluctuation caused by external noises becomes serious, as the sampling frequency is increased. To reduce the fluctuations in conventional A/D converters, capacitors have been simply used, but the layout area was large. Instead of capacitors, a low-noise and small-size DC reference circuit based on transmission gate (TG) is proposed in this paper. In order to verify the proposed technique, we designed and manufactured a 6-bit 2GSPS CMOS A/D converter. The A/D converter is designed with a 0.18um 1-poly 6-metal n-well CMOS technology, and it consumes 145mW at 1.8V power supply. It occupies the chip area of 977um by 1040um. The measured result shows that SNDR is 36.25 dB and INL/DNL is within 0.5LSB, even though the DC reference fluctuation is serious.