• Title/Summary/Keyword: pipelined memory

Search Result 79, Processing Time 0.026 seconds

Development of a Flash ADC with an Analog Memory (아날로그메모리를 이용한 플레쉬 ADC)

  • Chai, Yong-Yoong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.6 no.4
    • /
    • pp.545-552
    • /
    • 2011
  • In this article, reference voltages in a general flash ADC are not obtained from a series of resistors but floating gates. When a behavior model simulation was performed in a pipelined ADC including the suggested flash ADC as a result of an ADC's overall function, it showed results that SNR is approximately 77 dB and resolution is 12 bit. And more than almost 90% showed INL within ${\pm}0.5$ LSB, and like INL, more than 90% showed DNL within ${\pm}0.5$ LSB.

A study on Chip Design for Hageul Type Classification using Content Addressable Memory (메모리(CAM)를 이용한 한글 유형 분리용 칩 설계에 관한 연구)

  • Park, Noh-Kyung;Koo, Chang-Mo;Jeong, Chang-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.6
    • /
    • pp.16-25
    • /
    • 1996
  • In this paper, we designed the chip which can classify the Korean characters using CAM(Content Addressable Memory). A high-speed OCR has been implemented by software to recognize the characters. However, it is difficult to process in real-time. The pipelined hardware implementation is one of the solution to recognize the characters in real-time by using the parallel processing techniques. We used the CAM which has the function of high-speed parallel-match to implement easily and twenty reference patterns are used for comparison. The chip has been evaluated result using DLAB of DAZIX. The simulation results have shown that the process speed was $1.6{\mu}s$ per character. Also, we programed using C-language and compared the results.

  • PDF

Low-latency SAO Architecture and its SIMD Optimization for HEVC Decoder

  • Kim, Yong-Hwan;Kim, Dong-Hyeok;Yi, Joo-Young;Kim, Je-Woo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.1
    • /
    • pp.1-9
    • /
    • 2014
  • This paper proposes a low-latency Sample Adaptive Offset filter (SAO) architecture and its Single Instruction Multiple Data (SIMD) optimization scheme to achieve fast High Efficiency Video Coding (HEVC) decoding in a multi-core environment. According to the HEVC standard and its Test Model (HM), SAO operation is performed only at the picture level. Most realtime decoders, however, execute their sub-modules on a Coding Tree Unit (CTU) basis to reduce the latency and memory bandwidth. The proposed low-latency SAO architecture has the following advantages over picture-based SAO: 1) significantly less memory requirements, and 2) low-latency property enabling efficient pipelined multi-core decoding. In addition, SIMD optimization of SAO filtering can reduce the SAO filtering time significantly. The simulation results showed that the proposed low-latency SAO architecture with significantly less memory usage, produces a similar decoding time as a picture-based SAO in single-core decoding. Furthermore, the SIMD optimization scheme reduces the SAO filtering time by approximately 509% and increases the total decoding speed by approximately 7% compared to the existing look-up table approach of HM.

Design of an Area-Efficient Survivor Path Unit for Viterbi Decoder Supporting Punctured Codes (천공 부호를 지원하는 Viterbi 복호기의 면적 효율적인 생존자 경로 계산기 설계)

  • Kim, Sik;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.3A
    • /
    • pp.337-346
    • /
    • 2004
  • Punctured convolutional codes increase transmission efficiency without increasing hardware complexity. However, Viterbi decoder supporting punctured codes requires long decoding length and large survivor memory to achieve sifficiently low bit error rate (BER), when compared to the Viterbi decoder for a rate 1/2 convolutional code. This Paper presents novel architecture adopting a pipelined trace-forward unit reducing survivor memory requirements in the Viterbi decoder. The proposed survivor path architecture reduces the memory requirements by removing the initial decoding delay needed to perform trace-back operation and by accelerating the trace-forward process to identify the survivor path in the Viterbi decoder. Experimental results show that the area of survivor path unit has been reduced by 16% compared to that of conventional hybrid survivor path unit.

Design and Implementation of An I/O System for Irregular Application under Parallel System Environments (병렬 시스템 환경하에서 비정형 응용 프로그램을 위한 입출력 시스템의 설계 및 구현)

  • No, Jae-Chun;Park, Seong-Sun;;Gwon, O-Yeong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.11
    • /
    • pp.1318-1332
    • /
    • 1999
  • 본 논문에서는 입출력 응용을 위해 collective I/O 기법을 기반으로 한 실행시간 시스템의 설계, 구현 그리고 그 성능평가를 기술한다. 여기서는 모든 프로세서가 동시에 I/O 요구에 따라 스케쥴링하며 I/O를 수행하는 collective I/O 방안과 프로세서들이 여러 그룹으로 묶이어, 다음 그룹이 데이터를 재배열하는 통신을 수행하는 동안 오직 한 그룹만이 동시에 I/O를 수행하는 pipelined collective I/O 등의 두 가지 설계방안을 살펴본다. Pipelined collective I/O의 전체 과정은 I/O 노드 충돌을 동적으로 줄이기 위해 파이프라인된다. 이상의 설계 부분에서는 동적으로 충돌 관리를 위한 지원을 제공한다. 본 논문에서는 다른 노드의 메모리 영역에 이미 존재하는 데이터를 재 사용하여 I/O 비용을 줄이기 위해 collective I/O 방안에서의 소프트웨어 캐슁 방안과 두 가지 모형에서의 chunking과 온라인 압축방안을 기술한다. 그리고 이상에서 기술한 방안들이 입출력을 위해 높은 성능을 보임을 기술하는데, 이 성능결과는 Intel Paragon과 ASCI/Red teraflops 기계 상에서 실험한 것이다. 그 결과 응용 레벨에서의 bandwidth는 peak point가 55%까지 측정되었다.Abstract In this paper we present the design, implementation and evaluation of a runtime system based on collective I/O techniques for irregular applications. We present two designs, namely, "Collective I/O" and "Pipelined Collective I/O". In the first scheme, all processors participate in the I/O simultaneously, making scheduling of I/O requests simpler but creating a possibility of contention at the I/O nodes. In the second approach, processors are grouped into several groups, so that only one group performs I/O simultaneously, while the next group performs communication to rearrange data, and this entire process is pipelined to reduce I/O node contention dynamically. In other words, the design provides support for dynamic contention management. Then we present a software caching method using collective I/O to reduce I/O cost by reusing data already present in the memory of other nodes. Finally, chunking and on-line compression mechanisms are included in both models. We demonstrate that we can obtain significantly high-performance for I/O above what has been possible so far. The performance results are presented on an Intel Paragon and on the ASCI/Red teraflops machine. Application level I/O bandwidth up to 55% of the peak is observed.he peak is observed.

Design of Pipeline Bus and the Performance Evaluation in Multiprocessor System (다중프로세서 시스템에서 파이프라인 전송 버스의 설계 및 성능 평가)

  • 윤용호;임인칠
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.2
    • /
    • pp.288-299
    • /
    • 1993
  • This paper proposes the new bus protocol in the tightly coupled multiprocessor system. The bus protocol uses the pipelined data transfer and block transfer scheme to increase the bus bandwidth, The bus also has the independent transfer lines for the address and data respectively, and it can transfer the data up to maximum 264 Mbytes /sec. This paper also models the multiprocessor system where each processor boards have the private cache. Simulation evaluates the bus and system performance according to hit ratio of the reference data in cache memory, In the case of using this bus, the bus is evaluated not to be saturated when up to 10 processor boards are connected to the bus. As for up to 4 memory interleavng, the performance increases linearly.

  • PDF

Design and Implementation of Asynchronous Memory for Pipelined Bus (파이프라인 방식의 버스를 위한 비 동기식 주 기억장치의 설계 및 구현)

  • Hahn, Woo-Jong;Kim, Soo-Won
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.11
    • /
    • pp.45-52
    • /
    • 1994
  • In recent days low cost, high performance microprocessors have led to construction of medium scale shared memory multiprocessor systems with shared bus. Such multiprocessor systems are heavily influenced by the structures of memory systems and memory systems become more important factor in design space as microprocessors are getting faster. Even though local cache memories are very common for such systems, the latency on access to the shared memory limits throughput and scalability. There have been many researches on the memory structure for multiprocessor systems. In this paper, an asynchronous memory architecture is proposed to utilize the bandwith of system bus effectively as well as to provide flexibility of implementation. The effect of the proposed architecture if shown by simulation. We choose, as our model of the shared bus is HiPi+Bus which is designed by ETRI to meet the requirements of the High-Speed Midrange Computer System. The simulation is done by using Verilog hardware decription language. With this simulation, it is explored that the proposed asynchronous memory architecture keeps the utilization of system bus low enough to provide better throughput and scalibility. The implementation trade-offs are also described in this paper. The asynchronous memory is implemented and tested under the prototype testing environment by using test program. This intensive test has validated the operation of the proposed architecture.

  • PDF

The VLSI implementation of RS Decoder using the Modified Euclidean Algorithm (변형 유클리디안 알고리즘을 이용한 리드 - 솔로몬 디코더의 VLSI 구현)

  • 최광석;김수원
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.679-682
    • /
    • 1998
  • This paper presents the VLSI implementation of RS(reed-solomon) decoder using the Modified Euclidean Algorithm(hereafter MEA) for DVD(Digital Versatile Disc) and CD(Compact Disc). The decoder has a capability of correcting 8-error or 16-erasure for DVD and 2-error or 4-erasure for CD. The technique of polynomial evaluation is introduced to realize syndrome calculation and a polynomial expansion circuit is developed to calculate the Forney syndrome polynomial and the erasure locator polynomial. Due to the property of our system with buffer memory, the MEA architecture can have a recursive structure which the number of basic operating cells can be reduced to one. We also proposed five criteria to determine an uncorrectable codeword in using the MEA. The overall architecture is a simple and regular and has a 4-stage pipelined structure.

  • PDF

A VLSI implementation of 32-bit RISC embedded controller (내장형 32비트 RISC 콘트롤러의 VLSI 구현)

  • 이문기;최병윤;이승호
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.31A no.10
    • /
    • pp.141-151
    • /
    • 1994
  • this paper describes the design and implementation of a RISC processor for embedded control systems. This RISC processor integrates a register file, a pipelined execution unit, a FPU interface, a memory interface, and an instruction prefetcher. Its characteristics include both single cycle executions of most instructions in a 2 phase 20 MHz frequency and the worst case interrupt latency of 7 cycles with the vectored interrupt handling that makes it possible to be applicable to the real time processing system. For efficient handling of multi-cycle instructions, data stationary hardwired control scheme equippedwith cycle counter was used. This chip integrates about 139K transistors and occupies 9.1mm$\times$9.1mm in a 1.0um DLM CMOS technology. The power dissipation is 0.8 Watts from a 5V supply at 20 MHz operation.

  • PDF

A Design of 8192-point FFT Processor using a new CBFP Scaling Method (새로운 CBFP 스케일링 방법을 적용한 8192점 FFT프로세서 설계)

  • 이승기;양대성;박광호;신경욱
    • Proceedings of the IEEK Conference
    • /
    • 2002.06b
    • /
    • pp.113-116
    • /
    • 2002
  • This paper describes a design of 8192-Point pipelined FFT/IFFT processor (PFFTSk) core for DVB-T and DMT-based VBSL modems. A novel two-step convergent block floating -point (75_CBFP) scaling method is proposed to improve the signal- to-quantization-noise ratio (SeNR) of FFT/IFFT results. Our approach reduces about 80% of memory when compared with conventional CBFP methods. The PFFTSk core, which is designed in VHDL and synthesized using 0.25-${\mu}{\textrm}{m}$ CMOS library, has about 76,300 gates, 390k bits RAM, and Twiddle factor ROM of 39k bits. Simulation results show that it can safely operate up to 50-MHz clock frequency at 2.5-V supply, resulting that a 8192-point FFT/IFFT can be computed every 164-$mutextrm{s}$. The SQNR of about 60-dB is achieved.

  • PDF