• Title/Summary/Keyword: memory interface

Search Result 512, Processing Time 0.03 seconds

A Link Layer Design for DisplayPort Interface

  • Jin, Hyun-Bae;Yoon, Kwang-Hee;Kim, Tae-Ho;Jang, Ji-Hoon;Song, Byung-Cheol;Kang, Jin-Ku
    • Journal of IKEEE
    • /
    • v.14 no.4
    • /
    • pp.297-304
    • /
    • 2010
  • This paper presents a link layer design of DisplayPort interface with a state machine based on packet processing. The DisplayPort link layer provides isochronous video/audio transport service, link service, and device service. The merged video, audio main link, and AUX channel controller are implemented with 7,648 LUTs(Loop Up Tables), 6020 register, and 821,760 of block memory bits synthesized using a FPGA board and it operates at 203.32MHz.

FPGA Implementation of Real-time 2-D Wavelet Image Compressor (실시간 2차원 웨이블릿 영상압축기의 FPGA 구현)

  • 서영호;김왕현;김종현;김동욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.7A
    • /
    • pp.683-694
    • /
    • 2002
  • In this paper, a digital image compression codec using 2D DWT(Discrete Wavelet Transform) is designed using the FPGA technology for real time operation The implemented image compression codec using wavelet decomposition consists of a wavelet kernel part for wavelet filtering process, a quantizer/huffman coder for quantization and huffman encoding of wavelet coefficients, a memory controller for interface with external memories, a input interface to process image pixels from A/D converter, a output interface for reconstructing huffman codes, which has irregular bit size, into 32-bit data having regular size data, a memory-kernel buffer to arrage data for real time process, a PCI interface part, and some modules for setting timing between each modules. Since the memory mapping method which converts read process of column-direction into read process of the row-direction is used, the read process in the vertical-direction wavelet decomposition is very efficiently processed. Global operation of wavelet codec is synchronized with the field signal of A/D converter. The global hardware process pipeline operation as the unit of field and each field and each field operation is classified as decomposition levels of wavelet transform. The implemented hardware used FPGA hardware resource of 11119(45%) LAB and 28352(9%) ESB in FPGA device of APEX20KC EP20k600CB652-7 and mapped into one FPGA without additional external logic. Also it can process 33 frames(66 fields) per second, so real-time image compression is possible.

0.11-2.5 GHz All-digital DLL for Mobile Memory Interface with Phase Sampling Window Adaptation to Reduce Jitter Accumulation

  • Chae, Joo-Hyung;Kim, Mino;Hong, Gi-Moon;Park, Jihwan;Ko, Hyeongjun;Shin, Woo-Yeol;Chi, Hankyu;Jeong, Deog-Kyoon;Kim, Suhwan
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.17 no.3
    • /
    • pp.411-424
    • /
    • 2017
  • An all-digital delay-locked loop (DLL) for a mobile memory interface, which runs at 0.11-2.5 GHz with a phase-shift capability of $180^{\circ}$, has two internal DLLs: a global DLL which uses a time-to-digital converter to assist fast locking, and shuts down after locking to save power; and a local DLL which uses a phase detector with an adaptive phase sampling window (WPD) to reduce jitter accumulation. The WPD in the local DLL adjusts the width of its sampling window adaptively to control the loop bandwidth, thus reducing jitter induced by UP/DN dithering, input clock jitter, and supply/ground noise. Implemented in a 65 nm CMOS process, the DLL operates over 0.11-2.5 GHz. It locks within 6 clock cycles at 0.11 GHz, and within 17 clock cycles at 2.5 GHz. At 2.5 GHz, the integrated jitter is $954fs_{rms}$, and the long-term jitter is $2.33ps_{rms}/23.10ps_{pp}$. The ratio of the RMS jitter at the output to that at the input is about 1.17 at 2.5 GHz, when the sampling window of the WPD is being adjusted adaptively. The DLL consumes 1.77 mW/GHz and occupies $0.075mm^2$.

Performance Analysis of NVMe SSDs and Design of Direct Access Engine on Virtualized Environment (가상화 환경에서 NVMe SSD 성능 분석 및 직접 접근 엔진 개발)

  • Kim, Sewoog;Choi, Jongmoo
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.3
    • /
    • pp.129-137
    • /
    • 2018
  • NVMe(Non-Volatile Memory Express) SSD(Solid State Drive) is a high-performance storage that makes use of flash memory as a storage cell, PCIe as an interface and NVMe as a protocol on the interface. It supports multiple I/O queues which makes it feasible to process parallel-I/Os on multi-core environments and to provide higher bandwidth than SATA SSDs. Hence, NVMe SSD is considered as a next generation-storage for data-center and cloud computing system. However, in the virtualization system, the performance of NVMe SSD is not fully utilized due to the bottleneck of the software I/O stack. Especially, when it uses I/O stack of the hypervisor or the host operating system like Xen and KVM, I/O performance degrades seriously due to doubled-I/O stack between host and virtual machine. In this paper, we propose a new I/O engine, called Direct-AIO (Direct-Asynchronous I/O) engine, that can access NVMe SSD directly for I/O performance improvements on QEMU emulator. We develop our proposed I/O engine and analyze I/O performance differences between the existed I/O engine and Direct-AIO engine.

Decrease of Interface Trap Density of Deposited Tunneling Layer Using CO2 Gas and Characteristics of Non-volatile Memory for Low Power Consumption (CO2가스를 이용하여 증착된 터널층의 계면포획밀도의 감소와 이를 적용한 저전력비휘발성 메모리 특성)

  • Lee, Sojin;Jang, Kyungsoo;Nguyen, Cam Phu Thi;Kim, Taeyong;Yi, Junsin
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.29 no.7
    • /
    • pp.394-399
    • /
    • 2016
  • The silicon dioxide ($SiO_2$) was deposited using various gas as oxygen and nitrous oxide ($N_2O$) in nowadays. In order to improve electrical characteristics and the interface state density ($D_{it}$) in low temperature, It was deposited with carbon dioxide ($CO_2$) and silane ($SiH_4$) gas by inductively coupled plasma chemical vapor deposition (ICP-CVD). Each $D_{it}$ of $SiO_2$ using $CO_2$ and $N_2O$ gas was $1.30{\times}10^{10}cm^{-2}{\cdot}eV^{-1}$ and $3.31{\times}10^{10}cm^{-2}{\cdot}eV^{-1}$. It showed $SiO_2$ using $CO_2$ gas was about 2.55 times better than $N_2O$ gas. After 10 years when the thin film was applied to metal/insulator/semiconductor(MIS)-nonvolatile memory(NVM), MIS NVM using $SiO_2$($CO_2$) on tunneling layer had window memory of 2.16 V with 60% retention at bias voltage from +16 V to -19 V. However, MIS NVM applied $SiO_2$($N_2O$) to tunneling layer had 2.48 V with 61% retention at bias voltage from +20 V to -24 V. The results show $SiO_2$ using $CO_2$ decrease the $D_{it}$ and it improves the operating voltage.

Design of a scalable general-purpose parallel associative processor using content-addressable memory (Content-Addressable Memory를 이용한 확장 가능한 범용 병렬 Associative Processor 설계)

  • Park, Tae-Geun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.2 s.344
    • /
    • pp.51-59
    • /
    • 2006
  • Von Neumann architecture suffers from the interface between the central processing unit and the memory, which is called 'Von Neumann bottleneck' In this paper, we propose a scalable general-purpose associative processor (AP) based on content-addressable memory (CAM) which solves this problem and is suitable for the search-oriented applications. We propose an efficient instruction set and a structural scalability to extend for larger applications. We define twelve instructions and provide some reduced instructions to speed up which execute two instructions in a single instruction cycle. The proposed AP performs in a bit-serial, word-parallel fashion and can be considered as a 32-bit general-purpose parallel processor with a massively parallel SIMD structure. We design and simulate a maximum/minumum search greater-than/less-than search, and parallel addition to verify the proposed architecture. The algorithms are executed in a constant time O(k) regardless of the number of input data.

Design of a DMA Controller for Augmented Reality in Embedded System (증강현실을 위한 임베디드 시스템의 DMA 컨트롤러 설계)

  • Jang, Su Yeon;Oh, Jung Hwan;Yoon, Young Hyun;Lee, Seong Mo;Lee, Seung Eun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.7
    • /
    • pp.822-828
    • /
    • 2019
  • An Augmented Reality(AR) provides virtual information with a real environment, and the processor needs to access the memory for the AR system. However, the processor has the heavy workload as the technology improvement leads to increase the size of data. We need a specific module to reduce the workload to overcome the limitation. In this paper, we propose a Direct Memory Access(DMA) controller displaying image instead of the processor. We implemented the proposed DMA controller on a Field Programmable Gate Array(FPGA) and demonstrated the functionality of the DMA controller based on an Avalon Memory Mapped(Avalon-MM) interface. Also, the DMA controller is fabricated by using Magnachip/Hynix 0.35um CMOS technology and verified the feasibility of the embedded system.

The Design of Hardware MPI Units for MPSoC (MPSoC를 위한 저비용 하드웨어 MPI 유닛 설계)

  • Jeong, Ha-Young;Chung, Won-Young;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.1B
    • /
    • pp.86-92
    • /
    • 2011
  • In this paper, we propose a novel hardware MPI(Message Passing Interface) unit which supports message passing in multiprocessor system which use distributed memory architecture. MPI Hardware unit processes data synchronization, transmission and completion, and it supports processor non-blocking operation so it reduces overhead according to synchronization. Additionally, MPI hardware unit combines ready entry, request entry, reserve entry which save and manage the synchronized messages and performs the multiple outstanding issue and out of order completion. According to BFM(Bus Functional Model) simulation result, the performance is increased by 25% on many to many communication. After we designed MPI unit using HDL, with synopsys design compiler we synthesized, and for synthesis library we used MagnaChip $0.18{\mu}m$. And then we making prototype chip. The proposed message transmission interface hardware shows high performance for its increase in size. Thus, as we consider low-cost design and scalability, MPI hardware unit is useful in increasing overall performance of embedded MPSoC(Multi-Processor System-on-Chip).

Implementation of Adaptive Multi Rate (AMR) Vocoder for the Asynchronous IMT-2000 Mobile ASIC (IMT-2000 비동기식 단말기용 ASIC을 위한 적응형 다중 비트율 (AMR) 보코더의 구현)

  • 변경진;최민석;한민수;김경수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.1
    • /
    • pp.56-61
    • /
    • 2001
  • This paper presents the real-time implementation of an AMR (Adaptive Multi Rate) vocoder which is included in the asynchronous International Mobile Telecommunication (IMT)-2000 mobile ASIC. The implemented AMR vocoder is a multi-rate coder with 8 modes operating at bit rates from 12.2kbps down to 4.75kbps. Not only the encoder and the decoder as basic functions of the vocoder are implemented, but VAD (Voice Activity Detection), SCR (Source Controlled Rate) operation and frame structuring blocks for the system interface are also implemented in this vocoder. The DSP for AMR vocoder implementation is a 16bit fixed-point DSP which is based on the TeakLite core and consists of memory block, serial interface block, register files for the parallel interface with CPU, and interrupt control logic. Through the implementation, we reduce the maximum operating complexity to 24MIPS by efficiently managing the memory structure. The AMR vocoder is verified throughout all the test vectors provided by 3GPP, and stable operation in the real-time testing board is also proved.

  • PDF

A VIA-based RDMA Mechanism for High Performance PC Cluster Systems (고성능 PC 클러스터 시스템을 위한 VIA 기반 RDMA 메커니즘 구현)

  • Jung In-Hyung;Chung Sang-Hwa;Park Sejin
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.11
    • /
    • pp.635-642
    • /
    • 2004
  • The traditional communication protocols such as TCP/IP are not suitable for PC cluster systems because of their high software processing overhead. To eliminate this overhead, industry leaders have defined the Virtual Interface Architecture (VIA). VIA provides two different data transfer mechanisms, a traditional Send/Receive model and the Remote Direct Memory Access (RDMA) model. RDMA is extremely efficient way to reduce software overhead because it can bypass the OS and use the network interface controller (NIC) directly for communication, also bypass the CPU on the remote host. In this paper, we have implemented VIA-based RDMA mechanism in hardware. Compared to the traditional Send/Receive model, the RDMA mechanism improves latency and bandwidth. Our RDMA mechanism can also communicate without using remote CPU cycles. Our experimental results show a minimum latency of 12.5${\mu}\textrm{s}$ and a maximum bandwidth of 95.5MB/s. As a result, our RDMA mechanism allows PC cluster systems to have a high performance communication method.