• Title/Summary/Keyword: 라이브러리 2.0

Search Result 238, Processing Time 0.026 seconds

Implementation of a Parallel Viterbi Decoder for High Speed Multimedia Communications (멀티미디어 통신용 병렬 아키텍쳐 고속 비터비 복호기 설계)

  • Lee, Byeong-Cheol
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.2
    • /
    • pp.78-84
    • /
    • 2000
  • The Viterbi decoders can be classified into serial Viterbi decoders and parallel Viterbi decoders. Parallel Viterbi decoders can handle higher data rates than serial Viterbl decoders. This paper designs and implements a fully parallel Viterbi decoder for high speed multimedia communications. For high speed operations, the ACS (Add-Compare-Select) module consisting of 64 PEs (Processing Elements) can compute one stage in a clock. In addition, the systolic away structure with 32 pipeline stages is developed for the TB (traceback) module. The implemented Viterbi decoder can support code rates 1/2, 2/3, 3/4, 5/6 and 7/8 using punctured codes. We have developed Verilog HDL models and performed logic synthesis. The 0.6 ${\mu}{\textrm}{m}$ SAMSUNG KG75000 SOG cell library has been used. The implemented Viterbi decoder has about 100,400 gates, and is running at 70 MHz in the worst case simulation.

  • PDF

Efficient systolic VLSI architecture for division in $GF(2^m)$ ($GF(2^m)$ 상에서의 나눗셈연산을 위한 효율적인 시스톨릭 VLSI 구조)

  • Kim, Ju-Young;Park, Tae-Geun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.3 s.357
    • /
    • pp.35-42
    • /
    • 2007
  • The finite-field division can be applied to the elliptic curve cryptosystems. However, an efficient algorithm and the hardware design are required since the finite-field division takes much time to compute. In this paper, we propose a radix-4 systolic divider on $GF(2^m)$ with comparative area and performance. The algorithm of the proposed divide, is mathematically developed and new counter structure is proposed to map on low-cost systolic cells, so that the proposed systolic architecture is suitable for YLSI design. Compared to the bit-parallel, bit-serial and digit-serial dividers, the proposed divider has relatively effective high performance and low cost. We design and synthesis $GF(2^{193})$ finite-field divider using Dongbuanam $0.18{\mu}m$ standard cell library and the maximum clock frequency is 400MHz.

A design of LDPC decoder supporting multiple block lengths and code rates of IEEE 802.11n (다중 블록길이와 부호율을 지원하는 IEEE 802.11n용 LDPC 복호기 설계)

  • Kim, Eun-Suk;Park, Hae-Won;Na, Young-Heon;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.05a
    • /
    • pp.132-135
    • /
    • 2011
  • This paper describes a multi-mode LDPC decoder which supports three block lengths(648, 1296, 1944) and four code rates(1/2, 2/3, 3/4, 5/6) of IEEE 802.11n WLAN standard. To minimize hardware complexity, it adopts a block-serial (partially parallel) architecture based on the layered decoding scheme. A novel memory reduction technique devised using the min-sum decoding algorithm reduces the size of check-node memory by 47% as compared to conventional method. The designed LDPC decoder is verified by FPGA implementation, and synthesized with a $0.18-{\mu}m$ CMOS cell library. It has 219,100 gates and 45,036 bits RAM, and the estimated throughput is about 164~212 Mbps at 50 MHz@2.5v.

  • PDF

Social Network Approach for Sharing Knowledge: How Can the Structure and Characteristics of Social Networks Support for Sharing Knowledge? (지식 공유에 대한 소셜 네트워크 접근법 : 어떻게 소셜 네트워크의 구조와 특징이 지식 공유를 지원하는가?)

  • Lee, Jeong-Soo
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.2
    • /
    • pp.61-74
    • /
    • 2010
  • The knowledge sharing in a knowledge management process is much affecting generation and distribution of knowledge. Especially, the knowledge distribution is being revitalized with the center of social media service like twitter and library service 2.0 in the knowledge-based IT (Information Technology) environment. The present research analyzed the structure and characteristics of a social network inside an organization that is growing like an organism through self-organization through tools for SNA (Social Network Analysis) and multiple regression analysis of independent variables such as 1) a relationship between social network's structure and knowledge sharing, 2) a relationship between structural holes and knowledge sharing influence of centrality, 3) a relationship between individual ability and knowledge sharing of information technology and work recognition.

VLSI Design of DWT-based Image Processor for Real-Time Image Compression and Reconstruction System (실시간 영상압축과 복원시스템을 위한 DWT기반의 영상처리 프로세서의 VLSI 설계)

  • Seo, Young-Ho;Kim, Dong-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1C
    • /
    • pp.102-110
    • /
    • 2004
  • In this paper, we propose a VLSI structure of real-time image compression and reconstruction processor using 2-D discrete wavelet transform and implement into a hardware which use minimal hardware resource using ASIC library. In the implemented hardware, Data path part consists of the DWT kernel for the wavelet transform and inverse transform, quantizer/dequantizer, the huffman encoder/huffman decoder, the adder/buffer for the inverse wavelet transform, and the interface modules for input/output. Control part consists of the programming register, the controller which decodes the instructions and generates the control signals, and the status register for indicating the internal state into the external of circuit. According to the programming condition, the designed circuit has the various selective output formats which are wavelet coefficient, quantization coefficient or index, and Huffman code in image compression mode, and Huffman decoding result, reconstructed quantization coefficient, and reconstructed wavelet coefficient in image reconstructed mode. The programming register has 16 stages and one instruction can be used for a horizontal(or vertical) filtering in a level. Since each register automatically operated in the right order, 4-level discrete wavelet transform can be executed by a programming. We synthesized the designed circuit with synthesis library of Hynix 0.35um CMOS fabrication using the synthesis tool, Synopsys and extracted the gate-level netlist. From the netlist, timing information was extracted using Vela tool. We executed the timing simulation with the extracted netlist and timing information using NC-Verilog tool. Also PNR and layout process was executed using Apollo tool. The Implemented hardware has about 50,000 gate sizes and stably operates in 80MHz clock frequency.

Design and Implementation of Baseband Modem Receiver for MIMO-OFDM Based WLANs (MIMO-OFDM 기반 무선 LAN 시스템을 위한 기저대역 모뎀 수신부 설계 및 구현)

  • Jang, Soo-Hyun;Roh, Jae-Young;Jung, Yun-Ho
    • Journal of Advanced Navigation Technology
    • /
    • v.14 no.3
    • /
    • pp.328-335
    • /
    • 2010
  • In this paper, an efficient algorithm and area-efficient hardware architecture have been proposed for $2{\times}2$ MIMO-OFDM based WLAN baseband modem with two transmit and two receive antennas. To enhance the performance of the receiver, the efficient timing synchronization algorithm and symbol detector based on MML algorithm are presented. Also, by sharing the hardware block with multi-stage pipeline structure and using the complex multiplier based on polar-coordinate, the complexity of the proposed architecture is dramatically decreased. The proposed area-efficient hardware design was designed in hardware description language (HDL) and synthesized to gate-level circuits using 0.13um CMOS standard cell library. As a result, the complexity of the proposed modem receiver is reduced by 56% over the conventional architecture.

Design of Efficient FFT Processor for IEEE 802.16e Mobile WiMax Systems (IEEE 802.16e Mobile WiMax 시스템을 위한 효율적인 FFT 프로세서 설계)

  • Park, Youn-Ok;Park, Jong-Won
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.2
    • /
    • pp.97-102
    • /
    • 2010
  • In this paper, an area-efficient FFT processor is proposed for IEEE 802.16e mobile WiMax systems. The proposed scalable FFT processor can support the variable length of 128, 512, 1024 and 2048. By reducing the required number of non-trivial multipliers with mixed-radix (MR) and multi-path delay commutator (MDC) architecture, the complexity of the proposed FFT processor is dramatically decreased without sacrificing system throughput. The proposed FFT processor was designed in hardware description language (HDL) and synthesized to gate-level circuits using 0.18um CMOS standard cell library. With the proposed architecture, the gate count for the processor is 46K and the size of memory is 64Kbits, which are reduced by 16% and 27%, respectively, compared with those of the 4-channel radix-2 MDC (R2MDC) FFT processor.

A Benchmark of Hardware Acceleration Technology for Real-time Simulation in Smart Farm (CUDA vs OpenCL) (스마트 시설환경 실시간 시뮬레이션을 위한 하드웨어 가속 기술 분석)

  • Min, Jae-Ki;Lee, DongHoon
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.160-160
    • /
    • 2017
  • 자동화 기술을 통한 한국형 스마트팜의 발전이 비약적으로 이루어지고 있는 가운데 무인화를 위한 지능적인 스마트 시설환경 관찰 및 분석에 대한 요구가 점점 증가 하고 있다. 스마트 시설환경에서 취득 가능한 시계열 데이터는 온도, 습도, 조도, CO2, 토양 수분, 환기량 등 다양하다. 시스템의 경계가 명확함에도 해당 속성의 특성상 타임도메인과 공간도메인 상에서 정확한 추정 또는 예측이 난해하다. 시설 환경에 접목이 증가하고 있는 지능형 관리 기술 구현을 위해선 시계열 공간 데이터에 대한 신속하고 정확한 정량화 기술이 필수적이라 할 수 있다. 이러한 기술적인 요구사항을 해결하고자 시도되는 다양한 방법 중에서 공간 분해능 향상을 위한 다지점 계측 메트릭스를 실험적으로 구성하였다. $50m{\times}100m$의 단면적인 연동 딸기 온실을 대상으로 $3{\times}3{\times}3$의 3차원 환경 인자 계측 매트릭스를 설치하였다. 1 Hz의 주기로 4가지 환경인자(온도, 습도, 조도, CO2)를 계측하였으며, 계측 하는 시점과 동시에 병렬적으로 공간통계법을 이용하여 미지의 지점에 대한 환경 인자들을 실시간으로 추정하였다. 선행적으로 50 cm 공간 분해능에 대응하기 위하여 Kriging interpolation법을 횡단면에 대하여 분석한 후 다시 종단면에 대하여 분석하였다. 3 Ghz에 해당하는 연산 능력을 보유한 컴퓨터에서 1초 동안 획득한 데이터에 대한 분석을 마치는데 소요되는 시간이 15초 내외로 나타났다. 이는 해당 알고리즘의 매우 높은 시간 복잡도(Order of $O=O^3$)에 기인하는 것으로 다양한 시설 환경의 관리 방법론에 적절히 대응하기에 한계가 있다 할 수 있다. 실시간으로 시간 복잡도가 높은 연산을 수행하기 위한 기술적인 과제를 해결하고자, 근래에 관심이 증가하고 있는 NVIDIA 사에서 제공하는 CUDA 엔진과 Apple사의 제안을 시작으로 하여 공개 소프트웨어 개발 컨소시엄인 크로노스 그룹에서 제공하는 OpenCL 엔진을 비교 분석하였다. CUDA 엔진은 GPU(Graphics Processing Unit)에서 정보 분석 프로그램의 연산 집약적인 부분만을 담당하여 신속한 결과를 산출할 수 있는 라이브러리이며 해당 하드웨어를 구비하였을 때 사용이 가능하다. 반면, OpenCL은 CUDA 엔진이 특정 하드웨어에서 구동이 되는 한계를 극복하고자 하드웨어에 비의존적인 라이브러리를 제공하는 것이 다르며 클러스터링 기술과 연계를 통해 낮은 하드웨어 성능으로 인한 단점을 극복하고자 하였다. 본 연구에서는 CUDA 8.0(https://developer.nvidia.com/cuda-downloads)버전과 Pascal Titan X(NVIDIA, CA, USA)를 사용한 방법과 OpenCL 1.2(https://www.khronos.org/opencl/)버전과 Samsung Exynos5422 칩을 장착한 ODROID-XU4(Hardkernel, AnYang, Korea)를 사용한 방법을 비교 분석하였다. 50 cm의 공간 분해능에 대응하기 위한 4차원 행렬($100{\times}200{\times}5{\times}4$)에 대하여 정수 지수화를 위한 Quantization을 거쳐 CUDA 엔진과 OpenCL 엔진을 적용한 비교한 결과, CUDA 엔진은 1초 내외, OpenCL 엔진의 경우 5초 내외의 연산 속도를 보였다. CUDA 엔진의 경우 비용측면에서 약 10배, 전력 소모 측면에서 20배 이상 소요되었다. 따라서 우선적으로 OpenCL 엔진 기반 하드웨어 가속 기술 최적화 연구를 통해 스마트 시설환경 실시간 시뮬레이션 기술 도입을 위한 기술적 과제를 풀어갈 것이다.

  • PDF

A Small-Area Hardware Implementation of Hash Algorithm Standard HAS-160 (해쉬 알고리듬 표준 HAS-l60의 저면적 하드웨어 구현)

  • Kim, Hae-Ju;Jeon, Heung-Woo;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.3
    • /
    • pp.715-722
    • /
    • 2010
  • This paper describes a hardware design of hash function processor which implements Korean Hash Algorithm Standard HAS-160. The HAS-160 processor compresses a message with arbitrary lengths into a hash code with a fixed length of 160-bit. To achieve high-speed operation with small-area, arithmetic operation for step-operation is implemented by using a hybrid structure of 5:3 and 3:2 carry-save adders and carry-select adder. It computes a 160-bit hash code from a message block of 512 bits in 82 clock cycles, and has 312 Mbps throughput at 50 MHz@3.3-V clock frequency. The designed HAS-160 processor is verified by FPGA implementation, and it has 17,600 gates on a layout area of about $1\;mm^2$ using a 0.35-${\mu}m$ CMOS cell library.

A Design of Pipeline Chain Algorithm Based on Circuit Switching for MPI Broadcast Communication System (MPI 브로드캐스트 통신을 위한 서킷 스위칭 기반의 파이프라인 체인 알고리즘 설계)

  • Yun, Heejun;Chung, Wonyoung;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37B no.9
    • /
    • pp.795-805
    • /
    • 2012
  • This paper proposes an algorithm and a hardware architecture for a broadcast communication which has the worst bottleneck among multiprocessor using distributed memory architectures. In conventional system, The pipelined broadcast algorithm is an algorithm which takes advantage of maximum bandwidth of communication bus. But unnecessary synchronization process are repeated, because the pipelined broadcast sends the data divided into many parts. In this paper, the MPI unit for pipeline chain algorithm based on circuit switching removing the redundancy of synchronization process was designed, the proposed architecture was evaluated by modeling it with systemC. Consequently, the performance of the proposed architecture was highly improved for broadcast communication up to 3.3 times that of systems using conventional pipelined broadcast algorithm, it can almost take advantage of the maximum bandwidth of transmission bus. Then, it was implemented with VerilogHDL, synthesized with TSMC 0.18um library and implemented into a chip. The area of synthesis results occupied 4,700 gates(2 input NAND gate) and utilization of total area is 2.4%. The proposed architecture achieves improvement in total performance of MPSoC occupying relatively small area.