Search | Korea Science

The Architecture of the Frame Memory in MPEG-2 Video Encoder (MPEG-2 비디오 인코더의 프레임 메모리 구조)

Seo, Gi-Beom;Jeong, Jeong-Hwa
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.37 no.3
- /
- pp.55-61
- /
- 2000
This paper presents an efficient hardware architecture of frame memory interface in MPEG-2 video encoder. To reduce the size of memory buffers between SDRAM and the frame memory module, the number of clocks needed for each memory access is minimized with dual bank operation and burst length change. By allocating the remaining cycles not used by SDRAM access, to the random access cycle, the internal buffer size, the data bus width, and the size of the control logic can be minimized. The proposed architecture is operated with 54MHz clock and designed with the VT $I^{тм}$ 0.5 ${\mu}{\textrm}{m}$ CMOS TLM standard cell library. It is verified by comparing the test vectors generated by the c-code model with the simulation results of the synthesized circuit. The buffer area of the proposed architecture is reduced to 40 % of the existing architecture.
PDF

A Benchmark of Micro Parallel Computing Technology for Real-time Control in Smart Farm (MPICH vs OpenMP) (제목을스마트 시설환경 실시간 제어를 위한 마이크로 병렬 컴퓨팅 기술 분석)

Min, Jae-Ki;Lee, DongHoon
- Proceedings of the Korean Society for Agricultural Machinery Conference
- /
- 2017.04a
- /
- pp.161-161
- /
- 2017
스마트 시설환경의 제어 요소는 난방기, 창 개폐, 수분/양액 밸브 개폐, 환풍기, 제습기 등 직접적으로 시설환경의 조절에 관여하는 인자와 정보 교환을 위한 통신, 사용자 인터페이스 등 간접적으로 제어에 관련된 요소들이 복합적으로 존재한다. PID 제어와 같이 하는 수학적 논리를 바탕으로 한 제어와 전문 관리자의 지식을 기반으로 한 비선형 학습 모델에 의한 제어 등이 공존할 수 있다. 이러한 다양한 요소들을 복합적으로 연동시키기 위해선 기존의 시퀀스 기반 제어 방식에는 한계가 있을 수 있다. 관행의 방식과 같이 시계열 상에서 획득한 충분한 데이터를 이용하여 제어의 양과 시점을 결정하는 방식은 예외 상황에 충분히 대처하기 어려운 단점이 있을 수 있다. 이러한 예외 상황은 자연적인 조건의 변화에 따라 불가피하게 발생하는 경우와 시스템의 오류에 기인하는 경우로 나뉠 수 있다. 본 연구에서는 실시간으로 변하는 시설환경 내의 다양한 환경요소를 실시간으로 분석하고 상응하는 제어를 수행하여 수학적이며 예측 가능한 논리에 의해 준비된 제어시스템을 보완할 방법을 연구하였다. 과거의 고성능 컴퓨팅(HPC; High Performance Computing)은 다수의 컴퓨터를 고속 네트워크로 연동하여 집적적으로 연산능력을 향상시킨 기술로 비용과 규모의 측면에서 많은 투자를 필요로 하는 첨단 고급 기술이었다. 핸드폰과 모바일 장비의 발달로 인해 소형 마이크로프로세서가 발달하여 근래 2 Ghz의 클럭 속도에 이르는 어플리케이션 프로세서(AP: Application Processor)가 등장하기도 하였다. 상대적으로 낮은 성능에도 불구하고 저전력 소모와 플랫폼의 소형화를 장점으로 한 AP를 시설환경의 실시간 제어에 응용하기 위한 방안을 연구하였다. CPU의 클럭, 메모리의 양, 코어의 수량을 다음과 같이 달리한 3가지 시스템을 비교하여 AP를 이용한 마이크로 클러스터링 기술의 성능을 비교하였다.1) 1.5 Ghz, 8 Processors, 32 Cores, 1GByte/Processor, 32Bit Linux(ARMv71). 2) 2.0 Ghz, 4 Processors, 32 Cores, 2GByte/Processor, 32Bit Linux(ARMv71). 3) 1.5 Ghz, 8 Processors, 32 Cores, 2GByte/Processor, 64Bit Linux(Arch64). 병렬 컴퓨팅을 위한 개발 라이브러리로 MPICH(www.mpich.org)와 Open-MP(www.openmp.org)를 이용하였다. 2,500,000,000에 이르는 정수 중 소수를 구하는 연산에 소요된 시간은 1)17초, 2)13초, 3)3초 이었으며, $12800{\times}12800$ 크기의 행렬에 대한 2차원 FFT 연산 소요시간은 각각 1)10초, 2)8초, 3)2초 이었다. 3번 경우는 클럭속도가 3Gh에 이르는 상용 데스크탑의 연산 속도보다 빠르다고 평가할 수 있다. 라이브러리의 따른 결과는 근사적으로 동일하였다. 선행 연구에서 획득한 3차원 계측 데이터를 1초 단위로 3차원 선형 보간법을 수행한 경우 코어의 수를 4개 이하로 한 경우 근소한 차이로 동일한 결과를 보였으나, 코어의 수를 8개 이상으로 한 경우 앞선 결과와 유사한 경향을 보였다. 현장 보급 가능성, 구축비용 및 전력 소모 등을 종합적으로 고려한 AP 활용 마이크로 클러스터링 기술을 지속적으로 연구할 것이다.
PDF

Implementation of an LLF Scheduler for the Hard Real-time OS, RT-eCos3.0 (경성 실시간 운영체제 RT-eCos3.0을 위한 LLF 스케줄러의 구현)

Yoo, Hwee-Jae;Kim, Jung-Guk
- Proceedings of the Korean Information Science Society Conference
- /
- 2011.06b
- /
- pp.395-397
- /
- 2011
RT-eCos3.0은 대표적 분산 실시간 객체 모델인 TMO(Time-triggered Message-triggered Object)의 실행을 제공하기 위하여 공개소스 eCos3.0 기반으로 개발된 초경량 경성 실시간 임베디드 운영체제이다. RT-eCos3.0에서는 그간 스레드의 최장 수행 시간 입력이 필요 없는 EDF 및 FIFO 스케줄러를 지원하여 왔다. 본 논문에서는 TMO의 시간 구동 스레드와 메시지 구동 스레드의 스레드 등록 시 최장 수행 시간을 입력 받아 이를 기반으로 마감시간까지의 수행시간 대비 잔여시간을 이용하는 LLF (Least Laxity First) 스케줄러를 클럭 인터럽트 핸들러 내에 구현하고 각 스레드로 하여금 스케줄링 정책을 선택할 수 있도록 구현하였다.

A Study on Remotely Located Synchronization System using GPS Common-View Method (GPS Common-View 방식에 의한 원격지 동기 시스템 연구)

김영범;정낙삼;박동철
- The Journal of Korean Institute of Electromagnetic Engineering and Science
- /
- v.12 no.4
- /
- pp.644-650
- /
- 2001
A remotely located synchronization system which is locked to the remote master clock has been implemented by using GPS Common-View technique. The measurement results showed that the accuracy of the remote synchronization system could be kept within a few parts in $10^{-12}$ and MTIE(Maximum Time Interval Error) met the ITU-T Recommendation(G.811). A prototype system having fully automatic operational functions has been realized up to now and is expected to be used in the network synchronization in the near future.
PDF

Robust Control of Input/state Asynchronous Machines with Uncertain State Transitions (불확실한 상태 천이를 가진 입력/상태 비동기 머신을 위한 견실 제어)

Yang, Jung-Min
- Journal of the Institute of Electronics Engineers of Korea SC
- /
- v.46 no.4
- /
- pp.39-48
- /
- 2009
Asynchronous sequential machines, or clockless logic circuits, have several advantages over synchronous machines such as fast operation speed, low power consumption, etc. In this paper, we propose a novel robust controller for input/output asynchronous sequential machines with uncertain state transitions. Due to model uncertainties or inner failures, the state transition function of the considered asynchronous machine is not completely known. In this study, we present a formulation to model this kind of asynchronous machines ana using generalized reachability matrices, we address the condition for the existence of an appropriate controller such that the closed-loop behavior matches that of a prescribed model. Based on the previous research results, we sketch design procedure of the proposed controller and analyze the stable-state operation of the closed-loop system.
PDF KSCI

An Efficient Matrix Multiplier Available in Multi-Head Attention and Feed-Forward Network of Transformer Algorithms (트랜스포머 알고리즘의 멀티 헤드 어텐션과 피드포워드 네트워크에서 활용 가능한 효율적인 행렬 곱셈기)

Seok-Woo Chang;Dong-Sun Kim
- Journal of IKEEE
- /
- v.28 no.1
- /
- pp.53-64
- /
- 2024
With the advancement of NLP(Natural Language Processing) models, conversational AI such as ChatGPT is becoming increasingly popular. To enhance processing speed and reduce power consumption, it is important to implement the Transformer algorithm, which forms the basis of the latest natural language processing models, in hardware. In particular, the multi-head attention and feed-forward network, which analyze the relationships between different words in a sentence through matrix multiplication, are the most computationally intensive core algorithms in the Transformer. In this paper, we propose a new variable systolic array based on the number of input words to enhance matrix multiplication speed. Quantization maintains Transformer accuracy, boosting memory efficiency and speed. For evaluation purposes, this paper verifies the clock cycles required in multi-head attention and feed-forward network and compares the performance with other multipliers.
https://doi.org/10.7471/ikeee.2024.28.1.53 인용 PDF

Implementation of a Branch Predictor and Its Cost Per Performance Analysis for a High Performance Embedded Microprocessor (고성능 내장형 마이크로프로세서의 분기 예측기 구현 및 성능 대비 비용 분석)

Shin, Sang-Hoon;Choi, Lynn
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.10a
- /
- pp.202-204
- /
- 2003
EISC ISA를 기반으로 한 64 비트 고성능 내장형 마이크로프로세서 AE64000의 효과적인 성능 향상을 위해서 비용 대비 성능 향상이 우수한 분기 예측 기법을 도입하여 AE64000 파이프라인에 적합한 분기 예측기를 추가로 설계하고 SPEClnt 벤치마크 및 타 내장형 벤치마크의 성능 분석 시뮬레이션을 통해 최적의 분기 예측기의 구조를 결정하였다. AE64000에서 LERI 명령 처리를 위해 AE64000 파이프라인에 추가된 독특한 IFU에 의하여 복잡성을 갖지만, IF 단계의 PC 대신에 IFU 단계의 PrePC를 이용하여 분기 명령을 명령어 prefetch 단계에서 예측함으로써, 올바른 분기 예측시 분기로 인한 손실을 제거할 수 있다. 결과적으로 최종 선정된 최적의 분기 예측기는 Verilog로 구현하여 AE64000 프로세서 코어 모델과 통합 합성하였고 아울러 추가되는 면적과 최종 목표 클럭에 동작하기 위한 타이밍 분석을 통해 최종 생산에 적합하도록 설계된 분기 예측기의 기능 및 타이밍 검증을 수행하였다. 최종 구현된 분기 예측기는 프로세서 칩 전체의 1% 미만의 비용으로 최고 12%의 성능 향상을 달성하여 성능 대비 면적의 효율성에서 높은 결과를 보였다.
PDF

Implementation of a Parallel Viterbi Decoder for High Speed Multimedia Communications (멀티미디어 통신용 병렬 아키텍쳐 고속 비터비 복호기 설계)

Lee, Byeong-Cheol
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.37 no.2
- /
- pp.78-84
- /
- 2000
The Viterbi decoders can be classified into serial Viterbi decoders and parallel Viterbi decoders. Parallel Viterbi decoders can handle higher data rates than serial Viterbl decoders. This paper designs and implements a fully parallel Viterbi decoder for high speed multimedia communications. For high speed operations, the ACS (Add-Compare-Select) module consisting of 64 PEs (Processing Elements) can compute one stage in a clock. In addition, the systolic away structure with 32 pipeline stages is developed for the TB (traceback) module. The implemented Viterbi decoder can support code rates 1/2, 2/3, 3/4, 5/6 and 7/8 using punctured codes. We have developed Verilog HDL models and performed logic synthesis. The 0.6 ${\mu}{\textrm}{m}$ SAMSUNG KG75000 SOG cell library has been used. The implemented Viterbi decoder has about 100,400 gates, and is running at 70 MHz in the worst case simulation.
PDF

Integrated Circuit Implementation and Characteristic Analysis of a CMOS Chaotic Neuron for Chaotic Neural Networks (카오스 신경망을 위한 CMOS 혼돈 뉴런의 집적회로 구현 및 특성 해석)

Song, Han-Jeong;Gwak, Gye-Dal
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.37 no.5
- /
- pp.45-53
- /
- 2000
This paper presents an analysis of the dynamical behavor in the chaotic neuron fabricated using 0.8${\mu}{\textrm}{m}$ single poly CMOS technology. An approximated empirical equation models for the sigmoid output function and chaos generative block of the chaotic neuron are extracted from the measurement data. Then the dynamical responses of the chaotic neuron such as biurcation diagram, frequency responses, Lyapunov exponent, and average firing rate are calculated with numerical analysis. In addition, we construct the chaotic neural networks which are composed of two chaotic neurons with four synapses and obtain bifurcation diagram according to synaptic weight variation. And results of experiments in the single chaotic neuron and chaotic neural networks by two neurons with the $\pm$2.5V power supply and sampling clock frequency of 10KHz are shown and compared with the simulated results.
PDF

Compact CNN Accelerator Chip Design with Optimized MAC And Pooling Layers (MAC과 Pooling Layer을 최적화시킨 소형 CNN 가속기 칩)

Son, Hyun-Wook;Lee, Dong-Yeong;Kim, HyungWon
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.25 no.9
- /
- pp.1158-1165
- /
- 2021
This paper proposes a CNN accelerator which is optimized Pooling layer operation incorporated in Multiplication And Accumulation(MAC) to reduce the memory size. For optimizing memory and data path circuit, the quantized 8bit integer weights are used instead of 32bit floating-point weights for pre-training of MNIST data set. To reduce chip area, the proposed CNN model is reduced by a convolutional layer, a 4*4 Max Pooling, and two fully connected layers. And all the operations use specific MAC with approximation adders and multipliers. 94% of internal memory size reduction is achieved by simultaneously performing the convolution and the pooling operation in the proposed architecture. The proposed accelerator chip is designed by using TSMC65nmGP CMOS process. That has about half size of our previous paper, 0.8*0.9 = 0.72mm². The presented CNN accelerator chip achieves 94% accuracy and 77us inference time per an MNIST image.
https://doi.org/10.6109/jkiice.2021.25.9.1158 인용 PDF KSCI

Search Result 35, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)