• 제목/요약/키워드: Data Processor

검색결과 1,278건 처리시간 0.03초

부분행렬을 사용한 행렬.벡터 연산용 1차원 시스톨릭 어레이 프로세서 설계에 관한 연구 (A Study On Improving the Performance of One Dimensional Systolic Array Processor for Matrix.Vector Operation using Sub-Matrix)

  • 김용성
    • 정보학연구
    • /
    • 제10권3호
    • /
    • pp.33-45
    • /
    • 2007
  • Systolic Array Processor is used for designing the special purpose processor in Digital Signal Processing, Computer Graphics, Neural Network Applications etc., since it has the characteristic of parallelism, pipeline processing and architecture of regularity. But, in case of using general design method, it has intial waiting period as large as No. of PE-1. And if the connected system needs parallel and simultaneous outputs, processor has some problems of the performance, since it generates only one output at each clock in output state. So in this paper, one dimensional Systolic Array Processor that is designed according to the dependance of data and operations using the partitioned sub-matrix is proposed for the purpose of improving the performance. 1-D Systolic Array using 4 partitioned sub-matrix has efficient method in case of considering those two problems.

  • PDF

FFT를 위한 효율적인 Signal Reordering Unit 구현 (Efficient Signal Reordering Unit Implementation for FFT)

  • 양승원;이종열
    • 전기학회논문지
    • /
    • 제58권6호
    • /
    • pp.1241-1245
    • /
    • 2009
  • As FFT(Fast Fourier Transform) processor is used in OFDM(Orthogonal Frequency Division Multiplesing) system. According to increase requirement about mobility and broadband, Research about low power and low area FFT processor is needed. So research concern in reduction of memory size and complex multiplier is in progress. Increasing points of FFT increase memory area of FFT processor. Specially, SRU(Signal Reordering Unit) has the most memory in FFT processor. In this paper, we propose a reduced method of memory size of SRU in FFT processor. SRU of 64, 1024 point FFT processor performed implementation by VerilogHDL coding and it verified by simulation. We select the APEX20KE family EP20k1000EPC672-3 device of Altera Corps. SRU implementation is performed by synthesis of Quartus Tool. The bits of data size decide by 24bits that is 12bits from real, imaginary number respectively. It is shown that, the proposed SRU of 64point and 1024point achieve more than 28%, 24% area reduction respectively.

3차원 형상기반 기계상 측정 시스템 개발에 관한 연구 (A Study on the Development of On Machine Measuring System using 3-Dimensional solid model)

  • 구본권;류제구;김세윤
    • 한국소성가공학회:학술대회논문집
    • /
    • 한국소성가공학회 2002년도 금형가공 심포지엄
    • /
    • pp.3-10
    • /
    • 2002
  • In this study on machine measuring system based on solid feature was developed. This system was applied with injection mold using 3 dimensional solid modeler for verification. Developed program include pre-processor, main processor, and post processor. In pre-processor there are functions which check intersection, simulate motion of probe and calculate measuring time. Main processor generates measuring path and output NC code in Unigraphics. In post-processor functions that include evaluation of undercut or overcut and display of measuring procedure are offered. In addition analysis module for quality control of measured data on manufactured product was developed with geometric and dimensional tolerance concept. As the result developed program could get stability of system, precision of product, rapidity and cost down of manufacturing process compared with before measuring process.

  • PDF

진보된 멀티미디어 프로세서 구조 (Advanced Multimedia Processor Architecture)

  • 박춘명
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2013년도 추계학술대회
    • /
    • pp.664-665
    • /
    • 2013
  • 본 논문에서는 멀티미디어프로세서 구성의 한가지 방법을 제안하였다. 제안한 멀티미디어프로세서는 각각의 문자, 소리, 비디오를 한 개의 칩안에서 다룰 수 있으며, 멀티미디어의 특징인 인터렉티브의 기능을 갖고 있다. 특히 제안한 멀티미디어프로세서는 소프트웨어 없이도 메모리매상의 어드레싱이 가능하다. 제아난 멀티미디어프로세서는 가상현실에 적용이 가능하다.

  • PDF

A Fast SIFT Implementation Based on Integer Gaussian and Reconfigurable Processor

  • Su, Le Tran;Lee, Jong Soo
    • 한국정보전자통신기술학회논문지
    • /
    • 제2권3호
    • /
    • pp.39-52
    • /
    • 2009
  • Scale Invariant Feature Transform (SIFT) is an effective algorithm in object recognition, panorama stitching, and image matching, however, due to its complexity, real time processing is difficult to achieve with software approaches. This paper proposes using a reconfigurable hardware processor with integer half kernel. The integer half kernel Gaussian reduces the Gaussian pyramid complexity in about half [] and the reconfigurable processor carries out a parallel implementation of a full search Fast SIFT algorithm. We use a low memory, fine grain single instruction stream multiple data stream (SIMD) pixel processor that is currently being developed. This implementation fully exposes the available parallelism of the SIFT algorithm process and exploits the processing and I/O capabilities of the processor which results in a system that can perform real time image and video compression. We apply this novel implementation to images and measure the effectiveness. Experimental simulation results indicate that the proposed implementation is capable of real time applications.

  • PDF

AB9: A neural processor for inference acceleration

  • Cho, Yong Cheol Peter;Chung, Jaehoon;Yang, Jeongmin;Lyuh, Chun-Gi;Kim, HyunMi;Kim, Chan;Ham, Je-seok;Choi, Minseok;Shin, Kyoungseon;Han, Jinho;Kwon, Youngsu
    • ETRI Journal
    • /
    • 제42권4호
    • /
    • pp.491-504
    • /
    • 2020
  • We present AB9, a neural processor for inference acceleration. AB9 consists of a systolic tensor core (STC) neural network accelerator designed to accelerate artificial intelligence applications by exploiting the data reuse and parallelism characteristics inherent in neural networks while providing fast access to large on-chip memory. Complementing the hardware is an intuitive and user-friendly development environment that includes a simulator and an implementation flow that provides a high degree of programmability with a short development time. Along with a 40-TFLOP STC that includes 32k arithmetic units and over 36 MB of on-chip SRAM, our baseline implementation of AB9 consists of a 1-GHz quad-core setup with other various industry-standard peripheral intellectual properties. The acceleration performance and power efficiency were evaluated using YOLOv2, and the results show that AB9 has superior performance and power efficiency to that of a general-purpose graphics processing unit implementation. AB9 has been taped out in the TSMC 28-nm process with a chip size of 17 × 23 ㎟. Delivery is expected later this year.

SVLIW 프로세서와 VLIW 프로세서의 명령어 캐싱에 따른 성능 분석 (Performance Analysis of Caching Instructions on SVLIW Processor and VLIW Processor)

  • 지승현;박노광;김석일
    • 전기전자학회논문지
    • /
    • 제1권1호
    • /
    • pp.101-110
    • /
    • 1997
  • 실시간에 VLIW 명령어를 스케줄링하는 SVLIW 프로세서 구조는 실행 중 LNOP(긴 NOP 명령어)를 삽입하여 자원 충돌이나 자료 종속 문제를 스스로 해결할 수 있다. 따라서 SVLIW 프로세서에서는 메모리나 캐시에 적재되는 목적 코드로부터 LNOP 명령어를 제거할 수 있다. 그러므로 SVLIW 프로세서에서는 같은 크기의 캐시를 가진 VLIW 프로세서에 비하여 프로그램의 실행 도중에 발생하는 캐시 미스의 발생 빈도가 적어진다. 캐시 미스가 적게 발생하면 결국 평균 메모리 참조 시간이 짧아지므로 프로그램을 수행하는데 걸리는 실행 사이클의 수가 적어지게 된다. 이러한 특징은 한편 명령어 파이프라인 단계를 늘림으로 인한 영향을 상쇄할 수 있기 때문에 전체적으로 성능을 향상시킬 수 있다. 본 논문에서는 두 가지 프로세서 구조에서 어떤 응용 프로그램을 수행할 때 소요되는 실행 사이클을 예측하는 모델을 확립하고 이를 비교하였다. 또한, 시뮬레이션 결과로부터 캐시 미스가 발생하였을 때 메모리를 참조하는데 걸리는 시간이 길어질수록 SVLIW 프로세서에서의 실행 사이클이 VLIW 프로세서의 경우에 비하여 짧아지는 것을 확인할 수 있었다.

  • PDF

멀티미디어 전용 명령어를 내장한 멀티코어 프로세서 구현 및 검증 (Implementation and Verification of a Multi-Core Processor including Multimedia Specific Instructions)

  • 서준상;김종면
    • 대한임베디드공학회논문지
    • /
    • 제8권1호
    • /
    • pp.17-24
    • /
    • 2013
  • In this paper, we present a multi-core processor including multimedia specific instructions to process multimedia data efficiently in the mobile environment. Multimedia specific instructions exploit subword level parallelism (SLP), while the multi-core processor exploits data level parallelism (DLP). These combined parallelisms improve the performance of multimedia processing applications. The proposed multi-core processor including multimedia specific instructions is implemented and tested using a Xilinx ISE 10.1 tool and SoCMaster3 testbed system including Vertex 4 FPGA. Experimental results using a fire detection algorithm show that multimedia specific instructions outperform baseline instructions in the same multi-core architecture in terms of performance (1.2x better), energy efficiency (1.37x better), and area efficiency (1.23x better).

데이타 흐름 시스템을 이용한 호처리 프로세서의 구조 (A New Architecture of Call Processor Based On Data flow System)

  • 임인택;이성규;한영철
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 1987년도 전기.전자공학 학술대회 논문집(II)
    • /
    • pp.965-968
    • /
    • 1987
  • Conventional major electronic switching systems based on stored program control employ a Von Neumann styled control processor. It has strict limitations such that it essentially lacks concurrency in executing instructions, which have brought the software bottleneck problem, and the capabilities of call processing are restricted by expanding system's capacity. In this paper, a new architecture of call control processor based on the data flow system is proposed, aiming at fundamental resolution for these limitations. The processor has a number of advantages in such as expansibility of system's capacity, parallel processing of calls, and so on.

  • PDF

RISC 구조 프로세서 및 CMOS이미지 센서를 이용한 영상신호처리 시스템 개발 (Development of the Image Capture System Using and RISC Type CPU)

  • 윤수정;김우식;김응석
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2005년도 제36회 하계학술대회 논문집 D
    • /
    • pp.2664-2666
    • /
    • 2005
  • In this paper, we develop the on board type image processing system using the CMOS sensor and the RISC type main processor. The main processor transmits YUV 4:2:2 type raw data captured by a CMOS image sensor to another processor(such as motion controller, PC, etc) via serial communication (rs232, SPI, I2C, etc). The role of another processor is line and obstacle detecting in image data received from the image processing board developed in this paper.

  • PDF