통합 검색 | Korea Science

Memory Latency Penalty를 개선한 SIMT 기반 Stream Processor의 Memory Operation System Architecture 설계 (An Implementation of a Memory Operation System Architecture for Memory Latency Penalty Reduction in SIMT Based Stream Processor)

이광엽
- 전기전자학회논문지
- /
- 제18권3호
- /
- pp.392-397
- /
- 2014
본 논문은 Memory Latency Penalty를 개선한 SIMT Architecture 기반 Stream Processor의 Memory Operation System Architecture를 제안한다. 제안하는 구조는 Non-Blocking Cache Architecture를 적용하여 기존의 Blocking Cache Architecture에서 발생하는 Cache Miss Penalty를 개선하였고 다양한 알고리즘의 처리속도를 비교하여 제안하는 Memory Operation System Architecture를 적용한 Stream Processor의 성능 향상을 검증하였다. 실험은 각 알고리즘의 Memory 명령어의 비율에 따라 향상된 성능을 측정하여 Stream Processor의 성능이 최소 8.2%에서 최대 46.5%까지 향상됨을 확인하였다.
https://doi.org/10.7471/ikeee.2014.18.3.392 인용 PDF KSCI

인공지능 컴퓨팅 프로세서 반도체 동향과 ETRI의 자율주행 인공지능 프로세서 (Trends in AI Computing Processor Semiconductors Including ETRI's Autonomous Driving AI Processor)

양정민;권영수;강성원
- 전자통신동향분석
- /
- 제32권6호
- /
- pp.57-65
- /
- 2017
Neural network based AI computing is a promising technology that reflects the recognition and decision operation of human beings. Early AI computing processors were composed of GPUs and CPUs; however, the dramatic increment of a floating point operation requires an energy efficient AI processor with a highly parallelized architecture. In this paper, we analyze the trends in processor architectures for AI computing. Some architectures are still composed using GPUs. However, they reduce the size of each processing unit by allowing a half precision operation, and raise the processing unit density. Other architectures concentrate on matrix multiplication, and require the construction of dedicated hardware for a fast vector operation. Finally, we propose our own inAB processor architecture and introduce domestic cutting-edge processor design capabilities.
https://doi.org/10.22648/ETRI.2017.J.320607 인용 PDF

디지탈 뉴런프로세서의 구현에 관한 연구 (On the Implementation of the Digital Neuron Processor)

홍봉화;이지영
- 한국컴퓨터정보학회논문지
- /
- 제4권2호
- /
- pp.27-38
- /
- 1999
본 논문에서는 캐리 전파가 없어 고속 연산이 가능한 잉여수체계(Residue Number System)를 이용하여 고속의 디지털 뉴런 프로세서를 제안하였다. 제안된 뉴런프로세서는 MAC (Multiply And Accumulator) 연산부, 몫연산부, 시그모이드(Sigmoid)함수 연산부로 구성되며, 0.8$\mu$m CMOS공정으로 설계되었다 실험결과, 본 논문에서 구현한 디지털 뉴런프로세서는 19.2nsec의 속도를 보였으며, 실수연산기로 구현한 뉴런프로세서에 비하여 약1/2정도 하드웨어 크기를 줄일 수 있었다.
PDF

TCP/IP프로토콜 스택 프로세서 IP의 VLSI설계 (VLSI Design of Processor IP for TCP/IP Protocol Stack)

최병윤;박성일;하창수
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2003년도 하계종합학술대회 논문집 II
- /
- pp.927-930
- /
- 2003
In this paper, a design of processor IP for TCP/IP protocol stack is described. The processor consists of input and output buffer memory with dual bank structure, 32-bit RISC microprocessor core, DMA unit with on-the-fly checksum capability. To handle the various modes of TCP/IP protocol, hardware and software co-design approach is used rather than the conventional state machine based design. To eliminate delay time due to the data transfer and checksum operation, DAM module which can execute the checksum operation on-the-fly along with data transfer operation is adopted. By programming the on-chip code ROM of RISC processor differently. the designed stack processor can support the packet format conversion operations required in the various TCP/IP protocols.
PDF

효율적인 DNA 서열 생성을 위한 진화연산 프로세서 구현 (Implementation of GA Processor for Efficient Sequence Generation)

전성모;김태선;이종호
- 대한전기학회:학술대회논문집
- /
- 대한전기학회 2003년도 학술회의 논문집 정보 및 제어부문 B
- /
- pp.376-379
- /
- 2003
DNA computing based DNA sequence Is operated through the biology experiment. Biology experiment used as operator causes illegal reactions through shifted hybridization, mismatched hybridization, undesired hybridization of the DNA sequence. So, it is essential to design DNA sequence to minimize the potential errors. This paper proposes method of the DNA sequence generation based evolutionary operation processor. Genetic algorithm was used for evolutionary operation and extra hardware, namely genetic algorithm processor was implemented for solving repeated evolutionary process that causes much computation time. To show efficiency of the Proposed processor, excellent result is confirmed by comparing between fitness of the DNA sequence formed randomly and DNA sequence formed by genetic algorithm processor. Proposed genetic algorithm processor can reduce the time and expense for preparing DNA sequence that is essential in DNA computing. Also it can apply design of the oligomer for development of the DNA chip or oligo chip.
PDF

A Prototype of Three Dimensional Operations for GIS

Chi, Jeong-Hee;Lee, Jin-Yul;Kim, Dae-Jung;Ryu, Keun-Ho;Kim, Kyong-Ho
- 대한원격탐사학회:학술대회논문집
- /
- 대한원격탐사학회 2002년도 Proceedings of International Symposium on Remote Sensing
- /
- pp.880-884
- /
- 2002
According to the development of computer technology, especially in 3D graphics and visualization, the interest for 3D GIS has been increasing. Several commercial GIS softwares are ready to provide 3D function in their traditional 2D GIS. However, most of these systems are focused on visualization of 3D objects and supports few analysis functions. Therefore in this paper, we design not only a spatial operation processor which can support spatial analysis functions as well as 3D visualization, but also implement a prototype to operate them. In order to support interoperability between the existing models, the proposed spatial operation processor supports the 3D spatial operations based on 3D geometry object model which is designed to extend 2D geometry model of OGIS consortium, and supports index based on R$^*$-Tree. The proposed spatial operation processor can be applied in 3D GIS to support 3D analysis functions.
PDF

SAD 연산의 가속을 위한 멀티미디어 코프로세서 구현 (Implemenation of an ASIP for acceleration SAD operation)

조정현;정하영
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2006년도 하계종합학술대회
- /
- pp.809-810
- /
- 2006
An H.264 algorithm is commonly used for video compression applications. This algorithm requires a large number of data computations, for example, the sum of absolute difference (SAD) operation. We analyzed H.264 reference encoding workloads. The H.264 encoding program has 8.78% SAD operation. The SAD operation is to sum up 16 difference-values in H.264 $4{\times}4$ sub-blocks. In order to accelerate SAD operations, we implemented an application specific instruction-set processor (ASIP) that can execute SAD and data transfer instructions. The proposed coprocessor has an absolute value generator and a carry save adder (CSA) unit to sum up 8 difference-values per one clock cycle. We completed SAD operation in 2 clock cycles. Experimental results show that the performance is improved by 34% of total execution time.
PDF

HDL을 이용한 파이프라인 프로세서의 테스트 벡터 구현에 의한 시뮬레이션 (Simulation on a test vector Implementation of a pipeline processor using a HDL)

박두열
- 한국컴퓨터정보학회논문지
- /
- 제5권3호
- /
- pp.16-28
- /
- 2000
본 연구에서는 HDL을 이용하여 16-비트의 파이프라인 프로세서를 함수적 레벨에서 기술하여 구현하고, 그 프로세서의 동작을 확인하였다. 구현된 파이프라인 프로세서를 시뮬레이션할 때 그 프로세서 내에서 실행되는 테스트 벡터를 기호로 표시된 명령어로 먼저 설정하여 규정하고, 구현된 명령어 세트를 프로그래밍하여 입력하였다. 따라서 본 연구에서 제시된 테스트 벡터를 이용한 시뮬에이션 방법은 프로세서의 동작을 쉽게 확인할 수 있었으며, 정확한 시뮬레이션을 할 수 있었고. HDL을 이용함으로써 구현시 프로세서의 동작을 문서화하는 것이 간편하였다.
PDF

SoC 플랫폼 기반 모바일용 3차원 그래픽 Hardwired T&L Accelerator 구현 (Implementation of a 3D Graphics Hardwired T&L Accelerator based on a SoC Platform for a Mobile System)

이광엽;구용서
- 대한전자공학회논문지SD
- /
- 제44권9호
- /
- pp.59-70
- /
- 2007
본 논문에서는 휴대 정보기기 시스템에서 더욱 향상된 실시간 3D 그래픽 가속 능력을 갖는 SoC(System on a Chip) 구현을 위해 효과적인 T&L(Transform & Lighting) Processor 구조를 연구하였다. T&L 과정에 필요한 IP들을 설계하였으며, 이를 바탕으로 SoC Platform 기반으로 검증하였다. 설계된 T&L Processor는 24 bits 부동소수점 형식과 16 bits 고정소수점 형식을 적절하게 혼용하고 계산식의 병렬성을 최대한 활용하여 Transform 과정 연산과 Lighting 과정 연산의 지연시간을 균일하게 배분하여 Transform 과정만 처리할 때와 Lighting과 혼용으로 처리할 때 연산 속도의 차이가 없이 동작이 가능하다. 설계된 T&L Processor는 SoC 플랫폼을 이용하여 성능 측정 실험 및 검증을 하였고, Xilinx-Virtex4 FPGA에서 80 MHz의 동작 주파수를 확인하였고 초당 20M개의 정점(Vertex) 처리 성능을 확인하였다.
PDF KSCI

영어 수계를 이용한 디지털 신경망회로의 실현 (An Implementation of Digital Neural Network Using Systolic Array Processor)

윤현식;조원경
- 전자공학회논문지B
- /
- 제30B권2호
- /
- pp.44-50
- /
- 1993
In this paper, we will present an array processor for implementation of digital neural networks. Back-propagation model can be formulated as a consecutive matrix-vector multiplication problem with some prespecified thresholding operation. This operation procedure is suited for the design of an array processor, because it can be recursively and repeatedly executed. Systolic array circuit architecture with Residue Number System is suggested to realize the efficient arithmetic circuit for matrix-vector multiplication and compute sigmoid function. The proposed design method would expect to adopt for the application field of neural networks, because it can be realized to currently developed VLSI technology.
PDF

검색결과 613건 처리시간 0.029초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)