• Title/Summary/Keyword: 특화 프로세서 구조

Search Result 7, Processing Time 0.021 seconds

The Implementation of the IPv4 Router on IXP1200 Network Processor (IXP1200 네트워크 프로세서를 이용한 IPv4 라우터의 구현)

  • 정영환;박우진;황광섭;배국동;안순신
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04d
    • /
    • pp.340-342
    • /
    • 2003
  • 인터넷의 급격한 성장으로 요구되는 고속의 데이터 처리 능력과 시장의 급격한 변화에 빠르게 대응하기 위하여 기존의 범용 프로세서를 사용한 방법과 주문형 반도체를 이용한 네트워크 라우터/스위치 시스템의 단점을 보완하고, 두 방식의 장점만을 취합한 네트워크 프로세서가 개발되었다. 네트워크 프로세서는 네트워크 관련 기능에 특화된 구조를 채택하면서 프로그램이 가능하여 고속의 데이터 처리와 동시에 다양한 응용 프로그램의 개발을 가능하게 한다. 본 논문에서는 인텔사의 IXP1200 네트워크 프로세서를 이용하여 IPv4 라우터를 구현하여 네트워크 프로세서가 가지는 특징을 평가해 본다.

  • PDF

Regular Expression Matching Processor Architecture Supporting Character Class Matching (문자클래스 매칭을 지원하는 정규표현식 매칭 프로세서 구조)

  • Yun, SangKyun
    • Journal of KIISE
    • /
    • v.42 no.10
    • /
    • pp.1280-1285
    • /
    • 2015
  • Many hardware-based regular expression matching architectures are proposed for high performance matching. In particular, regular expression processors such as ReCPU and SMPU perform pattern matching in a similar approach to that used in general purpose processors, which provide the flexibility when updating patterns. However, these processors are inefficient in performing class matching since they do not provide character class matching capabilities. This paper proposes an instruction set and architecture of a regular expression matching processor, which can support character class matching. The proposed processor can efficiently perform character class matching since it includes character class, character range, and negated character class matching capabilities.

A Structure of Hardware Abstraction Layer for Improving OS Portability (운영체제의 이식성 향상을 위한 하드웨어 추상화 계층 구조 설계)

  • Lee, Dong-ju;Kim, Jimin;Ryu, Minsoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.3-6
    • /
    • 2012
  • 최근 응용 특화된 다양한 구조의 프로세서가 확산됨에 따라 기존 운영체제를 다른 구조의 플랫폼으로 이식하는 비용이 증가하고 있다. 기존 운영체제에서는 소스 코드 수준에서 하드웨어 의존적인 부분을 HAL(hardware abstraction layer)로 구분하여 관리함으로써 이기종 플랫폼간의 이식성을 높이고자 하였다. 그러나 기존 HAL 구조는 대부분 하드웨어의 물리적인 구조만을 고려하여 설계되어 체계적인 이식 작업이 어렵다는 문제점을 가지고 있다. 이를 위해 본 논문에서는 하드웨어의 물리적인 구조와 운영체제의 기능적인 요소를 함께 고려한 HAL 구조를 제안한다. 제안하는 HAL 구조의 효용성은 S3C2410 에서 실행하는 운영체제를 Cell BE 플랫폼으로 이식하는 사례 연구를 통해 검증하였다.

Hardware Design of High Performance Arithmetic Unit with Processing of Complex Data for Multimedia Processor (복소수 데이터 처리가 가능한 멀티미디어 프로세서용 고성능 연산회로의 하드웨어 설계)

  • Choi, Byeong-yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.1
    • /
    • pp.123-130
    • /
    • 2016
  • In this paper, a high-performance arithmetic unit which can efficiently accelerate a number of algorithms for multimedia application was designed. The 3-stage pipelined arithmetic unit can execute 38 operations for complex and fixed-point data by using efficient configuration for four 16-bit by 16-bit multipliers, new sign extension method for carry-save data, and correction constant scheme to eliminate sign-extension in compression operation of multiple partial multiplication results. The arithmetic unit has about 300-MHz operating frequency and about 37,000 gates on 45nm CMOS technology and its estimated performance is 300 MCOPS(Million Complex Operations Per Second). Because the arithmetic unit has high processing rate and supports a number of operations dedicated to various applications, it can be efficiently applicable to multimedia processors.

Analysis of Programming Techniques for Creating Optimized CUDA Software (최적화된 CUDA 소프트웨어 제작을 위한 프로그래밍 기법 분석)

  • Kim, Sung-Soo;Kim, Dong-Heon;Woo, Sang-Kyu;Ihm, In-Sung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.7
    • /
    • pp.775-787
    • /
    • 2010
  • Unlike general-purpose CPUs, the GPUs have been specialized as many-core streaming processors, and are frequently replacing the CPUs in an increasing range of computations thanks to their outstanding parallel computing capacity. In order to respond to such trend, NVIDIA has recently issued a new parallel computing architecture called CUDA(Compute Unified Device Architecture), offering a flexible GPU programming environment for GPGPU(General Purpose GPU) computing. In general, when programmers use the CUDA API, they should clearly understand many aspects of GPU's computing architecture to produce efficient parallel software. In this article, we explain several optimization techniques for CUDA programming that we have verified through a lot of experiment and trial and error, and review how those techniques affect the performance of code execution. In particular, we use a specific problem as an example to analyze several elements that affect performances, such as effective accesses to hierarchical memory system, processor occupancy, and latency hiding. In conclusion, we present several directions that may be utilized effectively in CUDA-based parallel programming.

Analysis on Memory Characteristics of Graphics Processing Units for Designing Memory System of General-Purpose Computing on Graphics Processing Units (범용 그래픽 처리 장치의 메모리 설계를 위한 그래픽 처리 장치의 메모리 특성 분석)

  • Choi, Hongjun;Kim, Cheolhong
    • Smart Media Journal
    • /
    • v.3 no.1
    • /
    • pp.33-38
    • /
    • 2014
  • Even though the performance of microprocessor is improved continuously, the performance improvement of computing system becomes hard to increase, in order to some drawbacks including increased power consumption. To solve the problem, general-purpose computing on graphics processing units(GPGPUs), which execute general-purpose applications by using specialized parallel-processing device representing graphics processing units(GPUs), have been focused. However, the characteristics of applications related with graphics is substantially different from the characteristics of general-purpose applications. Therefore, GPUs cannot exploit the outstanding computational resources sufficiently due to various constraints, when they execute general-purpose applications. When designing GPUs for GPGPU, memory system is important to effectively exploit the GPUs since typically general-purpose applications requires more memory accesses than graphics applications. Especially, external memory access requiring long latency impose a big overhead on the performance of GPUs. Therefore, the GPU performance must be improved if hierarchical memory architecture which can reduce the number of external memory access is applied. For this reason, we will investigate the analysis of GPU performance according to hierarchical cache architectures in executing various benchmarks.

The Design of a Structure of Network Co-processor for SDR(Software Defined Radio) (SDR(Software Defined Radio)에 적합한 네트워크 코프로세서 구조의 설계)

  • Kim, Hyun-Pil;Jeong, Ha-Young;Ham, Dong-Hyeon;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.2A
    • /
    • pp.188-194
    • /
    • 2007
  • In order to become ubiquitous world, the compatibility of wireless machines has become the significant characteristic of a communication terminal. Thus, SDR is the most necessary technology and standard. However, among the environment which has different communication protocol, it's difficult to make a terminal with only hardware using ASIC or SoC. This paper suggests the processor that can accelerate several communication protocol. It can be connected with main-processor, and it is specialized PHY layer of network The C-program that is modeled with the wireless protocol IEEE802.11a and IEEE802.11b which are based on widely used modulation way; OFDM and CDM is compiled with ARM cross compiler and done simulation and profiling with Simplescalar-Arm version. The result of profiling, most operations were Viterbi operations and complex floating point operations. According to this result we suggested a co-processor which can accelerate Viterbi operations and complex floating point operations and added instructions. These instructions are simulated with Simplescalar-Arm version. The result of this simulation, comparing with computing only one ARM core, the operations of Viterbi improved as fast as 4.5 times. And the operations of complex floating point improved as fast as twice. The operations of IEEE802.11a are 3 times faster, and the operations of IEEE802.11b are 1.5 times faster.