• Title/Summary/Keyword: 고성능 프로세서

Search Result 235, Processing Time 0.027 seconds

A Study of a Task Mapping Technique for heterogeneous MPSoCs (이기종 MPSoC 를 위한 태스크 매핑 기법 연구)

  • Cho, Jungseok;Jung, Youjin;Cho, Doosan
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.18-19
    • /
    • 2014
  • 멀티프로세서 시스템 온칩 (MPSoC) 플랫폼은 고성능 임베디드 시스템을 위한 핵심 구성요소이다. MPSoC 를 구성하는 각각의 처리요소 (processing element, PE)는 대응하는 태스크의 연산 특징에 맞춤으로 최적화되어 있어야 한다. 갈수록 증가하는 고성능의 요구에 따라 동종 MPSoC 는 각각의 태스크 연산 특징에 최적화된 다양한 PE 를 보유한 이기종 MPSoC 로 발전되어 왔다. 따라서 이기종 MPSoC 의 코어들은 응용에 특화된 맞춤형 명령어 세트로 설계된다. 하지만 이러한 이기종성은 다양한 태스크로 구성된 응용들을 어떻게 서로 다른 특성을 지닌 PE 들에 매핑해야 최적의 시스템을 구성할 지를 결정해야 하는 부담을 컴파일러와 같은 툴에 지우고 있다. 잘못된 매핑은 시스템 성능을 현저히 저하시킬 소지가 있다. 본 연구에서는 멀티미디어 응용 태스크의 연산 패턴을 분석하여 최적의 태스크 매핑을 결정하는 기법을 제안하고 있다.

Enhanced Processor-Architecture for the Faster Processing of Genetic Algorithm (유전 알고리즘 처리속도 향상을 위한 강화 프로세서 구조)

  • Yoon, Han-Ul;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.2
    • /
    • pp.224-229
    • /
    • 2005
  • Generally, genetic algorithm (GA) has too much time and space complexity when it is running in the typical processor. Therefore, we are forced to use the high-performance and expensive processor by this reason. It also works as a barrier to implement real device, such a small mobile robot, which is required only simple rules. To solve this problem, this paper presents and proposes enhanced processor-architecture for the faster GA processing. A typical processor architecture can be enhanced and specialized by two approaches: one is a sorting network, the other is a residue number system (RNS). A sorting network can improve the time complexity of which needs to compare the populations' fitness. An RNS can reduce the magnitude of the largest bit that dictates the speed of arithmetic operation. Consequently, it can make the total logic size smaller and innovate arithmetic operation speed faster.

Implementation of an Optimal Many-core Processor for Beamforming Algorithm of Mobile Ultrasound Image Signals (모바일 초음파 영상신호의 빔포밍 기법을 위한 최적의 매니코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.8
    • /
    • pp.119-128
    • /
    • 2011
  • This paper introduces design space exploration of many-core processors that meet high performance and low power required by the beamforming algorithm of image signals of mobile ultrasound. For the design space exploration of the many-core processor, we mapped different number of ultrasound image data to each processing element of many-core, and then determined an optimal many-core processor architecture in terms of execution time, energy efficiency and area efficiency. Experimental results indicate that PE=4096 and 1024 provide the highest energy efficiency and area efficiency, respectively. In addition, PE=4096 achieves 46x and 10x better than TI DSP C6416, which is widely used for ultrasound image devices, in terms of energy efficiency and area efficiency, respectively.

Efficient Indirect Branch Predictor Based on Data Dependence (효율적인 데이터 종속 기반의 간접 분기 예측기)

  • Paik Kyoung-Ho;Kim Eun-Sung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.4 s.310
    • /
    • pp.1-14
    • /
    • 2006
  • The indirect branch instruction is a most substantial obstacle in utilizing ILP of modem high performance processors. The target address of an indirect branch has the polymorphic characteristic varied dynamically, so it is very difficult to predict the accurate target address. Therefore the performance of a processor with speculative methodology is reduced significantly due to the many execution cycle delays in occurring the misprediction. We proposed the very accurate and novel indirect branch prediction scheme so called data-dependence based prediction. The predictor results in the prediction accuracy of 98.92% using 1K entries, and. 99.95% using 8K But, all of the proposed indirect predictor including our predictor has a large hardware overhead for restoring expected target addresses as well as tags for alleviating an aliasing. Hence, we propose the scheme minimizing the hardware overhead without sacrificing the prediction accuracy. Our experiment results show that the hardware is reduced about 60% without the performance loss, and about 80% sacrificing only the performance loss of 0.1% in aspect of the tag overhead. Also, in aspect of the overhead of storing target addresses, it can save the hardware about 35% without the performance loss, and about 45% sacrificing only the performance loss of 1.11%.

Design and Analysis of a Class of Fault Tolerant Multistage Interconnection Networks: the Augmented Modified Delta (AMD) Network (AMD 고장감내 다단계 상호 연결망의 설계 및 분석)

  • Kim, Jung-Sun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.9
    • /
    • pp.2259-2268
    • /
    • 1997
  • Multistage interconnection networks(MINs) provide a high-bandwidth communication between processors and/or memory modules in a cost-effective way. In this paper, we propose a class of multipath MINs, called the Augmented Modified Delta(AMD) network, and analyze its performance and reliability. The salient features of the AMD network include fault-tolerant capability, modular structure, and high performance, which are essential for real-time parallel/distributed processing environments. The class of the AMD network retains well-known characteristics of the Kappa network, but it's design procedure is more systematic. Like Delta networks, all the AMD networks are topologically equivalent with each other.

  • PDF

Development of WLAN AP based on IBM 405GP (IBM PowerPC 405GP를 이용한 Wireless LAN Access Point 개발에 관한 연구)

  • Kim Do-Gyu
    • The Journal of Information Technology
    • /
    • v.6 no.3
    • /
    • pp.65-73
    • /
    • 2003
  • The evaluation AP embedded Linux board is implemented. The board is made of IBM 405 GP processor, PPCBoot-1.2.1 boot loader, Linux-2.4.21 kernel and root file system. The evaluation board has two flash memories, boot flash and application flash of size 512Kbyte and 16Mbyte, respectively. And it supports IEEE 802.11a which provide the maximum throughput of 54Mbps in the 5.2GHz frequency band. MTD(Memory Technology Device) and JFFS2(Journalling Flash File System version 2) technologies are adopted to optimally package the system software, boot loader, kernel and root file system. And in order to optimize root file system, busybox package and tiny login are used. Linux kernel and root file system is combined together with mkimage utility.

  • PDF

Sequential and Selective Recovery Mechanism for Value Misprediction (값 예측 오류를 위한 순차적이고 선택적인 복구 방식)

  • 이상정;전병찬
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.1_2
    • /
    • pp.67-77
    • /
    • 2004
  • Value prediction is a technique to obtain performance gains by supplying earlier source values of its data dependent instructions using predicted value of a instruction. To fully exploit the potential of value speculation, however, the efficient recovery mechanism is necessary in case of value misprediction. In this paper, we propose a sequential and selective recovery mechanism for value misprediction. It searches data dependency chain of the mispredicted instruction sequentially without pipeline stalls and adverse impact on clock cycle time. In our scheme, only the dependent instructions on the predicted instruction is selectively squashed and reissued in case of value misprediction.

Design of Battery Charge-Discharge Controller for Renewable Energy System -Focusing on Solar Battery Charge-Discharge Controller - (신재생 에너지 시스템을 위한 축전지 충방전 컨트롤러 설계 -태양광 발전 축전지 충방전 컨트롤러를 중심으로-)

  • Lee, Jae-Min
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.8 no.6
    • /
    • pp.1363-1368
    • /
    • 2007
  • In order to utilize renewal energy such as solar power and wind power, high performance battery charge-discharge controller is essentially needed. In this paper, a PIC microprocessor-based battery charge-discharge controller for solar power system is designed and implemented. The PIC16C711 microprocessor and CCS-C compiler are used to realize stable and accurate operation of the battery controller. The proposed controller is designed to utilize the charged battery power during daytime to provide convenience to user. Current control function is included in proposed controller to cope with various type of new material energy system coming in the near future.

  • PDF

On parallel computation for 3-d analysis of flow/wave field (3차원 유동/파동장 해석을 위한 병렬계산에 관한 고찰)

  • Lee, Woo-Dong;Hur, Dong-Soo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.88-88
    • /
    • 2019
  • 컴퓨터 성능향상과 수치해석기법의 발달로 인해 Navier-Stokes 방정식에 기초한 수치모델을 활용한 3차원 유동/파동장 해석이 증가하고 있는 추세이다. 그러나 아직까지 Navier-Stokes 방정식 모델의 계산부하를 PC에서 소화하기에는 무리가 따른다. 게다가 실험실 스케일을 벗어나, 실제 현장을 계산영역으로 설정할 경우에는 계산량이 엄청나게 증가하게 된다. 이것을 극복하기 위해서는 반듯이 병렬계산을 수행하여야 한다. 본 연구에서는 계산부하가 큰 Navier-Stokes 방정식 기반의 3차원 수치모델 LES-WASS-3D를 활용한 대용량 병렬계산체계를 구축한다. 나아가 3차원 정밀 또는 광역의 유동/파동장 해석에 있어서 병렬계산체계의 성능과 적용성을 검토한다. 현재 보급되고 있는 PC들은 모두 멀티프로세서가 장착됨으로 손쉽게 병렬계산을 수행할 수 있다. 그러나 정밀 또는 광역해석을 위해서는 대용량 병렬계산 컴퓨터가 요구된다. 따라서 본 연구에서는 보조프로세서를 장착한 공유메모리 환경의 고성능 병렬계산체계를 구축한다. 나아가 포트란 기반의 순차코드로 구축된 기존 3차원 Navier-Stokes 방정식 모델 LES-WASS- 3D를 병렬코드로 변환한다. 병렬계산 성능 및 적용성을 검토하기 위한 수치해석을 수행한다. 이상의 과정을 통해 본 연구에서 구축한 병렬계산체계의 성능 및 적용성을 확인할 수 있었다. 그리고 3차원 유동/파동장 해석에 있어서 정확도 향상뿐 아니라, 계산영역을 확장할 수 있는 계기가 마련되었다. 또한 유동/파동 해석보다 많은 계산시간이 필요한 지형변동 해석에도 충분히 적용될 수 있다고 판단된다.

  • PDF

An Architecture of a high efficient ALU for 3D Graphics Shader Processor (3D 그래픽 쉐이더 프로세서를 위한 고효율 연산기 구조)

  • Kim, Woo-Young;Lee, Bo-Haeng;Lee, Kwang-Yeob;Park, Tae-ryung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.229-232
    • /
    • 2009
  • In this paper, we propose a new programmable shader architecture based on an effective ALU operation. Today's mobile devices need the programmable shader processor for a three-dimensional(3D) graphics. The programmable shader processors require a lager ALU than a fixed pipeline ALU used previously. The proposed ALU architecture is able to execute two different arithmetic operations at the same time. Two instructions which need exclusive ALU operations are inserted into instruction decoders in parallel. Experimental results show the number of instruction cycles can be substantially reduced up to 40%.

  • PDF