• Title/Summary/Keyword: 병렬프로세서

Search Result 578, Processing Time 0.03 seconds

Implementation of a Client Display Interface for Mobile Devices via Serial Transfer (모바일 직렬 전송방식의 클라이언트 디스플레이 인터페이스 구현)

  • Park Sang-Woo;Lee Yong-Hwan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2006.05a
    • /
    • pp.522-525
    • /
    • 2006
  • Recently, mobile devices support multi-functions such as 3D game, wireless internet, moving pictures, DMB, GPS, and PMP. Bigger size of display device is indispensable to support these functions and higher speed of the interface is needed. However, conventional parallel interfaces between processor and display nodule are not competent enough for that high speed transfers. High-speed serial interface is beginning to appear as an alternative for parallel interface. The advantages of the serial interface are high bandwidth, small number of interconnections, low-power consumption, and good quality of electro-magnetic interference. In this paper, we implement serial interface and use it for a display module. LVDS is used for PHY layer and a defined packet is used for link layer. The feature of the implemented serial interface is the reduced number of interconnections with enough bandwidth.

  • PDF

A Hybrid Value Predictor using Speculative Update of the Predictor Table and Static Classification for the Pattern of Executed Instructions in Superscalar Processors (슈퍼스칼라 프로세서에서 예상 테이블의 모험적 갱신과 명령어 실행 유형의 정적 분류를 이용한 혼합형 결과값 예측기)

  • Park, Hong-Jun;Jo, Young-Il
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.1
    • /
    • pp.107-115
    • /
    • 2002
  • We propose a new hybrid value predictor which achieves high performance by combining several predictors. Because the proposed hybrid value predictor can update the prediction table speculatively, it efficiently reduces the number of mispredicted instructions due to stale data. Also, the proposed predictor can enhance the prediction accuracy and efficiently decrease the hardware cost of predictor, because it allocates instructions into the best-suited predictor during instruction fetch stage by using the information of static classification which is obtained from the profile-based compiler implementation. For the 16-issue superscalar processors, simulation results based on the SimpleScalar/PISA tool set show that we achieve the average prediction rates of 73% by using speculative update and the average prediction rates of 88% by adding static classification for the SPECint95 benchmark programs.

Embedding Mesh-Like Networks into Petersen-Torus(PT) Networks (메쉬 부류 네트워크를 피터슨-토러스(PT) 네트워크에 임베딩)

  • Seo, Jung-Hyun;Lee, Hyeong-Ok;Jang, Moon-Suk
    • The KIPS Transactions:PartA
    • /
    • v.15A no.4
    • /
    • pp.189-198
    • /
    • 2008
  • In this paper, we prove mesh-like networks can be embedded into Petersen-Torus(PT) networks. Once interconnection network G is embedded in H, the parallel algorithm designed in Gcan be applied to interconnection network H. The torus is embedded into PT with dilation 5, link congestion 5 and expansion 1 using one-to-one embedding. The honeycomb mesh is embedded into PT with dilation 5, link congestion 2 and expansion 5/3 using one-to-one embedding. Additional, We derive average dilation. The embedding algorithm could be available in both wormhole routing system and store-and-forward routing system by embedding the generally known Torus and honeycomb mesh networks into PT at 5 or less of dilation and congestion, and the processor throughput could be minimized at simulation through one-to-one.

2-Level Adaptive Branch Prediction Based on Set-Associative Cache (세트 연관 캐쉬를 사용한 2단계 적응적 분기 예측)

  • Shim, Won
    • The KIPS Transactions:PartA
    • /
    • v.9A no.4
    • /
    • pp.497-502
    • /
    • 2002
  • Conditional branches can severely limit the performance of instruction level parallelism by causing branch penalties. 2-level adaptive branch predictors were developed to get accurate branch prediction in high performance superscalar processors. Although 2 level adaptive branch predictors achieve very high prediction accuracy, they tend to be very costly. In this paper, set-associative cached correlated 2-level branch predictors are proposed to overcome the cost problem in conventional 2-level adaptive branch predictors. According to simulation results, cached correlated predictors deliver higher prediction accuracy than conventional predictors at a significantly lower cost. The best misprediction rates of global and local cached correlated predictors using set-associative caches are 5.99% and 6.28% respectively. They achieve 54% and 17% improvements over those of the conventional 2-level adaptive branch predictors.

Efficiency Low-Power Signal Processing for Multi-Channel LiDAR Sensor-Based Vehicle Detection Platform (멀티채널 LiDAR 센서 기반 차량 검출 플랫폼을 위한 효율적인 저전력 신호처리 기법)

  • Chong, Taewon;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.7
    • /
    • pp.977-985
    • /
    • 2021
  • The LiDAR sensor is attracting attention as a key sensor for autonomous driving vehicle. LiDAR sensor provides measured three-dimensional lengths within range using LASER. However, as much data is provided to the external system, it is difficult to process such data in an external system or processor of the vehicle. To resolve these issues, we develop integrated processing system for LiDAR sensor. The system is configured that client receives data from LiDAR sensor and processes data, server gathers data from clients and transmits integrated data in real-time. The test was carried out to ensure real-time processing of the system by changing the data acquisition, processing method and process driving method of process. As a result of the experiment, when receiving data from four LiDAR sensors, client and server process was operated using background or multi-core processing, the system response time of each client was about 13.2 ms and the server was about 12.6 ms.

High-Speed Implementation to CHAM-64/128 Counter Mode with Round Key Pre-Load Technique (라운드 키 선행 로드를 통한 CHAM-64/128 카운터 모드 고속 구현)

  • Kwon, Hyeok-dong;Jang, Kyoung-bae;Park, Jae-hoon;Seo, Hwa-jeong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1217-1223
    • /
    • 2020
  • The Block cipher CHAM is lightweight block cipher for low-end processors, developed by National Security Research Institute from Korea. The mode of operation is necessity for efficient operation of block cipher, among them, the counter (CTR) mode has good efficiency because it is easy to implement and supporting parallel operation. In this paper, we propose the optimized implementation for block cipher CHAM-CTR. The proposed implementation can be skipped some rounds by pre-computation. Thus it has better calculating speed than existing CHAM. Also, this implementation pre-load some of round keys to registers, before entering round functions. It makes reduced 160cycles loading time for round key load. Finally, proposed implementation achieved higher performance about 6.8%, and 4.5% for fixed-key scenario, and variable-key scenario, respectively.

Performance Analysis of Implementation on Image Processing Algorithm for Multi-Access Memory System Including 16 Processing Elements (16개의 처리기를 가진 다중접근기억장치를 위한 영상처리 알고리즘의 구현에 대한 성능평가)

  • Lee, You-Jin;Kim, Jea-Hee;Park, Jong-Won
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.3
    • /
    • pp.8-14
    • /
    • 2012
  • Improving the speed of image processing is in great demand according to spread of high quality visual media or massive image applications such as 3D TV or movies, AR(Augmented reality). SIMD computer attached to a host computer can accelerate various image processing and massive data operations. MAMS is a multi-access memory system which is, along with multiple processing elements(PEs), adequate for establishing a high performance pipelined SIMD machine. MAMS supports simultaneous access to pq data elements within a horizontal, a vertical, or a block subarray with a constant interval in an arbitrary position in an $M{\times}N$ array of data elements, where the number of memory modules(MMs), m, is a prime number greater than pq. MAMS-PP4 is the first realization of the MAMS architecture, which consists of four PEs in a single chip and five MMs. This paper presents implementation of image processing algorithms and performance analysis for MAMS-PP16 which consists of 16 PEs with 17 MMs in an extension or the prior work, MAMS-PP4. The newly designed MAMS-PP16 has a 64 bit instruction format and application specific instruction set. The author develops a simulator of the MAMS-PP16 system, which implemented algorithms can be executed on. Performance analysis has done with this simulator executing implemented algorithms of processing images. The result of performance analysis verifies consistent response of MAMS-PP16 through the pyramid operation in image processing algorithms comparing with a Pentium-based serial processor. Executing the pyramid operation in MAMS-PP16 results in consistent response of processing time while randomly response time in a serial processor.

Performance Evaluation of VBR MPEG Video Storage and Retrieval Schemes in a VOD System (VOD 시스템에서의 가변 비트율 MPEG 비디오 저장 및 검색 기법의 성능 평가)

  • 전용희;박정숙
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.1
    • /
    • pp.13-28
    • /
    • 2001
  • In a VOD(Vide-On-Demand) system, video data are generally stored in magnetic disk array. In order to provide real-time requirement for data retrieval, video streams must be delivered continuously to the clients such that the delivery of continuous media can be guaranteed in a timely fashion. Compared to the increased performance of processors and networks, the performance of magnetic disk systems have improved only modestly. In order to improve the performance of storage system, disk array system is proposed and used. The array system improves I/O performance by placing disks in parallel and retrieving data concurrently. In this paper, two approaches are considered in order to access the video data in a VOD system, which are CTL(Constant Time Length) and CDL(Constant Data Length) access policies. Disk scheduling policies are also classified into the two categories and compared in terms of the maximum allowable video streams with different degrees of disk array synchronization, under the mixed environments in which both data access policy and disk scheduling policy are considered. Among the compared scheduling policies, LOOK was shown to have the best performance. In terms of degree of disk synchronization, more gain was achieved with large degree of synchronization. In comparisons of performance of CTL and CDL, CTL was proved to have a little superior performance in terms of number of maximum allowable streams.

  • PDF

A Pipelined Hash Join Method for Load Balancing (부하 균형 유지를 고려한 파이프라인 해시 조인 방법)

  • Moon, Jin-Gue;Park, No-Sang;Kim, Pyeong-Jung;Jin, Seong-Il
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.755-768
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new hash join methods with load balancing capabilities. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets adaptively via a frequency distribution. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join processing without staying on disks. Unless the pipelining execution of multiple hash joins includes some load balancing mechanisms, the skew effect can severely deteriorate system performance. In this paper, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

Implementation of RTOS Simulator With Execution Time Estimation (실행시간 추정 가능한 RTOS 시뮬레이터의 구현)

  • 김방현;류성준;김종현;남영광;이광용
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 2002.05a
    • /
    • pp.125-129
    • /
    • 2002
  • 실시간 운영체제(Real-Time Operating System: 이하 RTOS라 함) 개발환경에서 제공하는 도구 중에 하나인 RTOS 시뮬레이터는 타겟 하드웨어가 호스트에 연결되어 있지 않아도 호스트에서 응용프로그램의 개발과 디버깅을 가능하게 해주는 타겟 시뮬레이션 환경을 제공해 줌으로서, 개발자로 하여금 빠른 시간 내에 응용프로그램을 개발할 수 있도록 지원하며 하드웨어 개발이 완료되기 전에도 응용프로그램을 개발할 수 있게 해 준다. 그러한 이유로 현재 대부분의 상용 RTOS 개발환경에서는 RTOS 시뮬레이터를 제공하고 있다. 그러나 현재 상용 RTOS 시뮬레이터들은 대부분 RTOS의 기능적인 부분들만 호스트에서 동작하도록 구현되어 있어서 RTOS나 RTOS 응용프로그램이 실제 타겟에서 실행될 때의 실질적인 시간 추정이 불가능하다. 이러한 문제점은 실시간 시스템이 정해진 시간 내에 결과를 출력해야 하는 시스템임을 감안한다면 RTOS 시뮬레이터의 가장 큰 결점이 되기 때문에 실행시간 추정 기능을 가지면서 실용화도 가능한 RTOS 시뮬레이터가 필요하다. 본 연구에서는 이러한 문제점을 해결하여 RTOS와 RTOS 응용프로그램이 실제 타겟에서 처리될 때의 실행시간 추정이 가능하고 상용화가 가능한 기계 명령어 기반(machine instruction-based)의 RTOS 시뮬레이터를 연구 개발하였다. 나아가 실행시간의 주요 요소인 파이프라인과 캐쉬의 영향도 고려함으로서 실행시간 추정의 정확도를 향상시켰다 본 연구에서 사용된 RTOS는 한국전자통신연구원(ETRI)에서 2000년에 개발된 Q+이고, Q+가 동작하는 타겟 하드웨어는 ARM 계열의 StrongARM SA-110 마이크로프로세서와 21285 주제어기가 장착된 EBSA-285 보드이다. 측정하면서 수행하였다. 검증 결과 random 상태에서는 문헌자료에 부합되는 예측결과를 보여주었으나, intermediate와 constant 상태에서는 문헌보다 다소 낮은 속도를 보여주었다 이러한 속도차는 추후 현장 데이터를 수집하여 보다 실질적인 검증을 통하여 조정되어야 할 것으로 판단된다.지발광(1.26초)보다 구애발광(1.12초)에서 0.88배 감소하였고, 암컷에서 정지발광(2.99초)보다 구애발광(1.06초)에서 0.35배 감소하였다. 발광양상에서 발광주파수는 수짓의 정지발광에서 0.8 Hz, 수컷 구애발광에서 0.9 Hz, 암컷의 정지발광에서 0.3 Hz, 암컷의 구애발광에서 0.9 Hz로 각각 나타났다. H. papariensis의 발광파장영역은 400 nm에서 700 nm에 이르는 모든 영역에서 확인되었으며 가장 높은 첨두치는 600 nm에 있고 500에서 600 nm 사이의 파장대가 가장 두드러지게 나타났다. 발광양상과 어우러진 교미행동은 Hp system과 같은 결과를 얻었다.하는 방법을 제안한다. 즉 채널 액세스 확률을 각 슬롯에서 예약상태에 있는 음성 단말의 수뿐만 아니라 각 슬롯에서 예약을 하려고 하는 단말의 수에 기초하여 산출하는 방법을 제안하고 이의 성능을 분석하였다. 시뮬레이션에 의해 새로 제안된 채널 허용 확률을 산출하는 방식의 성능을 비교한 결과 기존에 제안된 방법들보다 상당한 성능의 향상을 볼 수 있었다., 인삼이 성장될 때 부분적인 영양상태의 불충분이나 기후 등에 따른 영향을 받을 수 있기 때문에 앞으로 이에 대한 많은 연구가 이루어져야할 것으로 판단된다.태에도 불구하고 [-wh]의미의 겹의문사는 병렬적 관계의 합성어가 아니라 내부구조를 지니지 않은 단순한 단어(minimal $X^{0}$

  • PDF