• 제목/요약/키워드: parallel communication

검색결과 1,114건 처리시간 0.027초

Low-Power CMOS image sensor with multi-column-parallel SAR ADC

  • Hyun, Jang-Su;Kim, Hyeon-June
    • 센서학회지
    • /
    • 제30권4호
    • /
    • pp.223-228
    • /
    • 2021
  • This work presents a low-power CMOS image sensor (CIS) with a multi-column-parallel (MCP) readout structure while focusing on improving its performance compared to previous works. A delta readout scheme that utilizes the image characteristics is optimized for the MCP readout structure. By simply alternating the MCP readout direction for each row selection, additional memory for the row-to-row delta readout is not required, resulting in a reduced area of occupation compared to the previous work. In addition, the bias current of a pre-amplifier in a successive approximate register (SAR) analog-to-digital converter (ADC) changes according to the operating period to improve the power efficiency. The prototype CIS chip was fabricated using a 0.18-㎛ CMOS process. A 160 × 120 pixel array with 4.4 ㎛ pitch was implemented with a 10-bit SAR ADC. The prototype CIS demonstrated a frame rate of 120 fps with a total power consumption of 1.92 mW.

Accelerating Soft-Decision Reed-Muller Decoding Using a Graphics Processing Unit

  • Uddin, Md. Sharif;Kim, Cheol Hong;Kim, Jong-Myon
    • 예술인문사회 융합 멀티미디어 논문지
    • /
    • 제4권2호
    • /
    • pp.369-378
    • /
    • 2014
  • The Reed-Muller code is one of the efficient algorithms for multiple bit error correction, however, its high-computation requirement inherent in the decoding process prohibits its use in practical applications. To solve this problem, this paper proposes a graphics processing unit (GPU)-based parallel error control approach using Reed-Muller R(r, m) coding for real-time wireless communication systems. GPU offers a high-throughput parallel computing platform that can achieve the desired high-performance decoding by exploiting massive parallelism inherent in the algorithm. In addition, we compare the performance of the GPU-based approach with the equivalent sequential approach that runs on the traditional CPU. The experimental results indicate that the proposed GPU-based approach exceedingly outperforms the sequential approach in terms of execution time, yielding over 70× speedup.

MPI 일방향통신을 이용한 축류 팬 주위 소음해석 병렬프로그램 최적화 (Optimization of Parallel Code for Noise Prediction in an Axial Fan Using MPI One-Sided Communication)

  • 권오경;박근태;최해천
    • 정보처리학회논문지:컴퓨터 및 통신 시스템
    • /
    • 제7권3호
    • /
    • pp.67-72
    • /
    • 2018
  • 축류 팬(axial fan)은 팬이 회전하면서 작은 압력 상승을 만들어 다량의 공기를 불어주는 유체 기계로써 최근 축류 팬의 소음 저감이 중요하게 인식되고 있다. 본 연구는 팬 주위의 유동 소음을 해석하는 MPI 병렬프로그램 방법 및 최적화 기법에 대해 다룬다. 이때 수억 개 이상의 격자에서 수만 포인트의 소음원을 해석하기 위해서 2차원 도메인 분할 방법을 사용해서 MPI 병렬화를 하였다. 이때 대규모 계산 시 MPI 프로세스 간의 통신이 많이 발생하여 성능이 심각하게 느려지는 현상이 발생한다. 이를 극복하기 위해 MPI 일방향 통신을 적용하였다. 뿐만 아니라 통신 및 메모리 최적화 방법을 통해 최대 2.97배 향상시켰다. 마지막으로 KISTI 타키온2 슈퍼컴퓨터를 활용하여 전체 시뮬레이션 실험에서 유동 계산 시 6,144코어에서 최대 12배, 소음 계산 시 128코어에서 최대 6배의 성능향상을 달성하였다.

다중 프로세서를 갖는 SoC 를 위한 CDMA 기술에 기반한 통신망 설계 (A CDMA-Based Communication Network for a Multiprocessor SoC)

  • 천익재;김보관
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2005년도 추계종합학술대회
    • /
    • pp.707-710
    • /
    • 2005
  • In this paper, we propose a new communication network for on-chip communication. The network is based on a direct sequence code division multiple access (DS-CDMA) technique. The new communication network is suitable for a parallel processing system and also drastically reduces the I/O pin count. Our network architecture is mainly divided into a CDMA-based network interface (CNI), a communication channel, a synchronizer. The network includes a reverse communication channel for reducing latency. The network decouples computation task from communication task by the CNI. An extreme truncation is considered to simplify the communication link. For the scalability of the network, we use a PN-code reuse method and a hierarchical structure. The network elements have a modular architecture. The communication network is done using fully synthesizable Verilog HDL to enhance the portability between process technologies.

  • PDF

병렬 3레벨 AC/DC 전력변환 시스템의 영상분 순환전류 억제 (Suppression of Circulating Current in Parallel Operation of Three-Level AC/DC Converters)

  • 손영광;지승준;이영기;설승기
    • 전력전자학회논문지
    • /
    • 제21권4호
    • /
    • pp.312-319
    • /
    • 2016
  • Zero-sequence Circulating Current (ZSCC) flows inevitably in parallel converters that share common DC and AC sources. The ZSCC commonly flowing in all converters increases loss and decreases the overall capacity of parallel converters. This paper proposes a simple and effective ZSCC suppression method based on the Space Vector PWM (SVPWM) with the ZSCC controller. The zero-sequence voltage for the proposed SVPWM is calculated on the basis of the grid voltage and not on the phase voltage references. The limit of the linear modulation region of the converters with the proposed method is analyzed and compared with other methods, thereby proving that the limit of the region can be extended with the proposed method. The effectiveness of the proposed method has been verified through the experimental setup comprising four parallel three-level converters. The ZSCC is confirmed to be well suppressed, and the linear modulation region is extended simultaneously with the proposed method. Moreover, the proposed control method does not require any communication between the converters to suppress the ZSCC unlike other conventional methods.

실시간 멀티미디어 시스템을 위한 새로운 고속 병렬곱셈기 (New High Speed Parallel Multiplier for Real Time Multimedia Systems)

  • 조병록;이명옥
    • 정보처리학회논문지A
    • /
    • 제10A권6호
    • /
    • pp.671-676
    • /
    • 2003
  • 본 논문에서는 고속 병렬 곱셈기에서 속도향상을 위해 부분 곱을 가산하는 과정에 구성되는 CSA(Carry Select Adder) 트리에 새로운 압축기를 적용한 새로운 첫 번째 부분 곱가산(First Partial Product Addition : FPA)를 제안하여 기존의 전가산기를 이용한 병렬가산기보다 부분곱을 계산하는 속도를 약 20% 개선할 수 있게 했다. 새로운 회로는 새로운 FPA 구조를 사용하여 최종 합 CLA 비트를 N/2로 줄인다. 2.5v 0.25um CMOS 기술을 이용하여 제작된 16${\times}$16 곱셈기는 5.14nS의 곱셈 고속을 얻었다. 이 곱셈기의 구조는 파이프라인 설계에 용이하며 고성능을 낸다.

대규모 신경망 시뮬레이션을 위한 칩상 학습가능한 단일칩 다중 프로세서의 구현 (Design of a Dingle-chip Multiprocessor with On-chip Learning for Large Scale Neural Network Simulation)

  • 김종문;송윤선;김명원
    • 전자공학회논문지B
    • /
    • 제33B권2호
    • /
    • pp.149-158
    • /
    • 1996
  • In this paper we describe designing and implementing a digital neural chip and a parallel neural machine for simulating large scale neural netsorks. The chip is a single-chip multiprocessor which has four digiral neural processors (DNP-II) of the same architecture. Each DNP-II has program memory and data memory, and the chip operates in MIMD (multi-instruction, multi-data) parallel processor. The DNP-II has the instruction set tailored to neural computation. Which can be sed to effectively simulate various neural network models including on-chip learning. The DNP-II facilitates four-way data-driven communication supporting the extensibility of parallel systems. The parallel neural machine consists of a host computer, processor boards, a buffer board and an interface board. Each processor board consists of 8*8 array of DNP-II(equivalently 2*2 neural chips). Each processor board acn be built including linear array, 2-D mesh and 2-D torus. This flexibility supports efficiency of mapping from neural network models into parallel strucgure. The neural system accomplishes the performance of maximum 40 GCPS(giga connection per second) with 16 processor boards.

  • PDF

고속 네트웍 기반의 분산병렬시스템에서의 성능 향상 분석 모델 (Speedup Analysis Model for High Speed Network based Distributed Parallel Systems)

  • 김화성
    • 한국통신학회논문지
    • /
    • 제26권12C호
    • /
    • pp.218-224
    • /
    • 2001
  • 분산병렬처리의 목적은 다양한 내재 병렬 형태의 특징을 갖는 연산 집약적 문제를 고속 네트웍으로 연결되어진 다수의 고성능 및 병렬 컴퓨터들의 각기 다른 능력을 최대한 이용하여 해결함에 있다. 본 논문에서는 분산병렬시스템을 이용하는 경우의 성능 향상 분석을 위해 일반적인 그래프 표현 방법을 포함하는 계산 모델을 제안하고 프로그램의 수행을 위한 스케쥴링 시에 성능 향상이 어떠한 요인에 의해 달성되는지를 분석한다. 제안된 표현 방법은 동기종 및 이기종 시스템 모두에 적용되어질 수 있다. 분산병렬 시스템에서 스케줄링을 통하여 더 많은 속도향상을 얻기 위해서는 태스크와 병렬 컴퓨터간의 병렬특성의 일치가 주의 질게 다루어져야 하며 태스크의 이동으로 인한 통신 오버 헤드가 최소화 되어야 한다.

  • PDF

영역분할법과 유한요소해석을 이용한 유동장의 병렬계산 (Parallel Computation of a Flow Field Using FEM and Domain Decomposition Method)

  • 최형권;김범준;강성우;유정열
    • 대한기계학회:학술대회논문집
    • /
    • 대한기계학회 2002년도 학술대회지
    • /
    • pp.55-58
    • /
    • 2002
  • Parallel finite element code has been recently developed for the analysis of the incompressible Wavier-Stokes equations using domain decomposition method. Metis and MPI libraries are used for the domain partitioning of an unstructured mesh and the data communication between sub-domains, respectively. For unsteady computation of the incompressible Navier-Stokes equations, 4-step splitting method is combined with P1P1 finite element formulation. Smagorinsky and dynamic model are implemented for the simulation of turbulent flows. For the validation performance-estimation of the developed parallel code, three-dimensional Laplace equation has been solved. It has been found that the speed-up of 40 has been obtained from the present parallel code fir the bench mark problem. Lastly, the turbulent flows around the MIRA model and Tiburon model have been solved using 32 processors on IBM SMP cluster and unstructured mesh. The computed drag coefficient agrees better with the existing experiment as the mesh resolution of the region increases, where the variation of pressure is severe.

  • PDF

이더넷과 인피니밴드 네트워크 기반의 분산 메모리 시스템에서 병렬성능 분석 (PERFORMANCE ANALYSIS OF THE PARALLEL CUPID CODE IN DISTRIBUTED MEMORY SYSTEM BASED ETHERNET AND INFINIBAND NETWORK)

  • 전병진;최형권
    • 한국전산유체공학회지
    • /
    • 제19권2호
    • /
    • pp.24-29
    • /
    • 2014
  • In this study, a parallel performance of CUPID-code has been investigated for both Ethernet and Infiniband network system to examine the effect of cache memory and network-speed. Bi-conjugate gradient solver of CUPID-code has been parallelised by using domain decomposition method and message passing interface (MPI). It is shown that the parallel performance of Ethernet-network system is worse than that of Infiniband-network system due to the slow network-speed and a small cache memory. It is also found that the parallel performance of each system deteriorates for a small problem due to the communication overhead, but the performance of Infiniband-network system is better than Ethernet-network system due to a much faster network-speed. For a large problem, the parallel performance depends less on network system.