• Title/Summary/Keyword: parallel communication

Search Result 1,114, Processing Time 0.026 seconds

Low-Power CMOS image sensor with multi-column-parallel SAR ADC

  • Hyun, Jang-Su;Kim, Hyeon-June
    • Journal of Sensor Science and Technology
    • /
    • v.30 no.4
    • /
    • pp.223-228
    • /
    • 2021
  • This work presents a low-power CMOS image sensor (CIS) with a multi-column-parallel (MCP) readout structure while focusing on improving its performance compared to previous works. A delta readout scheme that utilizes the image characteristics is optimized for the MCP readout structure. By simply alternating the MCP readout direction for each row selection, additional memory for the row-to-row delta readout is not required, resulting in a reduced area of occupation compared to the previous work. In addition, the bias current of a pre-amplifier in a successive approximate register (SAR) analog-to-digital converter (ADC) changes according to the operating period to improve the power efficiency. The prototype CIS chip was fabricated using a 0.18-㎛ CMOS process. A 160 × 120 pixel array with 4.4 ㎛ pitch was implemented with a 10-bit SAR ADC. The prototype CIS demonstrated a frame rate of 120 fps with a total power consumption of 1.92 mW.

Accelerating Soft-Decision Reed-Muller Decoding Using a Graphics Processing Unit

  • Uddin, Md. Sharif;Kim, Cheol Hong;Kim, Jong-Myon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.4 no.2
    • /
    • pp.369-378
    • /
    • 2014
  • The Reed-Muller code is one of the efficient algorithms for multiple bit error correction, however, its high-computation requirement inherent in the decoding process prohibits its use in practical applications. To solve this problem, this paper proposes a graphics processing unit (GPU)-based parallel error control approach using Reed-Muller R(r, m) coding for real-time wireless communication systems. GPU offers a high-throughput parallel computing platform that can achieve the desired high-performance decoding by exploiting massive parallelism inherent in the algorithm. In addition, we compare the performance of the GPU-based approach with the equivalent sequential approach that runs on the traditional CPU. The experimental results indicate that the proposed GPU-based approach exceedingly outperforms the sequential approach in terms of execution time, yielding over 70× speedup.

Optimization of Parallel Code for Noise Prediction in an Axial Fan Using MPI One-Sided Communication (MPI 일방향통신을 이용한 축류 팬 주위 소음해석 병렬프로그램 최적화)

  • Kwon, Oh-Kyoung;Park, Keuntae;Choi, Haecheon
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.3
    • /
    • pp.67-72
    • /
    • 2018
  • Recently, noise reduction in an axial fan producing the small pressure rise and large flow rate, which is one type of turbomachine, is recognized as essential. This study describes the design and optimization techniques of MPI parallel program to simulate the flow-induced noise in the axial fan. In order to simulate the code using 100 million number of grids for flow and 70,000 points for noise sources, we parallelize it using the 2D domain decomposition. However, when it is involved many computing cores, it is getting slower because of MPI communication overhead among nodes, especially for the noise simulation. Thus, it is adopted the one-sided communication to reduce the overhead of MPI communication. Moreover, the allocated memory and communication between cores are optimized, thereby improving 2.97x compared to the original one. Finally, it is achieved 12x and 6x faster using 6,144 and 128 computing cores of KISTI Tachyon2 than using 256 and 16 computing cores for the flow and noise simulations, respectively.

A CDMA-Based Communication Network for a Multiprocessor SoC (다중 프로세서를 갖는 SoC 를 위한 CDMA 기술에 기반한 통신망 설계)

  • Chun, Ik-Jae;Kim, Bo-Gwan
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.707-710
    • /
    • 2005
  • In this paper, we propose a new communication network for on-chip communication. The network is based on a direct sequence code division multiple access (DS-CDMA) technique. The new communication network is suitable for a parallel processing system and also drastically reduces the I/O pin count. Our network architecture is mainly divided into a CDMA-based network interface (CNI), a communication channel, a synchronizer. The network includes a reverse communication channel for reducing latency. The network decouples computation task from communication task by the CNI. An extreme truncation is considered to simplify the communication link. For the scalability of the network, we use a PN-code reuse method and a hierarchical structure. The network elements have a modular architecture. The communication network is done using fully synthesizable Verilog HDL to enhance the portability between process technologies.

  • PDF

Suppression of Circulating Current in Parallel Operation of Three-Level AC/DC Converters (병렬 3레벨 AC/DC 전력변환 시스템의 영상분 순환전류 억제)

  • Son, Young-Kwang;Chee, Seung-Jun;Lee, Younggii;Sul, Seung-Ki
    • The Transactions of the Korean Institute of Power Electronics
    • /
    • v.21 no.4
    • /
    • pp.312-319
    • /
    • 2016
  • Zero-sequence Circulating Current (ZSCC) flows inevitably in parallel converters that share common DC and AC sources. The ZSCC commonly flowing in all converters increases loss and decreases the overall capacity of parallel converters. This paper proposes a simple and effective ZSCC suppression method based on the Space Vector PWM (SVPWM) with the ZSCC controller. The zero-sequence voltage for the proposed SVPWM is calculated on the basis of the grid voltage and not on the phase voltage references. The limit of the linear modulation region of the converters with the proposed method is analyzed and compared with other methods, thereby proving that the limit of the region can be extended with the proposed method. The effectiveness of the proposed method has been verified through the experimental setup comprising four parallel three-level converters. The ZSCC is confirmed to be well suppressed, and the linear modulation region is extended simultaneously with the proposed method. Moreover, the proposed control method does not require any communication between the converters to suppress the ZSCC unlike other conventional methods.

New High Speed Parallel Multiplier for Real Time Multimedia Systems (실시간 멀티미디어 시스템을 위한 새로운 고속 병렬곱셈기)

  • Cho, Byung-Lok;Lee, Mike-Myung-Ok
    • The KIPS Transactions:PartA
    • /
    • v.10A no.6
    • /
    • pp.671-676
    • /
    • 2003
  • In this paper, we proposed a new First Partial product Addition (FPA) architecture with new compressor (or parallel counter) to CSA tree built in the process of adding partial product for improving speed in the fast parallel multiplier to improve the speed of calculating partial product by about 20% compared with existing parallel counter using full Adder. The new circuit reduces the CLA bit finding final sum by N/2 using the novel FPA architecture. A 5.14nS of multiplication speed of the $16{\times}16$ multiplier is obtained using $0.25\mu\textrm{m}$ CMOS technology. The architecture of the multiplier is easily opted for pipeline design and demonstrates high speed performance.

Design of a Dingle-chip Multiprocessor with On-chip Learning for Large Scale Neural Network Simulation (대규모 신경망 시뮬레이션을 위한 칩상 학습가능한 단일칩 다중 프로세서의 구현)

  • 김종문;송윤선;김명원
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.2
    • /
    • pp.149-158
    • /
    • 1996
  • In this paper we describe designing and implementing a digital neural chip and a parallel neural machine for simulating large scale neural netsorks. The chip is a single-chip multiprocessor which has four digiral neural processors (DNP-II) of the same architecture. Each DNP-II has program memory and data memory, and the chip operates in MIMD (multi-instruction, multi-data) parallel processor. The DNP-II has the instruction set tailored to neural computation. Which can be sed to effectively simulate various neural network models including on-chip learning. The DNP-II facilitates four-way data-driven communication supporting the extensibility of parallel systems. The parallel neural machine consists of a host computer, processor boards, a buffer board and an interface board. Each processor board consists of 8*8 array of DNP-II(equivalently 2*2 neural chips). Each processor board acn be built including linear array, 2-D mesh and 2-D torus. This flexibility supports efficiency of mapping from neural network models into parallel strucgure. The neural system accomplishes the performance of maximum 40 GCPS(giga connection per second) with 16 processor boards.

  • PDF

Speedup Analysis Model for High Speed Network based Distributed Parallel Systems (고속 네트웍 기반의 분산병렬시스템에서의 성능 향상 분석 모델)

  • 김화성
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.12C
    • /
    • pp.218-224
    • /
    • 2001
  • The objective of Distributed Parallel Computing is to solve the computationally intensive problems, which have several types of parallelism, on a suite of high performance and parallel machines in a manner that best utilizes the capabilities of each machine. In this paper, we propose a computational model including the generalized graph representation method of distributed parallel systems for speedup analysis, and analyze how the super-linear speedup is achieved when scheduling of programs with diverse embedded parallelism modes onto a distributed heterogeneous supercomputing network environment. The proposed representation method can also be applied to simple homogeneous or heterogeneous systems whose components are heterogeneous only in terms of the processor speed. In order to obtain the core speedup, the matching of the parallelism characteristics between tasks and parallel machines should be carefully handled while minimizing the communication overhead.

  • PDF

Parallel Computation of a Flow Field Using FEM and Domain Decomposition Method (영역분할법과 유한요소해석을 이용한 유동장의 병렬계산)

  • Choi Hyounggwon;Kim Beomjun;Kang Sungwoo;Yoo Jung Yul
    • Proceedings of the KSME Conference
    • /
    • 2002.08a
    • /
    • pp.55-58
    • /
    • 2002
  • Parallel finite element code has been recently developed for the analysis of the incompressible Wavier-Stokes equations using domain decomposition method. Metis and MPI libraries are used for the domain partitioning of an unstructured mesh and the data communication between sub-domains, respectively. For unsteady computation of the incompressible Navier-Stokes equations, 4-step splitting method is combined with P1P1 finite element formulation. Smagorinsky and dynamic model are implemented for the simulation of turbulent flows. For the validation performance-estimation of the developed parallel code, three-dimensional Laplace equation has been solved. It has been found that the speed-up of 40 has been obtained from the present parallel code fir the bench mark problem. Lastly, the turbulent flows around the MIRA model and Tiburon model have been solved using 32 processors on IBM SMP cluster and unstructured mesh. The computed drag coefficient agrees better with the existing experiment as the mesh resolution of the region increases, where the variation of pressure is severe.

  • PDF

PERFORMANCE ANALYSIS OF THE PARALLEL CUPID CODE IN DISTRIBUTED MEMORY SYSTEM BASED ETHERNET AND INFINIBAND NETWORK (이더넷과 인피니밴드 네트워크 기반의 분산 메모리 시스템에서 병렬성능 분석)

  • Jeon, B.J.;Choi, H.G.
    • Journal of computational fluids engineering
    • /
    • v.19 no.2
    • /
    • pp.24-29
    • /
    • 2014
  • In this study, a parallel performance of CUPID-code has been investigated for both Ethernet and Infiniband network system to examine the effect of cache memory and network-speed. Bi-conjugate gradient solver of CUPID-code has been parallelised by using domain decomposition method and message passing interface (MPI). It is shown that the parallel performance of Ethernet-network system is worse than that of Infiniband-network system due to the slow network-speed and a small cache memory. It is also found that the parallel performance of each system deteriorates for a small problem due to the communication overhead, but the performance of Infiniband-network system is better than Ethernet-network system due to a much faster network-speed. For a large problem, the parallel performance depends less on network system.