• Title/Summary/Keyword: parallel communication

Search Result 1,114, Processing Time 0.034 seconds

A Reconfigurable Load and Performance Balancing Scheme for Parallel Loops in a Clustered Computing Environment (클러스터 컴퓨팅 환경에서 병렬루프 처리를 위한 재구성 가능한 부하 및 성능 균형 방법)

  • 김태형
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.10 no.1
    • /
    • pp.49-56
    • /
    • 2004
  • Load imbalance is a serious impediment to achieving good performance in parallel processing. Global load balancing schemes cannot adequately manage to balance parallel tasks generated from a single application. Dynamic loop scheduling methods are known to be useful in balancing parallel loops on shared-memory multiprocessor machines. However, their centralized nature causes a bottleneck for the relatively small number of processors in a network of workstations because of order-of-magniture differences in communication overheads. Moreover, improvements of basis loops scheduling methods have not effectively dealt with irregularly distributed workloads in parallel loops, which commonly occur in applications for a network of workstation. In this paper, we present a new reconfigurable and decentralized balancing method for parallel loops on a network of workstations. Since our method supplements performance balancing with those tranditional load balancing methods, it minimizes the overall execution time.

Design of Bit-Parallel Multiplier over Finite Field $GF(2^m)$ (유한체 $GF(2^m)$상의 비트-병렬 곱셈기의 설계)

  • Seong, Hyeon-Kyeong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.7
    • /
    • pp.1209-1217
    • /
    • 2008
  • In this paper, we present a new bit-parallel multiplier for performing the bit-parallel multiplication of two polynomials in the finite fields $GF(2^m)$. Prior to construct the multiplier circuits, we consist of the vector code generator(VCG) to generate the result of bit-parallel multiplication with one coefficient of a multiplicative polynomial after performing the parallel multiplication of a multiplicand polynomial with a irreducible polynomial. The basic cells of VCG have two AND gates and two XOR gates. Using these VCG, we can obtain the multiplication results performing the bit-parallel multiplication of two polynomials. Extending this process, we show the design of the generalized circuits for degree m and a simple example of constructing the multiplier circuit over finite fields $GF(2^4)$. Also, the presented multiplier is simulated by PSpice. The multiplier presented in this paper use the VCGs with the basic cells repeatedly, and is easy to extend the multiplication of two polynomials in the finite fields with very large degree m, and is suitable to VLSI.

Hybrid Atmospheric Compensation in Free-Space Optical Communication

  • Wang, Tingting;Zhao, Xiaohui
    • Journal of the Optical Society of Korea
    • /
    • v.20 no.1
    • /
    • pp.13-21
    • /
    • 2016
  • Since the direct-gradient (DG) method uses the Shack-Hartmann wave front sensor (SH-WFS), based on the phase-conjugation principle, for atmospheric compensation in free-space optical (FSO) communication, it cannot effectively correct high-order aberrations. While the stochastic parallel gradient descent (SPGD) can compensate the distorted wave front, it requires more calculations, which is sometimes undesirable for an FSO system. A hybrid compensation (HC) method is proposed by properly using the DG method and SPGD algorithm to improve the performance of FSO communication. Simulations show that this method can well compensate wave-front aberrations and upgrade the coupling efficiency with few computations, preferable correction results, and rapid convergence rate.

A Reconfigurable Digital Signal Processing Architecture for the Evolvable Hardware System (진화 하드웨어 시스템을 위한 재구성 가능한 디지털 신호처리 구조)

  • Lee, Han-Ho;Choi, Chang-Seok;Lee, Yong-Min;Choi, Jin-Tack;Lee, Chong-Ho;Chung, Duk-Jin
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.663-664
    • /
    • 2006
  • This paper presents a reconfigurable digital signal processing(rDSP) architecture that is effective for implementing adaptive digital signal processing in the applications of smart health care system. This rDSP architecture employs an evolution capability of FIR filters using genetic algorithm. Parallel genetic algorithm based rDSP architecture evolves FIR filters to explore optimal configuration of filter combination, associated parameters, and structure of feature space adaptively to noisy environments for an adaptive signal processing. The proposed DSP architecture is implemented using Xilinx Virtex4 FPGA device and SMIC 0.18um CMOS Technology.

  • PDF

Research Trends for Improving MPI Collective Communication Performance (MPI 집합통신 성능 향상 연구 동향)

  • H.Y., Ahn;Y.M., Park;S.Y., Kim;W.J., Han
    • Electronics and Telecommunications Trends
    • /
    • v.37 no.6
    • /
    • pp.43-53
    • /
    • 2022
  • Message Passing Interface (MPI) collective communication has been applied to various science and engineering area such as physics, chemistry, biology, and astronomy. The parallel computing performance of the data-intensive workload in the above research fields depends on the collective communication performance. To overcome this limitation, MPI collective communication technology has been extensively researched over the last several decades to improve communication performance. In this paper, we provide a comprehensive survey of the state-of-the-art research performed on the MPI collective communication and examine the trends of recently developed technologies. We also discuss future research directions for providing high performance and scalability to large-scale MPI applications.

Design and Performance Evaluation of MIN for Nonuniform Traffic (비균등 트래픽을 위한 MIN의 설계 및 성능 평가)

  • Choe, Chang-Hun;Kim, Seong-Cheon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.37 no.6
    • /
    • pp.1-9
    • /
    • 2000
  • This paper presents a Cluster Oriented Multistage Interconnection Network called COMR. COMR can be constructed suitable for the parallel application with localized communication by providing the shortcut path inside the processor-memory cluster which has frequent data communication. We evaluate the performance of COMR with respect to probability of acceptance, bandwidth, cost-effectiveness and average distance under varying degrees of localized communication. According to the result of analysis for performance evaluation, COMR shows higher performance than the regular MINs of the same network size in the highly localized communication. In the worst case, the diameter of an N$\times$N COMR is only n+1 which has only one stage more as compared the MIN with the same network size. Therefore COMR can be used as an attractive interconnection network for parallel applications with not only the localized communication distribution but also the uniform distribution in shared-memory multiprocessor system.

  • PDF

The PALM system : Architecture and Network Performance (PALM시스템의 구조와 네트웍 성능)

  • Kim, Suk-Il
    • The Transactions of the Korea Information Processing Society
    • /
    • v.1 no.1
    • /
    • pp.105-113
    • /
    • 1994
  • This paper introduces the Parallel Advanced Loosely coupled Multiprocessor (PALM) architecture, which is based on HCH(m,p), where m is number of links per a communication processor (CP) and p is the number of application processors (APs) connected to the CP. communication links between a pair of CPs and/or between a CP and an AP, are made of dual-Port RAMs, which provide fast and reliable word-parallel communication between processors. Among the wide spectrum of HCH networks, HCH(m,2) is also known to be a cost optimal topology, such that HCH(m,2) consists of the largest number of APs retaining the minimal number of CPs and communication links. We also implement a testbed based on HCH(2,2). The experiment result shows that the small communication/computation ratio of the PALM system would realize fine-grain parallelism on message-passing MIMD systems.

  • PDF

Data Transmission System Applying An Adaptive Threshold Based Multi-channel Sound (적응적 임계치를 적용한 멀티 채널 소리 기반의 데이터 전송 시스템)

  • Gang, Hyun-Mo;Jung, Jin-Woo;Choi, Chun-Yong;Kwon, Young-Hun;Lee, Sung-Koo
    • Journal of Digital Contents Society
    • /
    • v.15 no.1
    • /
    • pp.93-99
    • /
    • 2014
  • Recently Wireless communication among short-distance devices has come to notice due to smart phone generalization recently. However, instead of setting up additional H/W, communication technology providing wireless communication based on S/W is in need due to limited availability of NFC's use. Accordingly, short-distance wireless communication technology that makes great use of mike and speaker which installed in every device draws attention. Our thesis suggests improvement of acoustic transmission speed by applying multi-channel parallel transmission and advancement of transmission rate that differed from each mike's own characteristics through optimizing adaptive threshold. The study is not only just applied in specific and limited conditions such as promoting corporation and payments system but also fast and convenient data transmit system general users-oriented.

DEVS 형식론을 이용한 다중프로세서 운영체제의 모델링 및 성능평가

  • 홍준성
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 1994.10a
    • /
    • pp.32-32
    • /
    • 1994
  • In this example, a message passing based multicomputer system with general interdonnedtion network is considered. After multicomputer systems are developed with morm-hole routing network, topologies of interconecting network are not major considertion for process management and resource sharing. Tehre is an independeent operating system kernel oneach node. It communicates with other kernels using message passingmechanism. Based on this architecture, the problem is how mech does performance degradation will occur in the case of processor sharing on multicomputer systems. Processor sharing between application programs is veryimprotant decision on system performance. In almost cases, application programs running on massively parallel computer systems are not so much user-interactive. Thus, the main performance index is system throughput. Each application program has various communication patterns. and the sharing of processors causes serious performance degradation in hte worst case such that one processor is shared by two processes and another processes are waiting the messages from those processes. As a result, considering this problem is improtant since it gives the reason whether the system allows processor sharingor not. Input data has many parameters in this simulation . It contains the number of threads per task , communication patterns between threads, data generation and also defects in random inupt data. Many parallel aplication programs has its specific communication patterns, and there are computation and communication phases. Therefore, this phase informatin cannot be obtained random input data. If we get trace data from some real applications. we can simulate the problem more realistic . On the other hand, simualtion results will be waseteful unless sufficient trace data with varisous communication patterns is gathered. In this project , random input data are used for simulation . Only controllable data are the number of threads of each task and mapping strategy. First, each task runs independently. After that , each task shres one and more processors with other tasks. As more processors are shared , there will be performance degradation . Form this degradation rate , we can know the overhead of processor sharing . Process scheduling policy can affects the results of simulation . For process scheduling, priority queue and FIFO queue are implemented to support round-robin scheduling and priority scheduling.

  • PDF

A Comparative Study on Parallel Import between Korea and China- Focused on Intellectual Property Rights (한국과 중국의 병행수입제도에 관한 비교연구- 지적재산권을 중심으로)

  • Huang, Yi-Qing;Cho, Hyun-Sook
    • International Commerce and Information Review
    • /
    • v.16 no.4
    • /
    • pp.79-102
    • /
    • 2014
  • A parallel importation is a non-counterfeit product imported from another country without the permission of the intellectual property owner. It is caused by price differences between countries. Therefore parallel importation are implication in issues of international trade and intellectual property rights(hereafter referred as IPR). This paper provides parallel importation issues of Korea and China under the IPR laws such as patent, trademarks, copyright and analyzes difference between two countries. In China, patent law regulates exhaustion rights which is based theory of a parallel import for the first time unlike trademark law and copyright law. On the other hands, Korea rules parallel importing under Korean customs regulations. In conclusion, two countries have no provisions that advocate a parallel import under IPR laws. This paper suggests some improvements to overcome the limitation of current regulation system and avoid trade friction between two countries. First of all, two countries should clearly make a rule about parallel import in IPR law such as definition of parallel importation, genuine goods, permission conditions, importing proses, penalty and remedy etc. Secondly, two countries should prohibit an abuse of a exclusive import agent's rights and manage a parallel importer not to cause consumer's complain about goods to expansion parallel imports. Finally, two countries should cooperate not to cause disputes about this issue with a communication channel.

  • PDF