• Title/Summary/Keyword: 병렬시스템

Search Result 2,501, Processing Time 0.027 seconds

Parallel Computation for Extended Edit Distances Using the Shared Memory on GPU (GPU의 공유메모리를 활용한 확장편집거리 병렬계산)

  • Kim, Youngho;Na, Joong Chae;Sim, Jeong Seop
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.7
    • /
    • pp.213-218
    • /
    • 2015
  • Given two strings X and Y (|X|=m, |Y|=n) over an alphabet ${\Sigma}$, the extended edit distance between X and Y can be computed using dynamic programming in O(mn) time and space. Recently, a parallel algorithm that takes O(m+n) time and O(mn) space using m threads to compute the extended edit distance between X and Y was presented. In this paper, we present an improved parallel algorithm using the shared memory on GPU. The experimental results show that our parallel algorithm runs about 19~25 times faster than the previous parallel algorithm.

Checkpoint/Resimulation Overhead Minimization with Sporadic Synchronization in Prediction-Based Parallel Logic Simulation (간헐적 동기화를 통한 예측기반 병렬 로직 시뮬레이션에서의 체크포인트/재실행 오버헤드 최소화)

  • Kwak, Doohwan;Yang, Seiyang
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.5
    • /
    • pp.147-152
    • /
    • 2015
  • In general, there are two synchronization methods in parallel event-driven simulation, pessimistic approach and optimistic approach. In this paper, we propose a new approach, sporadic synchronization combining both for prediction-based parallel event-driven logic simulation. We claim this hybrid solution is pretty effective to minimize both checkpoint overhead and restart overhead, which are related problems with frequent false predictions for improving the performance of the prediction-based parallel event-driven logic simulation. The experiment has clearly shown the advantage of the proposed approach.

Compiler Optimization for Parallelism and Locality Improvement (병렬성 및 지역성 증진을 위한 컴파일러 최적화)

  • Jim, Jin-Mi;Byeon, Seok-U;Pyo, Chang-U;Lee, Man-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.2
    • /
    • pp.307-314
    • /
    • 1999
  • In this paper, we study on the transformation technique of sequential programs for the purpose of 'exploiting parallelism' and 'improving locality'. Based on the analysis of loop procedures of sequential programs with the factor of dependency and locality, two transformation techniques of loop distribution and loop fusion are applied to them. Transformed programs can be easily expressed as a parallel program wit thread notation, having coarse-grain parallelism and improved locality. This means that those transformations can be useful tools for optimizing and automatic-parallelizing compiler construction. Application of those techniques to SPEC95 on a solaris machine with four SPARC processors show an improvement of execution time.

  • PDF

Causal Replay for Cyclic Debugging of MPI Parallel Programs (MPI 병렬 프로그램의 순환 디버깅을 위한 인과관계 재실행)

  • Hong, Cheol-Eui;Kim, Yeong-Joon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.9
    • /
    • pp.424-433
    • /
    • 2001
  • The cyclic debugging approach often fails for message passing parallel programs because they non-deterministic characteristics due to message race conditions. This paper identifies the MPI events that affect non-deterministic executions, and then converts the concurrent execution to the sequential one that is controlled in order to make it equivalent to a reference execution by keeping their orders of events in two executions identical. This paper also presents an efficient algorithm for the causal distributed breakpoint which is initiated by any sequential breakpoint in one process, and restores each process to the earliest state that reflects all events that happened causally before the sequential breakpoint. So a cyclic debugging approach can be used in debugging MPI parallel programs as like as in debugging sequential programming environments.

  • PDF

Parallel Modular Multiplication Algorithm to Improve Time and Space Complexity in Residue Number System (RNS상에서 시간 및 공간 복잡도 향상을 위한 병렬 모듈러 곱셈 알고리즘)

  • 박희주;김현성
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.9
    • /
    • pp.454-460
    • /
    • 2003
  • In this paper, we present a novel method of parallelization of the modular multiplication algorithm to improve time and space complexity on RNS (Residue Number System). The parallel algorithm executes modular reduction using new table lookup based reduction method. MRS (Mixed Radix number System) is used because algebraic comparison is difficult in RNS which has a non-weighted number representation. Conversion from residue number system to certain MRS is relatively fast in residue computer. Therefore magnitude comparison is easily Performed on MRS. By the analysis of the algorithm, it is known that it requires only 1/2 table size than previous approach. And it requires 0(ι) arithmetic operations using 2ㅣ processors.

Equivalent Design Parameter Determination for Effective Numerical Modeling of Pre-reinforced Zones in Tunnel (터널 사전보강 영역의 효과적 수치해석을 위한 등가 물성치 결정 기법)

  • Song, Ki-Il;Cho, Gye-Chun
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.8 no.2
    • /
    • pp.151-163
    • /
    • 2006
  • Although various methods for effective modeling of pre-reinforced zones have been suggested for numerical analysis of large section tunnels, tunnel designers refer to empirical cases and literature reviews rather than engineering methods because ones who use commercial programs are unfamiliar with a macro-scale approach in general. Therefore, this paper suggests a simple micro-scale approach combined with the macro-scale approach to determine equivalent design parameters for effective numerical modeling of pre-reinforced zones in tunnel. This new approach is to determine the equivalent stiffness of pre-reinforced zones with combination of ground, bulb, and steel in series or/and parallel. For verification, 3-D numerical results from the suggested approach are compared with those of a realistic model. The comparison suggests that two cases make best approximation to a realistic solution: One is related to the series-parallel stiffness system (hereafter SPSS) in which bulb and steel are coupled in parallel and then connected to the ground in series, and the other is the series stiffness system (hereafter SSS) in which only bulb and steel are coupled in series. The SPSS is recommended for stiffness calculation of pre-reinforced zones because the SSS is inconvenient and time-consuming. The SPSS provides slightly bigger vertical displacement at tunnel crown in weathered rock than other cases and give almost identical results to a realistic model for horizontal displacement at tunnel spring line and ground surface settlement. Displacement trends on weathered rock and weathered soil are similar. The SPSS which is suggested in this paper represents the behavior mechanism of pre-reinforced area effectively.

  • PDF

Optimal Interference Rejection Weight for Multistage Parallel Nulling-Partial PIC Receiver for MIMO MC-CDMA Systems (MIMO MC-CDMA 시스템을 위한 다단계 병렬 널링 및 부분 간섭 제거 수신기를 위한 최적 가중치 결정)

  • 구정회;김경연;심세준;이충용
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.41 no.11
    • /
    • pp.9-15
    • /
    • 2004
  • We propose optimal interference rejection weight for multistage parallel nulling (MPN) partial parallel interference cancellation (PPIC) receiver previously proposed to enhance the performance of V-BLAST for downlink multiple-input multiple-output (MIMO) multicarrier (MC)-code division multiple access (CDMA) systems. MPN-PPIC method proposed in [1] was based on the parallel interference cancellation (PIC) with fixed interference rejection weight obtained experimentally. However, the fixed weight can not be adapted to various systems efficiently, thus we proposed method for the optimal interference rejection weight based on the received signal to interference and noise ratio (SINR), and the performance of the proposed method was evaluated through computer simulation comparing with the previous method. We obtained performance gains of 2.5 dB ~ 5 dB for BER of 10$^{-3}$ .

Smartphone Real Time Streaming Service using Parallel TCP Transmission (병렬 TCP 통신을 이용한 스마트폰 실시간 스트리밍 서비스)

  • Kim, Jang-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.5
    • /
    • pp.937-941
    • /
    • 2016
  • This paper proposed an efficient multiple TCP mechanism using Android smartphones for remote control video Wi-Fi stream transmission via network communications in real time. The wireless video stream transmission mechanism can be applied in various area such as real time server stream transmissions, movable drones, disaster robotics and real time security monitoring systems. Moreover, we urgently need to transmit data in timely fashion such as medical emergency, security surveillance and disaster prevention. Our parallel TCP transmission system can play an important role in several area such as real time server stream transmissions, movable drones, disaster robotics and real time security monitoring systems as mentioned in the previous sentence. Therefore, we designed and implemented a parallel TCP transmission (parallel stream) for an efficient real time video streaming services. In conclusion, we evaluated proposed mechanism using parallel TCP transmission under various environments with performance analysis.

A Novel VLSI Architecture for Parallel Adaptive Dictionary-Base Text Compression (가변 적응형 사전을 이용한 텍스트 압축방식의 병렬 처리를 위한 VLSI 구조)

  • Lee, Yong-Doo;Kim, Hie-Cheol;Kim, Jung-Gyu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.6
    • /
    • pp.1495-1507
    • /
    • 1997
  • Among a number of approaches to text compression, adaptive dictionary schemes based on a sliding window have been very frequently used due to their high performance. The LZ77 algorithm is the most efficient algorithm which implements such adaptive schemes for the practical use of text compression. This paperpresents a VLSI architecture designed for processing the LZ77 algorithm in parallel. Compared with the other VLSI architectures developed so far, the proposed architecture provides the more viable solution to high performance with regard to its throughput, efficient implementation of the VLSI systolic arrays, and hardware scalability. Indeed, without being affected by the size of the sliding window, our system has the complexity of O(N) for both the compression and decompression and also requires small wafer area, where N is the size of the input text.

  • PDF