• Title/Summary/Keyword: parallel algorithms

Search Result 655, Processing Time 0.03 seconds

Parallel Approximate String Matching with k-Mismatches for Multiple Fixed-Length Patterns in DNA Sequences on Graphics Processing Units (GPU을 이용한 다중 고정 길이 패턴을 갖는 DNA 시퀀스에 대한 k-Mismatches에 의한 근사적 병열 스트링 매칭)

  • Ho, ThienLuan;Kim, HyunJin;Oh, SeungRohk
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.6
    • /
    • pp.955-961
    • /
    • 2017
  • In this paper, we propose a parallel approximate string matching algorithm with k-mismatches for multiple fixed-length patterns (PMASM) in DNA sequences. PMASM is developed from parallel single pattern approximate string matching algorithms to effectively calculate the Hamming distances for multiple patterns with a fixed-length. In the preprocessing phase of PMASM, all target patterns are binary encoded and stored into a look-up memory. With each input character from the input string, the Hamming distances between a substring and all patterns can be updated at the same time based on the binary encoding information in the look-up memory. Moreover, PMASM adopts graphics processing units (GPUs) to process the data computations in parallel. This paper presents three kinds of PMASM implementation methods in GPUs: thread PMASM, block-thread PMASM, and shared-mem PMASM methods. The shared-mem PMASM method gives an example to effectively make use of the GPU parallel capacity. Moreover, it also exploits special features of the CUDA (Compute Unified Device Architecture) memory structure to optimize the performance. In the experiments with DNA sequences, the proposed PMASM on GPU is 385, 77, and 64 times faster than the traditional naive algorithm, the shift-add algorithm and the single thread PMASM implementation on CPU. With the same NVIDIA GPU model, the performance of the proposed approach is enhanced up to 44% and 21%, compared with the naive, and the shift-add algorithms.

A Study on the Highly Parallel Multiple-Valued Logic Circuit Design with DTG Properties (DTG의 性質을 갖는 高速竝列多値論理回路의 設計에 관한 硏究)

  • Na, Gi-Su;Shin, Boo-Sik;Choi, Jai-Sok;Park, Chun-Myoung;Kim, Heung-Soo
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.6
    • /
    • pp.27-36
    • /
    • 1999
  • This paper proposes algorithms that design the highly parallel multiple-valued logic circuit of DTG(Directed Tree Graph) to be represented by tree structure relationship between input and output of nodes. The conventional Nakajima's algorithms have some problems so that this paper introduce the concept of mathematical analysis based on tree structure to design optimized locally computable circuit. Using the proposed circuit design algorithms in this paper it is possible to design circuit in that DTG have any node number - not to design by Nakajima's algorithms. Also, making a comparison between the circuit design using Nakajim's algorithms and this paper's, we testify that proposed algorithms in this paper optimizes circuit design all case of DTG. Some examples are shown to demonstrate the usefulness of the circuit design algorithm.

  • PDF

Parallel Computing Environment for R with on Supercomputer Systems (빅데이터 분석을 위한 슈퍼컴퓨터 환경에서 R의 병렬처리)

  • Lee, Sang Yeol;Won, Joong Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.39 no.4
    • /
    • pp.19-31
    • /
    • 2014
  • We study parallel processing techniques for the R programming language of high performance computing technology. In this study, we used massively parallel computing system which has 25,408 cpu cores. We conducted a performance evaluation of a distributed memory system using MPI and of a the shared memory system using OpenMP. Our findings are summarized as follows. First, For some particular algorithms, parallel processing is about 150 times faster than serial processing in R. Second, the distributed memory system gets faster as the number of nodes increases while shared memory system is limited in the improvement of performance, due to the limit of the number of cpus in a single system.

A Parallel Control of Full-bridge Converter for Fuel Cell Generation (연료전지 발전용 풀-브리지 컨버터의 병렬제어)

  • Na, Jae-Hyeong;Jang, Su-Jin;Park, Chan-Heung;Won, Chung-Yuen;Lee, Byoung-Kuk
    • Proceedings of the Korean Institute of IIIuminating and Electrical Installation Engineers Conference
    • /
    • 2007.05a
    • /
    • pp.235-240
    • /
    • 2007
  • A large power fuel cell generation system needs a parallel operation of de-de boost converter. Therefore, this paper proposed parallel operation algorithms of de-de boost converters for the large scale fuel cell generation system of 250[kW] and the operating principle along with the control method in detail. This paper uses a maximum current sharing method as a parallel operation method and also the phase shift full bridge de-de converter as a de-de boost converter. Simulation and experimental results on two prototype converter modules of 500W show that the parallel operation method can be applied to the 250[kW] power converter.

  • PDF

Low-Complexity Triple-Error-Correcting Parallel BCH Decoder

  • Yeon, Jaewoong;Yang, Seung-Jun;Kim, Cheolho;Lee, Hanho
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.13 no.5
    • /
    • pp.465-472
    • /
    • 2013
  • This paper presents a low-complexity triple-error-correcting parallel Bose-Chaudhuri-Hocquenghem (BCH) decoder architecture and its efficient design techniques. A novel modified step-by-step (m-SBS) decoding algorithm, which significantly reduces computational complexity, is proposed for the parallel BCH decoder. In addition, a determinant calculator and a error locator are proposed to reduce hardware complexity. Specifically, a sharing syndrome factor calculator and a self-error detection scheme are proposed. The multi-channel multi-parallel BCH decoder using the proposed m-SBS algorithm and design techniques have considerably less hardware complexity and latency than those using a conventional algorithms. For a 16-channel 4-parallel (1020, 990) BCH decoder over GF($2^{12}$), the proposed design can lead to a reduction in complexity of at least 23 % compared to conventional architecttures.

Efficient m-step Generalization of Iterative Methods

  • Kim, Sun-Kyung
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.11 no.5
    • /
    • pp.163-169
    • /
    • 2006
  • In order to use parallel computers in specific applications, algorithms need to be developed and mapped onto parallel computer architectures. Main memory access for shared memory system or global communication in message passing system deteriorate the computation speed. In this paper, it is found that the m-step generalization of the block Lanczos method enhances parallel properties by forming in simultaneous search direction vector blocks. QR factorization, which lowers the speed on parallel computers, is not necessary in the m-step block Lanczos method. The m-step method has the minimized synchronization points, which resulted in the minimized global communications and main memory access compared to the standard methods.

  • PDF

Parallel Implementation of Nonlinear Analysis Program of PSC Frame Using MPI (MPI를 이용한 PSC 프레임 비선형해석 프로그램의 병렬화)

  • 이재석;최규천
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 2001.04a
    • /
    • pp.61-68
    • /
    • 2001
  • A parallel nonlinear analysis program of prestressed concrete frame is migrated on a PC cluster system and a massively parallel processing system, CRAY T3E system, using MPI. The PC cluster system is configured with Pentium Ⅲ class PCs and fast ethernet. The CRAY T3E system is composed of a set of nodes each containing one Processing Element (PE), a memory subsystem and its distributed memory interconnect network. Parallel computing algorithms are implemented on element-wise processing parts including the calculation of stiffness matrix, element stresses and determination of material states, check of material failure and calculation of unbalanced loads. Parallel performance of the migrated program is evaluated through typical numerical examples.

  • PDF

Hybrid Parallel Genetic Algorithm for Traveling Salesman Problem (순회 판매원 문제를 위한 하이브리드 병렬 유전자 알고리즘)

  • Kim, Ki-Tae;Jeo, Geon-Wook
    • Journal of the Korea Safety Management & Science
    • /
    • v.13 no.3
    • /
    • pp.107-114
    • /
    • 2011
  • Traveling salesman problem is to minimize the total cost for a traveling salesman who wants to make a tour given finite number of cities along with the cost of travel between each pair them, visiting each cities exactly once before returning home. Traveling salesman problem is known to be NP-hard, and it needs a lot of computing time to get the optimal solution, so that heuristics are more frequently developed than optimal algorithms. This study suggests a hybrid parallel genetic algorithm(HPGA) for traveling salesman problem The suggested algorithm combines parallel genetic algorithm, nearest neighbor search, and 2-opt. The suggested algorithm has been tested on 7 problems in TSPLIB and compared the results of existing methods(heuristics, meta-heuristics, hybrid, and parallel). Experimental results shows that HPGA could obtain good solution in total travel distance minimization.

A Master and Slave Control Strategy for Parallel Operation of Three-Phase UPS Systems with Different Ratings (다른 정격용량을 가진 3상 UPS 시스템의 병렬운전을 위한 주종제어 기법)

  • 이우철;현동석
    • The Transactions of the Korean Institute of Power Electronics
    • /
    • v.9 no.4
    • /
    • pp.341-349
    • /
    • 2004
  • A parallel operation of Uninterruptible Power Supply(UPS) systems is used to increase power capacity of the system or to secure higher reliability at critical loads. In the conventional parallel operation, the load-sharing control to maintain the current balance is the most important, since the load-sharing is very sensitive to discord between components of each module, amplitude/phase difference, line impedance, output LC filter, and so on. To solve these problems various control algorithms are researching. However, these methods cannot apply to the different ratings of UPS. In the case, master and slave control algorithm for parallel operation is adequate. However, if the UPS ratings are different, the value of passive filters L, C is different, and it affects the sharing of current. This paper presents general problems of conventional parallel operation systems, and control strategy for parallel operation with different ratings. The validity of the proposed control strategy is investigated through simulation and experiment in the parallel operation system with two 3-phase UPS systems.

High Speed Turbo Product Code Decoding Algorithm (고속 Turbo Product 부호 복호 알고리즘 및 구현에 관한 연구)

  • Choi Duk-Gun;Lee In-Ki;Jung Ji-Won
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.6C
    • /
    • pp.442-449
    • /
    • 2005
  • In this paper, we introduce three kinds of simplified high-speed decoding algorithms for turbo product decoder. First, A parallel decoder structure, the row and column decoders operate in parallel, is proposed. Second, HAD(Hard Decision Aided) algorithm is used for early-stopping algorithm. Lastly, P-Parallel TPC decoder is a parallel decoding scheme, processing P rows and P columns in parallel instead of decoding one by one as that in the original scheme.