• 제목/요약/키워드: Parallel Algorithms

검색결과 655건 처리시간 0.029초

GPU을 이용한 다중 고정 길이 패턴을 갖는 DNA 시퀀스에 대한 k-Mismatches에 의한 근사적 병열 스트링 매칭 (Parallel Approximate String Matching with k-Mismatches for Multiple Fixed-Length Patterns in DNA Sequences on Graphics Processing Units)

  • 호 티엔 루안;김현진;오승록
    • 전기학회논문지
    • /
    • 제66권6호
    • /
    • pp.955-961
    • /
    • 2017
  • In this paper, we propose a parallel approximate string matching algorithm with k-mismatches for multiple fixed-length patterns (PMASM) in DNA sequences. PMASM is developed from parallel single pattern approximate string matching algorithms to effectively calculate the Hamming distances for multiple patterns with a fixed-length. In the preprocessing phase of PMASM, all target patterns are binary encoded and stored into a look-up memory. With each input character from the input string, the Hamming distances between a substring and all patterns can be updated at the same time based on the binary encoding information in the look-up memory. Moreover, PMASM adopts graphics processing units (GPUs) to process the data computations in parallel. This paper presents three kinds of PMASM implementation methods in GPUs: thread PMASM, block-thread PMASM, and shared-mem PMASM methods. The shared-mem PMASM method gives an example to effectively make use of the GPU parallel capacity. Moreover, it also exploits special features of the CUDA (Compute Unified Device Architecture) memory structure to optimize the performance. In the experiments with DNA sequences, the proposed PMASM on GPU is 385, 77, and 64 times faster than the traditional naive algorithm, the shift-add algorithm and the single thread PMASM implementation on CPU. With the same NVIDIA GPU model, the performance of the proposed approach is enhanced up to 44% and 21%, compared with the naive, and the shift-add algorithms.

DTG의 性質을 갖는 高速竝列多値論理回路의 設計에 관한 硏究 (A Study on the Highly Parallel Multiple-Valued Logic Circuit Design with DTG Properties)

  • 나기수;신부식;최재석;박춘명;김흥수
    • 전자공학회논문지C
    • /
    • 제36C권6호
    • /
    • pp.27-36
    • /
    • 1999
  • 본 논문에서는 입출력간의 연관관계가 트리구조로 표현되는 DTG에 의한 고속병렬다치논리회로를 설계하는 알고리즘을 제안하였다. 본 논문에서는 Nakajima 등에 의해 제안된 알고리즘의 문제점을 도출한 후, 최적화된 분할연산회로설계를 위하여 트리구조에 기초를 둔 수학적인 해석의 개념을 소개한다. 본 논문에서 제안한 알고리즘은 Nakajima 등에 의해 제안된 알고리즘으로는 설계가 가능하지 않았던 임의의 절점을 갖는 DTG에 대해서도 회로를 설계할 수 있다는 장점이 있다. Nakajima 등에 의해 제안된 알고리즘과 본 논문에서 제한한 알고리즘을 회로설계의 관점에서 비교하여 본 논문의 알고리즘이 모든 경우의 DTG에서 보다 최적화 설계를 할 수 있음을 증명하였다. 그리고 예제를 통해 본 논문에서 제안한 알고리즘의 유용성을 증명해 보였다.

  • PDF

빅데이터 분석을 위한 슈퍼컴퓨터 환경에서 R의 병렬처리 (Parallel Computing Environment for R with on Supercomputer Systems)

  • 이상열;원중호
    • 한국경영과학회지
    • /
    • 제39권4호
    • /
    • pp.19-31
    • /
    • 2014
  • We study parallel processing techniques for the R programming language of high performance computing technology. In this study, we used massively parallel computing system which has 25,408 cpu cores. We conducted a performance evaluation of a distributed memory system using MPI and of a the shared memory system using OpenMP. Our findings are summarized as follows. First, For some particular algorithms, parallel processing is about 150 times faster than serial processing in R. Second, the distributed memory system gets faster as the number of nodes increases while shared memory system is limited in the improvement of performance, due to the limit of the number of cpus in a single system.

연료전지 발전용 풀-브리지 컨버터의 병렬제어 (A Parallel Control of Full-bridge Converter for Fuel Cell Generation)

  • 나재형;장수진;박찬흥;원충연;이병국
    • 한국조명전기설비학회:학술대회논문집
    • /
    • 한국조명전기설비학회 2007년도 춘계학술대회 논문집
    • /
    • pp.235-240
    • /
    • 2007
  • A large power fuel cell generation system needs a parallel operation of de-de boost converter. Therefore, this paper proposed parallel operation algorithms of de-de boost converters for the large scale fuel cell generation system of 250[kW] and the operating principle along with the control method in detail. This paper uses a maximum current sharing method as a parallel operation method and also the phase shift full bridge de-de converter as a de-de boost converter. Simulation and experimental results on two prototype converter modules of 500W show that the parallel operation method can be applied to the 250[kW] power converter.

  • PDF

Low-Complexity Triple-Error-Correcting Parallel BCH Decoder

  • Yeon, Jaewoong;Yang, Seung-Jun;Kim, Cheolho;Lee, Hanho
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • 제13권5호
    • /
    • pp.465-472
    • /
    • 2013
  • This paper presents a low-complexity triple-error-correcting parallel Bose-Chaudhuri-Hocquenghem (BCH) decoder architecture and its efficient design techniques. A novel modified step-by-step (m-SBS) decoding algorithm, which significantly reduces computational complexity, is proposed for the parallel BCH decoder. In addition, a determinant calculator and a error locator are proposed to reduce hardware complexity. Specifically, a sharing syndrome factor calculator and a self-error detection scheme are proposed. The multi-channel multi-parallel BCH decoder using the proposed m-SBS algorithm and design techniques have considerably less hardware complexity and latency than those using a conventional algorithms. For a 16-channel 4-parallel (1020, 990) BCH decoder over GF($2^{12}$), the proposed design can lead to a reduction in complexity of at least 23 % compared to conventional architecttures.

Efficient m-step Generalization of Iterative Methods

  • 김선경
    • 한국산업정보학회논문지
    • /
    • 제11권5호
    • /
    • pp.163-169
    • /
    • 2006
  • In order to use parallel computers in specific applications, algorithms need to be developed and mapped onto parallel computer architectures. Main memory access for shared memory system or global communication in message passing system deteriorate the computation speed. In this paper, it is found that the m-step generalization of the block Lanczos method enhances parallel properties by forming in simultaneous search direction vector blocks. QR factorization, which lowers the speed on parallel computers, is not necessary in the m-step block Lanczos method. The m-step method has the minimized synchronization points, which resulted in the minimized global communications and main memory access compared to the standard methods.

  • PDF

MPI를 이용한 PSC 프레임 비선형해석 프로그램의 병렬화 (Parallel Implementation of Nonlinear Analysis Program of PSC Frame Using MPI)

  • 이재석;최규천
    • 한국전산구조공학회:학술대회논문집
    • /
    • 한국전산구조공학회 2001년도 봄 학술발표회 논문집
    • /
    • pp.61-68
    • /
    • 2001
  • A parallel nonlinear analysis program of prestressed concrete frame is migrated on a PC cluster system and a massively parallel processing system, CRAY T3E system, using MPI. The PC cluster system is configured with Pentium Ⅲ class PCs and fast ethernet. The CRAY T3E system is composed of a set of nodes each containing one Processing Element (PE), a memory subsystem and its distributed memory interconnect network. Parallel computing algorithms are implemented on element-wise processing parts including the calculation of stiffness matrix, element stresses and determination of material states, check of material failure and calculation of unbalanced loads. Parallel performance of the migrated program is evaluated through typical numerical examples.

  • PDF

순회 판매원 문제를 위한 하이브리드 병렬 유전자 알고리즘 (Hybrid Parallel Genetic Algorithm for Traveling Salesman Problem)

  • 김기태;전건욱
    • 대한안전경영과학회지
    • /
    • 제13권3호
    • /
    • pp.107-114
    • /
    • 2011
  • Traveling salesman problem is to minimize the total cost for a traveling salesman who wants to make a tour given finite number of cities along with the cost of travel between each pair them, visiting each cities exactly once before returning home. Traveling salesman problem is known to be NP-hard, and it needs a lot of computing time to get the optimal solution, so that heuristics are more frequently developed than optimal algorithms. This study suggests a hybrid parallel genetic algorithm(HPGA) for traveling salesman problem The suggested algorithm combines parallel genetic algorithm, nearest neighbor search, and 2-opt. The suggested algorithm has been tested on 7 problems in TSPLIB and compared the results of existing methods(heuristics, meta-heuristics, hybrid, and parallel). Experimental results shows that HPGA could obtain good solution in total travel distance minimization.

다른 정격용량을 가진 3상 UPS 시스템의 병렬운전을 위한 주종제어 기법 (A Master and Slave Control Strategy for Parallel Operation of Three-Phase UPS Systems with Different Ratings)

  • 이우철;현동석
    • 전력전자학회논문지
    • /
    • 제9권4호
    • /
    • pp.341-349
    • /
    • 2004
  • 본 논문에서는 전체 시스템의 용량 증대와 부하 중요도에 따른 신뢰성 향상을 위해 사용되는 UPS 병렬 운전시 발생하는 문제점을 해석하고, 이를 해결하기 위해 기존에 연구되어온 다양한 제어기법들을 조사하였다. 또한 기존의 연구방법에서 적용할 수 없었던 다른 정격의 병력 운전 적용시 L, C 출력필터에 의한 문제점을 해결하기 위해 주종 제어를 이용한 3상 UPS의 전류 분달 제어기법이 연구되었다. 제안된 제어 알고리즘의 타당성을 검증하기 위해 시뮬레이션 및 실험을 수행하였다.

고속 Turbo Product 부호 복호 알고리즘 및 구현에 관한 연구 (High Speed Turbo Product Code Decoding Algorithm)

  • 최덕군;이인기;정지원
    • 한국통신학회논문지
    • /
    • 제30권6C호
    • /
    • pp.442-449
    • /
    • 2005
  • 최근 터보 부호에 비해서 구현시 복잡하지 않고, 높은 부호화율에서 거의 샤논 이론에 접근하는 Turbo Product Code(TPC)에 대해 관심이 고조되고 있다. 본 논문에서는 초고속 통신 시스템에 적용하기 위한 고속 TPC 복호를 위한 세가지의 알고리즘을 제안하는 바이다. 첫째로, 기존의 Turbo Product code 복호기에서 row과 column을 직렬로 복호를 하지 않고 복호 구조가 병렬로 동작하는 Turbo Product code 복호기를 제안한다. 둘째로 반복 중지 알고리즘을 제안하고 마지막으로, P-Parallel 알고리즘을 통해 P rows와 P columns을 병렬로 처리하여 복호한다. 모의 실험을 한 결과 기존의 방식에 비해 복호 지연이 줄어들고 성능면에서 직렬 방식과 거의 비슷한 성능이 나타난다. 또한 고속알고리즘을 바탕으로 VHDL모델링을 하였으며, 이를 timing 시뮬레이션 하여 메모리 요구량 및 복호 속도 향상도를 분석하였다.