• 제목/요약/키워드: parallel algorithm

검색결과 2,007건 처리시간 0.028초

Parallel Prefix Computation and Sorting on a Recursive Dual-Net

  • Li, Yamin;Peng, Shietung;Chu, Wanming
    • Journal of Information Processing Systems
    • /
    • 제7권2호
    • /
    • pp.271-286
    • /
    • 2011
  • In this paper, we propose efficient algorithms for parallel prefix computation and sorting on a recursive dual-net. The recursive dual-net $RDN^k$(B) for k > 0 has $(2n_o)^{2K}/2$ nodes and $d_0$ + k links per node, where $n_0$ and $d_0$ are the number of nod es and the node-degree of the base-network B, respectively. Assume that each node holds one data item, the communication and computation time complexities of the algorithm for parallel prefix computation on $RDN^k$(B), k > 0, are $2^{k+1}-2+2^kT_{comm}(0)$ and $2^{k+1}-2+2^kT_{comp}(0)$, respectively, where $T_{comm}(0)$ and $T_{comp}(0)$ are the communication and computation time complexities of the algorithm for parallel prefix computation on the base-network B, respectively. The algorithm for parallel sorting on $RDN^k$(B) is restricted on B = $Q_m$ where $Q_m$ is an m-cube. Assume that each node holds a single data item, the sorting algorithm runs in $O((m2^k)^2)$ computation steps and $O((km2^k)^2)$ communication steps, respectively.

병렬계산을 위한 부하분산 알고리즘의 병렬화 (Parallelization of A Load balancing Algorithm for Parallel Computations)

  • In-Jae Hwang
    • 융합신호처리학회논문지
    • /
    • 제5권3호
    • /
    • pp.236-242
    • /
    • 2004
  • 본 논문에서는 병렬프로그램을 효율적으로 수행하는데 필수적인 부하분산을 위한 기존 알고리즘의 부하분산 오버헤드를 최소화하기 위하여 이 알고리즘의 병렬화 방법을 제시한다. 병렬계산 모델로는 동적으로 변하는 트리구조를 들었으며 이러한 계산은 많은 응용분야에서 찾아볼 수 있다. 부하분산 알고리즘은 통신비용을 정해진 한도 이내로 유지하면서 프로세서간 계산부하를 최대한 균등하게 분산시키고자 시도한다. 이 알고리즘이 메쉬와 하이퍼큐브 구조에서 어떻게 병렬화 될 수 있는가를 상세히 보이고 각각의 경우에 대하여 시간상 복잡도를 분석하여 기존의 알고리즘보다 여러가지 오버헤드가 개선되었음을 증명한다.

  • PDF

FPGA 상에서 OpenCL을 이용한 병렬 문자열 매칭 구현과 최적화 방향 (Parallel String Matching and Optimization Using OpenCL on FPGA)

  • 윤진명;최강일;김현진
    • 전기학회논문지
    • /
    • 제66권1호
    • /
    • pp.100-106
    • /
    • 2017
  • In this paper, we propose a parallel optimization method of Aho-Corasick (AC) algorithm and Parallel Failureless Aho-Corasick (PFAC) algorithm using Open Computing Language (OpenCL) on Field Programmable Gate Array (FPGA). The low throughput of string matching engine causes the performance degradation of network process. Recently, many researchers have studied the string matching engine using parallel computing. FPGA's vendors offer a parallel computing platform using OpenCL. In this paper, we apply the AC and PFAC algorithm on DE1-SoC board with Cyclone V FPGA, where the optimization that considers FPGA architecture is performed. Experiments are performed considering global id, local id, local memory, and loop unrolling optimizations using PFAC algorithm. The performance improvement using loop unrolling is 129 times greater than AC algorithm that not adopt loop unrolling. The performance improvements using loop unrolling are 1.1, 0.2, and 1.5 times greater than those using global id, local id, and local memory optimizations mentioned above.

ON A SECOND ORDER PARALLEL VARIABLE TRANSFORMATION APPROACH

  • Pang, Li-Ping;Xia, Zun-Quan;Zhang, Li-Wei
    • Journal of applied mathematics & informatics
    • /
    • 제11권1_2호
    • /
    • pp.201-213
    • /
    • 2003
  • In this paper we present a second order PVT (parallel variable transformation) algorithm converging to second order stationary points for minimizing smooth functions, based on the first order PVT algorithm due to Fukushima (1998). The corresponding stopping criterion, descent condition and descent step for the second order PVT algorithm are given.

전력 조류 계산의 분산 병렬처리기법에 관한 연구 (A Development of Distributed Parallel Processing algorithm for Power Flow analysis)

  • 이춘모;이해기
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2001년도 학술대회 논문집 전문대학교육위원
    • /
    • pp.134-140
    • /
    • 2001
  • Parallel processing has the potential to be cost effectively used on computationally intense power system problems. But this technology is not still available is not only parallel computer but also parallel processing scheme. Testing these algorithms to ensure accuracy, and evaluation of their performance is also an issue. Although a significant amount of parallel algorithms of power system problem have been developed in last decade, actual testing on processor architectures lies in the beginning stages. This paper presents the parallel processing algorithm to supply the base being able to treat power flow by newton's method by the distributed memory type parallel computer. This method is to assign and to compute teared blocks of sparse matrix at each parallel processors. The testing to insure accuracy of developed method have been done on serial computer by trying to simulate a parallel environment.

  • PDF

문화재 검색을 위한 병렬처리기 구조 (A Parallel Processor System for Cultural Assets Image Retrieval)

  • 윤희준;이형;한기선;박종원
    • 한국멀티미디어학회논문지
    • /
    • 제1권2호
    • /
    • pp.154-161
    • /
    • 1998
  • 본 연구에서는 영상 데이터를 실시간으로 처리하기 위해 병렬처리기 및 병렬 기억장치 구조를 제안하였으며, 많은 영상 데이터 중에서 문화재 영상을 대상으로 하였다. 기존의 영상 인식 및 검색 알고리즘은 병렬화하기에 적합하지 않아서 병렬화 가능한 알고리즘을 제안하였고, 제안된 알고리즘을 부분적으로 병렬화하고, 적합한 병렬 기억장치 및 병렬처리기 구조를 제안한 다음 CADENCE사의 모의실험 패키지인 Verilog-XL을 이용해서 모의실험 하였다. 그 결과 81배의 속도향상을 볼 수 있었다.

  • PDF

비 압축 블록으로 구성된 제어 헤더 삽입을 통한 압축 해제 호환성 있는 병렬 처리 Deflate 알고리즘 제안 (Proposal for Decoding-Compatible Parallel Deflate Algorithm by Inserting Control Header Composed of Non-Compressed Blocks)

  • 김정훈
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제12권5호
    • /
    • pp.207-216
    • /
    • 2023
  • 본 연구에서는 압축 해제 호환성을 갖춘 병렬 처리 Deflate 압축 알고리즘을 구현하기 위하여 병렬 압축 및 압축 해제에 필수적인 정보를 복수의 비 압축 블록(Non-Compression Block)내의 버려지는 영역(Disposed Bit Area)에 저장하는 방식으로 구성한 컨트롤 헤더를 삽입하는 새로운 방식을 제안하였다. 이를 통해 기존 압축 해제 프로그램과 완벽한 호환성을 유지하면서도 병렬 압축 및 병렬 압축 해제가 가능하도록 하였다. 또한 순차 처리방식 대비 압축 시간을 최대 71.2% 절감하였고 병렬 압축해제 시간을 65.7%까지 절감하였다. 특히 Deflate 알고리즘의 구조적 제약으로 인해 병렬 압축 해제는 불가능하다고 알려져 있으나, 제안하는 방식을 탑재한 디코더로 알고리즘 수준에서 고속의 병렬 압축 해제가 가능하고, 호환성을 유지하여 동일한 압축 데이터를 기존의 압축 해제 프로그램으로도 정상적 압축 해제가 가능함을 확인하였다.

분산형 FP트리를 활용한 병렬 데이터 마이닝 (Parallel Data Mining with Distributed Frequent Pattern Trees)

  • 조두산;김동승
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 하계종합학술대회 논문집 V
    • /
    • pp.2561-2564
    • /
    • 2003
  • Data mining is an effective method of the discovery of useful information such as rules and previously unknown patterns existing in large databases. The discovery of association rules is an important data mining problem. We have developed a new parallel mining called Distributed Frequent Pattern Tree (abbreviated by DFPT) algorithm on a distributed shared nothing parallel system to detect association rules. DFPT algorithm is devised for parallel execution of the FP-growth algorithm. It needs only two full disk data scanning of the database by eliminating the need for generating the candidate items. We have achieved good workload balancing throughout the mining process by distributing the work equally to all processors. We implemented the algorithm on a PC cluster system, and observed that the algorithm outperformed the Improved Count Distribution scheme.

  • PDF

모듈형 3상 무정전 전원장치의 병렬 운전을 위한 주종 제어 알고리즘 (A Master and Slave Control Algorithm for Parallel Operation of Modular 3-Phase UPS System)

  • 이태영;조영훈;임승범;안창헌
    • 전력전자학회:학술대회논문집
    • /
    • 전력전자학회 2016년도 전력전자학술대회 논문집
    • /
    • pp.479-480
    • /
    • 2016
  • This paper introduces a master and slave control algorithm for parallel operation of UPS system. If each module of UPS system control the output voltage and filter inductor current in parallel operation, it occur unbalanced output power each module. To operate UPS system parallel, it need a algorithm that control output power of modules. A master and slave control algorithm is helpful to balance output power of modules by controlling output current. The effect of a master and slave control algorithm is proved by simulations.

  • PDF

세포 외곽선 추출 알고리즘의 병렬화 (Parallelization of Cell Contour Line Extraction Algorithm)

  • 이호석;유숙현;권희용
    • 한국멀티미디어학회논문지
    • /
    • 제18권10호
    • /
    • pp.1180-1188
    • /
    • 2015
  • In this paper, a parallel cell contour line extraction algorithm using CUDA, which has no inner contour lines, is proposed. The contour of a cell is very important in a cell image analysis. It could be obtained by a conventional serial contour tracing algorithm or parallel morphology operation. However, the cell image has various damages in acquisition or dyeing process. They could be turn into several inner contours, which make a cell image analysis difficult. The proposed algorithm introduces a min-max coordinates table into each CUDA thread block, and removes the inner contour in parallel. It is 4.1 to 7.6 times faster than a conventional serial contour tracing algorithm.