• Title/Summary/Keyword: 병렬 알고리즘

Search Result 1,326, Processing Time 0.029 seconds

Design of Parallel Processing of Lane Detection System Based on Multi-core Processor (멀티코어를 이용한 차선 검출 병렬화 시스템 설계)

  • Lee, Hyo-Chan;Moon, Dai-Tchul;Park, In-hag;Heo, Kang
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.9
    • /
    • pp.1778-1784
    • /
    • 2016
  • we improved the performance by parallelizing lane detection algorithms. Lane detection, as a intellectual assisting system, helps drivers make an alarm sound or revise the handle in response of lane departure. Four kinds of algorithms are implemented in order as following, Gaussian filtering algorithm so as to remove the interferences, gray conversion algorithm to simplify images, sobel edge detection algorithm to find out the regions of lanes, and hough transform algorithm to detect straight lines. Among parallelized methods, the data level parallelism algorithm is easy to design, yet still problem with the bottleneck. The high-speed data level parallelism is suggested to reduce this bottleneck, which resulted in noticeable performance improvement. In the result of applying actual road video of black-box on our parallel algorithm, the measurement, in the case of single-core, is approximately 30 Frames/sec. Furthermore, in the case of octa-core parallelism, the data level performance is approximately 100 Frames/sec and the highest performance comes close to 150 Frames/sec.

Parallel Computation for Extended Edit Distances Using the Shared Memory on GPU (GPU의 공유메모리를 활용한 확장편집거리 병렬계산)

  • Kim, Youngho;Na, Joong Chae;Sim, Jeong Seop
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.7
    • /
    • pp.213-218
    • /
    • 2015
  • Given two strings X and Y (|X|=m, |Y|=n) over an alphabet ${\Sigma}$, the extended edit distance between X and Y can be computed using dynamic programming in O(mn) time and space. Recently, a parallel algorithm that takes O(m+n) time and O(mn) space using m threads to compute the extended edit distance between X and Y was presented. In this paper, we present an improved parallel algorithm using the shared memory on GPU. The experimental results show that our parallel algorithm runs about 19~25 times faster than the previous parallel algorithm.

Join Operation of Parallel Database System with Large Main Memory (대용량 메모리를 가진 병렬 데이터베이스 시스템의 조인 연산)

  • Park, Young-Kyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.3
    • /
    • pp.51-58
    • /
    • 2007
  • The shared-nothing multiprocessor architecture has advantages in scalability, this architecture has been adopted in many multiprocessor database system. But, if the data are not uniformly distributed across the processors, load will be unbalanced. Therefore, the whole system performance will deteriorate. This is the data skew problem, which usually occurs in processing parallel hash join. Balancing the load before performing join will resolve this problem efficiently and the whole system performance can be improved. In this paper, we will present an algorithm using merit of very large memory to reduce disk access overhead in performing load balancing and to efficiently solve the data skew problem. Also, we will present analytical model of our new algorithm and present the result of some performance study we made comparing our algorithm with the other algorithms in handling data skew.

  • PDF

Load Balancing Using Mean-Field Annealing and Genetic Algorithms in Parallel Processing (병렬처리에서 평균장 어닐링과 유전자 알고리즘을 이용한 부하균형)

  • 홍철의;박경모
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.364-366
    • /
    • 2003
  • 본 논문에서는 병렬처리에서 중요한 부하균형 문제에 대한 새로운 솔루션을 소개한다. 제안하는 매핑 알고리즘은 평균장 어닐링과 유전자 알고리즘을 합성한 휴리스틱 부하균형 기법이다. 합성된 알고리즘을 세 개의 다른 알고리즘들과의 성능향상비를 측정하는 성능평가 시뮬레이션을 개발하였고 솔루션 품질과 수행시간 면에서 우지의 방법은 기존의 것들 보다 개선된 실험결과를 얻었다.

  • PDF

Implementation of Parallel Volume Rendering Using the Sequential Shear-Warp Algorithm (순차 Shear-Warp 알고리즘을 이용한 병렬볼륨렌더링의 구현)

  • Kim, Eung-Kon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1620-1632
    • /
    • 1998
  • This paper presents a fast parallel algorithm for volume rendering and its implementation using C language and MPI MasPar Programming Language) on the 4,096 processor MasPar MP-2 machine. This parallel algorithm is a parallelization hased on the Lacroute' s sequential shear - warp algorithm currently acknowledged to be the fastest sequential volume rendering algorithm. This algorithm reduces communication overheads by using the sheared space partition scheme and the load balancing technique using load estimates from the previous iteration, and the number of voxels to be processed by using the run-length encoded volume data structure.Actual performance is 3 to 4 frames/second on the human hrain scan dataset of $128\times128\times128$ voxels. Because of the scalability of this algorithm, performance of ]2-16 frames/sc.'cond is expected on the 16,384 processor MasPar MP-2 machine. It is expected that implementation on more current SIMD or MIMD architectures would provide 3O~60 frames/second on large volumes.

  • PDF

Construction of a CPU Cluster and Implementation of a 3-D Domain Decomposition Parallel FDTD Algorithm (CPU 클러스터 구축 및 3차원 공간분할 병렬 FDTD 알고리즘 구현)

  • Park, Sungmin;Chu, Kwang-Uk;Ju, Saehoon;Park, Yoon-Mi;Kim, Ki-Baek;Jung, Kyung-Young
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.25 no.3
    • /
    • pp.357-364
    • /
    • 2014
  • In this work, we construct a CPU cluster to implement a parallel finite-difference time domain(FDTD) algorithm for fast electromagnetic analyses. This parallel FDTD algorithm can reduce the computational time significantly and also analyze electrically larger structures, compared to a single FDTD counterpart. The parallel FDTD algorithm needs communication between neighboring processors, which is performed by the MPI(Message Passing Interface) library and a 3-D domain decomposition is employed to decrease the communication time between neighboring processors. Compared to a single-processor FDTD, the speed up factor of a-CPU-cluster-based parallel FDTD algorithm is investigated for the normal mode and the hypermode and finally analyze an electrically large concrete structure by the developed parallel algorithm.

A study on HFC-based GA (HFC 기반 유전자알고리즘에 관한 연구)

  • Kim, Gil-Seong;Choe, Jeong-Nae;O, Seong-Gwan;Kim, Hyeon-Gi
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.04a
    • /
    • pp.341-344
    • /
    • 2007
  • 본 논문에서는 계층적 공정 경쟁 개념을 병렬 유전자 알고리즘에 적용하여 계층적 공정 경쟁 기반 병렬유전자 알고리즘 (Hierarchical Fair Competition Genetic Algorithm: HFCGA)을 구현하였을 뿐만 아니라 실수코딩 유전자 알고리즘(Real-Coded Genetic Algorithm: RCGA)에서 좋은 성능을 갖는 산술교배(Arithmetic crossover), 수정된 단순교배(modified simple crossover) 그리고 UNDX(unimodal normal distribution crossover)등의 다양한 교배연산자들을 적용, 분석함으로써 개선된 병렬 유전자 알고리즘을 제안하였다. UNDX연산자는 다수의 부모(multiple parents)를 이용하여 부모들의 기하학적 중심(geometric center)에 근접하게 정규분포를 이루며 생성된다. 본 논문은 UNDX를 이용한 HFCGA모델을 구현하고 함수파라미터 최적화 문제에 많이 쓰이는 함수들에 적용시킴으로써 그 성능의 우수성을 증명 한다.

  • PDF

Efficient Implementation of an Extreme Eigenvalue Problem on Cray T3E (Cray T3E에서 극한 고유치문제의 효과적인 수행)

  • 김선경
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2000.11a
    • /
    • pp.480-483
    • /
    • 2000
  • 공학의 많은 응용분야에서 큰 회소 행렬(Large Sparse Matrices)에 대한 가장 작거나 또는 가장 큰 고유치(Eigenvalues)들을 요구하게 되는데, 이때 많이 이용되는 것은 Krylov Subspace로의 Projection방법이다. 대칭 행렬에 대해서는 Lanczos방법을, 비대칭 행렬에 대해서는 Biorhtogonal Lanczos방법을 이용할 수 있다. 이러한 기존의 알고리즘들은 새롭게 제안되는 병렬처리 시스템에서 효과적이지 못하다. 많은 프로세서를 가지는 병렬처리 컴퓨터 중에서도 분산 기억장치 시스템(Distributed Memory System)에서는 프로세서들 사이의 Data Communication에 필요한 시간을 줄이도록 해야한다. 본 논문에서는 기존의 Lanczos 알고리즘을 수정함으로써, 알고리즘의 동기점(Synchronization Point)을 줄이고 병렬화를 위한 입상(Granularity)을 증가시켜서 MPP인 Cray T3E에서 Data Communication에 필요한 시간을 줄인다. 많은 프로세서를 사용하는 경우 수정된 알고리즘이 기존의 알고리즘에 비해 더 나은 speedup을 보여준다.

  • PDF

A Parallel Task Allocation Algorithm improved Duplication Steps (중복 단계를 개선한 병렬 타스크 할당 알고리즘)

  • 이재관;김창수
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 1998.05a
    • /
    • pp.342-347
    • /
    • 1998
  • 병렬 프로그램의 스케줄링 기법에 있어, 타스크 중복 알고리즘은 리스트 스케줄링 알고리즘에 비해 상대적으로 새로운 접근 방식이다. 타스크 중복이란 어떤 프로세서에 할당되어 있는 중요한(critical) 타스크들을 다른 프로세서에 중복시켜, 그 타스크들이 중복 실행하도록 프로그램을 스케쥴하는 것이다. 따라서, 중요한 타스크들이 같은 프로세서내에 있게 되어, 다른 타스크들의 시작 시간(start time)을 줄일 수 있게 된다. 이는 결국 전체 프로그램의 스케줄 길이(schedule length)를 줄이게 된다. 병렬 프로그램의 스케줄링 목적은 프로그램의 스케쥴 길이를 최소화하고, 스케줄의 complexity를 줄이는 것이다 그러나, 스케줄 길이와 complexity는 상호 trade-off 관계이다 본 논문에서는 기존의 중복 알고리즘과 비교하여, 스케쥴 길이를 승가 시키지 않으면서, complexity를 같거나 더 적게하는 알고리즘을 제시하여 컴파일 시간을 향상시키고자 한다.

  • PDF

Obtaining 1-pixel Width Line Using an Enhanced Parallel Thinning Algorithm (병렬 세선화 알고리즘을 이용한 1-화소 굵기의 선 구하기)

  • Kwon, Jun-Sik
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.1
    • /
    • pp.1-6
    • /
    • 2009
  • A Thinning algorithm is a very Important factor in order to recognize the character, figure, and drawing. Until comparatively lately, the thinning algorithm was proposed by various methods. In this paper, we ascertain the point at issue of ZS(Zhang and Suen), LW(Lu and Wang) and WHF(Wang, Hui and Fleming) algorithms that are the parallel thinning algorithms. The parallel thinning algorithm means the first processing doesn't have to influence to the second processing. ZS algorithm has a problem which loses pixels in slanting lines and LW algorithm doesn't have one pixel width in slanting lines. So I propose an advanced parallel thinning algorithm that connects the pixels each other and preserve the end point.