• Title/Summary/Keyword: Parallel Processing Algorithm

Search Result 680, Processing Time 0.03 seconds

Parallel Design and Implementation of Shot Boundary Detection Algorithm (샷 경계 탐지 알고리즘의 병렬 설계와 구현)

  • Lee, Joon-Goo;Kim, SeungHyun;You, Byoung-Moon;Hwang, DooSung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.2
    • /
    • pp.76-84
    • /
    • 2014
  • As the number of high-density videos increase, parallel processing approaches are necessary to process a large-scale of video data. When a processing method of video data requires thousands of simple operations, GPU-based parallel processing is preferred to CPU-based parallel processing by way of reducing the time and space complexities of a given computation problem. This paper studies the parallel design and implementation of a shot-boundary detection algorithm. The proposed shot-boundary detection algorithm uses pixel brightness comparisons and global histogram data among the blocks of frames, and the computation of these data is characterized with the high parallelism for the related operations. In order to maximize these operations in parallel, the computations of the pixel brightness and histogram are designed in parallel and implemented in NVIDIA GPU. The GPU-based shot detection method is tested with 10 videos from the set of videos in National Archive of Korea. In experiments, the detection rate is similar but the computation time is about 10 time faster to that of the CPU-based algorithm.

Parallel Prefix Computation and Sorting on a Recursive Dual-Net

  • Li, Yamin;Peng, Shietung;Chu, Wanming
    • Journal of Information Processing Systems
    • /
    • v.7 no.2
    • /
    • pp.271-286
    • /
    • 2011
  • In this paper, we propose efficient algorithms for parallel prefix computation and sorting on a recursive dual-net. The recursive dual-net $RDN^k$(B) for k > 0 has $(2n_o)^{2K}/2$ nodes and $d_0$ + k links per node, where $n_0$ and $d_0$ are the number of nod es and the node-degree of the base-network B, respectively. Assume that each node holds one data item, the communication and computation time complexities of the algorithm for parallel prefix computation on $RDN^k$(B), k > 0, are $2^{k+1}-2+2^kT_{comm}(0)$ and $2^{k+1}-2+2^kT_{comp}(0)$, respectively, where $T_{comm}(0)$ and $T_{comp}(0)$ are the communication and computation time complexities of the algorithm for parallel prefix computation on the base-network B, respectively. The algorithm for parallel sorting on $RDN^k$(B) is restricted on B = $Q_m$ where $Q_m$ is an m-cube. Assume that each node holds a single data item, the sorting algorithm runs in $O((m2^k)^2)$ computation steps and $O((km2^k)^2)$ communication steps, respectively.

Parallelization of A Load balancing Algorithm for Parallel Computations (병렬계산을 위한 부하분산 알고리즘의 병렬화)

  • In-Jae Hwang
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.5 no.3
    • /
    • pp.236-242
    • /
    • 2004
  • In this paper, we propose an approach to parallelize a load balancing algorithm that was shown to be very effective in distributing workload for parallel computations. Load balancing algorithms are required in executing parallel program efficiently As a parallel computation model, we used dynamically growing tree structure that can be found in many application problems. The load balancing algorithm tries to balance the workload among processors while keeping the communication cost under certain limit. We show how the load balancing algorithm is effectively parallelized on mesh and hypercube interconnection networks, and analyzed the time complexity for each case to show that parallel algorithm actually reduced the various overhead.

  • PDF

Full Search Equivalent Motion Estimation Algorithm for General-Purpose Multi-Core Architectures

  • Park, Chun-Su
    • Journal of the Semiconductor & Display Technology
    • /
    • v.12 no.3
    • /
    • pp.13-18
    • /
    • 2013
  • Motion estimation is a key technique of modern video processing that significantly improves the coding efficiency significantly by exploiting the temporal redundancy between successive frames. Thread-level parallelism is a promising method to accelerate the motion estimation process for multithreading general-purpose processors. In this paper, we propose a parallel motion estimation algorithm which parallelizes the motion search process of the current H.264/AVC encoder. The proposed algorithm is implemented using the OpenMP application programming interface (API) and can be easily integrated into the current encoder. The experimental results show that the proposed parallel algorithm can reduce the processing time of the motion estimation up to 65.08% without any penalty in the rate-distortion (RD) performance.

FPGA Design of a Parallel Canny Edge Detector with Optimized Local Buffers (로컬 버퍼 최적화를 통한 병렬 처리 캐니 경계선 검출기의 FPGA 설계)

  • Ingi Min;Suhyun Sim;Seungwon Hwang;Sunhee Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.4
    • /
    • pp.59-65
    • /
    • 2023
  • Edge detection in image processing and computer vision is one of the most fundamental operations. Canny edge detection algorithm has excellent performance and is currently widely used. However, it is difficult to process the algorithm in real-time because the algorithm is complex. In this study, the equations required in the algorithm were simplified to facilitate hardware implementation, and the calculation speed was increased by using a parallel structure. In particular, the size and management of local buffers were selected in consideration of parallel processing and filter size so that data could be processed without bottlenecks. It was designed in verilog and implemented in FPGA to verify operation and performance.

  • PDF

A Genetic Algorithm for Minimizing Total Tardiness with Non-identical Parallel Machines (이종 병렬설비 공정의 납기지연시간 최소화를 위한 유전 알고리즘)

  • Choi, Yu-Jun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.1
    • /
    • pp.65-73
    • /
    • 2015
  • This paper considers a parallel-machine scheduling problem with dedicated and common processing machines using GA (Genetic Algorithm). Non-identical setup times, processing times and order lot size are assumed for each machine. The GA is proposed to minimize the total-tardiness objective measure. In this paper, heuristic algorithms including EDD (Earliest Due-Date), SPT (Shortest Processing Time) and LPT (Longest Processing Time) are compared with GA. The effectiveness and suitability of the GA are derived and tested through computational experiments.

A Disk Allocation Scheme for High-Performance Parallel File System (고성능 병렬화일 시스템을 위한 디스크 할당 방법)

  • Park, Kee-Hyun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.9
    • /
    • pp.2827-2835
    • /
    • 2000
  • In recent years, much attention has been focused on improving I/O devices' processing speed which is essential in such large data processing areas as multimedia data processing. And studies on high-performance parallel file systems are considered to be one of such efforts. In this paper, an efficient disk allocation scheme is proposed for high-performance parallel file systems. In other words, the concept of a parallel disk file's parallelism is defined using data declustering characteristic of a given parallel file. With the concept, an efficient disk allocation scheme is proposed which calculates the appropriate degree of data declustering on disks for each parallel file in order to obtain the maximum throughput when more than one parallel file is used at the same time. Since, calculation for obtaining the maximum throughput is too complex as the number of parallel files increases, an approximate disk allocation algorithm is also proposed in this paper. The approximate algorithm is very simple and especially provides very good results when I/O workload is high. In addition, it has shown that the approximate algorithm provides the optimal disk allocation for the maximum throughput when the arrival rate of I/O requests is infinite.

  • PDF

A Ray-Tracing Algorithm Based On Processor Farm Model (프로세서 farm 모델을 이용한 광추적 알고리듬)

  • Lee, Hyo Jong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.2 no.1
    • /
    • pp.24-30
    • /
    • 1996
  • The ray tracing method, which is one of many photorealistic rendering techniques, requires heavy computational processing to synthesize images. Parallel processing can be used to reduce the computational processing time. A parallel algorithm for the ray tracing has been implemented and executed for various images on transputer systems. In order to develop a scalable parallel algorithm, a processor farming technique has been exploited. Since each image is divided and distributed to each farming processor, the scalability of the parallel system and load balancing are achieved naturally in the proposed algorithm. Efficiency of the parallel algorithm is obtained up to 95% for nine processors. However, the best size of a distributed task is much higher in simple images due to less computational requirement for every pixel. Efficiency degradation is observed for large granularity tasks because of load unbalancing caused by the large task. Overall, transputer systems behave as good scalable parallel processing system with respect to the cost-performance ratio.

  • PDF

A Parallel Processor System for Cultural Assets Image Retrieval (문화재 검색을 위한 병렬처리기 구조)

  • Yoon, Hee-Jun;Lee, Hyung;Han, Ki-Sun;Partk, Jong-Won
    • Journal of Korea Multimedia Society
    • /
    • v.1 no.2
    • /
    • pp.154-161
    • /
    • 1998
  • This paper proposes a parallel processor system which processes cultural assets image recognition and retrieval algorithm in real time. A serial algorithm which is developed for the parallel processor system is parallellized. The parallel processor system consists of a control unit, 100 PE(Processing Elements), and 10 Park's multi-access memory systems which has 11 memory modules per each one. The parallel processor system is simulated by CADENCE Verilog-XL which is a package for the hardware simulation. With the same simulated results as that of the serial algorithm, the speed ratio of the parallel algorithm to the serial one is 81. The parallel processor system we proposed is quite effective for cultural assets image processing.

  • PDF

Efficient Parallel Block-layered Nonbinary Quasi-cyclic Low-density Parity-check Decoding on a GPU

  • Thi, Huyen Pham;Lee, Hanho
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.3
    • /
    • pp.210-219
    • /
    • 2017
  • This paper proposes a modified min-max algorithm (MMMA) for nonbinary quasi-cyclic low-density parity-check (NB-QC-LDPC) codes and an efficient parallel block-layered decoder architecture corresponding to the algorithm on a graphics processing unit (GPU) platform. The algorithm removes multiplications over the Galois field (GF) in the merger step to reduce decoding latency without any performance loss. The decoding implementation on a GPU for NB-QC-LDPC codes achieves improvements in both flexibility and scalability. To perform the decoding on the GPU, data and memory structures suitable for parallel computing are designed. The implementation results for NB-QC-LDPC codes over GF(32) and GF(64) demonstrate that the parallel block-layered decoding on a GPU accelerates the decoding process to provide a faster decoding runtime, and obtains a higher coding gain under a low $10^{-10}$ bit error rate and low $10^{-7}$ frame error rate, compared to existing methods.