• Title/Summary/Keyword: sequential and parallel algorithms

Search Result 35, Processing Time 0.027 seconds

Accelerating Soft-Decision Reed-Muller Decoding Using a Graphics Processing Unit

  • Uddin, Md. Sharif;Kim, Cheol Hong;Kim, Jong-Myon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.4 no.2
    • /
    • pp.369-378
    • /
    • 2014
  • The Reed-Muller code is one of the efficient algorithms for multiple bit error correction, however, its high-computation requirement inherent in the decoding process prohibits its use in practical applications. To solve this problem, this paper proposes a graphics processing unit (GPU)-based parallel error control approach using Reed-Muller R(r, m) coding for real-time wireless communication systems. GPU offers a high-throughput parallel computing platform that can achieve the desired high-performance decoding by exploiting massive parallelism inherent in the algorithm. In addition, we compare the performance of the GPU-based approach with the equivalent sequential approach that runs on the traditional CPU. The experimental results indicate that the proposed GPU-based approach exceedingly outperforms the sequential approach in terms of execution time, yielding over 70× speedup.

A Study on the IC, Implementation of High Speed Multiplier for Real Time Digital Signal Processing (실시간 디지털 신호 처리용 고속 MULTIPLIER 단일칩화에 관한 연구)

  • 문대철;차균현
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.15 no.7
    • /
    • pp.628-637
    • /
    • 1990
  • In this paper we present on architecture for a high sppeed CMOS multiplier which can be used for real-time digital signal processing. And a synthesis method for designing highly parallel algorithms in VLSI is presented. A parallel multiplier design based on the modified Booth's algorithms and Ling's algorthm. This paper addresses the design of multiplier capable of accpting data in 2's complement notation and coefficients in 2's complement notation. Multiplier consists of an interative array of sequential cells, and are well suited to VLSI implementation as a results of their modularity and regularity. Booth's decoders can be fully tested using a relatively small number af test vector.

  • PDF

Parallel Algorithms for Finding δ-approximate Periods and γ-approximate Periods of Strings over Integer Alphabets (정수문자열의 δ-근사주기와 γ-근사주기를 찾는 병렬알고리즘)

  • Kim, Youngho;Sim, Jeong Seop
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.760-766
    • /
    • 2017
  • Repetitive strings have been studied in diverse fields such as data compression, bioinformatics and so on. Recently, two problems of approximate periods of strings over integer alphabets were introduced, finding minimum ${\delta}-approximate$ periods and finding minimum ${\gamma}-approximate$ periods. Both problems can be solved in $O(n^2)$ time when n is the length of the string. In this paper, we present two parallel algorithms for solving the above two problems in O(n) time using $O(n^2)$ threads, respectively. The experimental results show that our parallel algorithms for finding minimum ${\delta}-approximate$ (resp. ${\gamma}-approximate$) periods run approximately 19.7 (resp. 40.08) times faster than the sequential algorithms when n = 10,000.

Debugging of Parallel Programs using Distributed Cooperating Components

  • Mrayyan, Reema Mohammad;Al Rababah, Ahmad AbdulQadir
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12spc
    • /
    • pp.570-578
    • /
    • 2021
  • Recently, in the field of engineering and scientific and technical calculations, problems of mathematical modeling, real-time problems, there has been a tendency towards rejection of sequential solutions for single-processor computers. Almost all modern application packages created in the above areas are focused on a parallel or distributed computing environment. This is primarily due to the ever-increasing requirements for the reliability of the results obtained and the accuracy of calculations, and hence the multiply increasing volumes of processed data [2,17,41]. In addition, new methods and algorithms for solving problems appear, the implementation of which on single-processor systems would be simply impossible due to increased requirements for the performance of the computing system. The ubiquity of various types of parallel systems also plays a positive role in this process. Simultaneously with the growing demand for parallel programs and the proliferation of multiprocessor, multicore and cluster technologies, the development of parallel programs is becoming more and more urgent, since program users want to make the most of the capabilities of their modern computing equipment[14,39]. The high complexity of the development of parallel programs, which often does not allow the efficient use of the capabilities of high-performance computers, is a generally accepted fact[23,31].

Evaluation of the different genetic algorithm parameters and operators for the finite element model updating problem

  • Erdogan, Yildirim Serhat;Bakir, Pelin Gundes
    • Computers and Concrete
    • /
    • v.11 no.6
    • /
    • pp.541-569
    • /
    • 2013
  • There is a wide variety of existing Genetic Algorithms (GA) operators and parameters in the literature. However, there is no unique technique that shows the best performance for different classes of optimization problems. Hence, the evaluation of these operators and parameters, which influence the effectiveness of the search process, must be carried out on a problem basis. This paper presents a comparison for the influence of GA operators and parameters on the performance of the damage identification problem using the finite element model updating method (FEMU). The damage is defined as reduction in bending rigidity of the finite elements of a reinforced concrete beam. A certain damage scenario is adopted and identified using different GA operators by minimizing the differences between experimental and analytical modal parameters. In this study, different selection, crossover and mutation operators are compared with each other based on the reliability, accuracy and efficiency criteria. The exploration and exploitation capabilities of different operators are evaluated. Also a comparison is carried out for the parallel and sequential GAs with different population sizes and the effect of the multiple use of some crossover operators is investigated. The results show that the roulettewheel selection technique together with real valued encoding gives the best results. It is also apparent that the Non-uniform Mutation as well as Parent Centric Normal Crossover can be confidently used in the damage identification problem. Nevertheless the parallel GAs increases both computation speed and the efficiency of the method.

Parallel Algorithms for Finding Consensus of Circular Strings (환형문자열에 대한 대표문자열을 찾는 병렬 알고리즘)

  • Kim, Dong Hee;Sim, Jeong Seop
    • Journal of KIISE
    • /
    • v.42 no.3
    • /
    • pp.289-294
    • /
    • 2015
  • The consensus problem is finding a representative string, called a consensus, of a given set S of k strings. Circular strings are different from linear strings in that the last symbol precedes the first symbol. Given a set S of circular strings of length n over an alphabet ${\Sigma}$, we first present an $O({\mid}{\Sigma}{\mid}nlogn)$ time parallel algorithm for finding a consensus of S minimizing both radius and distance sum when k=3 using O(n) threads. Then we present an $O({\mid}{\Sigma}{\mid}n^2logn)$ time parallel algorithm for finding a consensus of S minimizing distance sum when k=4 using O(n) threads. Finally, we compare execution times of our algorithms implemented using CUDA with corresponding sequential algorithms.

A Study on the Performance of Optimization Techniques on the Selection of Control Source Positions in an Active Noise Barrier System (능동방음벽 시스템의 제어 음원 위치 선정에 미치는 최적화 기법 성능에 관한 고찰)

  • Im, Hyoung-Jin;Baek, Kwang-Hyun
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2004.11a
    • /
    • pp.1012-1015
    • /
    • 2004
  • There have been several kinds of attempts to actively control the deflected noise behind the noise barrier. Omoto's work in 1993 would be one of the fundamental studies, where he placed the control sources uniformly parallel to the noise barrier. Following this study, Yang pointed that the average distance between the noise source and control sources is more important than the arrangement of control sources such as a straight line or an arc type distribution. In 2004, Baek tried to show optimal arrangement of control sources while keeping the average distance between the noise source and control sources. He used simulated annealing algorithm which is one of the natural algorithms for the selections of optimal control source positions, but the searching technique was a hybrid of the simulated annealing and the sequential searching to adapt to the vast amount of searching time. This study is about the performance comparison between the pure sequential searching and the hybrid one. The simulation results show very similar performance and a pure simulated annealing searching will be more beneficial for the noise reduction performance but at the cost of computing time.

  • PDF

Gene Expression Data Analysis Using Parallel Processor based Pattern Classification Method (병렬 프로세서 기반의 패턴 분류 기법을 이용한 유전자 발현 데이터 분석)

  • Choi, Sun-Wook;Lee, Chong-Ho
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.6
    • /
    • pp.44-55
    • /
    • 2009
  • Diagnosis of diseases using gene expression data obtained from microarray chip is an active research area recently. It has been done by general machine learning algorithms, because it is difficult to analyze directly. However, recent research results about the analysis based on the interaction between genes is essential for the gene expression analysis, which means the analysis using the traditional machine learning algorithms has limitations. In this paper, we classify the gene expression data using the hyper-network model that considers the higher-order correlations between the features, and then compares the classification accuracies. And also, we present the new hypo-network model that improve the disadvantage of existing model, and compare the processing performances of the existing hypo-network model based on general sequential processor and the improved hypo-network model implemented on parallel processors. In the experimental results, we show that the performance of our model shows improved and competitive classification performance than traditional machine learning methods, as well as, the existing hypo-network model. We show that the performance is maximized when the hypernetwork model is implemented on our parallel processors.

Extracting Maximum Parallelism for Parallel Computing (병렬 계산을 위한 최대 병렬성 추출 방법)

  • Park, Doo-Soon
    • The Journal of Korean Association of Computer Education
    • /
    • v.8 no.1
    • /
    • pp.93-103
    • /
    • 2005
  • Since the most program execution time is consumed in a loop structure, extracting parallelism from sequential loop programs is critical for the faster program execution. Conventional studies for extracting the parallelism are focused mostly on a uniform data dependence distance. In this paper, we proposed data dependency elimination method for a nested loop and extended data dependency elimination method to extract parallelism from the loop with procedure calls. The data dependency elimination method and the extended data dependency elimination method can be applied to uniform and non-uniform data dependency distance. We compared our method with conventional methods using CRAY-T3E for the performance evaluation. The results show that the proposed algorithms are very effective.

  • PDF

A survey on parallel training algorithms for deep neural networks (심층 신경망 병렬 학습 방법 연구 동향)

  • Yook, Dongsuk;Lee, Hyowon;Yoo, In-Chul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.6
    • /
    • pp.505-514
    • /
    • 2020
  • Since a large amount of training data is typically needed to train Deep Neural Networks (DNNs), a parallel training approach is required to train the DNNs. The Stochastic Gradient Descent (SGD) algorithm is one of the most widely used methods to train the DNNs. However, since the SGD is an inherently sequential process, it requires some sort of approximation schemes to parallelize the SGD algorithm. In this paper, we review various efforts on parallelizing the SGD algorithm, and analyze the computational overhead, communication overhead, and the effects of the approximations.