• Title/Summary/Keyword: Speedup

Search Result 272, Processing Time 0.025 seconds

A Physical Data Design and Query Routing Technique of High Performance BLAST on E-Cluster (고성능 BLAST구현을 위한 E-Cluster 기반 데이터 분할 및 질의 라우팅 기법)

  • Kim, Tae-Kyung;Cho, Wan-Sup
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.2
    • /
    • pp.139-147
    • /
    • 2009
  • BLAST (Basic Local Alignment Search Tool) is a best well-known tool in a bioinformatics area. BLAST quickly compares input sequences with annotated huge sequence databases and predicts their functions. It helps biologists to make it easy to annotate newly found sequences with reduced experimental time, scope, and cost. However, as the amount of sequences is increasing remarkably with the advance of sequencing machines, performance of BLAST has been a critical issue and tried to solve it with several alternatives. In this paper, we propose a new PC-Based Cluster system (E-Cluster), a new physical data design methodology (logical partitioning technique) and a query routing technique (intra-query routing). To verify our system, we measure response time, speedup, and efficiency for various sizes of sequences in NR (Non-Redundancy) database. Experimental result shows that proposed system has better speedup and efficiency (maximum 600%) than those o( conventional approaches such as SMF machines, clusters, and grids.

Parallel Processing of k-Means Clustering Algorithm for Unsupervised Classification of Large Satellite Images: A Hybrid Method Using Multicores and a PC-Cluster (대용량 위성영상의 무감독 분류를 위한 k-Means Clustering 알고리즘의 병렬처리: 다중코어와 PC-Cluster를 이용한 Hybrid 방식)

  • Han, Soohee;Song, Jeong Heon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.6
    • /
    • pp.445-452
    • /
    • 2019
  • In this study, parallel processing codes of k-means clustering algorithm were developed and implemented in a PC-cluster for unsupervised classification of large satellite images. We implemented intra-node code using multicores of CPU (Central Processing Unit) based on OpenMP (Open Multi-Processing), inter-nodes code using a PC-cluster based on message passing interface, and hybrid code using both. The PC-cluster consists of one master node and eight slave nodes, and each node is equipped with eight multicores. Two operating systems, Microsoft Windows and Canonical Ubuntu, were installed in the PC-cluster in turn and tested to compare parallel processing performance. Two multispectral satellite images were tested, which are a medium-capacity LANDSAT 8 OLI (Operational Land Imager) image and a high-capacity Sentinel 2A image. To evaluate the performance of parallel processing, speedup and efficiency were measured. Overall, the speedup was over N / 2 and the efficiency was over 0.5. From the comparison of the two operating systems, the Ubuntu system showed two to three times faster performance. To confirm that the results of the sequential and parallel processing coincide with the other, the center value of each band and the number of classified pixels were compared, and result images were examined by pixel to pixel comparison. It was found that care should be taken to avoid false sharing of OpenMP in intra-node implementation. To process large satellite images in a PC-cluster, code and hardware should be designed to reduce performance degradation caused by file I / O. Also, it was found that performance can differ depending on the operating system installed in a PC-cluster.

PARALLEL DYNAMIC CODING METHOD OF HANGUL TEXT

  • Min, Yong-Sik
    • Journal of applied mathematics & informatics
    • /
    • v.3 no.2
    • /
    • pp.157-168
    • /
    • 1996
  • This paper describes an efficient coding method for Ko-rean characters (alphabet) using a three-state transition graph. Par-allel hangul Dynamic Coding Method (PHDCM) compresses about 3.5 bits per Korean character compared with other coding techinques. When we ran the method on a MasPar machine it achieved a 49.314-fold speedup with 64 processors having 10 million orean characters

PARALLEL DYNAMIC OCTAL COMPACT MAPPING

  • Min, Yong-Sik
    • Journal of applied mathematics & informatics
    • /
    • v.3 no.1
    • /
    • pp.35-46
    • /
    • 1996
  • This paper suggests a new coding method for the parallel machine which compresses the data be reducing redundancy. Paral-lel Dynamic octal Compact Mapping (PDOCM) compresses at least 1 byte per word compared with other coding techniques and achieves a 54. 188-fold speedup with 64 processors to transmit 10 million charac-ters.

대형구조물을 위한 병렬 구조해석 및 설계

  • 박효선
    • Computational Structural Engineering
    • /
    • v.9 no.3
    • /
    • pp.47-53
    • /
    • 1996
  • 공학 전반에 걸쳐 다양한 형식으로 개발되어 사용되고 있는 병렬계산법의 기본개념과 병렬계산기의 분류에 대하여 소개하였으며, 구조해석시 가장 많은 시간을 요하는 방정식해법을 preconditioned conjugate gradient를 이용하여 병렬화하는 과정과 병렬알고리즘을 소개하였다. 그리고 소개된 병렬방정식해법을 대형구조물의 해석 및 설계에 적용하여 병렬계산의 효율성을 speedup을 이용하여 도표화하였다.

  • PDF

Speculative Parallelism Characterization Profiling in General Purpose Computing Applications

  • Wang, Yaobin;An, Hong;Liu, Zhiqin;Li, Li;Yu, Liang;Zhen, Yilu
    • Journal of Computing Science and Engineering
    • /
    • v.9 no.1
    • /
    • pp.20-28
    • /
    • 2015
  • General purpose computing applications have not yet been thoroughly explored in procedure level speculation, especially in the light-weighted profiling way. This paper proposes a light-weighted profiling mechanism to analyze speculative parallelism characterization in several classic general purpose computing applications from SPEC CPU2000 benchmark. By comparing the key performance factors in loop and procedure-level speculation, it includes new findings on the behaviors of loop and procedure-level parallelism under these applications. The experimental results are as follows. The best gzip application can only achieve a 2.4X speedup in loop level speculation, while the best mcf application can achieve almost 3.5X speedup in procedure level. It proves that our light-weighted profiling method is also effective. It is found that between the loop-level and procedure-level TLS, the latter is better on several cases, which is against the conventional perception. It is especially shown in the applications where their 'hot' procedure body is concluded as 'hot' loops.

Parallel processing in structural reliability

  • Pellissetti, M.F.
    • Structural Engineering and Mechanics
    • /
    • v.32 no.1
    • /
    • pp.95-126
    • /
    • 2009
  • The present contribution addresses the parallelization of advanced simulation methods for structural reliability analysis, which have recently been developed for large-scale structures with a high number of uncertain parameters. In particular, the Line Sampling method and the Subset Simulation method are considered. The proposed parallel algorithms exploit the parallelism associated with the possibility to simultaneously perform independent FE analyses. For the Line Sampling method a parallelization scheme is proposed both for the actual sampling process, and for the statistical gradient estimation method used to identify the so-called important direction of the Line Sampling scheme. Two parallelization strategies are investigated for the Subset Simulation method: the first one consists in the embarrassingly parallel advancement of distinct Markov chains; in this case the speedup is bounded by the number of chains advanced simultaneously. The second parallel Subset Simulation algorithm utilizes the concept of speculative computing. Speedup measurements in context with the FE model of a multistory building (24,000 DOFs) show the reduction of the wall-clock time to a very viable amount (<10 minutes for Line Sampling and ${\approx}$ 1 hour for Subset Simulation). The measurements, conducted on clusters of multi-core nodes, also indicate a strong sensitivity of the parallel performance to the load level of the nodes, in terms of the number of simultaneously used cores. This performance degradation is related to memory bottlenecks during the modal analysis required during each FE analysis.

A Parallel Speech Recognition Model on Distributed Memory Multiprocessors (분산 메모리 다중프로세서 환경에서의 병렬 음성인식 모델)

  • 정상화;김형순;박민욱;황병한
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.44-51
    • /
    • 1999
  • This paper presents a massively parallel computational model for the efficient integration of speech and natural language understanding. The phoneme model is based on continuous Hidden Markov Model with context dependent phonemes, and the language model is based on a knowledge base approach. To construct the knowledge base, we adopt a hierarchically-structured semantic network and a memory-based parsing technique that employs parallel marker-passing as an inference mechanism. Our parallel speech recognition algorithm is implemented in a multi-Transputer system using distributed-memory MIMD multiprocessors. Experimental results show that the parallel speech recognition system performs better in recognition accuracy than a word network-based speech recognition system. The recognition accuracy is further improved by applying code-phoneme statistics. Besides, speedup experiments demonstrate the possibility of constructing a realtime parallel speech recognition system.

  • PDF

High-speed Integer Fuzzy Controller without Multiplications

  • Lee Sang-Gu
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.6 no.3
    • /
    • pp.223-231
    • /
    • 2006
  • In high-speed fuzzy control systems applied to intelligent systems such as robot control, one of the most important problems is the improvement of the execution speed of the fuzzy inference. In particular, it is more important to have high-speed operations in the consequent part and the defuzzification stage. To improve the speedup of fuzzy controllers for intelligent systems, this paper presents an integer line mapping algorithm to convert [0, 1] real values of the fuzzy membership functions in the consequent part to a $400{\times}30$ grid of integer values. In addition, this paper presents a method of eliminating the unnecessary operations of the zero items in the defuzzification stage. With this representation, a center of gravity method can be implemented with only integer additions and one integer division. The proposed system is analyzed in the air conditioner control system for execution speed and COG, and applied to the truck backer-upper control system. The proposed system shows a significant increase in speed as compared with conventional methods with minimal error; simulations indicate a speedup of an order of magnitude. This system can be applied to real-time high-speed intelligent systems such as robot arm control.

Effects of inflow turbulence and slope on turbulent boundary layer over two-dimensional hills

  • Wang, Tong;Cao, Shuyang;Ge, Yaojun
    • Wind and Structures
    • /
    • v.19 no.2
    • /
    • pp.219-232
    • /
    • 2014
  • The characteristics of turbulent boundary layers over hilly terrain depend strongly on the hill slope and upstream condition, especially inflow turbulence. Numerical simulations are carried out to investigate the neutrally stratified turbulent boundary layer over two-dimensional hills. Two kinds of hill shape, a steep one with stable separation and a low one without stable separation, two kinds of inflow condition, laminar turbulent, are considered. An auxiliary simulation, based on the local differential quadrature method and recycling technique, is performed to simulate the inflow turbulence be imposed at inlet boundary of the turbulent inflow, which preserves very well in the computational domain. A large separation bubble is established on the leeside of the steep hill with laminar inflow, while reattachment point moves upstream under turbulent inflow condition. There is stable separation on the side of low hill with laminar inflow, whilw not turbulent inflow. Besides increase of turbulence intensity, inflow can efficiently enhance the speedup around hills. So in practice, it is unreasonable to study wind flow over hilly terrain without considering inflow turbulence.