• Title/Summary/Keyword: string processing

Search Result 140, Processing Time 0.024 seconds

High Throughput Parallel KMP Algorithm Considering CPU-GPU Memory Hierarchy (CPU-GPU 메모리 계층을 고려한 고처리율 병렬 KMP 알고리즘)

  • Park, Soeun;Kim, Daehee;Lee, Myungho;Park, Neungsoo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.5
    • /
    • pp.656-662
    • /
    • 2018
  • Pattern matching algorithm is widely used in many application fields such as bio-informatics, intrusion detection, etc. Among many string matching algorithms, KMP (Knuth-Morris-Pratt) algorithm is commonly used because of its fast execution time when using large texts. However, the processing speed of KMP algorithm is also limited when the text size increases significantly. In this paper, we propose a high throughput parallel KMP algorithm considering CPU-GPU memory hierarchy based on OpenCL in GPGPU (General Purpose computing on Graphic Processing Unit). We focus on the optimization for the allocation of work-times and work-groups, the local memory copy of the pattern data and the failure table, and the overlapping of the data transfer with the string matching operations. The experimental results show that the execution time of the optimized parallel KMP algorithm is about 3.6 times faster than that of the non-optimized parallel KMP algorithm.

Wine Label Character Recognition in Mobile Phone Images using a Lexicon-Driven Post-Processing (사전기반 후처리를 이용한 모바일 폰 영상에서 와인 라벨 문자 인식)

  • Lim, Jun-Sik;Kim, Soo-Hyung;Lee, Chil-Woo;Lee, Guee-Sang;Yang, Hyung-Jung;Lee, Myung-Eun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.5
    • /
    • pp.546-550
    • /
    • 2010
  • In this paper, we propose a method for the postprocessing of cursive script recognition in Wine Label Images. The proposed method mainly consists of three steps: combination matrix generation, character combination filtering, string matching. Firstly, the combination matrix generation step detects all possible combinations from a recognition result for each of the pieces. Secondly, the unnecessary information in the combination matrix is removed by comparing with bigram of word in the lexicon. Finally, string matching step decides the identity of result as a best matched word in the lexicon based on the levenshtein distance. An experimental result shows that the recognition accuracy is 85.8%.

A Simple Real-Time DMPPT Algorithm for PV Systems Operating under Mismatch Conditions

  • Aniruddha, Kamath M.;Jayanta, Biswas;Anjana, K.G.;Mukti, Barai
    • Journal of Power Electronics
    • /
    • v.18 no.3
    • /
    • pp.826-840
    • /
    • 2018
  • This paper presents a distributed maximum power point tracking (DMPPT) algorithm based on the reference voltage perturbation (RVP) method for the PV modules of a series PV string. The proposed RVP-DMPPT algorithm is developed to accurately track the maximum power point (MPP) for each PV module operating under all atmospheric conditions with a reduced hardware overhead. To study the influence of parameters such as the controller reference voltage ($V_{ref}$) and PV current ($I_{pv}$) on the PV string voltage, a small signal model of a unidirectional differential power processing (DPP) based PV-Bus architecture is developed. The steady state and dynamic performances of the proposed RVP DMPPT algorithm and small signal model of the unidirectional DPP based PV-Bus architecture are demonstrated with simulations and experimental results. The accuracy of the RVP DMPPT algorithm is demonstrated by obtaining a tracking efficiency of 99.4% from the experiment.

An Analysis System for Whole Genomic Sequence Using String B-Tree (스트링 B-트리를 이용한 게놈 서열 분석 시스템)

  • Choe, Jeong-Hyeon;Jo, Hwan-Gyu
    • The KIPS Transactions:PartA
    • /
    • v.8A no.4
    • /
    • pp.509-516
    • /
    • 2001
  • As results of many genome projects, genomic sequences of many organisms are revealed. Various methods such as global alignment, local alignment are used to analyze the sequences of the organisms, and k -mer analysis is one of the methods for analyzing the genomic sequences. The k -mer analysis explores the frequencies of all k-mers or the symmetry of them where the k -mer is the sequenced base with the length of k. However, existing on-memory algorithms are not applicable to the k -mer analysis because a whole genomic sequence is usually a large text. Therefore, efficient data structures and algorithms are needed. String B-tree is a good data structure that supports external memory and fits into pattern matching. In this paper, we improve the string B-tree in order to efficiently apply the data structure to k -mer analysis, and the results of k -mer analysis for C. elegans and other 30 genomic sequences are shown. We present a visualization system which enables users to investigate the distribution and symmetry of the frequencies of all k -mers using CGR (Chaotic Game Representation). We also describe the method to find the signature which is the part of the sequence that is similar to the whole genomic sequence.

  • PDF

Implementation of Parallel Processor for Sound Synthesis of Guitar (기타의 음 합성을 위한 병렬 프로세서 구현)

  • Choi, Ji-Won;Kim, Yong-Min;Cho, Sang-Jin;Kim, Jong-Myon;Chong, Ui-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3
    • /
    • pp.191-199
    • /
    • 2010
  • Physical modeling is a synthesis method of high quality sound which is similar to real sound for musical instruments. However, since physical modeling requires a lot of parameters to synthesize sound of a musical instrument, it prevents real-time processing for the musical instrument which supports a large number of sounds simultaneously. To solve this problem, this paper proposes a single instruction multiple data (SIMD) parallel processor that supports real-time processing of sound synthesis of guitar, a representative plucked string musical instrument. To control six strings of guitar, we used a SIMD parallel processor which consists of six processing elements (PEs). Each PE supports modeling of the corresponding string. The proposed SIMD processor can generate synthesized sounds of six strings simultaneously when a parallel synthesis algorithm receives excitation signals and parameters of each string as an input. Experimental results using a sampling rate 44.1 kHz and 16 bits quantization indicate that synthesis sounds using the proposed parallel processor were very similar to original sound. In addition, the proposed parallel processor outperforms commercial TI's TMS320C6416 in terms of execution time (8.9x better) and energy efficiency (39.8x better).

A Space-Efficient Inverted Index Technique using Data Rearrangement for String Similarity Searches (유사도 검색을 위한 데이터 재배열을 이용한 공간 효율적인 역 색인 기법)

  • Im, Manu;Kim, Jongik
    • Journal of KIISE
    • /
    • v.42 no.10
    • /
    • pp.1247-1253
    • /
    • 2015
  • An inverted index structure is widely used for efficient string similarity search. One of the main requirements of similarity search is a fast response time; to this end, most techniques use an in-memory index structure. Since the size of an inverted index structure usually very large, however, it is not practical to assume that an index structure will fit into the main memory. To alleviate this problem, we propose a novel technique that reduces the size of an inverted index. In order to reduce the size of an index, the proposed technique rearranges data strings so that the data strings containing the same q-grams can be placed close to one other. Then, the technique encodes those multiple strings into a range. Through an experimental study using real data sets, we show that our technique significantly reduces the size of an inverted index without sacrificing query processing time.

Signal Peptide Cleavage Site Prediction Using a String Kernel with Real Exponent Metric (실수 지수 메트릭으로 구성된 스트링 커널을 이용한 신호펩티드의 절단위치 예측)

  • Chi, Sang-Mun
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.10
    • /
    • pp.786-792
    • /
    • 2009
  • A kernel in support vector machines can be described as a similarity measure between data, and this measure is used to find an optimal hyperplane that classifies patterns. It is therefore important to effectively incorporate the characteristics of data into the similarity measure. To find an optimal similarity between amino acid sequences, we propose a real exponent exponential form of the two metrices, which are derived from the evolutionary relationships of amino acids and the hydrophobicity of amino acids. We prove that the proposed metric satisfies the conditions to be a metric, and we find a relation between the proposed metric and the metrics in the string kernels which are widely used for the processing of amino acid sequences and DNA sequences. In the prediction experiments on the cleavage site of the signal peptide, the optimal metric can be found in the proposed metrics.

The syllable recovrey rule-based system and the application of a morphological analysis method for the post-processing of a continuous speech recognition (연속음성인식 후처리를 위한 음절 복원 rule-based 시스템과 형태소분석기법의 적용)

  • 박미성;김미진;김계성;최재혁;이상조
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.3
    • /
    • pp.47-56
    • /
    • 1999
  • Various phonological alteration occurs when we pronounce continuously in korean. This phonological alteration is one of the major reasons which make the speech recognition of korean difficult. This paper presents a rule-based system which converts a speech recognition character string to a text-based character string. The recovery results are morphologically analyzed and only a correct text string is generated. Recovery is executed according to four kinds of rules, i.e., a syllable boundary final-consonant initial-consonant recovery rule, a vowel-process recovery rule, a last syllable final-consonant recovery rule and a monosyllable process rule. We use a x-clustering information for an efficient recovery and use a postfix-syllable frequency information for restricting recovery candidates to enter morphological analyzer. Because this system is a rule-based system, it doesn't necessitate a large pronouncing dictionary or a phoneme dictionary and the advantage of this system is that we can use the being text based morphological analyzer.

  • PDF

A Study of Digital Music Element for Music Plagiarism Analysis (음악 표절 분석을 위한 디지털 음악 요소에 대한 연구)

  • Shin, Mi-Hae;Jo, Jin-Wan;Lee, Hye-Seung;Kim, Young-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.8
    • /
    • pp.43-52
    • /
    • 2013
  • The purpose of this paper is researching musical elements to analyze plagiarism between two sources. We first search digital music elements to analyze music source and examine how to use these in plagiarism analysis using compiler techniques. In addition we are used open source Java API JFugue to process complex MIDI music data simply. Therefore we designed music plagiarism analysis system by using MusicString which is supported in JFugue and construct AST after investigate MusicString's syntax processing elements to manipulate music plagiarism analysis efficiently. So far music plagiarism analysis is evaluated emotionally and subjectively. But this paper suggests first step to build plagiarism analysis systemically. If this research is well utilized, this is very meaningful to standardize systemically which music is plagiarized or not.

A study on the genetic algorithms for the scheduling of parallel computation (병렬계산의 스케쥴링에 있어서 유전자알고리즘에 관한 연구)

  • 성기석;박지혁
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1997.10a
    • /
    • pp.166-169
    • /
    • 1997
  • For parallel processing, the compiler partitions a loaded program into a set of tasks and makes a schedule for the tasks that will minimize parallel processing time for the loaded program. Building an optimal schedule for a given set of partitioned tasks of a program has known to be NP-complete. In this paper we introduce a GA(Genetic Algorithm)-based scheduling method in which a chromosome consists of two parts of a string which decide the number and order of tasks on each processor. An additional computation is used for feasibility constraint in the chromosome. By granularity theory, a partitioned program is categorized into coarse-grain or fine-grain types. There exist good heuristic algorithms for coarse-grain type partitioning. We suggested another GA adaptive to the coarse-grain type partitioning. The infeasibility of chromosome is overcome by the encoding and operators. The number of processors are decided while the GA find the minimum parallel processing time.

  • PDF