• Title/Summary/Keyword: string algorithms

Search Result 105, Processing Time 0.023 seconds

An Analysis System for Whole Genomic Sequence Using String B-Tree (스트링 B-트리를 이용한 게놈 서열 분석 시스템)

  • Choe, Jeong-Hyeon;Jo, Hwan-Gyu
    • The KIPS Transactions:PartA
    • /
    • v.8A no.4
    • /
    • pp.509-516
    • /
    • 2001
  • As results of many genome projects, genomic sequences of many organisms are revealed. Various methods such as global alignment, local alignment are used to analyze the sequences of the organisms, and k -mer analysis is one of the methods for analyzing the genomic sequences. The k -mer analysis explores the frequencies of all k-mers or the symmetry of them where the k -mer is the sequenced base with the length of k. However, existing on-memory algorithms are not applicable to the k -mer analysis because a whole genomic sequence is usually a large text. Therefore, efficient data structures and algorithms are needed. String B-tree is a good data structure that supports external memory and fits into pattern matching. In this paper, we improve the string B-tree in order to efficiently apply the data structure to k -mer analysis, and the results of k -mer analysis for C. elegans and other 30 genomic sequences are shown. We present a visualization system which enables users to investigate the distribution and symmetry of the frequencies of all k -mers using CGR (Chaotic Game Representation). We also describe the method to find the signature which is the part of the sequence that is similar to the whole genomic sequence.

  • PDF

An Efficient Suffix Tree Reconstructing Algorithm for Biological Sequence Analysis (DNA 분석에 효율적인 서픽스 트리 재구성 알고리즘)

  • Choi, Hae-Won;Jung, Young-Seok;Kim, Sang-Jin
    • Journal of Digital Convergence
    • /
    • v.12 no.12
    • /
    • pp.265-275
    • /
    • 2014
  • This paper introduces a new algorithms for reconstructing the suffix tree of character string, when a substring id deleted from the string or a string is inserted into the string as a substring. The algorithem has two main functions, delete-structure and insert-structure. The main objective of this algorithm is to save the time for constructing the suffix tree of an edited string, when the suffix tree of the original string is available. We tested the performance of this algorithm with some DNA sequences. This test shows that delete-reconstructing can save time when the length of the subsequence deleted is less than 30% of the original sequence, and the insert-reconstructing takes less time with regard to the length of inserted sequence.

Performance analysis of symbiotic evolutionary algorithms according to partner selection strategies (공생 파트너 선택전략에 따른 공생진화 알고리듬의 성능 분석)

  • 김재윤;김여근;곽재승;김동묵
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2000.04a
    • /
    • pp.239-242
    • /
    • 2000
  • Symbiotic evolutionary algorithms are stochastic search algorithms that imitate the biological coevolution process through symbiotic interactions. In the algorithms, the fitness evaluation of an individual requires first selecting symbiotic partners of the individual. The symbiotic partner effects the change of individual's fitness and search direction. In this study we are to analyze how much partnering strategies can influence the performance of the algorithms. For this goal extensive experiments are carried out to compare the performance of partnering strategies. The NKC model and the binary string covering problem are used as the test-bed problems. The experimental results indicate that there does not exist statistically significant difference in their performance.

  • PDF

Efficient Construction of Generalized Suffix Arrays by Merging Suffix Arrays (써픽스 배열 합병을 이용한 일반화된 써픽스 배열의 효율적인 구축 알고리즘)

  • Jeon, Jeong-Eun;Park, Heejin;Kim, Dong-Kyue
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.6
    • /
    • pp.268-278
    • /
    • 2005
  • We consider constructing the generalized suffix way of strings A and B when the suffix arrays of A and B are given, j.e., merging two suffix arrays of A and B. There are efficient algorithms to merge some special suffix arrays such as the odd array and the even array. However, for the general case that A and B are arbitrary strings, no efficient merging algorithms have been developed. Thus, one had to construct the generalized suffix arrays of A and B by constructing the suffix array of A$\#$B$\$$ from scratch, even though the suffix ways of A and B are given. In this paper, we Present efficient merging algorithms for the suffix arrays of two arbitrary strings A and B drawn from constant and integer alphabets. The experimental results show that merging two suffix ways of A and B are about 5 times faster than constructing the suffix way of A$\#$B$\$$ from scratch for constant alphabets. Our algorithms include searching all suffixes of string B in the suffix array of A. To do this, we use suffix links in suffix ways and we developed efficient algorithms for computing the suffix links. Efficient computation of suffix links is another contribution of this paper because it can be used to solve other problems occurred in bioinformatics that should search all suffixes of a given string in the suffix array of another string such as computing matching statistics, finding longest common substrings, and so on. The experimental results show that our methods for computing suffix links is about 3-4 times faster than the previous fastest methods.

Parallel Algorithms for Finding δ-approximate Periods and γ-approximate Periods of Strings over Integer Alphabets (정수문자열의 δ-근사주기와 γ-근사주기를 찾는 병렬알고리즘)

  • Kim, Youngho;Sim, Jeong Seop
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.760-766
    • /
    • 2017
  • Repetitive strings have been studied in diverse fields such as data compression, bioinformatics and so on. Recently, two problems of approximate periods of strings over integer alphabets were introduced, finding minimum ${\delta}-approximate$ periods and finding minimum ${\gamma}-approximate$ periods. Both problems can be solved in $O(n^2)$ time when n is the length of the string. In this paper, we present two parallel algorithms for solving the above two problems in O(n) time using $O(n^2)$ threads, respectively. The experimental results show that our parallel algorithms for finding minimum ${\delta}-approximate$ (resp. ${\gamma}-approximate$) periods run approximately 19.7 (resp. 40.08) times faster than the sequential algorithms when n = 10,000.

Genetic Algorithm Using-Floating Point Representation for Steiner Tree (스타이너 트리를 구하기 위한 부동소수점 표현을 이용한 유전자 알고리즘)

  • 김채주;성길영;우종호
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.5
    • /
    • pp.1089-1095
    • /
    • 2004
  • The genetic algorithms have been used to take a near optimal solution because The generation of the optimal Steiner tree from a given network is NP-hard problem,. The chromosomes in genetic algorithm are represented with the floating point representation instead of the existing binary string for solving this problem. A spanning tree was obtained from a given network using Prim's algorithm. Then, the new Steiner point was computed using genetic algorithm with the chromosomes in the floating point representation, and it was added to the tree for approaching the result. After repeating these evolving steps, the near optimal Steiner tree was obtained. Using this method, the tree is quickly and exactly approached to the near optimal Steiner tree compared with the existing genetic algorithms using binary string.

Efficient Inverted List Search Technique using Bitmap Filters (비트맵 필터를 이용한 효율적인 역 리스트 탐색 기법)

  • Kwon, In-Teak;Kim, Jong-Ik
    • The KIPS Transactions:PartD
    • /
    • v.18D no.6
    • /
    • pp.415-422
    • /
    • 2011
  • Finding similar strings is an important operation because textual data can have errors, duplications, and inconsistencies by nature. Many algorithms have been developed for string approximate searches and most of them make use of inverted lists to find similar strings. These algorithms basically perform merge operations on inverted lists. In this paper, we develop a bitmap representation of an inverted list and propose an efficient search algorithm that can skip unnecessary inverted lists without searching using bitmap filters. Experimental results show that the proposed technique consistently improve the performance of the search.

Learning of Adaptive Behavior of artificial Ant Using Classifier System (분류자 시스템을 이용한 인공개미의 적응행동의 학습)

  • 정치선;심귀보
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1998.10a
    • /
    • pp.361-367
    • /
    • 1998
  • The main two applications of the Genetic Algorithms(GA) are the optimization and the machine learning. Machine Learning has two objectives that make the complex system learn its environment and produce the proper output of a system. The machine learning using the Genetic Algorithms is called GA machine learning or genetic-based machine learning (GBML). The machine learning is different from the optimization problems in finding the rule set. In optimization problems, the population of GA should converge into the best individual because optimization problems, the population of GA should converge into the best individual because their objective is the production of the individual near the optimal solution. On the contrary, the machine learning systems need to find the set of cooperative rules. There are two methods in GBML, Michigan method and Pittsburgh method. The former is that each rule is expressed with a string, the latter is that the set of rules is coded into a string. Th classifier system of Holland is the representative model of the Michigan method. The classifier systems arrange the strength of classifiers of classifier list using the message list. In this method, the real time process and on-line learning is possible because a set of rule is adjusted on-line. A classifier system has three major components: Performance system, apportionment of credit system, rule discovery system. In this paper, we solve the food search problem with the learning and evolution of an artificial ant using the learning classifier system.

  • PDF

Accuracy Improvement Methods for String Similarity Measurement in POI(Point Of Interest) Data Retrieval (POI(Point Of Interest) 데이터 검색에서 문자열 유사도 측정 정확도 향상 기법)

  • Ko, EunByul;Lee, JongWoo
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.9
    • /
    • pp.498-506
    • /
    • 2014
  • With the development of smart transportation, people are likely to find their paths by using navigation and map application. However, the existing retrieval system cannot output the correct retrieval result due to the inaccurate query. In order to remedy this problem, set-based POI search algorithm was proposed. Subsequently, additionally a method for measuring POI name similarity and POI search algorithm supporting classifying duplicate characters were proposed. These algorithms tried to compensate the insufficient part of the compensate set-based POI search algorithm. In this paper, accuracy improvement methods for measuring string similarity in POI data retrieval system are proposed. By formulization, similarity measurement scheme is systematized and generalized with the development of transportation. As a result, it improves the accuracy of the retrieval result. From the experimental results, we can observe that our accuracy improvement methods show better performance than the previous algorithms.

Applications of Micro Genetic Algorithms to Engineering Design Optimization (마이크로 유전알고리듬의 최적설계 응용에 관한 연구)

  • Kim, Jong-Hun;Lee, Jong-Soo;Lee, Hyung-Joo;Koo, Bon-Heung
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.27 no.1
    • /
    • pp.158-166
    • /
    • 2003
  • The paper describes the development and application of advanced evolutionary computing techniques referred to as micro genetic algorithms ($\mu$GA) in the context of engineering design optimization. The basic concept behind $\mu$GA draws from the use of small size of population irrespective of the bit string length in the representation of design variable. Such strategies also demonstrate the faster convergence capability and more savings in computational resource requirements than simple genetic algorithms (SGA). The paper first explores ten-bar truss design problems to see the optimization performance between $\mu$GA and SGA. Subsequently, $\mu$GA is applied to a realistic engineering design problem in the injection molding process optimization.