• Title/Summary/Keyword: DNA strings

Search Result 14, Processing Time 0.022 seconds

Fast Construction of Suffix Arrays for DNA Strings (DNA 스트링에 대하여 써픽스 배열을 구축하는 빠른 알고리즘)

  • Jo, Jun-Ha;Kim, Nam-Hee;Kwon, Ki-Ryong;Kim, Dong-Kyue
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.8
    • /
    • pp.319-326
    • /
    • 2007
  • To perform fast searching in massive data such as DNA strings, the most efficient method is to construct full-text index data structures of given strings. The widely used full-text index structures are suffix trees and suffix arrays. Since the suffix may uses less space than the suffix tree, the suffix array is proper for DNA strings. Previously developed construction algorithms of suffix arrays are not suitable for DNA strings since those are designed for integer alphabets. We propose a fast algorithm to construct suffix arrays on DNA strings whose alphabet sizes are fixed by 4. We reduce the construction time by improving encoding and merging steps on Kim et al.[1]'s algorithm. Experimental results show that our algorithm constructs suffix arrays on DNA strings 1.3-1.6 times faster than Kim et al.'s algorithm, and also for other algorithms in most cases.

A GENERALIZED 4-STRING SOLUTION TANGLE OF DNA-PROTEIN COMPLEXES

  • Kim, Soo-Jeong
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.15 no.3
    • /
    • pp.161-175
    • /
    • 2011
  • An n-string tangle is a three dimensional ball with n strings properly embedded in it. A tangle model of a DNA-protein complex is first introduced by C. Ernst and D. Sumners in 1980's. They assumed the protein bound DNA as strings and the protein as a three dimensional ball. By using a tangle analysis, one can predict the topology of DNA within the complex. S.Kim and I. Darcy developed the biologically reasonable 4-string tangle equations and decided a solution tangle, called R-standard tangle. The author discussed more about the simple solution tangles of the equations and found a generalized R-standard tangle solution.

Approximate Periods of Strings based on Distance Sum for DNA Sequence Analysis (DNA 서열분석을 위한 거리합기반 문자열의 근사주기)

  • Jeong, Ju Hui;Kim, Young Ho;Na, Joong Chae;Sim, Jeong Seop
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.119-122
    • /
    • 2013
  • Repetitive strings such as periods have been studied vigorously in so diverse fields as data compression, computer-assisted music analysis, bioinformatics, and etc. In bioinformatics, periods are highly related to repetitive patterns in DNA sequences so called tandem repeats. In some cases, quite similar but not the same patterns are repeated and thus we need approximate string matching algorithms to study tandem repeats in DNA sequences. In this paper, we propose a new definition of approximate periods of strings based on distance sum. Given two strings $p({\mid}p{\mid}=m)$ and $x({\mid}x{\mid}=n)$, we propose an algorithm that computes the minimum approximate period distance based on distance sum. Our algorithm runs in $O(mn^2)$ time for the weighted edit distance, and runs in O(mn) time for the edit distance, and runs in O(n) time for the Hamming distance.

DNA coding-Based Fuzzy System Modeling for Chaotic Systems (DNA 코딩 기반 카오스 시스템의 퍼지 모델링)

  • Kim, Jang-Hyun;Joo, Young-Hoon;Park, Jin-Bae
    • Proceedings of the KIEE Conference
    • /
    • 1999.11c
    • /
    • pp.524-526
    • /
    • 1999
  • In the construction of successful fuzzy models and/or controllers for nonlinear systems, the identification of a good fuzzy inference system is an important yet difficult problem, which is traditionally accomplished by a time-consuming trial-and-error process. In this paper, we propose a systematic identification procedure for complex multi-input single-output nonlinear systems with DNA coding method. A DNA coding method is optimization algorithm based on biological DNA as conventional genetic algorithms(GAs) are. The strings in the DNA coding method are variable-length strings, while standard GAs work with a fixed-length coding scheme. the DNA coding method is well suited to learning because it allows a flexible representation of a fuzzy inference system. We also propose a new coding method fur applying the DNA coding method to the identification of fuzzy models. This coding scheme can effectively represent the zero-order Takagi-Sugeno(TS) fuzzy model. To acquire optimal TS fuzzy model with higher accuracy and economical size, we use the DNA coding method to optimize the parameters and the number of fuzzy inference system. In order to demonstrate the superiority and efficiency of the proposed scheme, we finally show its application to a Duffing-forced oscillation system.

  • PDF

Automatic Reading System for On-off Type DNA Chip

  • Ryu, Mun-Ho;Kim, Jong-Dae;Kim, Jong-Won
    • Journal of Information Processing Systems
    • /
    • v.2 no.3 s.4
    • /
    • pp.189-193
    • /
    • 2006
  • In this study we propose an automatic reading system for diagnostic DNA chips. We define a general specification for an automatic reading system and propose a possible implementation method. The proposed system performs the whole reading process automatically without any user intervention, covering image acquisition, image analysis, and report generation. We applied the system for the automatic report generation of a commercialized DNA chip for cervical cancer detection. The fluorescence image of the hybridization result was acquired with a $GenePix^{TM}$ scanner using its library running in HTML pages. The processing of the acquired image and the report generation were executed by a component object module programmed with Microsoft Visual C++ 6.0. To generate the report document, we made an HWP 2002 document template with marker strings that were supposed to be searched and replaced with the corresponding information such as patient information and diagnosis results. The proposed system generates the report document by reading the template and changing the marker strings with the resultant contents. The system is expected to facilitate the usage of a diagnostic DNA chip for mass screening by the automation of a conventional manual reading process, shortening its processing time, and quantifying the reading criteria.

Global Optimum Searching Technique of Multi-Modal Function Using DNA Coding Method (DNA 코딩을 이용한 multi-modal 함수의 최적점 탐색방법)

  • 백동화;강환일;김갑일;한승수
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.225-228
    • /
    • 2001
  • DNA computing has been applied to the problem of getting an optimal solution since Adleman's experiment. DNA computing uses strings with various length and four-type bases that makes more useful for finding a global optimal solutions of the complex multi-modal problems. This paper presents DNA coding method for finding optimal solution of the multi-modal function and compares the efficiency of this method with the genetic algorithms (GA). GA searches effectively an optimal solution via the artificial evolution of individual group of binary string and DNA coding method uses a tool of calculation or Information store with DNA molecules and four-type bases denoted by the symbols of A(Ademine), C(Cytosine), G(Guanine) and T(Thymine). The same operators, selection, crossover, mutation, are applied to the both DNA coding algorithm and genetic algorithms. The results show that the DNA based algorithm performs better than GA.

  • PDF

TOPOLOGICAL ANALYSIS OF MU-TRANSPOSITION

  • Kim, Soojeong
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.17 no.2
    • /
    • pp.87-102
    • /
    • 2013
  • An n-string tangle is a three dimensional ball with n-strings which are properly embedded in the ball. In early 90's, C. Ernst and D. Sumners first used a tangle to describe a DNA-protein complex. In this model, DNA is represented by a string and protein is represented by a ball. Mu is a protein which binds to DNA at three sites and a DNA-Mu complex is called Mu-transpososome. Knowing the DNA topology within Mu-transpososome is very important to understand DNA transposition by Mu protein. In 2002, Pathania et al. determined that the DNA configuration within the Mu transpososome is three branched and five noded [12]. In 2007, Darcy et al. analyzed this by using mathematical tangle and concluded that the three branched and five noded DNA configuration is the only biologically reasonable solution [4]. In this paper, based on the result of Pathania et al. and Darcy et al., the author determines the DNA topology within the DNA-Mu complex after the whole Mu transposition process. Furthermore, a new experiment is designed which can support the Pathania et al.'s result. The result of this new experiment is predicted through mathematical knot thory.

Suffix Tree Constructing Algorithm for Large DNA Sequences Analysis (대용량 DNA서열 처리를 위한 서픽스 트리 생성 알고리즘의 개발)

  • Choi, Hae-Won
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.15 no.1
    • /
    • pp.37-46
    • /
    • 2010
  • A Suffix Tree is an efficient data structure that exposes the internal structure of a string and allows efficient solutions to a wide range of complex string problems, in particular, in the area of computational biology. However, as the biological information explodes, it is impossible to construct the suffix trees in main memory. We should find an efficient technique to construct the trees in a secondary storage. In this paper, we present a method for constructing a suffix tree in a disk for large set of DNA strings using new index scheme. We also show a typical application example with a suffix tree in the disk.

An Efficient Local Alignment Algorithm for DNA Sequences including N and X (N과 X를 포함하는 DNA 서열을 위한 효율적인 지역정렬 알고리즘)

  • Kim, Jin-Wook
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.3
    • /
    • pp.275-280
    • /
    • 2010
  • A local alignment algorithm finds a substring pair of given two strings where two substrings of the pair are similar to each other. A DNA sequence can consist of not only A, C, G, and T but also N and X where N and X are used when the original bases lose their information for various reasons. In this paper, we present an efficient local alignment algorithm for two DNA sequences including N and X using the affine gap penalty metric. Our algorithm is an extended version of the Kim-Park algorithm and can be extended in case of including other characters which have similar properties to N and X.

Global Optimum Searching Technique Using DNA Coding and Evolutionary Computing (DNA 코딩과 진화연산을 이용한 함수의 최적점 탐색방법)

  • Paek, Dong-Hwa;Kang, Hwan-Il;Kim, Kab-Il;Han, Seung-Soo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.6
    • /
    • pp.538-542
    • /
    • 2001
  • DNA computing has been applied to the problem of getting an optimal soluting since Adleman's experiment. DNA computing uses strings with various length and four-type bases that makes more useful for finding a global optimal solutions of the complex multi-modal problems This paper presents DNA coding method finding optimal solution of the multi-modal function and compares the efficiency of this method with the genetic algorithms(GA). GA searches efffectively an optimal solution via the artificial evolution of individual group of binary string and DNA coding method uses DNA molecules and four-type bases denoted by the A(Ademine) C(Gytosine);G(Guanine)and T(Thymine). The selection, crossover, mutation operators are applied to both DNA coding algorithm and genetic algorithms and the comparison has been performed. The results show that the DNA based algorithm performs better than GA.

  • PDF