• Title/Summary/Keyword: DNA sequence alignment

Search Result 157, Processing Time 0.019 seconds

A DNA Sequence Alignment Algorithm Using Quality Information and a Fuzzy Inference Method (품질 정보와 퍼지 추론 기법을 이용한 DNA 염기 서열 배치 알고리즘)

  • Kim, Kwang-Baek
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.2
    • /
    • pp.55-68
    • /
    • 2007
  • DNA sequence alignment algorithms in computational molecular biology have been improved by diverse methods. In this paper, we proposed a DNA sequence alignment algorithm utilizing quality information and a fuzzy inference method utilizing characteristics of DNA sequence fragments and a fuzzy logic system in order to improve conventional DNA sequence alignment methods using DNA sequence quality information. In conventional algorithms, DNA sequence alignment scores were calculated by the global sequence alignment algorithm proposed by Needleman-Wunsch applying quality information of each DNA fragment. However, there may be errors in the process for calculating DNA sequence alignment scores in case of low quality of DNA fragment tips, because overall DNA sequence quality information are used. In the proposed method, exact DNA sequence alignment can be achieved in spite of low quality of DNA fragment tips by improvement of conventional algorithms using quality information. And also, mapping score parameters used to calculate DNA sequence alignment scores, are dynamically adjusted by the fuzzy logic system utilizing lengths of DNA fragments and frequencies of low quality DNA bases in the fragments. From the experiments by applying real genome data of NCBI (National Center for Biotechnology Information), we could see that the proposed method was more efficient than conventional algorithms using quality information in DNA sequence alignment.

  • PDF

Sequence Alignment Algorithm using Quality Information (품질 정보를 이용한 서열 배치 알고리즘)

  • Na, Joong-Chae;Roh, Kang-Ho;Park, Kun-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.578-586
    • /
    • 2005
  • In this Paper we consider the problem of sequence alignment with quality scores. DNA sequences produced by a base-calling program (as part of sequencing) have quality scores which represent the confidence level for individual bases. However, previous sequence alignment algorithms do not consider such quality scores. To solve sequence alignment with quality scores, we propose a measure of an alignment of two sequences with orality scores. We show that an optimal alignment in this measure can be found by dynamic programming.

Multiple Sequence Aligmnent Genetic Algorithm (진화 알고리즘을 사용한 복수 염기서열 정렬)

  • Kim, Jin;Song, Min-Dong;Choi, Hong-Sik;Chang, Yeon-Ah
    • Korean Journal of Microbiology
    • /
    • v.35 no.2
    • /
    • pp.115-120
    • /
    • 1999
  • Multiple Sequence Alignment of DNA and protem sequences is a imnport'mt tool in the study of molecular evolution, gene regulation. and prolein suucture-function relationships. Progressive pairwise alignment method generates multiple sequence alignment fast but not necessarily with optimal costs. Dynamic programming generates multiple sequence alig~~menl with optimal costs in most cases but long execution time. In this paper. we suggest genetlc algorithm lo improve the multiple sequence alignment generated from the cnlent methods, describe the design of the genetic algorithm, and compare the multiple sequence alignments from 0111 method and current methods.

  • PDF

A Pattern Summary System Using BLAST for Sequence Analysis

  • Choi, Han-Suk;Kim, Dong-Wook;Ryu, Tae-W.
    • Genomics & Informatics
    • /
    • v.4 no.4
    • /
    • pp.173-181
    • /
    • 2006
  • Pattern finding is one of the important tasks in a protein or DNA sequence analysis. Alignment is the widely used technique for finding patterns in sequence analysis. BLAST (Basic Local Alignment Search Tool) is one of the most popularly used tools in bio-informatics to explore available DNA or protein sequence databases. BLAST may generate a huge output for a large sequence data that contains various sequence patterns. However, BLAST does not provide a tool to summarize and analyze the patterns or matched alignments in the BLAST output file. BLAST lacks of general and robust parsing tools to extract the essential information out from its output. This paper presents a pattern summary system which is a powerful and comprehensive tool for discovering pattern structures in huge amount of sequence data in the BLAST. The pattern summary system can identify clusters of patterns, extract the cluster pattern sequences from the subject database of BLAST, and display the clusters graphically to show the distribution of clusters in the subject database.

A Multiple Sequence Alignment Algorithm using Clustering Divergence (콜러스터링 분기를 이용한 다중 서열 정렬 알고리즘)

  • Lee Byung-ll;Lee Jong-Yun;Jung Soon-Key
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.5 s.37
    • /
    • pp.1-10
    • /
    • 2005
  • Multiple sequence alignment(MSA) is a fundamental technique of DNA and Protein sequence analysis. Biological sequences are aligned vertically in order to show the similarities and differences among them. In this Paper, we Propose an effcient group alignment method, which is based on clustering divergency, to Perform the alignment between two groups of sequences. The Proposed algorithm is a clustering divergence(CDMS)-based multiple sequence alignment and a top-down approach. The algorithm builds the tree topology for merging. It is so based on the concept that two sequences having the longest distance should be spilt into two clusters. We expect that our sequence alignment algorithm improves its qualify and speeds up better than traditional algorithm Clustal-W.

  • PDF

Implementation of Parallel Local Alignment Method for DNA Sequence using Apache Spark (Apache Spark을 이용한 병렬 DNA 시퀀스 지역 정렬 기법 구현)

  • Kim, Bosung;Kim, Jinsu;Choi, Dojin;Kim, Sangsoo;Song, Seokil
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.10
    • /
    • pp.608-616
    • /
    • 2016
  • The Smith-Watrman (SW) algorithm is a local alignment algorithm which is one of important operations in DNA sequence analysis. The SW algorithm finds the optimal local alignment with respect to the scoring system being used, but it has a problem to demand long execution time. To solve the problem of SW, some methods to perform SW in distributed and parallel manner have been proposed. The ADAM which is a distributed and parallel processing framework for DNA sequence has parallel SW. However, the parallel SW of the ADAM does not consider that the SW is a dynamic programming method, so the parallel SW of the ADAM has the limit of its performance. In this paper, we propose a method to enhance the parallel SW of ADAM. The proposed parallel SW (PSW) is performed in two phases. In the first phase, the PSW splits a DNA sequence into the number of partitions and assigns them to multiple nodes. Then, the original Smith-Waterman algorithm is performed in parallel at each node. In the second phase, the PSW estimates the portion of data sequence that should be recalculated, and the recalculation is performed on the portions in parallel at each node. In the experiment, we compare the proposed PSW to the parallel SW of the ADAM to show the superiority of the PSW.

Effective Biological Sequence Alignment Method using Divide Approach

  • Choi, Hae-Won;Kim, Sang-Jin;Pi, Su-Young
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.6
    • /
    • pp.41-50
    • /
    • 2012
  • This paper presents a new sequence alignment method using the divide approach, which solves the problem by decomposing sequence alignment into several sub-alignments with respect to exact matching subsequences. Exact matching subsequences in the proposed method are bounded on the generalized suffix tree of two sequences, such as protein domain length more than 7 and less than 7. Experiment results show that protein sequence pairs chosen in PFAM database can be aligned using this method. In addition, this method reduces the time about 15% and space of the conventional dynamic programming approach. And the sequences were classified with 94% of accuracy.

An Efficient Local Alignment Algorithm for DNA Sequences including N and X (N과 X를 포함하는 DNA 서열을 위한 효율적인 지역정렬 알고리즘)

  • Kim, Jin-Wook
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.3
    • /
    • pp.275-280
    • /
    • 2010
  • A local alignment algorithm finds a substring pair of given two strings where two substrings of the pair are similar to each other. A DNA sequence can consist of not only A, C, G, and T but also N and X where N and X are used when the original bases lose their information for various reasons. In this paper, we present an efficient local alignment algorithm for two DNA sequences including N and X using the affine gap penalty metric. Our algorithm is an extended version of the Kim-Park algorithm and can be extended in case of including other characters which have similar properties to N and X.

Molecular Cloning of a cDNA Encoding a Ferritin Subunit from the Spider, Araneus ventricosus

  • Jin, Byung-Rea;Han, Ji-Hee;Kim, Seong-Ryul;Sohn, Hung-Dae
    • International Journal of Industrial Entomology and Biomaterials
    • /
    • v.4 no.2
    • /
    • pp.163-168
    • /
    • 2002
  • We report for the first time the cDNA sequence encoding a ferritin subunit from the spiders Araneus ventricosus. The complete cDNA sequence of A. ventricosus ferritin subunit comprised 516 bp with 172 amino acid residues. The A. ventricosus ferritin subunit cDNA contained a conserved iron responsive element sequence in the 5 untranslated region. An alignment of the deduced protein sequence of the A. ventricosus ferritin subunit gene to that of other heavy chain ferritin molecules showed that A. ventricosus ferritin subunit is most similar to the great pond snail, Lymnaea stagnalis, ferritin with 70.2% of protein sequence identity.

A Local Alignment Algorithm using Normalization by Functions (함수에 의한 정규화를 이용한 local alignment 알고리즘)

  • Lee, Sun-Ho;Park, Kun-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.5_6
    • /
    • pp.187-194
    • /
    • 2007
  • A local alignment algorithm does comparing two strings and finding a substring pair with size l and similarity s. To find a pair with both sufficient size and high similarity, existing normalization approaches maximize the ratio of the similarity to the size. In this paper, we introduce normalization by functions that maximizes f(s)/g(l), where f and g are non-decreasing functions. These functions, f and g, are determined by experiments comparing DNA sequences. In the experiments, our normalization by functions finds appropriate local alignments. For the previous algorithm, which evaluates the similarity by using the longest common subsequence, we show that the algorithm can also maximize the score normalized by functions, f(s)/g(l) without loss of time.