• Title/Summary/Keyword: sequence alignment

Search Result 351, Processing Time 0.023 seconds

Sequence Alignment Algorithm using Quality Information (품질 정보를 이용한 서열 배치 알고리즘)

  • Na, Joong-Chae;Roh, Kang-Ho;Park, Kun-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.578-586
    • /
    • 2005
  • In this Paper we consider the problem of sequence alignment with quality scores. DNA sequences produced by a base-calling program (as part of sequencing) have quality scores which represent the confidence level for individual bases. However, previous sequence alignment algorithms do not consider such quality scores. To solve sequence alignment with quality scores, we propose a measure of an alignment of two sequences with orality scores. We show that an optimal alignment in this measure can be found by dynamic programming.

A Multiple Sequence Alignment Algorithm using Clustering Divergence (콜러스터링 분기를 이용한 다중 서열 정렬 알고리즘)

  • Lee Byung-ll;Lee Jong-Yun;Jung Soon-Key
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.5 s.37
    • /
    • pp.1-10
    • /
    • 2005
  • Multiple sequence alignment(MSA) is a fundamental technique of DNA and Protein sequence analysis. Biological sequences are aligned vertically in order to show the similarities and differences among them. In this Paper, we Propose an effcient group alignment method, which is based on clustering divergency, to Perform the alignment between two groups of sequences. The Proposed algorithm is a clustering divergence(CDMS)-based multiple sequence alignment and a top-down approach. The algorithm builds the tree topology for merging. It is so based on the concept that two sequences having the longest distance should be spilt into two clusters. We expect that our sequence alignment algorithm improves its qualify and speeds up better than traditional algorithm Clustal-W.

  • PDF

Implementation and Application of Multiple Local Alignment (다중 지역 정렬 알고리즘 구현 및 응용)

  • Lee, Gye Sung
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.3
    • /
    • pp.339-344
    • /
    • 2019
  • Global sequence alignment in search of similarity or homology favors larger size of the sequence because it keeps looking for more similar section between two sequences in the hope that it adds up scores for matched part in the rest of the sequence. If a substantial size of mismatched section exists in the middle of the sequence, it greatly reduces the total alignment score. In this case a whole sequence would be better to be divided into multiple sections. Overall alignment score over the multiple sections of the sequence would increase as compared to global alignment. This method is called multiple local alignment. In this paper, we implement a multiple local alignment algorithm, an extension of Smith-Waterman algorithm and show the experimental results for the algorithm that is able to search for sub-optimal sequence.

Development of an efficient sequence alignment algorithm and sequence analysis software

  • Kim, Jin;Hwang, Jae-Joon;Kim, Dong-Hoi;Saangyong Uhmn
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.264-267
    • /
    • 2003
  • Multiple sequence alignment is a useful tool to identify the relationships among protein sequences. Dynamic programming is the most widely used algorithm to obtain multiple sequence alignment with optimal cost. However dynamic programming cannot be applied to certain cost function due its drawback and to produce optimal multiple sequence alignment. We proposed sub-alignment refinement algorithm to overcome the problem of dynamic programming and impelmented this algorithm as a module of our MS Windows-based sequence alignment program.

  • PDF

A DNA Sequence Alignment Algorithm Using Quality Information and a Fuzzy Inference Method (품질 정보와 퍼지 추론 기법을 이용한 DNA 염기 서열 배치 알고리즘)

  • Kim, Kwang-Baek
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.2
    • /
    • pp.55-68
    • /
    • 2007
  • DNA sequence alignment algorithms in computational molecular biology have been improved by diverse methods. In this paper, we proposed a DNA sequence alignment algorithm utilizing quality information and a fuzzy inference method utilizing characteristics of DNA sequence fragments and a fuzzy logic system in order to improve conventional DNA sequence alignment methods using DNA sequence quality information. In conventional algorithms, DNA sequence alignment scores were calculated by the global sequence alignment algorithm proposed by Needleman-Wunsch applying quality information of each DNA fragment. However, there may be errors in the process for calculating DNA sequence alignment scores in case of low quality of DNA fragment tips, because overall DNA sequence quality information are used. In the proposed method, exact DNA sequence alignment can be achieved in spite of low quality of DNA fragment tips by improvement of conventional algorithms using quality information. And also, mapping score parameters used to calculate DNA sequence alignment scores, are dynamically adjusted by the fuzzy logic system utilizing lengths of DNA fragments and frequencies of low quality DNA bases in the fragments. From the experiments by applying real genome data of NCBI (National Center for Biotechnology Information), we could see that the proposed method was more efficient than conventional algorithms using quality information in DNA sequence alignment.

  • PDF

An Efficient Method for Multiple Sequence Alignment using Subalignment Refinement (부분서열정렬 개선 기법을 사용한 효율적인 복수서열정렬에 관한 알고리즘)

  • Kim, Jin;Jung, Woo-Cheol;Uhmn, Saang-Yong
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.9
    • /
    • pp.803-811
    • /
    • 2003
  • Multiple sequence alignment is a useful tool to identify the relationships among protein sequences. Dynamic programming is the most widely used algorithm to obtain multiple sequence alignment with optimal cost. However, dynamic programming cannot be applied to certain cost function due to its drawback and cannot be used to produce optimal multiple sequence alignment. We propose sub-alignment refinement algorithm to overcome the problem of dynamic programming. Also we show proposed algorithm can solve the problem of dynamic programming efficiently.

Multiple Sequence Aligmnent Genetic Algorithm (진화 알고리즘을 사용한 복수 염기서열 정렬)

  • Kim, Jin;Song, Min-Dong;Choi, Hong-Sik;Chang, Yeon-Ah
    • Korean Journal of Microbiology
    • /
    • v.35 no.2
    • /
    • pp.115-120
    • /
    • 1999
  • Multiple Sequence Alignment of DNA and protem sequences is a imnport'mt tool in the study of molecular evolution, gene regulation. and prolein suucture-function relationships. Progressive pairwise alignment method generates multiple sequence alignment fast but not necessarily with optimal costs. Dynamic programming generates multiple sequence alig~~menl with optimal costs in most cases but long execution time. In this paper. we suggest genetlc algorithm lo improve the multiple sequence alignment generated from the cnlent methods, describe the design of the genetic algorithm, and compare the multiple sequence alignments from 0111 method and current methods.

  • PDF

A new method to predict the protein sequence alignment quality (단백질 서열정렬 정확도 예측을 위한 새로운 방법)

  • Lee, Min-Ho;Jeong, Chan-Seok;Kim, Dong-Seop
    • Bioinformatics and Biosystems
    • /
    • v.1 no.1
    • /
    • pp.82-87
    • /
    • 2006
  • The most popular protein structure prediction method is comparative modeling. To guarantee accurate comparative modeling, the sequence alignment between a query protein and a template should be accurate. Although choosing the best template based on the protein sequence alignments is most critical to perform more accurate fold-recognition in comparative modeling, even more critical is the sequence alignment quality. Contrast to a lot of attention to developing a method for choosing the best template, prediction of alignment accuracy has not gained much interest. Here, we develop a method for prediction of the shift score, a recently proposed measure for alignment quality. We apply support vector regression (SVR) to predict shift score. The alignment between a query protein and a template protein of length n in our own library is transformed into an input vector of length n +2. Structural alignments are assumed to be the best alignment, and SVR is trained to predict the shift score between structural alignment and profile-profile alignment of a query protein to a template protein. The performance is assessed by Pearson correlation coefficient. The trained SVR predicts shift score with the correlation between observed and predicted shift score of 0.80.

  • PDF

A Python-based educational software tool for visualizing bioinformatics alignment algorithms

  • Elis Khatizah;Hee-Jo Nam;Hyun-Seok Park
    • Genomics & Informatics
    • /
    • v.21 no.1
    • /
    • pp.15.1-15.4
    • /
    • 2023
  • Bioinformatics education can be defined as the teaching and learning of how to use software tools, along with mathematical and statistical analysis, to solve biological problems. Although many resources are available, most students still struggle to understand even the simplest sequence alignment algorithms. Applying visualizations to these topics benefits both lecturers and students. Unfortunately, educational software for visualizing step-by-step processes in the user experience of sequence alignment algorithms is rare. In this article, an educational visualization tool for biological sequence alignment is presented, and the source code is released in order to encourage the collaborative power of open-source software, with the expectation of further contributions from the community in the future. Two different modules are integrated to enable a student to investigate the characteristics of alignment algorithms.

The Sequence Labeling Approach for Text Alignment of Plagiarism Detection

  • Kong, Leilei;Han, Zhongyuan;Qi, Haoliang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.9
    • /
    • pp.4814-4832
    • /
    • 2019
  • Plagiarism detection is increasingly exploiting text alignment. Text alignment involves extracting the plagiarism passages in a pair of the suspicious document and its source document. The heuristics have achieved excellent performance in text alignment. However, the further improvements of the heuristic methods mainly depends more on the experiences of experts, which makes the heuristics lack of the abilities for continuous improvements. To address this problem, machine learning maybe a proper way. Considering the position relations and the context of text segments pairs, we formalize the text alignment task as a problem of sequence labeling, improving the current methods at the model level. Especially, this paper proposes to use the probabilistic graphical model to tag the observed sequence of pairs of text segments. Hence we present the sequence labeling approach for text alignment in plagiarism detection based on Conditional Random Fields. The proposed approach is evaluated on the PAN@CLEF 2012 artificial high obfuscation plagiarism corpus and the simulated paraphrase plagiarism corpus, and compared with the methods achieved the best performance in PAN@CLEF 2012, 2013 and 2014. Experimental results demonstrate that the proposed approach significantly outperforms the state of the art methods.