Sequence Alignment Algorithm using Quality Information

품질 정보를 이용한 서열 배치 알고리즘

  • 나중채 (서울대학교 전기, 컴퓨터공학부) ;
  • 노강호 (서울대학교 전기, 컴퓨터공학부) ;
  • 박근수 (서울대학교 전기, 컴퓨터공학부)
  • Published : 2005.12.01

Abstract

In this Paper we consider the problem of sequence alignment with quality scores. DNA sequences produced by a base-calling program (as part of sequencing) have quality scores which represent the confidence level for individual bases. However, previous sequence alignment algorithms do not consider such quality scores. To solve sequence alignment with quality scores, we propose a measure of an alignment of two sequences with orality scores. We show that an optimal alignment in this measure can be found by dynamic programming.

본 논문에서 다루는 문제는 품질 정보를 가지는 서열을 배치(alignment)하는 알고리즘이다. 시퀀싱(sequencing) 작업의 일부인 염기 결정 프로그램(base-calling program)에 의해서 생성되는 DNA 서열은 각 염기가 어느 정도 신뢰할 수 있는 가를 나타내는 품질 정보를 가진다. 그러나 지금까지 개발된 서열 배치 알고리즘들은 이러한 품질 정보를 고려하지 않았다. 본 논문에서는 품질 정보를 가지는 두 서열의 배치를 평가하는 기준을 제시한다. 이 평가 기준에 의한 최적의 서열 배치는 동적 프로그래밍(dynamic programming) 기법에 의해서 찾을 수 있다.

Keywords

References

  1. Waterman, M.S., Introduction to Computational Biology, Champman and Hall, 1995
  2. Gusfield, D., Algorithms on Strings, Trees and Sequences: Computer science and Computational Biology, Cambridge University Press, 1997
  3. Apostolico, A. and Giancarlo, R., 'Sequence Alignment in Molecular Biology,' Journal of Computational Biology 5(2), pp. 173-196, 1998 https://doi.org/10.1089/cmb.1998.5.173
  4. Pevzner, P., Computational Molecular Biology: An Algorithmic Approach, The MIT Press, 2000
  5. Needleman, S.B. and Wunsch, C.D., 'A general method applicable to the search for similarities in the amino acid sequences of two proteins,' Journal of Molecular Biology 48, pp. 443-453, 1970 https://doi.org/10.1016/0022-2836(70)90057-4
  6. Gotoh, O., 'An improved algorithm for matching biological sequences,' Journal of Molecular Biology 162, pp. 705-504, 1982 https://doi.org/10.1016/0022-2836(82)90398-9
  7. Smith, T.F. and Waterman, M.S., Identification of Common Molecular Biology, PWS Publishing Company, 1997
  8. Gusfield, D., 'Efficient methods for multiple sequence alignment with guaranteed error bounds,' Bulletin of Mathematical Biology 55, pp. 141-154, 1993 https://doi.org/10.1007/BF02460299
  9. Hubbard, T., Lesk, A. and Tramontano, A., 'Gathering them into the fold,' Nature Structural Biology 4, pp 313, 1996 https://doi.org/10.1038/nsb0496-313
  10. Zhang, Z., Berman, P., Wiehe, T. and Miller, W., 'Post-processing long pairwise alignments,' Bioinformatics 15(2), pp. 1012-1019, 1999 https://doi.org/10.1093/bioinformatics/15.12.1012
  11. Arslan, A., Egecioglu, O. and Pevzner P., 'A new approach to sequence comparison: Normalized sequence alignment,' Bioinformatics 17(4), pp. 327-337, 2001 https://doi.org/10.1093/bioinformatics/17.4.327
  12. Crochemore M., Landau, G. and Ziv-Ukelson, M., 'A sub-quadratic sequence alignment algorithm for unrestricted cost matrices,' In 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 679-688, 2002
  13. Ewing, B., Hillier, L., Wendl, M.C. and Green, P., 'Base-calling of automated sequencer traces using phred. I. accuracy assessment,' Genome Research 8(3), pp. 175-185, 1998
  14. Green, P., Documentation for phrap, Genome Center, University of Washington, http://www.phrap.org/phrap.docs/phrap.html
  15. Batzoglou, S., Jaffe, D., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J. and Lander E., 'Arachne: A whole-genome shotgun assembler,' Genome Research 12, pp. 177-189, 2002 https://doi.org/10.1101/gr.208902
  16. Jaffe, D., Butler, J., Gnerre, S., Mauceli, E., Lindblan-Toh, K., Mesirov, J., Zody, M. and Lander E., 'Whole-genome sequence assembly for mammalian genomes: Arachne 2,' Genome Research 13, pp. 91-96, 2003 https://doi.org/10.1101/gr.828403