Haplotype Assembly from Weighted SNP Fragments and Related Genotype Information

신뢰도를 가진 SNP 단편들과 유전자형으로부터 일배체형 조합

  • Published : 2008.12.15

Abstract

The Minimum Letter Flips (MLF) model and the Weighted Minimum Letter Flips (WMLF) model are for solving the haplotype assembly problem. But these two models are effective only when the error rate in SNP fragments is low. In this paper, we first establish a new computational model that employs the related genotype information as an improvement of the WMLF model and show its NP-hardness, and then propose an efficient genetic algorithm to solve the haplotype assembly problem. The results of experiments on random data set and a real data set indicate that the introduction of genotype information to the WMLF model is quite effective in improving the reconstruction rate especially when the error rate in SNP fragments is high. And the results also show that genotype information increases the convergence speed of the genetic algorithm.

Minimum Letter Flips(MLF) 모델과 Weighted Minimum Letter Flips(WMLF) 모델은 일배체형 조합문제(haplotype assembly problem)를 해결하기 위한 모델들이다. 그러나 MLF 모델이나 WMLF 모델은 SNP(Single Nucleotide Polymorphism) 단편들에 손실과 오류가 적은 경우에만 효과적이다. 본 논문은 WMLF 모델의 개선을 목적으로 유전자형 정보를 추가한 WMLF/GI 모델과 문제를 제시한다. 새로 제시한 문제가 NP-hard임을 증명하고, 정확성이 높고 효율적인 문제 해결을 위해 유전자 알고리즘을 설계한다. 실험 결과를 통해 새로운 모델이 기존의 모델들에 비해 SNP 단편들에 손실과 오류가 많은 경우에도 높은 정확성을 가짐과 유전자형 정보가 유전자 알고리즘의 수렴속도를 크게 개선함을 보인다.

Keywords

References

  1. R. S. Wang, L. Y. Wu, Z. P. Li, and X. S. Zhang, "Haplotype reconstruction from SNP fragments by minimum error correction," Bioinformatics, Vol. 21, No. 10, pp. 2456-2462, 2005 https://doi.org/10.1093/bioinformatics/bti352
  2. J. D Terwilliger and K. M Weiss, "Linkage disequilibrium mapping of complex disease: fantasy or reality?," Current Opinion in Biotechnology, Vol. 9, No. 6, pp. 578-594, 1998 https://doi.org/10.1016/S0958-1669(98)80135-3
  3. J. C. Stephens, et al, "Haplotype variation and linkage disequilibrium in 313 human genes," Science, vol. 293, pp. 489-493, 2001 https://doi.org/10.1126/science.1059431
  4. X. S. Zhang, R. S. Wang, L. Y. Wu, and L. Chen, "Models and Algorithms for Haplotyping Problem," Current Bioinformatics, Vol. 1, pp. 105-114, 2006 https://doi.org/10.2174/157489306775330570
  5. H. J. Greenberg, W. E. Hart, and G. Lancia, "Opportunities for Combinatorial Optimization in Computational Biology," INFORMS Journal on Computing, Vol. 16, No. 3, pp. 211-231, 2004 https://doi.org/10.1287/ijoc.1040.0073
  6. R. Cilibrasi, L. V. Iersel, S. Kelk, and J. Tromp, "On the complexity of Several Haplotyping Problem," 5th Workshop on Algorithms in Bioinformatics(WABI), LNBI 3692, pp. 128-139, 2005
  7. R. Rizzi, V. Bafna, S. Istrail, and G. Lancia, "Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem," 2nd Workshop on Algorithms in Bioinformatics(WABI), LNCS 2452, pp. 29-43, 2002
  8. Y. Y. Zhao, L. Y. Wu, J. H. Zhang, R. S. Wang, and X. S. Zhang, "Haplotype assembly from aligned weighted SNP fragments," Computational Biology and Chemistry, Vol. 29, pp. 281-287, 2005 https://doi.org/10.1016/j.compbiolchem.2005.05.001
  9. Y. Wang, E. Feng, R. Wang, and D. Zhang, "The haplotype assembly model with genotype information and iterative local-exhaustive search algorithm," Computational Biology and Chemistry, Vol. 31, pp. 288-293, 2007 https://doi.org/10.1016/j.compbiolchem.2007.03.012
  10. X. S. Zhang, R. S. Wang, L. Y. Wu, and W. Zhang, "Minimum Conflict Individual Haplotyping from SNP Fragments and Related Genotype," Evolutionary Bioinformatics Online, Vol. 2, pp. 271-280, 2006
  11. S. H. Kang, I. S. Jeong, M. H. Choi, and H. S. Lim, "Haplotype Assembly from Weighted SNP Fragments and Related Genotype Information," Frontiers in Algorithmics Workshop(FAW) 2008, LNCS 5059, pp. 45-54, 2008
  12. D. E. Goldberg, Genetic Algorithms in serarch, Optimization and Machine Learning, Addison- Wesley, 1989
  13. M. J. Daly, J. D. Rioux, S. F. Schaffner, T. J. Hudson, and E. S. Lander, "High-resolution haplotype structure in the human genome," Nature Genetics 29, pp. 229-232, 2001 https://doi.org/10.1038/ng1001-229