Browse > Article

FASIM: Fragments Assembly Simulation using Biased-Sampling Model and Assembly Simulation for Microbial Genome Shotgun Sequencing  

Hur Cheol-Goo (Division of Genomics and Proteomics, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Bioinformatics Cooperative Course, Pusan National University)
Kim Sunny (Division of Genomics and Proteomics, Korea Research Institute of Bioscience and Biotechnology (KRIBB))
Kim Chang-Hoon (Division of Genomics and Proteomics, Korea Research Institute of Bioscience and Biotechnology (KRIBB))
Yoon Sung-Ho (Division of Genomics and Proteomics, Korea Research Institute of Bioscience and Biotechnology (KRIBB))
In Yong-Ho (Bioinfomatix, Inc.)
Kim Cheol-Min (Bioinformatics Cooperative Course, Pusan National University)
Cho Hwan-Gue (Bioinformatics Cooperative Course, Pusan National University)
Publication Information
Journal of Microbiology and Biotechnology / v.16, no.5, 2006 , pp. 683-688 More about this Journal
Abstract
We have developed a program for generating shotgun data sets from known genome sequences. Generation of synthetic data sets by computer program is a useful alternative to real data to which students and researchers have limited access. Uniformly-distributed-sampling clones that were adopted by previous programs cannot account for the real situation where sampled reads tend to come from particular regions of the target genome. To reflect such situation, a probabilistic model for biased sampling distribution was developed by using an experimental data set derived from a microbial genome project. Among the experimental parameters tested (varied fragment or read lengths, chimerism, and sequencing error), the extent of sequencing error was the most critical factor that hampered sequence assembly. We propose that an optimum sequencing strategy employing different insert lengths and redundancy can be established by performing a variety of simulations.
Keywords
Fragments assembly simulation; sampling model; genome; shotgun sequencing;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
Times Cited By Web Of Science : 0  (Related Records In Web of Science)
연도 인용수 순위
1 Lee, D.-H., W. J. Jun, J. W. Yoon, H. Y. Cho, and B. S. Hong. 2004. Process strategies to enhance the production of 5-aminolevulinic acid with recombinant E. coli. J. Microbiol. Biotechnol. 16: 1310-1317
2 Lim, S. Y., K. H. Yong, and S. Y. Ry. 2005. Analysis of Salmonella pathogenicity island 1 expression in response to the changes of osmolarity. J. Microbiol. Biotechnol. 15: 175-182   과학기술학회마을   DOI   ScienceOn
3 Pop, M., S. Salzberg, and M. Shumway. 2002. Genome sequence assembly: Algorithms and issues. IEEE Computer 35: 47-54
4 Engle, M. L. and C. Burks. 1994. Artificially generated data sets for testing DNA sequence assembly algorithms. Genomics 16: 286-288   DOI   ScienceOn
5 Kim, H. W., K. M. Kim, E. J. Ko, S. K. Lee, S. D. Ha, K. B. Song, S. K. Park, K. S. Kwon, and D. H. Bae. 2004. Development of antimicrobial edible film from defatted soybean meal fermented by Bacillus subtilis. J. Microbiol. Biotechnol. 14: 1303-1309
6 Roach, J. C., C. Boysen, K. Wang, and L. Hood. 1995. Pairwise end sequencing: A unified approach to genomic mapping and sequencing. Genomics 26: 345-353   DOI   ScienceOn
7 Lee, P. C., S. Y. Lee, S. H. Hong, and H. N. Chang. 2002. Isolation and characterization of a new succinic acid-producing bacterium, Mannheimia succiniciproducens MBEL55E, from bovine rumen. Appl. Microbiol. Biotechnol. 58: 663-668   DOI
8 Myers, G. 1999. A dataset generator for whole genome shotgun sequencing. Proc. Int. Conf. Intell. Syst. Mol. Biol. pp. 202-210
9 Ewing, B., L. Hillier, M. Wendl, and P. Green. 1998. Basecalling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8: 175-185   DOI
10 Huang, X. and A. Madan. 1999. CAP3: A DNA sequence assembly program. Genome Res. 9: 868-877   DOI
11 Lander, E. S. and M. S. Waterman. 1988. Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics 2: 231-239   DOI
12 Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389-3402   DOI
13 May, B. J., Q. Zhang, L. L. Li, M. L. Paustian, T. S. Whittam, and V. Kapur. 2001. Complete genomic sequence of Pasteurella multocida, Pm70. Proc. Natl. Acad. Sci. USA 98: 3460-3465
14 Kang, S. A., J. C. Lee, Y. M. Park, C. Lee, S. H. Kim, B. I. Chang, C. H. Kim, J. W. Seo, S. K. Rhee, S. J. Jung, S. M. Kim, S. K. Park, and K. I. Jang. 2004. Secretory production of Rahnella aquatilis ATCC 33071 levansucrase expressed in Escherichia coli. J. Microbiol. Biotechnol. 14: 1232-1238
15 Myers, G. 1999. Whole-genome DNA sequencing. Comput. Sci. Eng. 1: 33-43   DOI   ScienceOn