DOI QR코드

DOI QR Code

misMM: An Integrated Pipeline for Misassembly Detection Using Genotyping-by-Sequencing and Its Validation with BAC End Library Sequences and Gene Synteny

  • Ko, Young-Joon (Department of Bioinformatics and Life Science, Soongsil University) ;
  • Kim, Jung Sun (Genomics Division, Department of Agricultural Biotechnology, National Institute of Agricultural Sciences, Rural Development Administration) ;
  • Kim, Sangsoo (Department of Bioinformatics and Life Science, Soongsil University)
  • Received : 2017.10.24
  • Accepted : 2017.11.02
  • Published : 2017.12.31

Abstract

As next-generation sequencing technologies have advanced, enormous amounts of whole-genome sequence information in various species have been released. However, it is still difficult to assemble the whole genome precisely, due to inherent limitations of short-read sequencing technologies. In particular, the complexities of plants are incomparable to those of microorganisms or animals because of whole-genome duplications, repeat insertions, and Numt insertions, etc. In this study, we describe a new method for detecting misassembly sequence regions of Brassica rapa with genotyping-by-sequencing, followed by MadMapper clustering. The misassembly candidate regions were cross-checked with BAC clone paired-ends library sequences that have been mapped to the reference genome. The results were further verified with gene synteny relations between Brassica rapa and Arabidopsis thaliana. We conclude that this method will help detect misassembly regions and be applicable to incompletely assembled reference genomes from a variety of species.

Keywords

References

  1. Comizzoli P, Holt WV. Implications of the Nagoya Protocol for genome resource banks composed of biomaterials from rare and endangered species. Reprod Fertil Dev 2016 Feb 24 [Epub]. https://doi.org/10.1071/RD15429.
  2. Schindel DE, du Plessis P. Biodiversity: reap the benefits of the Nagoya Protocol. Nature 2014;515:37.
  3. Burton JN, Adey A, Patwardhan RP, Qiu R, Kitzman JO, Shendure J. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol 2013;31:1119-1125. https://doi.org/10.1038/nbt.2727
  4. Muggli MD, Puglisi SJ, Ronen R, Boucher C. Misassembly detection using paired-end sequence reads and optical mapping data. Bioinformatics 2015;31:i80-i88. https://doi.org/10.1093/bioinformatics/btv262
  5. Phillippy AM, Schatz MC, Pop M. Genome assembly forensics: finding the elusive mis-assembly. Genome Biol 2008;9:R55. https://doi.org/10.1186/gb-2008-9-3-r55
  6. Zhu X, Leung HC, Wang R, Chin FY, Yiu SM, Quan G, et al. misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads. BMC Bioinformatics 2015;16:386. https://doi.org/10.1186/s12859-015-0818-3
  7. Ko YJ, Kim S. Analysis of nuclear mitochondrial DNA segments of nine plant species: size, distribution, and insertion Loci. Genomics Inform 2016;14:90-95. https://doi.org/10.5808/GI.2016.14.3.90
  8. Freeling M. Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition. Annu Rev Plant Biol 2009;60:433-453. https://doi.org/10.1146/annurev.arplant.043008.092122
  9. Yim HS, Cho YS, Guang X, Kang SG, Jeong JY, Cha SS, et al. Minke whale genome and aquatic adaptation in cetaceans. Nat Genet 2014;46:88-92. https://doi.org/10.1038/ng.2835
  10. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 2011;6:e19379. https://doi.org/10.1371/journal.pone.0019379
  11. Poland J, Endelman J, Dawson J, Rutkoski J, Wu S, Manes Y, et al. Genomic selection in wheat breeding using genotyping-by-sequencing. Plant Genomes 2012;5:103-113. https://doi.org/10.3835/plantgenome2012.06.0006
  12. Poland JA, Brown PJ, Sorrells ME, Jannink JL. Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PLoS One 2012;7:e32253. https://doi.org/10.1371/journal.pone.0032253
  13. The Plant List. Version 1.1. Published on the internet. The Plant List, 2013. Accessed 2017 Oct 1. Available from: http://www.theplantlist.org.
  14. Nagaharu U. Genome analysis in Brassica with special reference to the experimental formation of B. napus and peculiar mode of fertilization. Jpn J Bot 1935;7:389-452.
  15. Wang X, Wang H, Wang J, Sun R, Wu J, Liu S, et al. The genome of the mesopolyploid crop species Brassica rapa. Nat Genet 2011;43:1035-1039. https://doi.org/10.1038/ng.919
  16. Liu S, Liu Y, Yang X, Tong C, Edwards D, Parkin IA, et al. The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes. Nat Commun 2014;5:3930. https://doi.org/10.1038/ncomms4930
  17. Cai C, Wang X, Liu B, Wu J, Liang J, Cui Y, et al. Brassica rapa genome 2.0: a reference upgrade through sequence re-assembly and gene re-annotation. Mol Plant 2017;10:649-651. https://doi.org/10.1016/j.molp.2016.11.008
  18. Boswell VR. Our vegetable travelers. Natl Geogr Mag 1949;96:145-217.
  19. Chalhoub B, Denoeud F, Liu S, Parkin IA, Tang H, Wang X, et al. Plant genetics: early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science 2014;345:950-953. https://doi.org/10.1126/science.1253435
  20. Parkin IA, Koh C, Tang H, Robinson SJ, Kagale S, Clarke WE, et al. Transcriptome and methylome profiling reveals relics of genome dominance in the mesopolyploid Brassica oleracea. Genome Biol 2014;15:R77. https://doi.org/10.1186/gb-2014-15-6-r77
  21. Yang J, Liu D, Wang X, Ji C, Cheng F, Liu B, et al. The genome sequence of allopolyploid Brassica juncea and analysis of differential homoeolog gene expression influencing selection. Nat Genet 2016;48:1225-1232. https://doi.org/10.1038/ng.3657
  22. Moghe GD, Hufnagel DE, Tang H, Xiao Y, Dworkin I, Town CD, et al. Consequences of whole-genome triplication as revealed by comparative genomic analyses of the wild radish Raphanus raphanistrum and three other Brassicaceae species. Plant Cell 2014;26:1925-1937. https://doi.org/10.1105/tpc.114.124297
  23. Kitashiba H, Li F, Hirakawa H, Kawanabe T, Zou Z, Hasegawa Y, et al. Draft sequences of the radish (Raphanus sativus L.) genome. DNA Res 2014;21:481-490. https://doi.org/10.1093/dnares/dsu014
  24. Mitsui Y, Shimomura M, Komatsu K, Namiki N, Shibata-Hatta M, Imai M, et al. The radish genome and comprehensive gene expression profile of tuberous root formation and development. Sci Rep 2015;5:10835. https://doi.org/10.1038/srep10835
  25. Jeong YM, Kim N, Ahn BO, Oh M, Chung WH, Chung H, et al. Elucidating the triplicated ancestral genome structure of radish based on chromosome-level comparison with the Brassica genomes. Theor Appl Genet 2016;129:1357-1372. https://doi.org/10.1007/s00122-016-2708-0
  26. Seo MS, Won SY, Kang SH, Kim JS. Analysis of flavonoids in double haploid population derived from microspore culture of F1 hybrid of Brassica rapa. J Plant Biotechnol 2017;44:35-41. https://doi.org/10.5010/JPB.2017.44.1.035
  27. Kozik A. Python programs to infer orders of genetic markers and for visualization and validation of genetic maps and haplotypes. Davis: The Michelmore Lab of UC Davis Genome Center, 2006. Accessed 2017 Oct 1. Available from: http://cgpdb.ucdavis.edu/XLinkage/MadMapper/.
  28. Lysak MA, Koch MA, Pecinka A, Schubert I. Chromosome triplication found across the tribe Brassiceae. Genome Res 2005;15:516-525. https://doi.org/10.1101/gr.3531105
  29. Mun JH, Kwon SJ, Yang TJ, Kim HS, Choi BS, Baek S, et al. The first generation of a BAC-based physical map of Brassica rapa. BMC Genomics 2008;9:280. https://doi.org/10.1186/1471-2164-9-280
  30. Sun C, Wu J, Liang J, Schnable JC, Yang W, Cheng F, et al. Impacts of whole-genome triplication on MIRNA evolution in Brassica rapa. Genome Biol Evol 2015;7:3085-3096. https://doi.org/10.1093/gbe/evv206
  31. Park TH, Park BS, Kim JA, Hong JK, Jin M, Seol YJ, et al. Construction of random sheared fosmid library from Chinese cabbage and its use for Brassica rapa genome sequencing project. J Genet Genomics 2011;38:47-53. https://doi.org/10.1016/j.jcg.2010.12.002
  32. Lee TH, Tang H, Wang X, Paterson AH. PGDD: a database of gene and genome duplication in plants. Nucleic Acids Res 2013;41:D1152-D1158.
  33. Yoo W, Kyung S, Han S, Kim S. Investigation of splicing quantitative trait loci in Arabidopsis thaliana. Genomics Inform 2016;14:211-215. https://doi.org/10.5808/GI.2016.14.4.211