DOI QR코드

DOI QR Code

Comparison of Normalization Methods for Defining Copy Number Variation Using Whole-genome SNP Genotyping Data

  • Kim, Ji-Hong (Integrated Research Center for Genome Polymorphism) ;
  • Yim, Seon-Hee (Department of Microbiology, The Catholic University of Korea, College of Medicine) ;
  • Jeong, Yong-Bok (Integrated Research Center for Genome Polymorphism) ;
  • Jung, Seong-Hyun (Integrated Research Center for Genome Polymorphism) ;
  • Xu, Hai-Dong (Integrated Research Center for Genome Polymorphism) ;
  • Shin, Seung-Hun (Integrated Research Center for Genome Polymorphism) ;
  • Chung, Yeun-Jun (Integrated Research Center for Genome Polymorphism)
  • Published : 2008.12.31

Abstract

Precise and reliable identification of CNV is still important to fully understand the effect of CNV on genetic diversity and background of complex diseases. SNP marker has been used frequently to detect CNVs, but the analysis of SNP chip data for identifying CNV has not been well established. We compared various normalization methods for CNV analysis and suggest optimal normalization procedure for reliable CNV call. Four normal Koreans and NA10851 HapMap male samples were genotyped using Affymetrix Genome-Wide Human SNP array 5.0. We evaluated the effect of median and quantile normalization to find the optimal normalization for CNV detection based on SNP array data. We also explored the effect of Robust Multichip Average (RMA) background correction for each normalization process. In total, the following 4 combinations of normalization were tried: 1) Median normalization without RMA background correction, 2) Quantile normalization without RMA background correction, 3) Median normalization with RMA background correction, and 4) Quantile normalization with RMA background correction. CNV was called using SW-ARRAY algorithm. We applied 4 different combinations of normalization and compared the effect using intensity ratio profile, box plot, and MA plot. When we applied median and quantile normalizations without RMA background correction, both methods showed similar normalization effect and the final CNV calls were also similar in terms of number and size. In both median and quantile normalizations, RMA backgroundcorrection resulted in widening the range of intensity ratio distribution, which may suggest that RMA background correction may help to detect more CNVs compared to no correction.

Keywords

References

  1. Iafrate, A.J., Feuk, L., Rivera, M.N., Listewnik, M.L., Donahoe, P.K., Qi, Y., Scherer, S.W., and Lee, C. (2004). Detection of large-scale variation in the human genome. Nat. Genet. 36, 949-951 https://doi.org/10.1038/ng1416
  2. Sebat, J., Lakshmi, B., Troge, J., et al. (2004). Large-scale copy number polymorphism in the human genome. Science 305, 525-528 https://doi.org/10.1126/science.1098918
  3. Kim, T.M., Yim, S.H., and Chung, Y.J. (2008). Copy number variations in the human genome: potential source for individual diversity and disease association studies. Genomics & Informatics 6, 1-7 https://doi.org/10.5808/GI.2008.6.1.001
  4. McCarroll, S.A., and Altshuler, D.M. (2007). Copy-number variation and association studies of human disease. Nat. Genet. 39, S37-S42 https://doi.org/10.1038/ng2080
  5. Estivill, X., and Armengol, L. (2007). Copy number variants and common disorders: filling the gaps and exploring complexity in genome-wide association studies. PLoS Genet. 10, 1787-1799
  6. de Smith, A.J., Tsalenko, A., Sampas, N., et al. (2007). Array CGH analysis of copy number variation identifies 1284 new genes variant in healthy white males: implications for association studies of complex diseases. Hum. Mol. Genet. 16, 2783-2794 https://doi.org/10.1093/hmg/ddm208
  7. Perry, G.H., Ben-Dor, A., Tsalenko, A., et al. (2008). The fine-scale and complex architecture of human copy-number variation. Am. J. Hum. Genet. 82, 685-695 https://doi.org/10.1016/j.ajhg.2007.12.010
  8. Price, T.S., Regan, R., Mott, R., et al., (2005) SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data. Nucleic Acids Res. 33, 3455-3464 https://doi.org/10.1093/nar/gki643

Cited by

  1. Identification of 1,531 cSNPs from Full-length Enriched cDNA Libraries of the Korean Native Pig Using in Silico Analysis vol.7, pp.2, 2009, https://doi.org/10.5808/GI.2009.7.2.065