Rank-Based Nonlinear Normalization of Oligonucleotide Arrays

  • Park, Peter J. (Children's Hospital Informatics Program, Children's Hospital, Harvard Medical School) ;
  • Kohane, Isaac S. (Children's Hospital Informatics Program, Children's Hospital, Harvard Medical School) ;
  • Kim, Ju Han (SNUBI: Seoul National University Biomedical Informatics, Seoul National University College of Medicine)
  • Published : 2003.12.01

Abstract

Motivation: Many have observed a nonlinear relationship between the signal intensity and the transcript abundance in microarray data. The first step in analyzing the data is to normalize it properly, and this should include a correction for the nonlinearity. The commonly used linear normalization schemes do not address this problem. Results: Nonlinearity is present in both cDNA and oligonucleotide arrays, but we concentrate on the latter in this paper. Across a set of chips, we identify those genes whose within-chip ranks are relatively constant compared to other genes of similar intensity. For each gene, we compute the sum of the squares of the differences in its within-chip ranks between every pair of chips as our statistic and we select a small fraction of the genes with the minimal changes in ranks at each intensity level. These genes are most likely to be non-differentially expressed and are subsequently used in the normalization procedure. This method is a generalization of the rank-invariant normalization (Li and Wong, 2001), using all available chips rather than two at a time to gather more information, while using the chip that is least likely to be affected by nonlinear effects as the reference chip. The assumption in our method is that there are at least a small number of non­differentially expressed genes across the intensity range. The normalized expression values can be substantially different from the unnormalized values and may result in altered down-stream analysis.

Keywords

References

  1. Alizadeh, A A., Eisen, M. B., Davis, R. E, Ma, C., Lossos, I. S., Rosenwald, A, Boldrick, J. G., Sabet, H.,Tran, T, Yu, X. et al. (2000). Distinct types of diffuse large B-cell lymphoma identified bygeneexpression profiling. Nature 403, 503-511 https://doi.org/10.1038/35000501
  2. Cho, R. J., Campbell, M. J., Winzeler, E. A. et al. (1998). A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell2, 65-73 https://doi.org/10.1016/S1097-2765(00)80114-8
  3. Chudin, E, Walker, R., Kosaka, A, Wu,S. X., Rabert, D., Chang, T. K., and Kreder, D. E (2001). The relationship between signal intensities and transcript concentration for affymetrix genechips. Genome Biology3, research 0005.1-0005.10
  4. Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Joumal of the American Statistical Association 74, 829-836 https://doi.org/10.2307/2286407
  5. Collins, F. S. (1999). Microarrays and macroconsequences. Nature Genetics 21 (Supp), 2 https://doi.org/10.1038/4425
  6. DeRisi, J. L., Iyer, V. R., and Brown, P. O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680-686 https://doi.org/10.1126/science.278.5338.680
  7. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A, Bloomfield, C. D., and Lander, E S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531-537 https://doi.org/10.1126/science.286.5439.531
  8. Hartemink, A J., Gifford, D. K., Jaakkola, T. S., and Young, R.A. (2001). Maximum likelihood estimation of optimal scaling factors for expression arraynormalization. In SPIE BiOS 2001
  9. Hill, A. A., Brown, E. L., Whitley, M. Z., Tucker-Kellogg, G., Hunter, C. P., and Slonim, D. K. (2001). Evaluation of normalization procedures for oligonucleotide array data based on spiked cRNA controls. Genome Biology 2, research 0055.1-research 0055.13
  10. Hoffmann, R., Seidl, T, and Dugas, M. (2002). Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis. Genome Biology 3, research 0033.1-0033.11
  11. Kepler, T B., Crosby, L., and Morgan, K.T (2002). Normalization and analysis of DNA microarray data by self-consistency and local regression. Genome Biology3, research 0037.1-0037.12
  12. Kroll, T. C. and Wolfl, S. (2002). Ranking: a closer look on globalisation methods for normalisation of gene expression arrays. Nucleic Acids Research 30, e50-e55 https://doi.org/10.1093/nar/30.11.e50
  13. Li, C. and Wong, W. H. (2001). Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biology 2, research 0032.1-0032.11
  14. Lockhart, D. J., Dong, H., Byme, M. C., Follettie, M. T, GalloM. V., Chee, M. S., Mittmann, M., Want, C., Kobayashi, M., Horton, H., and Brown, E. L. (1996). DNA expression monitoring by hybridization of high density oligonucleotide arrays. Nature Biotechnology 14, 1675-1680 https://doi.org/10.1038/nbt1296-1675
  15. Quackenbush, J. (2002). Microarray data normalization and transformation. NatGenetSuppl,496-501
  16. Ramdas, L., Coombes, K. R., Baggerly, K., Abruzzo, L., Highsmith, W. E, Krogmann, T., Hamilton, S. R., and Zhang, W. (2001). Sources of nonlinearity in eDNA microarray expression measurements. Genome Biology 2, research 0047.1-0047.7
  17. Shmulevich, I. and Zhang, W. (2002). Binary analysis and optimization-based normalization of gene expression data. Bioinformatics 18, 555-565 https://doi.org/10.1093/bioinformatics/18.4.555
  18. Tseng, G. C., Oh, M., Rohlin, L., Liao, J. C., and Wong, W. H. (2001). Issues in eDNA microarray analysis: quality filtering, channel normalization, models of variation and assessment of geneeffects. NucleicAcidsResearch 29,2549-2557
  19. Workman, C., Jensen, L. J., Jarmer, H., Berka, R., Gautier, L., Nielsen, H. B., Saxild, H. H., Nielsen, C., Brunak, S., and Knudsen, S. (2002). A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biology3, research0048.1-0048.16
  20. Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J., and Speed, T.P. (2002). Normalization for edna microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic AcidsResearch 30,e15 https://doi.org/10.1093/nar/30.4.e15
  21. Yang, Y. H., Dudoit, S., Luu, P., and Speed T. P. (2001). Normalization for eDNA microarray data. Technical Report 589, Statistics Dept, UC Berkeley