Browse > Article
http://dx.doi.org/10.22937/IJCSNS.2021.21.3.4

Dimensionality Reduction of RNA-Seq Data  

Al-Turaiki, Isra (College of Computer and Information Sciences, Information Technology Department King Saud University)
Publication Information
International Journal of Computer Science & Network Security / v.21, no.3, 2021 , pp. 31-36 More about this Journal
Abstract
RNA sequencing (RNA-Seq) is a technology that facilitates transcriptome analysis using next-generation sequencing (NSG) tools. Information on the quantity and sequences of RNA is vital to relate our genomes to functional protein expression. RNA-Seq data are characterized as being high-dimensional in that the number of variables (i.e., transcripts) far exceeds the number of observations (e.g., experiments). Given the wide range of dimensionality reduction techniques, it is not clear which is best for RNA-Seq data analysis. In this paper, we study the effect of three dimensionality reduction techniques to improve the classification of the RNA-Seq dataset. In particular, we use PCA, SVD, and SOM to obtain a reduced feature space. We built nine classification models for a cancer dataset and compared their performance. Our experimental results indicate that better classification performance is obtained with PCA and SOM. Overall, the combinations PCA+KNN, SOM+RF, and SOM+KNN produce preferred results.
Keywords
Principal Component Analysis (PCA); Singular Value Decomposition (SVD); Self-Organizing Maps (SOM); RNA-Seq; Dimensionality Reduction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 "Overview and comparative study of dimensionality reduction techniques for high dimensional data - ScienceDirect." https://www-sciencedirectcom.sdl.idm.oclc.org/science/article/pii/S156625351 930377X (accessed Jan. 17, 2021).
2 J. Han and M. Kamber, Data Mining: Concepts and Techniques, 1st edition. San Francisco: Morgan Kaufmann, 2000.
3 N. S. Altman, "An introduction to kernel and nearest-neighbor nonparametric regressi0on," Am. Stat., vol. 46, no. 3, pp. 175-185, 1992, doi: 10.2307/2685209.   DOI
4 K. Tsuyuzaki, H. Sato, K. Sato, and I. Nikaido, "Benchmarking principal component analysis for large-scale single-cell RNA-sequencing," Genome Biol., vol. 21, no. 1, p. 9, Jan. 2020, doi: 10.1186/s13059-019-1900-3.   DOI
5 H. Wirth, M. Loffler, M. von Bergen, and H. Binder, "Expression cartography of human tissues using self organizing maps," Nat. Preced., pp. 1-1, Jun. 2011, doi: 10.1038/npre.2011.5825.2.   DOI
6 "The Cancer Genome Atlas Program - National Cancer Institute," Jun. 13, 2018. https://www.cancer.gov/aboutnci/organization/ccg/research/structuralgenomics/tcga (accessed Jan. 17, 2021).
7 C. Ferles, Y. Papanikolaou, and K. J. Naidoo, "Denoising Autoencoder Self-Organizing Map (DASOM)," Neural Netw., vol. 105, pp. 112-131, Sep. 2018, doi: 10.1016/j.neunet.2018.04.016.   DOI
8 T. Ahvenlampi, R. Rantanen, and M. Tervaskanto, "Fault tolerant control application for continuous kraft pulping process," in Fault Detection, Supervision and Safety of Technical Processes 2006, H.-Y. Zhang, Ed. Oxford: Elsevier Science Ltd, 2007, pp. 849-854.
9 S. A. Alsenan, I. M. Al-Turaiki, and A. M. Hafez, "Feature extraction methods in quantitative structure-activity relationship modeling: A comparative study," IEEE Access, vol. 8, pp. 78737-78752, 2020, doi: 10.1109/ACCESS.2020.2990375.   DOI
10 G. H. Golub and C. Reinsch, "Singular value decomposition and least squares solutions," Numer. Math., vol. 14, no. 5, pp. 403-420, Apr. 1970, doi: 10.1007/BF02163027.   DOI
11 L. Breiman, "Random forests," Mach. Learn., vol. 45, no. 1, pp. 5-32, Oct. 2001, doi: 10.1023/A:1010933404324.   DOI
12 S. Mahapatra, A. Kumar, A. Sharma, and S. S. Sahu, "Effect of dimensionality reduction on classification accuracy for protein-protein interaction prediction," in Advanced Computing and Intelligent Engineering, Singapore, 2020, pp. 3-12, doi: 10.1007/978-981-15-1081-6_1.
13 L. D. Locati et al., "Mining of self-organizing map gene-expression portraits reveals prognostic stratification of HPV-positive head and neck squamous cell carcinoma," Cancers, vol. 11, no. 8, Art. no. 8, Aug. 2019, doi: 10.3390/cancers11081057.   DOI
14 K. Nirmalakumari, H. Rajaguru, and P. Rajkumar, "Performance analysis of classifiers for colon cancer detection from dimensionality reduced microarray gene data," Int. J. Imaging Syst. Technol., vol. 30, no. 4, pp. 1012-1032, 2020, doi: https://doi.org/10.1002/ima.22431.   DOI
15 T. Kohonen, Self-Organizing Maps. Springer Science & Business Media, 2012.
16 I. T. Jolliffe and J. Cadima, "Principal component analysis: A review and recent developments," Philos. Trans. R. Soc. Math. Phys. Eng. Sci., vol. 374, no. 2065, p. 20150202, Apr. 2016, doi: 10.1098/rsta.2015.0202.   DOI
17 M. O. Arowolo, M. O. Adebiyi, A. A. Adebiyi, and O. J. Okesola, "A hybrid heuristic dimensionality reduction methods for classifying malaria vector gene expression data," IEEE Access, vol. 8, pp. 182422-182430, 2020, doi: 10.1109/ACCESS.2020.3029234.   DOI
18 A. Jabeen, N. Ahmad, and K. Raza, "Machine learning-based state-of-the-art methods for the classification of RNA-seq data," in Classification in BioApps: Automation of Decision Making, N. Dey, A. S. Ashour, and S. Borra, Eds. Cham: Springer International Publishing, 2018, pp. 133-172.
19 "GenBank and WGS Statistics." https://www.ncbi.nlm.nih.gov/genbank/statistics/ (accessed Jan. 17, 2021).
20 D. Singh, P. K. Singh, S. Chaudhary, K. Mehla, and S. Kumar, "Chapter Three - Exome sequencing and advances in crop improvement," in Advances in Genetics, vol. 79, T. Friedmann, J. C. Dunlap, and S. F. Goodwin, Eds. Academic Press, 2012, pp. 87-121.
21 L. H. Nguyen and S. Holmes, "Ten quick tips for effective dimensionality reduction," PLOS Comput. Biol., vol. 15, no. 6, p. e1006907, Jun. 2019, doi: 10.1371/journal.pcbi.1006907.   DOI
22 "RapidMiner | Best Data Science & Machine Learning Platform," RapidMiner. https://rapidminer.com/ (accessed Mar. 15, 2020).
23 D. Miljkovic, "Brief review of self-organizing maps," in 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), May 2017, pp. 1061-1066, doi: 10.23919/MIPRO.2017.7973581.   DOI