DOI QR코드

DOI QR Code

A MA-plot-based Feature Selection by MRMR in SVM-RFE in RNA-Sequencing Data

  • Received : 2018.09.05
  • Accepted : 2018.11.26
  • Published : 2018.12.31

Abstract

It is extremely lacking and urgently required that the method of constructing the Gene Regulatory Network (GRN) from RNA-Sequencing data (RNA-Seq) because of Big-Data and GRN in Big-Data has obtained substantial observation as the interactions among relevant featured genes and their regulations. We propose newly the computational comparative feature patterns selection method by implementing a minimum-redundancy maximum-relevancy (MRMR) filter the support vector machine-recursive feature elimination (SVM-RFE) with Intensity-dependent normalization (DEGSEQ) as a preprocessor for emphasizing equal preciseness in RNA-seq in Big-Data. We found out the proposed algorithm might be more scalable and convenient because of all libraries in R package and be more improved in terms of the time consuming in Big-Data and minimum-redundancy maximum-relevancy of a set of feature patterns at the same time.

유전자 규정 네트워크 (GRN)에 RNA-시퀀싱 데이터를 활용할 때, 해당 유전자와 환경과의 상호 작용에 의해서 생기는 형질들 중에서 연관성이 높은 유전자로 GRN을 구성하는 것은 상당히 어려운 일이다. 본 연구에서는 Big-Data의 RNA-시퀀싱 자료들로, 지지 벡터 머신 회귀 특징 추출(SVM-RFE) 에 근거하여, 연관성이 높은 유전자(maximum-relevancy)는 추출하고, 연관성이 낮은 유전자(minimum-redundancy)는 제거하는 MRMR 필터 방법을 집중도 의존 정규화(intensity-dependent normalization, DEGSEQ)에 기반 하여 데이터의 정밀성을 높여, 소수 연관성 높은 유전자만 판별해 내는 방법을 사용한다. 제안한 방법은 R 언어 패키지를 사용하여 편리함과 동시에, 다른 기존의 방법을 비교하였을 때, Big-Data의 시간 활용도를 높이면서, 동시에 높은 연관성 있는 유전자만을 잘 추출해 냄을 확인하였다.

Keywords

References

  1. H. Bolouri, "Modeling genomic regulatory networks with big data", Trends in Genetics, Vol. 30, No. 5, p. 182, May 2014. https://doi.org/10.1016/j.tig.2014.02.005
  2. Y. H. Yang, S. Dudoit, P. Luu, D. M. Lin, V. Peng, J. Ngai, and T. P. Speed, "Normalization for cDNA microarray data", Nucleic Acids Res, Vol. 30, No. 4, (e)15, Feb. 2002. https://doi.org/10.1093/nar/30.4.e15
  3. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander, "Molecular classification of cancer: class discovery and class prediction by gene expression monitoring", Science, Vol. 286, No. 5439, pp. 531-537, Oct. 1999. https://doi.org/10.1126/science.286.5439.531
  4. S. K Kim, S. Y Kim, J. H Kim, S. A Roh, D. H Cho, Y. S Kim, and J. C Kim, "A nineteen gene-based risk score classifier predicts prognosis of colorectal cancer patients", Molecular Oncology, Vol. 8, No. 8, pp. 653-1666, Dec. 2014.
  5. C. Ding and H. Peng, "Minimum Redundancy Feature Selection from Microarray Gene Expression Data", J. Bioinfo. Compu. Bio., Vol. 3, No. 2, pp. 185-205, Apr. 2005. https://doi.org/10.1142/S0219720005001004
  6. B. Liu, Y. Wei, Y. Zhang, and Q. Yang, "Deep Neural Networks for High Dimension, Low Sample Size Data", IJCAI-17, pp. 2287-2293, Aug. 2017.
  7. I. Guyon, J. Weston, S. Barnhill, V. Vapnik, "Gene selection for cancer classification using support vector machine", Mach. Learn. Vol. 46, pp. 389-422, Jan. 2002. https://doi.org/10.1023/A:1012487302797
  8. Y. Tang, Y. Q. Zhang, and Z. Huang, "Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis", IEEE ACM Transactions on Computational Biology and Bioinformatics, Vol. 4, No. 3, pp. 365-381, Jul. 2007. https://doi.org/10.1109/TCBB.2007.1028
  9. P. A. Mundra and J. C. Rajapakse, "SVM-RFE With MRMR Filter for Gene Selection", IEEE Transactions on Nanobioscience, VoL. 9, No. 1, pp. 31-37, Oct. 2010. https://doi.org/10.1109/TNB.2009.2035284
  10. C. Kim, "Feature Selection of SVM-RFE Combined with a TD Reinforcement Learning", Journal of KIIT. Vol. 16, No. 10, pp. 21-26, Oct. 2018.