Browse > Article
http://dx.doi.org/10.9708/jksci.2018.23.12.043

A Implementation of Optimal Multiple Classification System using Data Mining for Genome Analysis  

Jeong, Yu-Jeong (IT Convergence University, Chosun University)
Choi, Gwang-Mi (IT Convergence University, Chosun University)
Abstract
In this paper, more efficient classification result could be obtained by applying the combination of the Hidden Markov Model and SVM Model to HMSV algorithm gene expression data which simulated the stochastic flow of gene data and clustering it. In this paper, we verified the HMSV algorithm that combines independently learned algorithms. To prove that this paper is superior to other papers, we tested the sensitivity and specificity of the most commonly used classification criteria. As a result, the K-means is 71% and the SOM is 68%. The proposed HMSV algorithm is 85%. These results are stable and high. It can be seen that this is better classified than using a general classification algorithm. The algorithm proposed in this paper is a stochastic modeling of the generation process of the characteristics included in the signal, and a good recognition rate can be obtained with a small amount of calculation, so it will be useful to study the relationship with diseases by showing fast and effective performance improvement with an algorithm that clusters nodes by simulating the stochastic flow of Gene Data through data mining of BigData.
Keywords
BigData; Data Mining Gene Data; Classification System;
Citations & Related Records
연도 인용수 순위
  • Reference
1 G.K., Yang, Y. H., T. p. Speed, " Statistical issues in microarray data analysis," Functional Genomics, Methods and Protocols, 24 ,111-136, 2003
2 Y.Chen, E. R. Dougherty and M. L., " Bittner, Ratio-Based Decision and the Quantitative Analysis of cCNA Microarray Images," Journal of Biomedical Optics 2 no.4,364-374, 1997   DOI
3 Y. H. Yang, S. Dudiot, P. Luu, D. M. Lin, V. Peng, J. Nagi and T.P. Speed, "Normalization for cDNA Microarray data : a robust composite method addressing single and multiple slide systematic variation," Nucleic Acids Research no,2002.
4 Pierre Baldlnd G. Wesley Hatfield, "DNA Microarrays and gene expression "(n.p.: Cambridge University Press, 2002)
5 T. Kohonen, elf-Organizing Map (n.p.: Springer, 1997)
6 Kim sul Lam, "Analysis of Influencing Factors of Medical Expenditure on Elderly Hypertension Outpatients - Focused on Region and Medical Use," (Master of Engineering Thesis, Chungbuk National University Graduate School, 8-9,2018.
7 E. Berglund ; J. Sitte, "The parameterless self-organizing map algorithm," IEEE Transactions in Neural Networks 17 no.2 ,305-316,2006   DOI
8 Smyth, G.K., Yang, Y.H., Speed, T.P, "Staticstics issues in microarray data analysis. Function Genomics," Methods and protocols 24,111-136, 2003
9 Yang, Y.H., Dudoit, s., Luu,P., Lin, D.M., Peng, V., Nagi, J., Speed, T.P.(2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiplr slide systematic varation. Nucleic Acids Res 30.
10 SukBuk Kang, oungMin Kim, JinKap Choi, BongSeon Kim, WonSub Yang. "Application Statistics." n.p.: Kyeongmunsa, 1993.
11 Han hakyoung, "Introduction to pattern recognition"(n.p.: hanbit media, 2011)
12 Cho sunho, "Segmented viterbi algorithm for speech recognition,",Master's Thesis, Korea University, n.d, 8-9.
13 National Cancer Information Center. http://www.cancer.go.kr, 2017
14 Hsu, Chih_Wei and Chih-Hen Lin, "comparison of methods for multi-class support vector machines," IEEE Transactions in Neural Networks 13,415-425,2002   DOI
15 Lee Ji Sun, "Explanatory model on quality of life in patients with pancreatic cancer", doctor, Yonsei University Graduate School, 1-2, 2018
16 Tao,L., C.Zhang and Mitsunori,O., " comparative study of feature selection and multiclass classfication methods for tissue classification based on gene expression," Bioinformatics 20 ,2429-2437.