의료진단 및 중요 검사 항목 결정 지원 시스템을 위한 랜덤 포레스트 알고리즘 적용

Application of Random Forest Algorithm for the Decision Support System of Medical Diagnosis with the Selection of Significant Clinical Test

  • 윤태균 (한국정보통신대학 공학부) ;
  • 이관수 (한국정보통신대학 공학부)
  • 발행 : 2008.06.01

초록

In clinical decision support system(CDSS), unlike rule-based expert method, appropriate data-driven machine learning method can easily provide the information of individual feature(clinical test) for disease classification. However, currently developed methods focus on the improvement of the classification accuracy for diagnosis. With the analysis of feature importance in classification, one may infer the novel clinical test sets which highly differentiate the specific diseases or disease states. In this background, we introduce a novel CDSS that integrate a classifier and feature selection module together. Random forest algorithm is applied for the classifier and the feature importance measure. The system selects the significant clinical tests discriminating the diseases by examining the classification error during backward elimination of the features. The superior performance of random forest algorithm in clinical classification was assessed against artificial neural network and decision tree algorithm by using breast cancer, diabetes and heart disease data in UCI Machine Learning Repository. The test with the same data sets shows that the proposed system can successfully select the significant clinical test set for each disease.

키워드

참고문헌

  1. DL Hudson, ME Cohen, A neural network learning algorithm, for development of diagnostic decision strategies, IEEE Engineering in Medicine and Biology, 1990; 12:1451-1452
  2. SJ Fakih, TL Das, LEAD: A methodology for learning efficient approaches to medical diagnosis, IEEE Trans. Information Technology in Biomedicine, 2006; 10 (2):220-228 https://doi.org/10.1109/TITB.2005.855538
  3. RO Duda, R.O., PE Hart., Pattern Classification and Scene Analysis, Wiley-Interscience, New York, 1973
  4. DL Hudson, ME Cohen, Neural Networks and Artificial Intelligence in Biomedical Engineering, IEEE Press/Wiley, 1999
  5. Breiman L: Random forests, Machine Learning 2001, 45 pp. 5-32 https://doi.org/10.1023/A:1010933404324
  6. http://www.ics.uci.edu/~mlearn/MLRepository
  7. ME Cohen, DL Hudson, Combining Evidence in Hybrid Medical Decision Support Models, Proceeding of IEEE EMBS, 2007
  8. R.E. Abdel-Aal, Improved classification of medical data using abductive network committees trained on different feature subsets, Computer Methods and Programs in Biomedicine, 2005, 80 pp. 141-153 https://doi.org/10.1016/j.cmpb.2005.08.001
  9. http://www.r-project.org/
  10. W. Duch, R. Adamczak, K. Grabczewski, A new methodology of extraction, optimization and application of crisp and fuzzy logical rules, IEEE Trans. Neural Networks 12, 2001, pp. 277-306 https://doi.org/10.1109/72.914524
  11. F. Zhu, S. Guan, Feature selection for modular GA-based classification, Appl. Soft Comput, 2004, 4 pp. 381-393 https://doi.org/10.1016/j.asoc.2004.02.001