DOI QR코드

DOI QR Code

Cancer Diagnosis System using Genetic Algorithm and Multi-boosting Classifier

Genetic Algorithm과 다중부스팅 Classifier를 이용한 암진단 시스템

  • 온승엽 (한국항공대학교 컴퓨터 공학과) ;
  • 지승도 (한국항공대학교 컴퓨터 공학과)
  • Published : 2011.06.30

Abstract

It is believed that the anomalies or diseases of human organs are identified by the analysis of the patterns. This paper proposes a new classification technique for the identification of cancer disease using the proteome patterns obtained from two-dimensional polyacrylamide gel electrophoresis(2-D PAGE). In the new classification method, three different classification methods such as support vector machine(SVM), multi-layer perceptron(MLP) and k-nearest neighbor(k-NN) are extended by multi-boosting method in an array of subclassifiers and the results of each subclassifier are merged by ensemble method. Genetic algorithm was applied to obtain optimal feature set in each subclassifier. We applied our method to empirical data set from cancer research and the method showed the better accuracy and more stable performance than single classifier.

생물 및 의학계에서는 생물정보학(bioinformatics)의 데이터 중 혈청 단백질(proteome)에서 추출한 데이터가 질병의 진단에 관련된 정보를 가지고 있고, 이 데이터를 분류 분석함으로 질병을 조기에 진단 할 수 있다고 믿고 있다. 본 논문에서는 혈청 단백질(2-D PAGE: Two-dimensional polyacrylamide gel electrophoresis)로부터 암과 정상을 판별하는 새로운 복합분류기를 제안한다. 새로운 복합 분류기에서는 support vector machine(SVM)와 다층 퍼셉트론(multi-layer perceptron: MLP)와 k-최근 접 이웃(k-nearest neighbor: k-NN)분류기를 앙상블(ensemble) 방법으로 통합하는 동시에 다중 부스팅(boosting) 방법으로 각 분류기를 확장하여 부분류기(subclassifier)의 배열(array)으로서 복합분류기를 구성하였다. 각 부분류기에서는 최적 특성 집합 (feature set)을 탐색하기 위하여 유전 알고리즘(genetic algorithm: GA)를 적용하였다. 복합분류기의 성능을 측정하기 위하여 암연구에서 얻어진 임상 데이터를 복합분류기에 적용하였고 결과로서 단일 분류기 보다 높은 분류 정확도와 안정성을 보여 주었다.

Keywords

References

  1. B. Krishnapuram, L. Carin, and A. Hartemink, "Joint Classifier and feature optimization for cancer diagnosis using gene expression data", Proceedings of the seventh annual international conference on computational molecular biology, pp. 167-175, 2003.
  2. S. Ando, and H. Iba, "Classification of Gene Expression Profile Using Combinatory Method of Evolutionary Computation and Machine Learning", Genetic Programming and Evolvable Machines, Volume 5 Issue 2, 2004.
  3. "PDQuest User Manual", http://www.bio-rad.com
  4. Ha-Nam Nguyen, Syng-yup Ohn and Ohn Woo-Jin, Combined Kernel Function for Support Vector Machine and Learning Method Based on Evolutionary Algorithm." 1273-1278.
  5. R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd Ed., Wiley Interscience, New York, 2001.
  6. V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, Berlin Heidelberg, New York, 1995.
  7. J. P. Anderson, "Computer Security Threat Monitoring and Surveillance", James P Anderson Co., Technical report, Fort Washington, Pennsylvania, April 980.
  8. Adaptation in Natural and Artificial Systems, Ann Arbor: The University of Michigan Press, 1975. (Second edition printed in 1992 by MIT Press, Cambridge, MA.)
  9. Y. Freund, and R. Schapire, "Experments with new boosting algorithm" Proc. Of the 13th International Conference on Machine Learning, pp. 148-156, Bari, Italy, 1996.
  10. Ljubomir J.Buturovic PCP-Pattern Classification Program, version1.2 http://pcp.sourceforge.net
  11. C.-C. Chang, C.-J Lin, "LIBSVM : a Library for Support Vector Machines," http://www.csie.ntu.edu.tw/-cjlin/papers/ libsvm.pdf, August 12,2004.
  12. C.-W. Hus, C.-C. Chang, C.-J. Lin, "A Paractical Guide to Support Vector Classification," http://www.csie.ntu.edu.tw/ -cjlin/paper/guide/guide.pdf
  13. Dong Seong Kim, Ha-Nam Nguyen, Jong Sou Park "Genetic Algorithm to Impove SVM Based Network Intrusion Detection System" IEE Computer Society Press.