Development of Decision Tree Software and Protein Profiling using Surface Enhanced laser Desorption/lonization - Time of Flight - Mass Spectrometry (SELDI-TOF-MS) in Papillary Thyroid Cancer

의사결정트리 프로그램 개발 및 갑상선유두암에서 질량분석법을 이용한 단백질 패턴 분석

  • Yoon, Joon-Kee (Department of Nuclear Medicine & Molecular Imaging, Ajou University School of Medicine) ;
  • Lee, Jun (Department of Computer Engineering, Konkuk University) ;
  • An, Young-Sil (Department of Nuclear Medicine & Molecular Imaging, Ajou University School of Medicine) ;
  • Park, Bok-Nam (Department of Nuclear Medicine & Molecular Imaging, Ajou University School of Medicine) ;
  • Yoon, Seok-Nam (Department of Nuclear Medicine & Molecular Imaging, Ajou University School of Medicine)
  • 윤준기 (아주대학교 의과대학 핵의학과학교실) ;
  • 이준 (건국대학교 컴퓨터공학과) ;
  • 안영실 (아주대학교 의과대학 핵의학과학교실) ;
  • 박복남 (아주대학교 의과대학 핵의학과학교실) ;
  • 윤석남 (아주대학교 의과대학 핵의학과학교실)
  • Published : 2007.08.31

Abstract

Purpose: The aim of this study was to develop a bioinformatics software and to test it in serum samples of papillary thyroid cancer using mass spectrometry (SELDI-TOF-MS). Materials and Methods: Development of 'Protein analysis' software performing decision tree analysis was done by customizing C4.5. Sixty-one serum samples from 27 papillary thyroid cancer, 17 autoimmune thyroiditis, 17 controls were applied to 2 types of protein chips, CM10 (weak cation exchange) and IMAC3 (metal binding - Cu). Mass spectrometry was performed to reveal the protein expression profiles. Decision trees were generated using 'Protein analysis' software, and automatically detected biomarker candidates. Validation analysis was performed for CM10 chip by random sampling. Results: Decision tree software, which can perform training and validation from profiling data, was developed. For CM10 and IMAC3 chips, 23 of 113 and 8 of 41 protein peaks were significantly different among 3 groups (p<0.05), respectively. Decision tree correctly classified 3 groups with an error rate of 3.3% for CM10 and 2.0% for IMAC3, and 4 and 7 biomarker candidates were detected respectively. In 2 group comparisons, all cancer samples were correctly discriminated from non-cancer samples (error rate = 0%) for CM10 by single node and for IMAC3 by multiple nodes. Validation results from 5 test sets revealed SELDI-TOF-MS and decision tree correctly differentiated cancers from non-cancers (54/55, 98%), while predictability was moderate in 3 group classification (36/55, 65%). Conclusion: Our in-house software was able to successfully build decision trees and detect biomarker candidates, therefore it could be useful for biomarker discovery and clinical follow up of papillary thyroid cancer.

본 연구의 목적은 의사결정트리를 생성하는 생물정보학 프로그램을 개발하고, 이를 갑상선유두암 혈청의 질량분석자료로 시험해 보는 것이다. 대상 및 방법: C4.5를 커스터마이징하여 의사결정트리 분석을 수행할 수 있는 'Protein analysis'라는 프로그램을 개발하였다 61개의 혈청시료(갑상선유두암 27, 자가면역성 갑상선염 17, 대조군 17)를 일정 기간 동안 순차적으로 냉동한 후 실온에서 일시에 해동하여 분석에 사용하였다. 모든 시료는 탈지질화 과정을 거쳐 준비한 후, 2종류의 단백질칩(CM10, IMAC3)에 각각 60개, 50개 시료를 적용하였다. 갑상선유두암의 특징적인 단백질 패턴을 찾기 위해 질량분석기를 이용하여 단백질칩을 분석했다. 'Protein analysis' 프로그램을 이용하여 단백질분포 자료로부터 의사결정트리를 작성하고, 생체표지자 후보물질을 검출하였다. CM10칩에서 발견된 생체표지자 후보물질을 무작위 표본추출 방법을 이용하여 검증하였다. 결과: 단백질분포 자료의 훈련과 검증이 가능한 의사결정트리 프로그램이 개발되었으며, 이 프로그램은 트리 구조와 노드 정보, 트리 구성 과정을 표시하는 3개의 창으로 구성되었다. CM10칩을 이용한 분석에서 총 113개의 단백질 피크 중 23개가 3그룹 간에 유의한 차이가 있었으며, IMAC3는 41개의 단백질 피크 중 8개가 3그룹 간에 유의한 차이가 있었다. 3그룹 분석에서 의사결정트리는 CM10칩과 IMAE3의 단백질분포 자료로부터 각각 60개와 50개의 시료를 높은 정확도로 분류하였으며(오차율 = 각각 3.3%, 2.0%), 각각 4개와 7개의 생체표지자 후보물질을 검출하였다. 암시료와 비암시료를 구분하는 2그룹 분석 에서, 의사결정트리는 모든 암시료를 정확히 구분하였으며(모두 오차율 = 0%), CM10칩을 이용한 분석에서는 단일 노드를 사용하고, IMAC3칩을 이용한 분석에서는 여러 개의 노드를 사용하였다. CM10칩의 단백질 분포자료를 5번의 무작위 추출에 의해 시행한 검증에서 암시료와 비암시료를 구분하는데 높은 정확도를 보였으나(정확도 = 98%, 54/55), 3그룹을 구분할 때는 중등도의 정확도를 보였다(정확도 = 65%, 36/55). 결론: 우리가 개발한 프로그램은 질량분석 자료로부터 성공적으로 의사결정트리를 생성하고, 생체표지자 후보물질을 검출할 수 있었다. 따라서 이 프로그램은 혈청 시료를 이용한 생체표지자 발굴 및 갑상선유두암의 추적관찰에 유용하게 사용될 수 있을 것이다.

Keywords

References

  1. Mansi L, Moncayo R, Cuccurullo V, Dottorini ME, Rambaldi PF. Nuclear medicine in diagnosis, staging and follow-up of thyroid cancer. Q J Nucl Med Mol Imaging 2004;48:82-95
  2. Ringel MD, Ladenson PW. Controversies in the follow-up and management of well-differentiated thyroid cancer. Endocr Relat Cancer 2004;11:97-116 https://doi.org/10.1677/erc.0.0110097
  3. Xiao Z, Prieto D, Conrads TP, Veenstra TD, Issaq HJ. Proteomic patterns: their potential for disease diagnosis. Mol Cell Endocrinol 2005;230:95-106 https://doi.org/10.1016/j.mce.2004.10.010
  4. Issaq HJ, Veenstra TD, Conrads TP, Felschow D. The SELDI-TOF MS approach to proteomics: protein profiling and biomarker identification. Biochem Biophys Res Commun 2002;292:587-92 https://doi.org/10.1006/bbrc.2002.6678
  5. Engwegen JY, Gast MC, Schellens JH, Beijnen JH. Clinical proteomics: searching for better tumour markers with SELDI-TOF mass spectrometry. Trends Pharmacol Sci 2006;27:251-9 https://doi.org/10.1016/j.tips.2006.03.003
  6. Haberkorn U, Altmann A, Eisenhut M. Functional genomics and proteomics-the role of nuclear medicine. Eur J Nucl Med Mol Imaging 2002;29:115-32 https://doi.org/10.1007/s00259-001-0682-4
  7. Quinlan JR. Learning logical definitions from relations. Machine Learning 1990;5:239-66
  8. Won Y, Song HJ, Kang TW, Kim JJ, Han BD, Lee SW. Pattern analysis of serum proteome distinguishes renal cell carcinoma from other urologic diseases and healthy persons. Proteomics 2003;3:2310-6 https://doi.org/10.1002/pmic.200300590
  9. Salzberg SL. Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc.,1993. Machine Learning 1994;16:235-40
  10. Quinlan JR. Improved use of continuous attributes in C4.5. J Artif Intell Res 1996;5:139-61
  11. Masic N, Gagro A, Rabatic S, Sabioncello A, Dasic G, Jaksic B, et al. Decision-tree approach to the immunophenotype-based prognosis of the B-cell chronic lymphocytic leukemia. Am J Hematol 1998;59:143-8 https://doi.org/10.1002/(SICI)1096-8652(199810)59:2<143::AID-AJH7>3.0.CO;2-Y
  12. Twa MD, Parthasarathy S, Roberts C, Mahmoud AM, Raasch TW, Bullimore MA. Automated decision tree classification of corneal shape. Optom Vis Sci 2005;82:1038-46 https://doi.org/10.1097/01.opx.0000192350.01045.6f
  13. Chan AL, Chen JX, Wang HY. Application of data mining to predict the dosage of vancomycin as an outcome variable in a teaching hospital population. Int J Clin Pharmacol Ther 2006; 44:533-8 https://doi.org/10.5414/CPP44533
  14. van Gerven MA, Jurgelenaite R, Taal BG, Heskes T, Lucas PJ. Predicting carcinoid heart disease with the noisy-threshold classifier. Artif Intell Med 2006 (epub)
  15. Esquerda A, Trujillano J, Lopez UI, Bielsa S, Madronero AB, Porcel JM. Classification tree analysis for the discrimination of pleural exudates and transudates. Clin Chem Lab Med 2007;45: 82-7 https://doi.org/10.1515/CCLM.2007.001
  16. Takahashi O, Cook EF, Nakamura T, Saito J, Ikawa F, Fukui T. Risk stratification for in-hospital mortality in spontaneous intracerebral haemorrhage: a Classification and Regression Tree analysis. Qjm 2006;99:743-50 https://doi.org/10.1093/qjmed/hcl107
  17. Suzuyama K, Shiraishi T, Oishi T, Ueda S, Okamoto H, Furuta M, et al. Combined proteomic approach with SELDI-TOF-MS and peptide mass fingerprinting identified the rapid increase of monomeric transthyretin in rat cerebrospinal fluid after transient focal cerebral ischemia. Brain Res Mol Brain Res 2004;129:44-53 https://doi.org/10.1016/j.molbrainres.2004.06.021
  18. Chang LY, Wang HW. Analysis of traffic injury severity: an application of non-parametric classification tree techniques. Accid Anal Prev 2006;38:1019-27 https://doi.org/10.1016/j.aap.2006.04.009
  19. Webster AP, Goodacre S, Walker D, Burke D. How do clinical features help identify paediatric patients with fractures following blunt wrist trauma? Emerg Med J 2006;23:354-7 https://doi.org/10.1136/emj.2005.029249
  20. Spurgeon SE, Hsieh YC, Rivadinera A, Beer TM, Mori M, Garzotto M. Classification and regression tree analysis for the prediction of aggressive prostate cancer on biopsy. J Urol 2006; 175:918-22 https://doi.org/10.1016/S0022-5347(05)00353-8
  21. Isbister GK, Sibbritt D. Developing a decision tree algorithm for the diagnosis of suspected spider bites. Emerg Med Australas 2004;16:161-6 https://doi.org/10.1111/j.1742-6723.2004.00569.x
  22. Lehmann G, Schmitt C, Kehl V, Schmieder S, Schomig A. Electrocardiographic algorithm for assignment of occluded vessel in acute myocardial infarction. Int J Cardiol 2003;89:79-85 https://doi.org/10.1016/S0167-5273(02)00408-4
  23. Langlois RG, Trebes JE, Dalmasso EA, Ying Y, Davies RW, Curzi MP, et al. Serum protein profile alterations in hemodialysis patients. Am J Nephrol 2004;24:268-74 https://doi.org/10.1159/000077409
  24. Smolle J, Kahofer P. Automated detection of connective tissue by tissue counter analysis and classification and regression trees. Anal Cell Pathol 2001;23:153-8
  25. Merchant M, Weinberger SR. Recent advancements in surface-enhanced laser desorption/ionization-time of flight-mass spectrometry. Electrophoresis 2000;21:1164-77 https://doi.org/10.1002/(SICI)1522-2683(20000401)21:6<1164::AID-ELPS1164>3.0.CO;2-0
  26. Jr GW, Cazares LH, Leung SM, Nasim S, Adam BL, Yip TT, et al. Proteinchip(R) surface enhanced laser desorption/ionization (SELDI) mass spectrometry: a novel protein biochip technology for detection of prostate cancer biomarkers in complex protein mixtures. Prostate Cancer Prostatic Dis 1999;2:264-76 https://doi.org/10.1038/sj.pcan.4500384
  27. Paweletz CP, Trock B, Pennanen M, Tsangaris T, Magnant C, Liotta LA, et al. Proteomic patterns of nipple aspirate fluids obtained by SELDI-TOF: potential for new biomarkers to aid in the diagnosis of breast cancer. Dis Markers 2001;17:301-7 https://doi.org/10.1155/2001/674959
  28. Li J, Zhang Z, Rosenzweig J, Wang YY, Chan DW. Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin Chem 2002;48:1296-304
  29. Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet 2002;359:572-7 https://doi.org/10.1016/S0140-6736(02)07746-2
  30. Streckfus CF, Bigler LR, Zwick M. The use of surface-enhanced laser desorption/ionization time-of-flight mass spectrometry to detect putative breast cancer markers in saliva: a feasibility study. J Oral Pathol Med 2006;35:292-300 https://doi.org/10.1111/j.1600-0714.2006.00427.x
  31. Qu Y, Adam BL, Yasui Y, Ward MD, Cazares LH, Schellhammer PF, et al. Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clin Chem 2002;48: 1835-43
  32. Grizzle WE, Adam BL, Bigbee WL, Conrads TP, Carroll C, Feng Z, et al. Serum protein expression profiling for cancer detection: validation of a SELDI-based approach for prostate cancer. Dis Markers 2003;19:185-95
  33. Vlahou A, Laronga C, Wilson L, Gregory B, Fournier K, McGaughey D, et al. A novel approach toward development of a rapid blood test for breast cancer. Clin Breast Cancer 2003;4: 203-9 https://doi.org/10.3816/CBC.2003.n.026
  34. Zhang Z, Bast RC, Jr., Yu Y, Li J, Sokoll LJ, Rai AJ, et al. Three biomarkers identified from serum proteomic analysis for the detection of early stage ovarian cancer. Cancer Res 2004;64: 5882-90 https://doi.org/10.1158/0008-5472.CAN-04-0746
  35. Wulfkuhle JD, McLean KC, Paweletz CP, Sgroi DC, Trock BJ, Steeg PS, et al. New approaches to proteomic analysis of breast cancer. Proteomics 2001;1:1205-15 https://doi.org/10.1002/1615-9861(200110)1:10<1205::AID-PROT1205>3.0.CO;2-X
  36. Baggerly KA, Morris JS, Coombes KR. Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 2004;20:777-85 https://doi.org/10.1093/bioinformatics/btg484
  37. Bons JAP, Wodzig WKWH, De Boer D, Drent M, Van Dieijen-Visser MP. Application of SELDI-TOF-MS in protein profiling: promises and Pitfalls. Jugoslov Med Biohem 2006;25: 201-10 https://doi.org/10.2298/JMB0603201B