Browse > Article
http://dx.doi.org/10.9723/jksiis.2012.17.6.059

Protein Disorder/Order Region Classification Using EPs-TFP Mining Method  

Lee, Heon Gyu (한국전자통신연구원 융합기술연구부문)
Shin, Yong Ho (영남대학교 경영학부)
Publication Information
Journal of Korea Society of Industrial Information Systems / v.17, no.6, 2012 , pp. 59-72 More about this Journal
Abstract
Since a protein displays its specific functions when disorder region of protein sequence transits to order region with provoking a biological reaction, the separation of disorder region and order region from the sequence data is urgently necessary for predicting three dimensional structure and characteristics of the protein. To classify the disorder and order region efficiently, this paper proposes a classification/prediction method using sequence data while acquiring a non-biased result on a specific characteristics of protein and improving the classification speed. The emerging patterns based EPs-TFP methods utilizes only the essential emerging pattern in which the redundant emerging patterns are removed. This classification method finds the sequence patterns of disorder region, such sequence patterns are frequently shown in disorder region but relatively not frequently in the order region. We expand P-tree and T-tree conceptualized TFP method into a classification/prediction method in order to improve the performance of the proposed algorithm. We used Disprot 4.9 and CASP 7 data to evaluate EPs-TFP technique, the results of order/disorder classification show sensitivity 73.6, specificity 69.51 and accuracy 74.2.
Keywords
Emerging Patterns; TFP-tree; Protein Disorder/Order region;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 J.F. Gibrat, T. Madej, and S.H. Bryant, "Surprising similarities in structure comparison," Curr. Opin. Struct. Biol., vol. 6, pp.377-385, 1996.   DOI
2 안명상, 고정환, 유재수, 조완섭, "단백질 상호작용 네트워크에서 연결노드 추출과 그 중요도 측정," 한국산업정보학회논문지, vol. 12, no. 5, pp.1-13, 2007.
3 S. Maslov, and K. Sneppen, "Specificity and stability in topology of protein networks," Science, vol. 296, pp.910-913, 2006.
4 F. Ferron, S. Longhi, B. Canard, and D. Karlin, "A practical overview of protein disorder prediction methods," Proteins: Structure, Function, and Bioinformatics, vol. 5, pp.1-14, 2006.
5 DT. Jones, and JJ. Ward, "Prediction of disordered regions in proteins from position specific score matrices," Proteins, vol. 53, pp.573-578, 2003.   DOI
6 K. Peng, P. Radivojac, S. Vucetic, AK. Dunker, et al., "Length dependent prediction of protein intrinsic disorder," BMC Bioinformatics, vol. 7 online, 2006.
7 S. Hirose, and K. Shimizu, "POODLE-L: a twolevel SVM prediction system for reliably predicting long disordered regions," Bioinformatics, vol. 23, pp.2046-53, 2007.   DOI
8 T. Ishida, and K. Kinoshita, "PrDOS: prediction of disordered protein regions from amino acid sequence," Nucleic Acids Research, vol. 35, pp.460-464, 2007.   DOI
9 ZR. Yang, R. Thomson, P. McNeil, and RM. Esnouf, "RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins," Bioinformatics, vol. 21, pp.3369-3376, 2005.   DOI   ScienceOn
10 J. Liu, H. Tan, and B. Rost, "Loopy proteins appear conserved in evolution," Mol. Biol., vol. 322, pp.53-64, 2002.   DOI
11 Z. Obradovic, K. Peng, S. Vucetic, P. Radivojac, et al., "Predicting intrinsic disorder from amino acid sequence," Proteins, vol. 53, pp.566-572, 2003.   DOI
12 R. Linding, LJ. Jensen, F. Diella, P. Bork, TJ. Gibson, RB. Russell, "Protein disorder prediction: implications for structural," Proteomics, vol. 11, pp.1453-1459, 2003.
13 R. Linding, LJ. Jensen, F. Diella, P. Bork, et al., "Protein disorder prediction: implications for structural proteomics," Structure. vol. 11, pp.1453-1459, 2003.   DOI   ScienceOn
14 J. Cheng, M. Sweredoski, P. Baldi, "Accurate prediction of protein disordered regions by mining protein structure data," Data Mining and Knowledge Discovery, pp.213-222, 2005.
15 J. Prilusky, C.E. Felder, T. Mordehai, E.H. Rydberg, et al., "FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded," Bioinformatics, vol. 21, pp.3435-3438, 2005.   DOI
16 P. Han, X. Zhang, ZP. Feng, "Predicting disordered regions in proteins using the profiles using amino acid indices," BMC Bioinformatics, vol. 10 online, 2009.
17 F. Coenen, P. Leng, and G. Goulbourne, "Tree Structures for Mining Association Rules," Data Mining and Knowledge Discovery, vol. 15, pp.391-398, 2004.
18 최해원, "대용량 DNA서열 처리를 위한 서픽스트리 생성 알고리즘의 개발," 한국산업정보학회논문지, vol. 15, no. 1, pp.37-46, 2010.
19 G. Dong, X. Zhang, L. Wong, J. Li, "Classification by aggregating emerging patterns," Int'l Conf. on Discovery Science, pp.30-42, 1999.
20 이헌규, 노기용, 류근호 정두영, "심혈관계 질환 진단을 위한 출현 패턴 기반 분류 기법," 한국정보처리학회 16-D, pp.11-26, 2009.
21 S. Vucetic, Z. Obradovic, V. Vacic, P. Radivojac, et al., "DisProt: A Database of Protein Disorder,"Bioinformatics, vol. 21, pp.137-140, 2005.   DOI
22 U. Hobohm, C. Sander, "Enlarged representative set of protein structures," Protein Science, vol. 3, p.522, 1994.
23 J. Moult, K. Fidelis, A. Zemla, T. Hubbard, "Critical assessment of methods of protein structure prediction (CASP)—round 5," Proteins, vol.53, pp.334-339, 2003.   DOI
24 J. Moult, K. Fidelis, B. Rost, T. Hubbard, et al., "Critical assessment of methods of protein structure prediction (CASP)—round 6," Proteins, vol. 61, pp.3-7, 2005.   DOI   ScienceOn
25 J. Li, G. Dong, and K. Ramamohanarao, "Making use of the most expressive jumping emerging patterns for classification," Knowledge and Information Systems, vol. 3, no. 2, pp.131-145, 2001.   DOI
26 G. Dong, X. Zhang, L. Wong, and J. Li, "Classification by aggregating emerging patterns," Int'l Conf. on Discovery Science, Japan, pp.30-42, 1999.
27 W. Li, J. Han and J. Pei, "CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules," ICDM 2001, pp.369-376, 2001.
28 F. Coenen, "LUCS-KDD group, Dept. of Computer Science," The University of Liverpool, UK, "http://www.cSc.liv.ac.uk/-frans/KDD/," 2004.