DOI QR코드

DOI QR Code

Inferring Disease-related Genes using Title and Body in Biomedical Text

생물학 문헌 데이터의 제목과 본문을 이용한 질병 관련 유전자 추론 방법

  • 김정우 (연세대학교 컴퓨터과학과) ;
  • 김현진 (연세대학교 컴퓨터과학과) ;
  • 여윤구 (연세대학교 컴퓨터과학과) ;
  • 신민철 (연세대학교 컴퓨터과학과) ;
  • 박상현 (연세대학교 컴퓨터과학과)
  • Received : 2016.06.24
  • Accepted : 2016.10.21
  • Published : 2017.01.15

Abstract

After the genome projects of the 90s, a vast number of gene studies have been stored in online databases. By using these databases, several biological relationships can be inferred. In this study, we proposed a method to infer disease-gene relationships using title and body in biomedical text. The title was used to extract hub genes from data in the literature; whereas, the body of the literature was used to extract sub genes that are related to hub genes. Through these steps, we were able to construct a local gene-network for each report in the literature. By integrating the local gene-networks, we then constructed a global gene-network. Subsequent analyses of the global gene-network allowed inference of disease-related genes with high rank. We validated the proposed method by comparing with previous methods. The results indicated that the proposed method is a meaningful approach to infer disease-related genes.

1990년대 게놈프로젝트 이후 유전자와 관련된 많은 연구가 진행되고 있다. 데이터 저장 기술의 발달로 연구의 결과물들은 다량의 문헌들로 기록되고 있으며, 이러한 문헌들은 새로운 생물학적 관계들을 추론하는 데이터로 유용하게 사용되고 있다. 이러한 이유로 본 연구에서는 생물학 문헌들을 활용하여 질병과 관련한 유전자를 추론하는 방법론에 대해서 제안한다. 문헌들을 제목과 본문으로 구분하고, 각 영역에서 등장한 유전자들을 추출한다. 제목 영역에서 추출된 유전자는 중심 유전자로 구분하고, 본문 영역에서 추출된 유전자는 제목에서 추출된 유전자와 관계를 갖는 주변 유전자로 구분한다. 이러한 과정을 각 문헌에 적용하여, 지역 유전자 네트워크를 구축한다. 구축된 지역 유전자 네트워크는 모두 연결하여 전역유전자 네트워크를 구축한다. 구축한 네트워크를 분석하여 질병 관련 유전자를 추론하였으며, 비교 실험을 통해 제안하는 방법론이 질병 관련 유전자를 추론하는 유용한 방법론임을 입증하였다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. PubMed: MEDLINE Retrieval on the World Wide Web. DOI=http://www.sanger.ac.uk/
  2. Chiang, J.H., Yu, H.C., and Hsu, H.J. GIS: a biomedical text-mining system for gene information discovery, Bioinformatics. 20, 1, (2004), 120-121. https://doi.org/10.1093/bioinformatics/btg369
  3. Xie, B., Ding, G., Han, H., Wu, D. miRCancer: a microRNA-cancer association database constructed by text mining on literature, Bioinformatics. 2013. 29(6):638-644. https://doi.org/10.1093/bioinformatics/btt014
  4. Lee, S., Choi, J., Park, K.., Song, M., and Lee, D. Discovering context-specific relationships from biological literature by using multi-level context terms, BMC Medical Informatics and Decision Making. 12(Suppl 1):S1 (2012). https://doi.org/10.1186/1472-6947-12-S1-S1
  5. Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., Sharan, R. Associating Genes and Protein Complexes with Disease via Network Propagation. PLoS Comput Biol. 6(1): e1000641. https://doi.org/10.1371/journal.pcbi.1000641
  6. Li, S., Wu, L., and Zhang, Z. Constructing biological networks through combined literature mining and microarray analysis: a LMMA approach. Bioinformatics. 22, 17 (2006), 2143-2150. https://doi.org/10.1093/bioinformatics/btl363
  7. HGNC Database, HUGO Gene Nomenclature Committee (HGNC), EMBL Outstation - Hinxton, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK. DOI=http://www.genenames.org/
  8. Wellcome Trust Sanger Institute. DOI=http://www.sanger.ac.uk/
  9. KEGG: Kyoto Encyclopedia of Genes and Genomes. DOI=http://www.genome.jp/kegg/
  10. National Library of Medicine (US). Genetics Home Reference [Internet]. Bethesda (MD): The Library. DOI=http://ghr.nlm.nih.gov/
  11. National Cancer Institute: Comprehensive Cancer Information. DOI=http://www.cancer.gov/
  12. Gottlieb, A., Magger, O., Berman, I., Ruppin, E., Sharan, R. PRINCIPLE: a tool for associating genes with diseases via network propagation, Bioinformatics. 2011. 27(23):3325-3326.
  13. Hong, L., Han, Y., Zhang, H., Zhao, Q., Yang, J., Ahuja, N. High expression of epidermal growth factor receptor might predict poor survival in patient with colon cancer: a meta-analysis. Genet Test Mol Biomarkers. 2013; 17(4) :348-51. https://doi.org/10.1089/gtmb.2012.0421
  14. Teng, Z., Wang, L., Cai, S., Yu, P., Wang, J., Gong, J., Liu, Y. The 677C>T (rs1801133) polymorphism in the MTHFR gene contributes to colorectal cancer risk: a meta-analysis based on 71 research studies. PLoS One. 2013; 8(2):e55332. https://doi.org/10.1371/journal.pone.0055332
  15. Saito, S., Okabe, H., Watanabe, M., Ishimoto, T., Iwatsuki, M., Baba, Y., Tanaka, Y., Kurashige, J., Miyamoto, Y., Baba, H. CD44v6 expression is related to mesenchymal phenotype and poor prognosis in patients with colorectal cancer. Oncol Rep. 2013 Apr; 29(4):1570-8. https://doi.org/10.3892/or.2013.2273
  16. Hinoi, T., Loda, M., Fearon, ER., Silencing of CDX2 expression in colon cancer via a dominant repression pathway. J Biol Chem. 2003 Nov 7;278(45): 44608-16. https://doi.org/10.1074/jbc.M307435200
  17. Park, JH., Kim, NS., Park, JY., Chae, YS., Kim, JG., Sohn, SK., Moon, JH., Kang, BW., Tyoo, HM., Bae, SH., Choi, GS., Jun, SH. MGMT -533G>T polymorphism is associated with prognosis for patients with metastatic colorectal cancer treated with oxaliplatin-based chemotherapy. J Cancer Res Clin Oncol. 2010 Aug;136(8):1135-42. https://doi.org/10.1007/s00432-010-0760-8
  18. Liu, C., Wang, QS., Wang, YJ. The CHEK2 I157T variant and colorectal cancer susceptibility: a systematic review and meta-analysis. Asian Pan J Cancer Prev. 2012;13(5);2051-5. https://doi.org/10.7314/APJCP.2012.13.5.2051
  19. Bajro, MH., Josifovski, T., Panovski, M., Jankulovski, N., Nestorovska, AK., Metevska, N., Petrusevska, N., Dimovski, AJ. Promoter length polymorphism in UGT1A1 and the risk of sporadic colorectal cancer. Cancer genetics, 2012 Apr;205(4):163-7. https://doi.org/10.1016/j.cancergen.2012.01.015
  20. Wang, W., Zhao, C., Jou, D., Lu, J., Zhang, C., Lin, L., Lin, J. Ursolic acid inhibits the growth of colon cancer-initiating cells by targeting STAT3. Anticancner Res. 2013 Oct;33(10):4279-84.
  21. Tang, Y., Zhu, L., Li, Y., Ji, J., Li, J., Yuan, F., Wang, D., Chen, W., Huang, O., Chen, X., Wu, J., Shen, K., Loo, WT., Chow, LW. Overexpression of epithelial growth factor receptor (EGFR) predicts better response to neo-adjuvant chemotherapy in patients with triple-negative breast cancer. J Transl Med. 2012 Sep 19;10 Suppl 1:S4. https://doi.org/10.1186/1479-5876-10-S1-S4
  22. Tulsyan, S., Agarwal, G., Lal, P., Agrawal, S., Mittal, RD., Mittal, B. CD44 gene polymorphisms in breast cancer risk and prognosis: a study in North Indian population. PLoS One. 2013 Aug 5;8(8):e71073
  23. Jung, JA., Lim, HS. Association between CYP2D6 genotypes and the clinical outcomes of adjuvant tamoxifen for breast cancer: a meta-analysis. Pharmacogenomics. 2014 Jan;15(1):49-60. https://doi.org/10.2217/pgs.13.221
  24. Buck, K., Hug, S., Seibold, P., Ferschke, I., Altevogt, P., Sohn, C., Schneeweiss, A., Burwinkel, B., Jager, D., Flesch-Janys, D., Chang-Claude, J., Marme, F. CD24 polymorphisms in breast cancer: impact on prognosis and risk. Breast Cancer Res Treat. 2013 Feb;137(3):927-37. https://doi.org/10.1007/s10549-012-2325-9
  25. Piotrowski, P., Lianeri, M., Rubis, B., Knula, H., Rybczynska, M., Grodecka-Gazdecka, S., Jagodzinski, PP. Murine double minute clone 2,309T/G and 285G/C promoter single nucleotide polymorphism as a risk factor for breast cancer: a Polish experience. Int J Biol Markers. 2012 Jul 19;27(2):e105-10. https://doi.org/10.5301/JBM.2012.9140
  26. Araujo, AP., Ribeiro, R., Pinto, D., Pereira, D., Sousa, B., Mauricio, J., Lopes, C., Medeiros, R. Epidermal growth factor genetic variation, breast cancer risk, and waiting time to onset of disease. DNA Cell Biol. 2009 May;28(5):265-9. https://doi.org/10.1089/dna.2008.0823