• 제목/요약/키워드: Biomedical Interaction Extraction

검색결과 10건 처리시간 0.026초

Utilizing Various Natural Language Processing Techniques for Biomedical Interaction Extraction

  • Park, Kyung-Mi;Cho, Han-Cheol;Rim, Hae-Chang
    • Journal of Information Processing Systems
    • /
    • 제7권3호
    • /
    • pp.459-472
    • /
    • 2011
  • The vast number of biomedical literature is an important source of biomedical interaction information discovery. However, it is complicated to obtain interaction information from them because most of them are not easily readable by machine. In this paper, we present a method for extracting biomedical interaction information assuming that the biomedical Named Entities (NEs) are already identified. The proposed method labels all possible pairs of given biomedical NEs as INTERACTION or NO-INTERACTION by using a Maximum Entropy (ME) classifier. The features used for the classifier are obtained by applying various NLP techniques such as POS tagging, base phrase recognition, parsing and predicate-argument recognition. Especially, specific verb predicates (activate, inhibit, diminish and etc.) and their biomedical NE arguments are very useful features for identifying interactive NE pairs. Based on this, we devised a twostep method: 1) an interaction verb extraction step to find biomedically salient verbs, and 2) an argument relation identification step to generate partial predicate-argument structures between extracted interaction verbs and their NE arguments. In the experiments, we analyzed how much each applied NLP technique improves the performance. The proposed method can be completely improved by more than 2% compared to the baseline method. The use of external contextual features, which are obtained from outside of NEs, is crucial for the performance improvement. We also compare the performance of the proposed method against the co-occurrence-based and the rule-based methods. The result demonstrates that the proposed method considerably improves the performance.

PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Genomics & Informatics
    • /
    • 제2권2호
    • /
    • pp.99-106
    • /
    • 2004
  • In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein­protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.

A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training

  • Tang, Zhan;Guo, Xuchao;Bai, Zhao;Diao, Lei;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권3호
    • /
    • pp.771-791
    • /
    • 2022
  • Protein-protein interaction (PPI) extraction from original text is important for revealing the molecular mechanism of biological processes. With the rapid growth of biomedical literature, manually extracting PPI has become more time-consuming and laborious. Therefore, the automatic PPI extraction from the raw literature through natural language processing technology has attracted the attention of the majority of researchers. We propose a PPI extraction model based on the large pre-trained language model and adversarial training. It enhances the learning of semantic and syntactic features using BioBERT pre-trained weights, which are built on large-scale domain corpora, and adversarial perturbations are applied to the embedding layer to improve the robustness of the model. Experimental results showed that the proposed model achieved the highest F1 scores (83.93% and 90.31%) on two corpora with large sample sizes, namely, AIMed and BioInfer, respectively, compared with the previous method. It also achieved comparable performance on three corpora with small sample sizes, namely, HPRD50, IEPA, and LLL.

기계 학습을 이용한 바이오 분야 학술 문헌에서의 관계 추출에 대한 실험적 연구 (An Experimental Study on the Relation Extraction from Biomedical Abstracts using Machine Learning)

  • 최성필
    • 한국문헌정보학회지
    • /
    • 제50권2호
    • /
    • pp.309-336
    • /
    • 2016
  • 본 논문에서는 지지벡터기계(Support Vector Machines, SVM) 기반의 기계 학습 모듈을 활용하여 특정 문장 내에서의 두 개체 간의 관계를 자동으로 식별하고 분류하는 바이오 분야 관계 추출 시스템을 제안한다. 제안된 시스템의 특징은 개체를 포함하고 있는 문장 내에서 풍부한 언어 자질을 추출하여 학습에 활용함으로써 그 성능을 극대화할 수 있는 다양한 기능들을 포함하고 있다는 점이다. 제안된 시스템의 성능 측정을 위해서 전 세계적으로 많이 활용되고 있는 바이오 분야 관계 추출 표준 컬렉션 3가지를 활용하여 심층적인 실험을 수행한 결과 모든 컬렉션에서 높은 성능을 획득하여 그 우수성을 입증하였다. 결론적으로, 본 논문에서 수행한 바이오 분야 관계 추출에 대한 광범위하고 심층적인 실험 연구가 향후 기계학습 기반의 바이오 분야 텍스트 분석 연구에 많은 시사점을 제공할 것으로 보인다.

바이오 분야 학술 문헌에서의 분야별 관계 추출 데이터셋 반자동 구축에 관한 연구 - 알츠하이머병 유관 유전자 간 상호 작용 중심으로 - (A Study on the Semiautomatic Construction of Domain-Specific Relation Extraction Datasets from Biomedical Abstracts - Mainly Focusing on a Genic Interaction Dataset in Alzheimer's Disease Domain -)

  • 최성필;유석종;조현양
    • 한국도서관정보학회지
    • /
    • 제47권4호
    • /
    • pp.289-307
    • /
    • 2016
  • 본 논문에서는 생의학 분야의 특정 세부 분야에 특화된 관계 추출 학습 말뭉치를 효율적으로 구축할 수 있는 시스템을 소개한다. 이 시스템은 대상 분야에 해당하는 용어집(유전자, 단백질, 질환 명칭 등)을 입력하면, 대용량 상호 작용 데이터베이스를 통해서 이들 용어 간의 연관 관계를 1차적으로 생성하고 생성된 연관 관계 집합을 다시 학술 데이터베이스에서 검색하여 최종적으로 연관 관계 포함 문장을 추출하는 형태로 수행된다. 개발된 시스템의 유용성 검증을 위해서 알츠하이머병 분야에서의 유전자 간 상호 작용 학습 말뭉치를 구축하는데 본 시스템을 적용하였고, 140개의 유전자 집합을 입력하여 이 분야에 특화된 학습 집합인 유전자 쌍 및 상호 작용 포함 문장 3,510 건을 추출하였다. 본 논문에서 제안한 시스템을 활용함으로써 기존에 완전 수작업으로 수행되던 연관 관계 추출용 학습 말뭉치 구축의 효율성을 높일 수 있고 다양한 세부 분야에 적합한 학습 말뭉치 구축에 도움을 줄 수 있다.

TAKES: Two-step Approach for Knowledge Extraction in Biomedical Digital Libraries

  • Song, Min
    • Journal of Information Science Theory and Practice
    • /
    • 제2권1호
    • /
    • pp.6-21
    • /
    • 2014
  • This paper proposes a novel knowledge extraction system, TAKES (Two-step Approach for Knowledge Extraction System), which integrates advanced techniques from Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing (NLP). In particular, TAKES adopts a novel keyphrase extraction-based query expansion technique to collect promising documents. It also uses a Conditional Random Field-based machine learning technique to extract important biological entities and relations. TAKES is applied to biological knowledge extraction, particularly retrieving promising documents that contain Protein-Protein Interaction (PPI) and extracting PPI pairs. TAKES consists of two major components: DocSpotter, which is used to query and retrieve promising documents for extraction, and a Conditional Random Field (CRF)-based entity extraction component known as FCRF. The present paper investigated research problems addressing the issues with a knowledge extraction system and conducted a series of experiments to test our hypotheses. The findings from the experiments are as follows: First, the author verified, using three different test collections to measure the performance of our query expansion technique, that DocSpotter is robust and highly accurate when compared to Okapi BM25 and SLIPPER. Second, the author verified that our relation extraction algorithm, FCRF, is highly accurate in terms of F-Measure compared to four other competitive extraction algorithms: Support Vector Machine, Maximum Entropy, Single POS HMM, and Rapier.

Natural language processing techniques for bioinformatics

  • Tsujii, Jun-ichi
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2003년도 제2차 연례학술대회 발표논문집
    • /
    • pp.3-3
    • /
    • 2003
  • With biomedical literature expanding so rapidly, there is an urgent need to discover and organize knowledge extracted from texts. Although factual databases contain crucial information the overwhelming amount of new knowledge remains in textual form (e.g. MEDLINE). In addition, new terms are constantly coined as the relationships linking new genes, drugs, proteins etc. As the size of biomedical literature is expanding, more systems are applying a variety of methods to automate the process of knowledge acquisition and management. In my talk, I focus on the project, GENIA, of our group at the University of Tokyo, the objective of which is to construct an information extraction system of protein - protein interaction from abstracts of MEDLINE. The talk includes (1) Techniques we use fDr named entity recognition (1-a) SOHMM (Self-organized HMM) (1-b) Maximum Entropy Model (1-c) Lexicon-based Recognizer (2) Treatment of term variants and acronym finders (3) Event extraction using a full parser (4) Linguistic resources for text mining (GENIA corpus) (4-a) Semantic Tags (4-b) Structural Annotations (4-c) Co-reference tags (4-d) GENIA ontology I will also talk about possible extension of our work that links the findings of molecular biology with clinical findings, and claim that textual based or conceptual based biology would be a viable alternative to system biology that tends to emphasize the role of simulation models in bioinformatics.

  • PDF

Effect of Chlorella Growth Factor on the Proliferation of Human Skin Keratinocyte

  • Yong-Ho Kim;Yoo-Kyeong Hwang;Yu-Yon Kim;Su-Mi Ko;Jung-Min Hwang;Yong-Woo Lee
    • 대한의생명과학회지
    • /
    • 제8권4호
    • /
    • pp.229-234
    • /
    • 2002
  • Chlorella is rich in chlorella growth factor (CGF). A review of the literature has described that CGF improves the capability of a Th1-based immunity, anticancer, antioxidant antibacterial activity, growth promotion, wound healing and so on, but has not studied the effect for the metabolism and the proliferation of human skin keratinocyte. The aim of this study was to examine the effect of metabolism and the proliferation of human skin keratinocyte in vitro. CGF was extracted with an autoclaving method which is a modified hot-water extraction method from dried chlorella and conformed by means of absorbance 0.22 at 260 nm. We have measured the extracellular acidification rate (ECAR) of the CGF by Cytosensor$^{\circledR}$ Microphysiometer and evaluated responsiveness depending upon the dosage on the HaCaT cell. The ECAR for the concentrations of 0.15, 1.5, 15, 150 $\mu\textrm{g}$/ml of CGF increased as a 103.6, 128.2, 149.0 and 423.9%, respectively compared to control (0.0 $\mu\textrm{g}$/ml, 100% ECAR). The ECAR for ErbBl tyrosine kinase inhibited by 4-anilinoquinazolines, $C_{16}$H$_{14}$BrN$_3$O$_2$.HCl on tile HaCaT cells with the amounts of 10 $\mu\textrm{g}$/ml of the CCF compared with 100 $\mu\textrm{g}$/ml of rhEGF. The conclusion of the study is that CGF might increase human epidermal keratinocyte proliferation through the interaction between the epidermal growth factor receptor and itself.

  • PDF

Evaluation of Cardiac Function Analysis System Using Magnetic Resonance Images

  • Tae, Ki-Sik;Suh, Tae-Suk;Choe, Bo-Young;Lee, Hyoung-Koo;Shinn, Kyung-Sub;Jung, Seung-Eun;Lee, Jae-Moon
    • 한국의학물리학회지:의학물리
    • /
    • 제10권3호
    • /
    • pp.159-168
    • /
    • 1999
  • Cardiac disease is one of the leading causes of death in Korea. In quantitative analysis of cardiac function and morphological information by three-dimensional reconstruction of magnetic resonance images, left ventricle provides an important role functionally and physiologically. However, existing procedures mostly rely on the extensive human interaction and are seldom evaluated on clinical applications. In this study, we developed a system which could perform automatic extraction of enpicardial and endocardial contour and analysis of cardiac function to evaluate reliability and stability of each system comparing with the result of ARGUS system offered 1.5T Siemens MRI system and manual method performed by clinicians. For various aspects, we investigated reliability of each system by compared with left ventricular contour, end-diastolic volume (EDV), end-systolic volume (ESV), stock volume (SV), ejection fraction (EF), cardiac output (CO) and wall thickness (WT). When comparing with manual method, extracted results of developed process using minimum error threshold (MET) method that automatically extracts contour from cardiac MR images and ARGUS system were demonstrated as successful rate 90% of the contour extraction. When calculating cardiac function parameters using MET and comparing with using correlation coefficients analysis method, the process extracts endocardial and epicardial contour using MET, values from automatic and ARGUS method agreed with manual values within :t 3% average error. It was successfully demonstrated that automatic method using threshold technique could provide high potential for assessing of each parameters with relatively high reliability compared with manual method. In this study, the method developed in this study could reduce processing time compared with ARGUS and manual method due to a simple threshold technique. This method is useful for diagnosis of cardiac disease, simulating physiological function and amount of blood flow of left ventricle. In addition, this method could be valuable in developing automatic systems in order to apply to other deformable image models.

  • PDF

지첨용적맥파의 파형분석과 주파수분석에 대한 문헌적 연구 (A Systemic Review of Pulse Contour Analysis and Fourier Spectrum Analysis on the Photoplethysmography of Digit)

  • 남동현;박영배;박영재;신상훈
    • 대한한의진단학회지
    • /
    • 제11권1호
    • /
    • pp.48-60
    • /
    • 2007
  • Palpation of the pulse has been used in Korean traditional medicine since ancient times to assess physical health. Pulse wave contour may be obtained by measuring arterial pressure or blood volume change of skin. The latter is called as Photoplethysmography(PPG) or digital volume pulse(DVP). The PPG signal is measured by a device comprising an infrared light sourece and a photodetector. Although less widely used, this technique deserves further consideration because of its simplicity and ease of use. The contour of the PPG is formed as a result of a complex interaction between the left ventricle and the systemic circulation. It usually exhibits an early systolic peak and an early diastolic peak. the first peak is formed mainly by pressure trasmitted along a direct path from the left ventricle to the finger. The second peak is formed in part by pressure transmitted along the aorta and large arteries to sites of impedance mismatch in the lower body. The contour of the PPG is sensitive to changes in arterial tone and is influenced by ageing and large artery stiffness. Measurements taken directly from the PPG or from its second derivative can be used to assess these properties. In some mathematical approaches, the extraction of periodic components using frequency analysis was tried to analysis of the PPG. But we don't understand yet what kind of factor in the cardiovascular system or human body is related with the respective specific Fourier components of PPG. This review describes the background to measurement principles, representative contour, contour analysis and frequency domain analysis of PPG, and current and future.

  • PDF