• Title/Summary/Keyword: biological entity

Search Result 48, Processing Time 0.021 seconds

OryzaGP 2021 update: a rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Liu, Yusha;Yao, Xinzhi;Xia, Jingbo
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.27.1-27.4
    • /
    • 2021
  • Due to the rapid evolution of high-throughput technologies, a tremendous amount of data is being produced in the biological domain, which poses a challenging task for information extraction and natural language understanding. Biological named entity recognition (NER) and named entity normalisation (NEN) are two common tasks aiming at identifying and linking biologically important entities such as genes or gene products mentioned in the literature to biological databases. In this paper, we present an updated version of OryzaGP, a gene and protein dataset for rice species created to help natural language processing (NLP) tools in processing NER and NEN tasks. To create the dataset, we selected more than 15,000 abstracts associated with articles previously curated for rice genes. We developed four dictionaries of gene and protein names associated with database identifiers. We used these dictionaries to annotate the dataset. We also annotated the dataset using pretrained NLP models. Finally, we analysed the annotation results and discussed how to improve OryzaGP.

Named Entity Boundary Recognition Using Hidden Markov Model and Hierarchical Information (은닉 마르코프 모델과 계층 정보를 이용한 개체명 경계 인식)

  • Lim, Heui-Seok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.2
    • /
    • pp.182-187
    • /
    • 2006
  • This paper proposes a method for boundary recognition of named entity using hidden markov model and ontology information of biological named entity. We uses smoothing method using 31 feature information of word and hierarchical information to alleviate sparse data problem in HMM. The GENIA corpus version 2.1 was used to train and to experiment the proposed boundary recognition system. The experimental results show that the proposed system outperform the previous system which did not use ontology information of hierarchical information and smoothing technique. Also the system shows improvement of execution time of boundary recognition.

  • PDF

PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Genomics & Informatics
    • /
    • v.2 no.2
    • /
    • pp.99-106
    • /
    • 2004
  • In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein­protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.

Improving classification of low-resource COVID-19 literature by using Named Entity Recognition

  • Lithgow-Serrano, Oscar;Cornelius, Joseph;Kanjirangat, Vani;Mendez-Cruz, Carlos-Francisco;Rinaldi, Fabio
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.22.1-22.5
    • /
    • 2021
  • Automatic document classification for highly interrelated classes is a demanding task that becomes more challenging when there is little labeled data for training. Such is the case of the coronavirus disease 2019 (COVID-19) clinical repository-a repository of classified and translated academic articles related to COVID-19 and relevant to the clinical practice-where a 3-way classification scheme is being applied to COVID-19 literature. During the 7th Biomedical Linked Annotation Hackathon (BLAH7) hackathon, we performed experiments to explore the use of named-entity-recognition (NER) to improve the classification. We processed the literature with OntoGene's Biomedical Entity Recogniser (OGER) and used the resulting identified Named Entities (NE) and their links to major biological databases as extra input features for the classifier. We compared the results with a baseline model without the OGER extracted features. In these proof-of-concept experiments, we observed a clear gain on COVID-19 literature classification. In particular, NE's origin was useful to classify document types and NE's type for clinical specialties. Due to the limitations of the small dataset, we can only conclude that our results suggests that NER would benefit this classification task. In order to accurately estimate this benefit, further experiments with a larger dataset would be needed.

The Estrogen Receptor Negative-Progesterone Receptor Positive Breast Carcinoma is a Biological Entity and not a Technical Artifact

  • Ng, Char Hong;Pathy, Nirmala Bhoo;Taib, Nur Aishah;Mun, Kein Seong;Rhodes, Anthony;Yip, Cheng Har
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.4
    • /
    • pp.1111-1113
    • /
    • 2012
  • The ER-/PR+ breast tumor may be the result of a false ER negative result. The aim of this study was to investigate whether there is a difference in patient and tumor characteristics of the ER-/PR+ phenotype in an Asian setting. A total of 2629 breast cancer patients were categorized on the basis of their age, ethnicity, tumor hormonal receptor phenotype, grade and histological type. There were 1230 (46.8%) ER+/PR+, 306 (11.6%) ER+/PR-, 122 (4.6%) ER-/PR+ and 972 (37%) ER-/PR-. ER-/PR+ tumors were 2.5 times more likely to be younger than 50 years at diagnosis (OR: 2.52; 95% CI: 1.72-3.67). Compared to ER+/PR+ tumors, the ER-/PR+ phenotype was twice more likely to be associated with grade 3 tumors (OR:2.02; 95%CI: 1.00-4.10). In contrast, compared to ER-/PR- tumors, the ER-/PR+ phenotype was 90% less likely to be associated with a grade 3 tumor (OR: 0.12; 95%CI:0.05-0.26), and more likely to have invasive lobular than invasive ductal histology (OR: 3.66; 95%CI: 1.47-9.11). These results show that the ER-/PR+ phenotype occurs in a younger age group and is associated with intermediate histopathological characteristics compared to ER+/PR+ and ER-/PR- tumors. This may imply that it is a distinct entity and not a technical artifact.

OryzaGP: rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Do, Huy;Wang, Yue
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.17.1-17.3
    • /
    • 2019
  • Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.

A conceptual understanding of macroeconomic interrelationships among science, engineering, technology, industry and national economy

  • Hyun, Jae-Chun
    • Korea-Australia Rheology Journal
    • /
    • v.18 no.1
    • /
    • pp.31-38
    • /
    • 2006
  • A systematic approach is employed to elucidate the interrelationships among macroeconomic entities such as science, engineering, technology, industry and national economy. Specifically, a conceptual, sequential method has been developed to clearly identify the essential ingredients needed for each macroeconomic entity starting from science to transform to the next one, and all the way to the national economy where the production of added-value is of overriding importance. The results thus obtained can then be utilized for macroeconomists to readily apply the engineering theory and knowledge to various macroeconomics situations, while engineers can likewise utilize the results on top of the microeconomic knowledge already prevalent in many engineering fields in getting better grasp of the seemingly difficult nation's macroeconomic picture. Other peripheral concepts and issues such as the evolutionary development of industry, the perspectives of the $21^{st}$ century civilization, an analogy between macroeconomics and chemical engineering, and national policies for each macroeconomic entity are also presented in this study.

Databases and tools for constructing signal transduction networks in cancer

  • Nam, Seungyoon
    • BMB Reports
    • /
    • v.50 no.1
    • /
    • pp.12-19
    • /
    • 2017
  • Traditionally, biologists have devoted their careers to studying individual biological entities of their own interest, partly due to lack of available data regarding that entity. Large, high-throughput data, too complex for conventional processing methods (i.e., "big data"), has accumulated in cancer biology, which is freely available in public data repositories. Such challenges urge biologists to inspect their biological entities of interest using novel approaches, firstly including repository data retrieval. Essentially, these revolutionary changes demand new interpretations of huge datasets at a systems-level, by so called "systems biology". One of the representative applications of systems biology is to generate a biological network from high-throughput big data, providing a global map of molecular events associated with specific phenotype changes. In this review, we introduce the repositories of cancer big data and cutting-edge systems biology tools for network generation, and improved identification of therapeutic targets.

Four Cases of Late-Onset Schizophrenia (만발성 정신분열증 4례)

  • Park, Jong Deuk;Yoon, Doh Joon
    • Korean Journal of Biological Psychiatry
    • /
    • v.2 no.2
    • /
    • pp.295-300
    • /
    • 1995
  • Late-onset schizophrenia(LOS) is a controversial entity. It has been thought that onset of schizophrenia is limited to early adulthood, but many European psychiatrists have reported on the occurrence of schizophrenia in late life. DSM-III restricted the diagnosis of schizophrenia to patients with onset of illness before age 45 years. But, DSM-III-R, DSM-IV, and ICD-10 recognize no upper limit to the age at onset of schizophrenia. Patients with LOS have more visual, tactile, and olfactory hallucinations. Patients with LOS have more persecutory delusions, premorbid schizoid personality traits, and less affective blunting. The course of illness was favorable in LOS. We present four cases of LOS. Their detailed clinical features are reported hear with brief review.

  • PDF

Brain MRI Findings for the Patient with the Late Onset Schizophrenia : Comparison among Patients with the Early Onset Schizophrenia, Progressive Schizophrenia, Senile Dementia and Controls (후기발병 정신분열병 환자에서의 뇌자기공명촬영 소견에 관한 연구 : 조기발병 정신분열병, 진행성 정신분열병, 노인성 치매 및 대조군과의 비교)

  • Park, Doo Sung;Lee, Young Ho;Choi, Young Hee;Park, Young Soo;Chung, Young Cho
    • Korean Journal of Biological Psychiatry
    • /
    • v.4 no.1
    • /
    • pp.74-83
    • /
    • 1997
  • With increasing tendency of incidence and interest for the late onset schzophrenia, concerns about whether this disorder is etiologically or phenomenogically distinctive entity or not have increased also. To clarify the disease entity of the late onset schzophrenia and the role of structural brain changes in its etiology, authors tried to prove following hypothesis : Are there any evidences of structural brain changes in the lateonset schizophrenia? ; If present, are they not different from those of the early-onset schizophrenia or progressive schizophrenia? ; And are they not different from those of senile dementia? Subjects were 6 patients with the late-onset schizophrenia, 6 patients with the early-onset schizophrenia, 6 patients with progressive schizophrenia, 6 patients with Alzheimer's dementia, and 6 controls. We measured regions of interest of the magnetic resonance images by computer assisted planimetry using the AutoCad and digitizer. Our study results may suggest that the third ventricular enlargement and a reversal of normal difference between left and right temporal lobe and left-right difference in posterior lateral ventricle are common brain pathology for all types of schizophrenia including the late onset schzophrenia. And also suggest that brain structural changes of the late onset schizophrenia are related with neurodevelopmental abnormality rather than degenerative change.

  • PDF