• 제목/요약/키워드: Agricultural natural language processing

검색결과 9건 처리시간 0.028초

OryzaGP 2021 update: a rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Liu, Yusha;Yao, Xinzhi;Xia, Jingbo
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.27.1-27.4
    • /
    • 2021
  • Due to the rapid evolution of high-throughput technologies, a tremendous amount of data is being produced in the biological domain, which poses a challenging task for information extraction and natural language understanding. Biological named entity recognition (NER) and named entity normalisation (NEN) are two common tasks aiming at identifying and linking biologically important entities such as genes or gene products mentioned in the literature to biological databases. In this paper, we present an updated version of OryzaGP, a gene and protein dataset for rice species created to help natural language processing (NLP) tools in processing NER and NEN tasks. To create the dataset, we selected more than 15,000 abstracts associated with articles previously curated for rice genes. We developed four dictionaries of gene and protein names associated with database identifiers. We used these dictionaries to annotate the dataset. We also annotated the dataset using pretrained NLP models. Finally, we analysed the annotation results and discussed how to improve OryzaGP.

설계지식 데이터베이스의 자료구조 규명과 자연어처리를 이용한 인터페이스 프로그램 개발 (The Definition of Data Structure for Design Knowledge Database and Development of the Interface Program for using Natural Language Processing)

  • 이정재;이민호;윤성수
    • 한국농공학회지
    • /
    • 제43권6호
    • /
    • pp.187-196
    • /
    • 2001
  • In this study, by using the natural language processing of the field of artificial intelligence, automated index was performed. And then, the Natural Language Processing Interface for knowledge representation(NALPI) has been developed. Furthermore, the DEsign KnOwledge DataBase(DEKODB) has been also developed, which is designed to interlock the knowledge base. The DEKODB processes both the documented design-data, like a concrete standard specification, and the design knowledge from an expert. The DEKODB is also simulates the design space of structures accordance with the production rule, and thus it is determined that DEKODB can be used as a engine to retrieve new knowledge and to implement knowledge base that is necessary to the development of automatic design system. The application field of the system, which has been developed in this study, can be expanded by supplement of the design knowledge at DEKODB and developing dictionaries for foreign languages. Furthermore, the perfect automation at the data accumulation and development of the automatic rule generator should benefit the unified design automation.

  • PDF

A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training

  • Tang, Zhan;Guo, Xuchao;Bai, Zhao;Diao, Lei;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권3호
    • /
    • pp.771-791
    • /
    • 2022
  • Protein-protein interaction (PPI) extraction from original text is important for revealing the molecular mechanism of biological processes. With the rapid growth of biomedical literature, manually extracting PPI has become more time-consuming and laborious. Therefore, the automatic PPI extraction from the raw literature through natural language processing technology has attracted the attention of the majority of researchers. We propose a PPI extraction model based on the large pre-trained language model and adversarial training. It enhances the learning of semantic and syntactic features using BioBERT pre-trained weights, which are built on large-scale domain corpora, and adversarial perturbations are applied to the embedding layer to improve the robustness of the model. Experimental results showed that the proposed model achieved the highest F1 scores (83.93% and 90.31%) on two corpora with large sample sizes, namely, AIMed and BioInfer, respectively, compared with the previous method. It also achieved comparable performance on three corpora with small sample sizes, namely, HPRD50, IEPA, and LLL.

자연어처리를 이용한 구조물 설계지식정보 데이터베이스 구축에 관한연구 (A Study on the Construction of Database contains Knowledge for the Structural Design using the Natural Language Processing)

  • 이민호;이정재;김한중;윤성수
    • 한국농공학회:학술대회논문집
    • /
    • 한국농공학회 1999년도 Proceedings of the 1999 Annual Conference The Korean Society of Agricutural Engineers
    • /
    • pp.245-251
    • /
    • 1999
  • In this study, by using the natural language processing of the field artificial intelligence, automated index was performed . And then, the Natural Language Processor for Constructing Database (NALPDB) has been developed. Furthermore, the Design knowldege Information Relational DataBase (DIREDB) has been also developed, which is designed to interlock the knowledge base. DIREDB processes both the documented design-data , like a concrete standard specification, and the design knowledge frrom an expert. DIREDB is also simulates the design space of structures accordance with the production rule, and thus it is determined that DIREDB can be used as a engine to retrieve new knowledge and to implement knowldege base that is necessary to the development of automatic design system.

  • PDF

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • 제44권4호
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.

LitCovid-AGAC: cellular and molecular level annotation data set based on COVID-19

  • Ouyang, Sizhuo;Wang, Yuxing;Zhou, Kaiyin;Xia, Jingbo
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.23.1-23.7
    • /
    • 2021
  • Currently, coronavirus disease 2019 (COVID-19) literature has been increasing dramatically, and the increased text amount make it possible to perform large scale text mining and knowledge discovery. Therefore, curation of these texts becomes a crucial issue for Bio-medical Natural Language Processing (BioNLP) community, so as to retrieve the important information about the mechanism of COVID-19. PubAnnotation is an aligned annotation system which provides an efficient platform for biological curators to upload their annotations or merge other external annotations. Inspired by the integration among multiple useful COVID-19 annotations, we merged three annotations resources to LitCovid data set, and constructed a cross-annotated corpus, LitCovid-AGAC. This corpus consists of 12 labels including Mutation, Species, Gene, Disease from PubTator, GO, CHEBI from OGER, Var, MPA, CPA, NegReg, PosReg, Reg from AGAC, upon 50,018 COVID-19 abstracts in LitCovid. Contain sufficient abundant information being possible to unveil the hidden knowledge in the pathological mechanism of COVID-19.

Question Similarity Measurement of Chinese Crop Diseases and Insect Pests Based on Mixed Information Extraction

  • Zhou, Han;Guo, Xuchao;Liu, Chengqi;Tang, Zhan;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권11호
    • /
    • pp.3991-4010
    • /
    • 2021
  • The Question Similarity Measurement of Chinese Crop Diseases and Insect Pests (QSM-CCD&IP) aims to judge the user's tendency to ask questions regarding input problems. The measurement is the basis of the Agricultural Knowledge Question and Answering (Q & A) system, information retrieval, and other tasks. However, the corpus and measurement methods available in this field have some deficiencies. In addition, error propagation may occur when the word boundary features and local context information are ignored when the general method embeds sentences. Hence, these factors make the task challenging. To solve the above problems and tackle the Question Similarity Measurement task in this work, a corpus on Chinese crop diseases and insect pests(CCDIP), which contains 13 categories, was established. Then, taking the CCDIP as the research object, this study proposes a Chinese agricultural text similarity matching model, namely, the AgrCQS. This model is based on mixed information extraction. Specifically, the hybrid embedding layer can enrich character information and improve the recognition ability of the model on the word boundary. The multi-scale local information can be extracted by multi-core convolutional neural network based on multi-weight (MM-CNN). The self-attention mechanism can enhance the fusion ability of the model on global information. In this research, the performance of the AgrCQS on the CCDIP is verified, and three benchmark datasets, namely, AFQMC, LCQMC, and BQ, are used. The accuracy rates are 93.92%, 74.42%, 86.35%, and 83.05%, respectively, which are higher than that of baseline systems without using any external knowledge. Additionally, the proposed method module can be extracted separately and applied to other models, thus providing reference for related research.

공공데이터와 AI챗봇을 이용한 물가 음성안내 앱 서비스 (Audio Guidance Application For Commodity Prices Using Public Data And AI Chatbot)

  • 이재선;강경돈;박태억;정덕길
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2018년도 춘계학술대회
    • /
    • pp.251-253
    • /
    • 2018
  • 최근 물가 불안정으로 인하여 농수축산물품 가격변동이 심화됨에 따라 소비자들은 제품 구매 시 정확한 기준 없이 시장의 마케팅 또는 경험과 감각에 의지하여 구매 활동을 하는 경향이 강해졌다. 이 논문에서 구현하는 애플리케이션의 핵심 기능은 공공데이터를 이용하여 실시간으로 농수축산물의 물가 지수를 간단하게 알아볼 수 있도록 도와준다. 또한, 구현하는 앱에 AI챗봇 및 음성인식 기능을 도입하는 동시에 자연어 처리와 핸즈프리 등의 편리성을 충족시켜 불안정한 물가에 휩쓸리는 소비자들에게 정확하고 편리하게 소비 지표를 알려주는 데 주요 목적이 있다.

  • PDF

미곡종합처리장의 에너지 모델 개발(II) -시뮬레이션 모델 개발 및 소요 에너지 분석- (Development of an Energy Model of Rice Processing Complex(II) -Simulation Model Development and Analysis of Energy Requirement-)

  • 장홍희;장동일;김만수
    • Journal of Biosystems Engineering
    • /
    • 제20권3호
    • /
    • pp.275-287
    • /
    • 1995
  • The rice processing complex(RPC) consisted of the rice handling, drying, storage, and milling processes. It has been established at 83 locations domestically by April 1994, and 200 of RPC will be built more throughout the country. Therefore, this study has been performed to achieve two objectives as the followings : 1) Development of mathematical models which can assess the requirement of electricity, fuel, and labor for four model systems of rice processing complex. 2) Development of a computer simulation model which produce the improved designs of RPC by the evaluation results of energy requirements of four RPC models. The results from this study are summarized as follows : 1) Mathematical models were developed on the basis of result of mass balance analysis and required power of machines for each process. 2) A computer simulation model was developed, which can produce the improved designs of RPC by the evaluation results of energy requirements. The computer simulation model language was BORLAND $C^{++}$. 3) The results of simulation showed that total energy requirements were ranged from 75.94㎾h/t to 124.30㎾h/t. 4) From the results of computer analysis of energy requirement classified by drying type, it was found that energy requirement of the drying type A{paddy rice (PR) for storage-natural air drying(15%), PR for milling-heated air drying(16%)} were less than that of the drying type B{1 step-natural air drying(PR for storage : 18%, PR for milling : 20%), 2 step-heated air drying(PR for storage : 15%, PR for milling : 16%)}. 5) The energy efficient drying method is that all the incoming rough rice to RPC should be dried by national air drying systems. If it is more than the capacity of national air drying system, the amount of surplus rough rice is recommended to be dried by the heated air drying method.

  • PDF