• Title/Summary/Keyword: Agricultural natural language processing

Search Result 9, Processing Time 0.021 seconds

OryzaGP 2021 update: a rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Liu, Yusha;Yao, Xinzhi;Xia, Jingbo
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.27.1-27.4
    • /
    • 2021
  • Due to the rapid evolution of high-throughput technologies, a tremendous amount of data is being produced in the biological domain, which poses a challenging task for information extraction and natural language understanding. Biological named entity recognition (NER) and named entity normalisation (NEN) are two common tasks aiming at identifying and linking biologically important entities such as genes or gene products mentioned in the literature to biological databases. In this paper, we present an updated version of OryzaGP, a gene and protein dataset for rice species created to help natural language processing (NLP) tools in processing NER and NEN tasks. To create the dataset, we selected more than 15,000 abstracts associated with articles previously curated for rice genes. We developed four dictionaries of gene and protein names associated with database identifiers. We used these dictionaries to annotate the dataset. We also annotated the dataset using pretrained NLP models. Finally, we analysed the annotation results and discussed how to improve OryzaGP.

The Definition of Data Structure for Design Knowledge Database and Development of the Interface Program for using Natural Language Processing (설계지식 데이터베이스의 자료구조 규명과 자연어처리를 이용한 인터페이스 프로그램 개발)

  • 이정재;이민호;윤성수
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.43 no.6
    • /
    • pp.187-196
    • /
    • 2001
  • In this study, by using the natural language processing of the field of artificial intelligence, automated index was performed. And then, the Natural Language Processing Interface for knowledge representation(NALPI) has been developed. Furthermore, the DEsign KnOwledge DataBase(DEKODB) has been also developed, which is designed to interlock the knowledge base. The DEKODB processes both the documented design-data, like a concrete standard specification, and the design knowledge from an expert. The DEKODB is also simulates the design space of structures accordance with the production rule, and thus it is determined that DEKODB can be used as a engine to retrieve new knowledge and to implement knowledge base that is necessary to the development of automatic design system. The application field of the system, which has been developed in this study, can be expanded by supplement of the design knowledge at DEKODB and developing dictionaries for foreign languages. Furthermore, the perfect automation at the data accumulation and development of the automatic rule generator should benefit the unified design automation.

  • PDF

A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training

  • Tang, Zhan;Guo, Xuchao;Bai, Zhao;Diao, Lei;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.771-791
    • /
    • 2022
  • Protein-protein interaction (PPI) extraction from original text is important for revealing the molecular mechanism of biological processes. With the rapid growth of biomedical literature, manually extracting PPI has become more time-consuming and laborious. Therefore, the automatic PPI extraction from the raw literature through natural language processing technology has attracted the attention of the majority of researchers. We propose a PPI extraction model based on the large pre-trained language model and adversarial training. It enhances the learning of semantic and syntactic features using BioBERT pre-trained weights, which are built on large-scale domain corpora, and adversarial perturbations are applied to the embedding layer to improve the robustness of the model. Experimental results showed that the proposed model achieved the highest F1 scores (83.93% and 90.31%) on two corpora with large sample sizes, namely, AIMed and BioInfer, respectively, compared with the previous method. It also achieved comparable performance on three corpora with small sample sizes, namely, HPRD50, IEPA, and LLL.

A Study on the Construction of Database contains Knowledge for the Structural Design using the Natural Language Processing (자연어처리를 이용한 구조물 설계지식정보 데이터베이스 구축에 관한연구)

  • 이민호;이정재;김한중;윤성수
    • Proceedings of the Korean Society of Agricultural Engineers Conference
    • /
    • 1999.10c
    • /
    • pp.245-251
    • /
    • 1999
  • In this study, by using the natural language processing of the field artificial intelligence, automated index was performed . And then, the Natural Language Processor for Constructing Database (NALPDB) has been developed. Furthermore, the Design knowldege Information Relational DataBase (DIREDB) has been also developed, which is designed to interlock the knowledge base. DIREDB processes both the documented design-data , like a concrete standard specification, and the design knowledge frrom an expert. DIREDB is also simulates the design space of structures accordance with the production rule, and thus it is determined that DIREDB can be used as a engine to retrieve new knowledge and to implement knowldege base that is necessary to the development of automatic design system.

  • PDF

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.

LitCovid-AGAC: cellular and molecular level annotation data set based on COVID-19

  • Ouyang, Sizhuo;Wang, Yuxing;Zhou, Kaiyin;Xia, Jingbo
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.23.1-23.7
    • /
    • 2021
  • Currently, coronavirus disease 2019 (COVID-19) literature has been increasing dramatically, and the increased text amount make it possible to perform large scale text mining and knowledge discovery. Therefore, curation of these texts becomes a crucial issue for Bio-medical Natural Language Processing (BioNLP) community, so as to retrieve the important information about the mechanism of COVID-19. PubAnnotation is an aligned annotation system which provides an efficient platform for biological curators to upload their annotations or merge other external annotations. Inspired by the integration among multiple useful COVID-19 annotations, we merged three annotations resources to LitCovid data set, and constructed a cross-annotated corpus, LitCovid-AGAC. This corpus consists of 12 labels including Mutation, Species, Gene, Disease from PubTator, GO, CHEBI from OGER, Var, MPA, CPA, NegReg, PosReg, Reg from AGAC, upon 50,018 COVID-19 abstracts in LitCovid. Contain sufficient abundant information being possible to unveil the hidden knowledge in the pathological mechanism of COVID-19.

Question Similarity Measurement of Chinese Crop Diseases and Insect Pests Based on Mixed Information Extraction

  • Zhou, Han;Guo, Xuchao;Liu, Chengqi;Tang, Zhan;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.3991-4010
    • /
    • 2021
  • The Question Similarity Measurement of Chinese Crop Diseases and Insect Pests (QSM-CCD&IP) aims to judge the user's tendency to ask questions regarding input problems. The measurement is the basis of the Agricultural Knowledge Question and Answering (Q & A) system, information retrieval, and other tasks. However, the corpus and measurement methods available in this field have some deficiencies. In addition, error propagation may occur when the word boundary features and local context information are ignored when the general method embeds sentences. Hence, these factors make the task challenging. To solve the above problems and tackle the Question Similarity Measurement task in this work, a corpus on Chinese crop diseases and insect pests(CCDIP), which contains 13 categories, was established. Then, taking the CCDIP as the research object, this study proposes a Chinese agricultural text similarity matching model, namely, the AgrCQS. This model is based on mixed information extraction. Specifically, the hybrid embedding layer can enrich character information and improve the recognition ability of the model on the word boundary. The multi-scale local information can be extracted by multi-core convolutional neural network based on multi-weight (MM-CNN). The self-attention mechanism can enhance the fusion ability of the model on global information. In this research, the performance of the AgrCQS on the CCDIP is verified, and three benchmark datasets, namely, AFQMC, LCQMC, and BQ, are used. The accuracy rates are 93.92%, 74.42%, 86.35%, and 83.05%, respectively, which are higher than that of baseline systems without using any external knowledge. Additionally, the proposed method module can be extracted separately and applied to other models, thus providing reference for related research.

Audio Guidance Application For Commodity Prices Using Public Data And AI Chatbot (공공데이터와 AI챗봇을 이용한 물가 음성안내 앱 서비스)

  • Lee, Jae-Seon;Kang, Kyeong-Don;Park, Tae-Yok;Jung, Deok-Gil
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.251-253
    • /
    • 2018
  • As the prices of agricultural, fishery, and dairy products have been fluctuating due to recent instability on commodity prices, so consumers have been more inclined to make purchase without specific criteria by relying on marketing or their personal experiences and senses of market. The core function of this application is precisely and conveniently telling the consumption index to consumers who are waved by unstable commodity prices by helping users to easily understand the price index of agricultural, fishery, and dairy products in real time using public data. And, it also includes the AI Chatbot and voice recognition function, and meets the convenience of natural language processing and hands-free etc..

  • PDF

Development of an Energy Model of Rice Processing Complex(II) -Simulation Model Development and Analysis of Energy Requirement- (미곡종합처리장의 에너지 모델 개발(II) -시뮬레이션 모델 개발 및 소요 에너지 분석-)

  • 장홍희;장동일;김만수
    • Journal of Biosystems Engineering
    • /
    • v.20 no.3
    • /
    • pp.275-287
    • /
    • 1995
  • The rice processing complex(RPC) consisted of the rice handling, drying, storage, and milling processes. It has been established at 83 locations domestically by April 1994, and 200 of RPC will be built more throughout the country. Therefore, this study has been performed to achieve two objectives as the followings : 1) Development of mathematical models which can assess the requirement of electricity, fuel, and labor for four model systems of rice processing complex. 2) Development of a computer simulation model which produce the improved designs of RPC by the evaluation results of energy requirements of four RPC models. The results from this study are summarized as follows : 1) Mathematical models were developed on the basis of result of mass balance analysis and required power of machines for each process. 2) A computer simulation model was developed, which can produce the improved designs of RPC by the evaluation results of energy requirements. The computer simulation model language was BORLAND $C^{++}$. 3) The results of simulation showed that total energy requirements were ranged from 75.94㎾h/t to 124.30㎾h/t. 4) From the results of computer analysis of energy requirement classified by drying type, it was found that energy requirement of the drying type A{paddy rice (PR) for storage-natural air drying(15%), PR for milling-heated air drying(16%)} were less than that of the drying type B{1 step-natural air drying(PR for storage : 18%, PR for milling : 20%), 2 step-heated air drying(PR for storage : 15%, PR for milling : 16%)}. 5) The energy efficient drying method is that all the incoming rough rice to RPC should be dried by national air drying systems. If it is more than the capacity of national air drying system, the amount of surplus rough rice is recommended to be dried by the heated air drying method.

  • PDF