• Title/Summary/Keyword: Entity name recognition

Search Result 21, Processing Time 0.027 seconds

A Study on the Performance Analysis of Entity Name Recognition Techniques Using Korean Patent Literature

  • Gim, Jangwon
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.2
    • /
    • pp.139-151
    • /
    • 2020
  • Entity name recognition is a part of information extraction that extracts entity names from documents and classifies the types of extracted entity names. Entity name recognition technologies are widely used in natural language processing, such as information retrieval, machine translation, and query response systems. Various deep learning-based models exist to improve entity name recognition performance, but studies that compared and analyzed these models on Korean data are insufficient. In this paper, we compare and analyze the performance of CRF, LSTM-CRF, BiLSTM-CRF, and BERT, which are actively used to identify entity names using Korean data. Also, we compare and evaluate whether embedding models, which are variously used in recent natural language processing tasks, can affect the entity name recognition model's performance improvement. As a result of experiments on patent data and Korean corpus, it was confirmed that the BiLSTM-CRF using FastText method showed the highest performance.

Performance Comparison and Error Analysis of Korean Bio-medical Named Entity Recognition (한국어 생의학 개체명 인식 성능 비교와 오류 분석)

  • Jae-Hong Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.4
    • /
    • pp.701-708
    • /
    • 2024
  • The advent of transformer architectures in deep learning has been a major breakthrough in natural language processing research. Object name recognition is a branch of natural language processing and is an important research area for tasks such as information retrieval. It is also important in the biomedical field, but the lack of Korean biomedical corpora for training has limited the development of Korean clinical research using AI. In this study, we built a new biomedical corpus for Korean biomedical entity name recognition and selected language models pre-trained on a large Korean corpus for transfer learning. We compared the name recognition performance of the selected language models by F1-score and the recognition rate by tag, and analyzed the errors. In terms of recognition performance, KlueRoBERTa showed relatively good performance. The error analysis of the tagging process shows that the recognition performance of Disease is excellent, but Body and Treatment are relatively low. This is due to over-segmentation and under-segmentation that fails to properly categorize entity names based on context, and it will be necessary to build a more precise morphological analyzer and a rich lexicon to compensate for the incorrect tagging.

Bi-directional LSTM-CNN-CRF for Korean Named Entity Recognition System with Feature Augmentation (자질 보강과 양방향 LSTM-CNN-CRF 기반의 한국어 개체명 인식 모델)

  • Lee, DongYub;Yu, Wonhee;Lim, HeuiSeok
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.12
    • /
    • pp.55-62
    • /
    • 2017
  • The Named Entity Recognition system is a system that recognizes words or phrases with object names such as personal name (PS), place name (LC), and group name (OG) in the document as corresponding object names. Traditional approaches to named entity recognition include statistical-based models that learn models based on hand-crafted features. Recently, it has been proposed to construct the qualities expressing the sentence using models such as deep-learning based Recurrent Neural Networks (RNN) and long-short term memory (LSTM) to solve the problem of sequence labeling. In this research, to improve the performance of the Korean named entity recognition system, we used a hand-crafted feature, part-of-speech tagging information, and pre-built lexicon information to augment features for representing sentence. Experimental results show that the proposed method improves the performance of Korean named entity recognition system. The results of this study are presented through github for future collaborative research with researchers studying Korean Natural Language Processing (NLP) and named entity recognition system.

A Method to Solve the Entity Linking Ambiguity and NIL Entity Recognition for efficient Entity Linking based on Wikipedia (위키피디아 기반의 효과적인 개체 링킹을 위한 NIL 개체 인식과 개체 연결 중의성 해소 방법)

  • Lee, Hokyung;An, Jaehyun;Yoon, Jeongmin;Bae, Kyoungman;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.813-821
    • /
    • 2017
  • Entity Linking find the meaning of an entity mention, which indicate the entity using different expressions, in a user's query by linking the entity mention and the entity in the knowledge base. This task has four challenges, including the difficult knowledge base construction problem, multiple presentation of the entity mention, ambiguity of entity linking, and NIL entity recognition. In this paper, we first construct the entity name dictionary based on Wikipedia to build a knowledge base and solve the multiple presentation problem. We then propose various methods for NIL entity recognition and solve the ambiguity of entity linking by training the support vector machine based on several features, including the similarity of the context, semantic relevance, clue word score, named entity type similarity of the mansion, entity name matching score, and object popularity score. We sequentially use the proposed two methods based on the constructed knowledge base, to obtain the good performance in the entity linking. In the result of the experiment, our system achieved 83.66% and 90.81% F1 score, which is the performance of the NIL entity recognition to solve the ambiguity of the entity linking.

Rule-based Named Entity (NE) Recognition from Speech (음성 자료에 대한 규칙 기반 Named Entity 인식)

  • Kim Ji-Hwan
    • MALSORI
    • /
    • no.58
    • /
    • pp.45-66
    • /
    • 2006
  • In this paper, a rule-based (transformation-based) NE recognition system is proposed. This system uses Brill's rule inference approach. The performance of the rule-based system and IdentiFinder, one of most successful stochastic systems, are compared. In the baseline case (no punctuation and no capitalisation), both systems show almost equal performance. They also have similar performance in the case of additional information such as punctuation, capitalisation and name lists. The performances of both systems degrade linearly with the number of speech recognition errors, and their rates of degradation are almost equal. These results show that automatic rule inference is a viable alternative to the HMM-based approach to NE recognition, but it retains the advantages of a rule-based approach.

  • PDF

Topic conversation performance improvement technology through game domain entity name recognition and deep learning intention classification (게임 도메인 개체명인식과 딥러닝 의도분류를 통한 주제대화 성능향상 기술)

  • Yun, Jae-Min;Jee, Min-Seong;Shin, Dong-Chun;Ko, Yeon-Jeong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.01a
    • /
    • pp.241-242
    • /
    • 2021
  • 대화시스템에서 게임설명요청과 같은 주제대화의 경우, 입력문장의 의도를 정확하게 분류하는 것이 대화시스템 성능과 직결되므로 매우 중요하다. 본 논문에서는 개체명 인식 방법과 머신러닝 방법을 결합한 하이브리드 방법을 제안하여, 머신러닝 방법을 단독으로 사용하는 방법보다 주제대화의 의도 분류 성능을 향상시켰다.

  • PDF

Constructing for Korean Traditional culture Corpus and Development of Named Entity Recognition Model using Bi-LSTM-CNN-CRFs (한국 전통문화 말뭉치구축 및 Bi-LSTM-CNN-CRF를 활용한 전통문화 개체명 인식 모델 개발)

  • Kim, GyeongMin;Kim, Kuekyeng;Jo, Jaechoon;Lim, HeuiSeok
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.12
    • /
    • pp.47-52
    • /
    • 2018
  • Named Entity Recognition is a system that extracts entity names such as Persons(PS), Locations(LC), and Organizations(OG) that can have a unique meaning from a document and determines the categories of extracted entity names. Recently, Bi-LSTM-CRF, which is a combination of CRF using the transition probability between output data from LSTM-based Bi-LSTM model considering forward and backward directions of input data, showed excellent performance in the study of object name recognition using deep-learning, and it has a good performance on the efficient embedding vector creation by character and word unit and the model using CNN and LSTM. In this research, we describe the Bi-LSTM-CNN-CRF model that enhances the features of the Korean named entity recognition system and propose a method for constructing the traditional culture corpus. We also present the results of learning the constructed corpus with the feature augmentation model for the recognition of Korean object names.

A Study on Named Entity Recognition for Effective Dialogue Information Prediction (효율적 대화 정보 예측을 위한 개체명 인식 연구)

  • Go, Myunghyun;Kim, Hakdong;Lim, Heonyeong;Lee, Yurim;Jee, Minkyu;Kim, Wonil
    • Journal of Broadcast Engineering
    • /
    • v.24 no.1
    • /
    • pp.58-66
    • /
    • 2019
  • Recognition of named entity such as proper nouns in conversation sentences is the most fundamental and important field of study for efficient conversational information prediction. The most important part of a task-oriented dialogue system is to recognize what attributes an object in a conversation has. The named entity recognition model carries out recognition of the named entity through the preprocessing, word embedding, and prediction steps for the dialogue sentence. This study aims at using user - defined dictionary in preprocessing stage and finding optimal parameters at word embedding stage for efficient dialogue information prediction. In order to test the designed object name recognition model, we selected the field of daily chemical products and constructed the named entity recognition model that can be applied in the task-oriented dialogue system in the related domain.

A Study on the Identification Method of Security Threat Information Using AI Based Named Entity Recognition Technology (인공지능 기반 개체명 인식 기술을 활용한 보안 위협 정보 식별 방안 연구)

  • Taehyeon Kim;Joon-Hyung Lim;Taeeun Kim;Ieck-chae Euom
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.4
    • /
    • pp.577-586
    • /
    • 2024
  • As new technologies are developed, new security threats such as the emergence of AI technologies that create ransomware are also increasing. New security equipment such as XDR has been developed to cope with these security threats, but when using various security equipment together rather than a single security equipment environment, there is a difficulty in creating numerous regular expressions for identifying and classifying essential data. To solve this problem, this paper proposes a method of identifying essential information for identifying threat information by introducing artificial intelligence-based entity name recognition technology in various security equipment usage environments. After analyzing the security equipment log data to select essential information, the storage format of information and the tag list for utilizing artificial intelligence were defined, and the method of identifying and extracting essential data is proposed through entity name recognition technology using artificial intelligence. As a result of various security equipment log data and 23 tag-based entity name recognition tests, the weight average of f1-score for each tag is 0.44 for Bi-LSTM-CRF and 0.99 for BERT-CRF. In the future, we plan to study the process of integrating the regular expression-based threat information identification and extraction method and artificial intelligence-based threat information and apply the process based on new data.

OryzaGP: rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Do, Huy;Wang, Yue
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.17.1-17.3
    • /
    • 2019
  • Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.