• Title/Summary/Keyword: Korean Named Entity Recognition

Search Result 89, Processing Time 0.023 seconds

Named Entity and Event Annotation Tool for Cultural Heritage Information Corpus Construction (문화유산정보 말뭉치 구축을 위한 개체명 및 이벤트 부착 도구)

  • Choi, Ji-Ye;Kim, Myung-Keun;Park, So-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.9
    • /
    • pp.29-38
    • /
    • 2012
  • In this paper, we propose a named entity and event annotation tool for cultural heritage information corpus construction. Focusing on time, location, person, and event suitable for cultural heritage information management, the annotator writes the named entities and events with the proposed tool. In order to easily annotate the named entities and the events, the proposed tool automatically annotates the location information such as the line number or the word number, and shows the corresponding string, formatted as both bold and italic, in the raw text. For the purpose of reducing the costs of the manual annotation, the proposed tool utilizes the patterns to automatically recognize the named entities. Considering the very little training corpus, the proposed tool extracts simple rule patterns. To avoid error propagation, the proposed patterns are extracted from the raw text without any additional process. Experimental results show that the proposed tool reduces more than half of the manual annotation costs.

A Method to Solve the Entity Linking Ambiguity and NIL Entity Recognition for efficient Entity Linking based on Wikipedia (위키피디아 기반의 효과적인 개체 링킹을 위한 NIL 개체 인식과 개체 연결 중의성 해소 방법)

  • Lee, Hokyung;An, Jaehyun;Yoon, Jeongmin;Bae, Kyoungman;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.813-821
    • /
    • 2017
  • Entity Linking find the meaning of an entity mention, which indicate the entity using different expressions, in a user's query by linking the entity mention and the entity in the knowledge base. This task has four challenges, including the difficult knowledge base construction problem, multiple presentation of the entity mention, ambiguity of entity linking, and NIL entity recognition. In this paper, we first construct the entity name dictionary based on Wikipedia to build a knowledge base and solve the multiple presentation problem. We then propose various methods for NIL entity recognition and solve the ambiguity of entity linking by training the support vector machine based on several features, including the similarity of the context, semantic relevance, clue word score, named entity type similarity of the mansion, entity name matching score, and object popularity score. We sequentially use the proposed two methods based on the constructed knowledge base, to obtain the good performance in the entity linking. In the result of the experiment, our system achieved 83.66% and 90.81% F1 score, which is the performance of the NIL entity recognition to solve the ambiguity of the entity linking.

The partial matching method for effective recognizing HLA entities (효과적인 HLA개체인식을 위한 부분매칭기법)

  • Chae, Jeong-Min;Jung, Young-Hee;Lee, Tae-Min;Chae, Ji-Eun;Oh, Heung-Bum;Jung, Soon-Young
    • The Journal of Korean Association of Computer Education
    • /
    • v.14 no.2
    • /
    • pp.83-94
    • /
    • 2011
  • In the biomedical domain, the longest matching method is frequently used for recognizing named entity written in the literature. This method uses a dictionary as a resource for named entity recognition. If there exist appropriated dictionary about target domain, the longest matching method has the advantage of being able to recognize the entities of target domain quickly and exactly. However, the longest matching method is difficult to recognize the enumerated named entities, because these entities are frequently expressed as being omitted some words. In order to resolve this problem, we propose the partial matching method using a dictionary. The proposed method makes several candidate entities on the assumption that the ellipses may be included. After that, the method selects the most valid one among candidate entities through the optimization algorithm. We tested the longest and partial matching method about HLA entities: HLA gene, antigen, and allele entities, which are frequently enumerated among biomedical entities. As preparing for named entity recognition, we built two new resource, extended dictionary and tag-based dictionary about HLA entities. And later, we performed the longest and partial matching method using each dictionary. According to our experiment result, the longest matching method was effective in recognizing HLA antigen entities, in which the ellipses are rare, and the partial matching method was effective in recognizing HLA gene and allele entities, in which the ellipses are frequent. Especially, the partial matching method had a high F-score 95.59% about HLA alleles.

  • PDF

A Comparative Research on End-to-End Clinical Entity and Relation Extraction using Deep Neural Networks: Pipeline vs. Joint Models (심층 신경망을 활용한 진료 기록 문헌에서의 종단형 개체명 및 관계 추출 비교 연구 - 파이프라인 모델과 결합 모델을 중심으로 -)

  • Sung-Pil Choi
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.1
    • /
    • pp.93-114
    • /
    • 2023
  • Information extraction can facilitate the intensive analysis of documents by providing semantic triples which consist of named entities and their relations recognized in the texts. However, most of the research so far has been carried out separately for named entity recognition and relation extraction as individual studies, and as a result, the effective performance evaluation of the entire information extraction systems was not performed properly. This paper introduces two models of end-to-end information extraction that can extract various entity names in clinical records and their relationships in the form of semantic triples, namely pipeline and joint models and compares their performances in depth. The pipeline model consists of an entity recognition sub-system based on bidirectional GRU-CRFs and a relation extraction module using multiple encoding scheme, whereas the joint model was implemented with a single bidirectional GRU-CRFs equipped with multi-head labeling method. In the experiments using i2b2/VA 2010, the performance of the pipeline model was 5.5% (F-measure) higher. In addition, through a comparative experiment with existing state-of-the-art systems using large-scale neural language models and manually constructed features, the objective performance level of the end-to-end models implemented in this paper could be identified properly.

Named Entity Recognition based on CRF reflecting relative weight (상대적 가중치 자질을 반영한 CRF 기반의 개체명 인식)

  • Jeong, Jin-Wook
    • 한국어정보학회:학술대회논문집
    • /
    • 2017.10a
    • /
    • pp.338-339
    • /
    • 2017
  • 본 논문은 개체명 인식을 위해 CRF 모델을 이용해 분류를 수행했다. 개체명 후보를 개체명으로 식별에서 중의성 문제가 필요하다. 본 논문에서는 이러한 중의성 문제 해결을 위해 학습 셋으로부터 패턴과 형태적 특성을 고려해 개체명 후보를 최대로 선택하고 선택된 개체명 후보의 중의성과 정확도를 높이기 위해 주변의 문맥 자질과 분별 확률 모델인 CRF를 이용해 중의성 문제를 해결한다.

  • PDF

English-Korean Cross-lingual Link Discovery Using Link Probability and Named Entity Recognition (링크확률과 개체명 인식을 이용한 영-한 교차언어 링크 탐색)

  • Kang, Shin-Jae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.3
    • /
    • pp.191-195
    • /
    • 2013
  • This paper proposes an automatic method for discovering cross-lingual links from English Wikipedia documents to Korean ones in order to increase connectivity among vast web resources. Compared to the existing methods roughly estimating link probability of phrases, candidate anchors are selected from English documents by using various information such as title lists and linking probability extracted from Wikipedia dumps and the results of named-entity recognition, and the anchors are translated into Korean words, and then the most suitable Korean documents with the words are selected as cross-lingual links. The experimental results showed 0.375 of MAP.

Named Entity Recognition Using Bidirectional LSTM CRFs Based on the POS Tag Embedding and the Named Entity Distribution of Syllables (품사 임베딩과 음절 단위 개체명 분포 기반의 Bidirectional LSTM CRFs를 이용한 개체명 인식)

  • Yu, Hongyeon;Ko, Youngjoong
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.105-110
    • /
    • 2016
  • 개체명 인식이란 문서 내에서 인명, 기관명, 지명, 시간, 날짜 등 고유한 의미를 가지는 개체명을 추출하여 그 종류를 결정하는 것을 말한다. 최근 개체명 인식 연구에서는 bidirectional LSTM CRFs가 가장 우수한 성능을 보여주고 있다. 하지만 LSTM 기반의 딥 러닝 모델은 입력이 되는 단어 표상에 의존적이기 때문에 입력이 되는 단어 표상을 확장하는 방법에 대한 연구가 많이 진행되어지고 있다. 본 논문에서는 한국어 개체명 인식을 위하여 bidirectional LSTM CRFs모델을 사용하고, 그 입력으로 사용되는 단어 표상을 확장하기 위해 사전 학습된 단어 임베딩 벡터, 품사 임베딩 벡터, 그리고 음절 기반에서 확장된 단어 임베딩 벡터를 사용한다. 음절 기반에서 단어 기반 임베딩 벡터로 확장하기 위하여 bidirectional LSTM을 이용하고, 그 입력으로 학습 데이터에서 추출한 개체명 분포를 이용하였다. 그 결과 사전 학습된 단어 임베딩 벡터만 사용한 것보다 4.93%의 성능 향상을 보였다.

  • PDF

Token-Based Classification and Dataset Construction for Detecting Modified Profanity (변형된 비속어 탐지를 위한 토큰 기반의 분류 및 데이터셋)

  • Sungmin Ko;Youhyun Shin
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.181-188
    • /
    • 2024
  • Traditional profanity detection methods have limitations in identifying intentionally altered profanities. This paper introduces a new method based on Named Entity Recognition, a subfield of Natural Language Processing. We developed a profanity detection technique using sequence labeling, for which we constructed a dataset by labeling some profanities in Korean malicious comments and conducted experiments. Additionally, to enhance the model's performance, we augmented the dataset by labeling parts of a Korean hate speech dataset using one of the large language models, ChatGPT, and conducted training. During this process, we confirmed that filtering the dataset created by the large language model by humans alone could improve performance. This suggests that human oversight is still necessary in the dataset augmentation process.

A Study on Automatic Discovery and Summarization Method of Battlefield Situation Related Documents using Natural Language Processing and Collaborative Filtering (자연어 처리 및 협업 필터링 기반의 전장상황 관련 문서 자동탐색 및 요약 기법연구)

  • Kunyoung Kim;Jeongbin Lee;Mye Sohn
    • Journal of Internet Computing and Services
    • /
    • v.24 no.6
    • /
    • pp.127-135
    • /
    • 2023
  • With the development of information and communication technology, the amount of information produced and shared in the battlefield and stored and managed in the system dramatically increased. This means that the amount of information which cansupport situational awareness and decision making of the commanders has increased, but on the other hand, it is also a factor that hinders rapid decision making by increasing the information overload on the commanders. To overcome this limitation, this study proposes a method to automatically search, select, and summarize documents that can help the commanders to understand the battlefield situation reports that he or she received. First, named entities are discovered from the battlefield situation report using a named entity recognition method. Second, the documents related to each named entity are discovered. Third, a language model and collaborative filtering are used to select the documents. At this time, the language model is used to calculate the similarity between the received report and the discovered documents, and collaborative filtering is used to reflect the commander's document reading history. Finally, sentences containing each named entity are selected from the documents and sorted. The experiment was carried out using academic papers since their characteristics are similar to military documents, and the validity of the proposed method was verified.

A Collaborative Framework for Discovering the Organizational Structure of Social Networks Using NER Based on NLP (NLP기반 NER을 이용해 소셜 네트워크의 조직 구조 탐색을 위한 협력 프레임 워크)

  • Elijorde, Frank I.;Yang, Hyun-Ho;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.2
    • /
    • pp.99-108
    • /
    • 2012
  • Many methods had been developed to improve the accuracy of extracting information from a vast amount of data. This paper combined a number of natural language processing methods such as NER (named entity recognition), sentence extraction, and part of speech tagging to carry out text analysis. The data source is comprised of texts obtained from the web using a domain-specific data extraction agent. A framework for the extraction of information from unstructured data was developed using the aforementioned natural language processing methods. We simulated the performance of our work in the extraction and analysis of texts for the detection of organizational structures. Simulation shows that our study outperformed other NER classifiers such as MUC and CoNLL on information extraction.