• Title/Summary/Keyword: Korean Named Entity Recognition

Search Result 89, Processing Time 0.026 seconds

Named Entity Recognition Using Distant Supervision and Active Bagging (원거리 감독과 능동 배깅을 이용한 개체명 인식)

  • Lee, Seong-hee;Song, Yeong-kil;Kim, Hark-soo
    • Journal of KIISE
    • /
    • v.43 no.2
    • /
    • pp.269-274
    • /
    • 2016
  • Named entity recognition is a process which extracts named entities in sentences and determines categories of the named entities. Previous studies on named entity recognition have primarily been used for supervised learning. For supervised learning, a large training corpus manually annotated with named entity categories is needed, and it is a time-consuming and labor-intensive job to manually construct a large training corpus. We propose a semi-supervised learning method to minimize the cost needed for training corpus construction and to rapidly enhance the performance of named entity recognition. The proposed method uses distance supervision for the construction of the initial training corpus. It can then effectively remove noise sentences in the initial training corpus through the use of an active bagging method, an ensemble method of bagging and active learning. In the experiments, the proposed method improved the F1-score of named entity recognition from 67.36% to 76.42% after active bagging for 15 times.

Named Entity Recognition and Dictionary Construction for Korean Title: Books, Movies, Music and TV Programs (한국어 제목 개체명 인식 및 사전 구축: 도서, 영화, 음악, TV프로그램)

  • Park, Yongmin;Lee, Jae Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.7
    • /
    • pp.285-292
    • /
    • 2014
  • A named entity recognition method is used to improve the performance of information retrieval systems, question answering systems, machine translation systems and so on. The targets of the named entity recognition are usually PLOs (persons, locations and organizations). They are usually proper nouns or unregistered words, and traditional named entity recognizers use these characteristics to find out named entity candidates. The titles of books, movies and TV programs have different characteristics than PLO entities. They are sometimes multiple phrases, one sentence, or special characters. This makes it difficult to find the named entity candidates. In this paper we propose a method to quickly extract title named entities from news articles and automatically build a named entity dictionary for the titles. For the candidates identification, the word phrases enclosed with special symbols in a sentence are firstly extracted, and then verified by the SVM with using feature words and their distances. For the classification of the extracted title candidates, SVM is used with the mutual information of word contexts.

Constructing for Korean Traditional culture Corpus and Development of Named Entity Recognition Model using Bi-LSTM-CNN-CRFs (한국 전통문화 말뭉치구축 및 Bi-LSTM-CNN-CRF를 활용한 전통문화 개체명 인식 모델 개발)

  • Kim, GyeongMin;Kim, Kuekyeng;Jo, Jaechoon;Lim, HeuiSeok
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.12
    • /
    • pp.47-52
    • /
    • 2018
  • Named Entity Recognition is a system that extracts entity names such as Persons(PS), Locations(LC), and Organizations(OG) that can have a unique meaning from a document and determines the categories of extracted entity names. Recently, Bi-LSTM-CRF, which is a combination of CRF using the transition probability between output data from LSTM-based Bi-LSTM model considering forward and backward directions of input data, showed excellent performance in the study of object name recognition using deep-learning, and it has a good performance on the efficient embedding vector creation by character and word unit and the model using CNN and LSTM. In this research, we describe the Bi-LSTM-CNN-CRF model that enhances the features of the Korean named entity recognition system and propose a method for constructing the traditional culture corpus. We also present the results of learning the constructed corpus with the feature augmentation model for the recognition of Korean object names.

Named Entity Recognition with Structural SVMs and Pegasos algorithm (Structural SVMs 및 Pegasos 알고리즘을 이용한 한국어 개체명 인식)

  • Lee, Chang-Ki;Jang, Myun-Gil
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.4
    • /
    • pp.655-667
    • /
    • 2010
  • The named entity recognition task is one of the most important subtasks in Information Extraction. In this paper, we describe a Korean named entity recognition using structural Support Vector Machines (structural SVMs) and modified Pegasos algorithm. Using the proposed approach, we could achieve an 85.43% F1 and an 86.79% F1 for 15 named entity types on TV domain and sports domain, respectively. Moreover, we reduced the training time to 4% without loss of performance compared to Conditional Random Fields (CRFs).

  • PDF

A Study on Named Entity Recognition for Effective Dialogue Information Prediction (효율적 대화 정보 예측을 위한 개체명 인식 연구)

  • Go, Myunghyun;Kim, Hakdong;Lim, Heonyeong;Lee, Yurim;Jee, Minkyu;Kim, Wonil
    • Journal of Broadcast Engineering
    • /
    • v.24 no.1
    • /
    • pp.58-66
    • /
    • 2019
  • Recognition of named entity such as proper nouns in conversation sentences is the most fundamental and important field of study for efficient conversational information prediction. The most important part of a task-oriented dialogue system is to recognize what attributes an object in a conversation has. The named entity recognition model carries out recognition of the named entity through the preprocessing, word embedding, and prediction steps for the dialogue sentence. This study aims at using user - defined dictionary in preprocessing stage and finding optimal parameters at word embedding stage for efficient dialogue information prediction. In order to test the designed object name recognition model, we selected the field of daily chemical products and constructed the named entity recognition model that can be applied in the task-oriented dialogue system in the related domain.

HMM-based Korean Named Entity Recognition (HMM에 기반한 한국어 개체명 인식)

  • Hwang, Yi-Gyu;Yun, Bo-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.10B no.2
    • /
    • pp.229-236
    • /
    • 2003
  • Named entity recognition is the process indispensable to question answering and information extraction systems. This paper presents an HMM based named entity (m) recognition method using the construction principles of compound words. In Korean, many named entities can be decomposed into more than one word. Moreover, there are contextual relationships among nouns in an NE, and among an NE and its surrounding words. In this paper, we classify words into a word as an NE in itself, a word in an NE, and/or a word adjacent to an n, and train an HMM based on NE-related word types and parts of speech. Proposed named entity recognition (NER) system uses trigram model of HMM for considering variable length of NEs. However, the trigram model of HMM has a serious data sparseness problem. In order to solve the problem, we use multi-level back-offs. Experimental results show that our NER system can achieve an F-measure of 87.6% in the economic articles.

Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

  • Hwang, Sangwon;Hong, Jang-Eui;Nam, Young-Kwang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1639-1658
    • /
    • 2019
  • Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.

Bi-directional LSTM-CNN-CRF for Korean Named Entity Recognition System with Feature Augmentation (자질 보강과 양방향 LSTM-CNN-CRF 기반의 한국어 개체명 인식 모델)

  • Lee, DongYub;Yu, Wonhee;Lim, HeuiSeok
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.12
    • /
    • pp.55-62
    • /
    • 2017
  • The Named Entity Recognition system is a system that recognizes words or phrases with object names such as personal name (PS), place name (LC), and group name (OG) in the document as corresponding object names. Traditional approaches to named entity recognition include statistical-based models that learn models based on hand-crafted features. Recently, it has been proposed to construct the qualities expressing the sentence using models such as deep-learning based Recurrent Neural Networks (RNN) and long-short term memory (LSTM) to solve the problem of sequence labeling. In this research, to improve the performance of the Korean named entity recognition system, we used a hand-crafted feature, part-of-speech tagging information, and pre-built lexicon information to augment features for representing sentence. Experimental results show that the proposed method improves the performance of Korean named entity recognition system. The results of this study are presented through github for future collaborative research with researchers studying Korean Natural Language Processing (NLP) and named entity recognition system.

Re-defining Named Entity Type for Personal Information De-identification and A Generation method of Training Data (개인정보 비식별화를 위한 개체명 유형 재정의와 학습데이터 생성 방법)

  • Choi, Jae-hoon;Cho, Sang-hyun;Kim, Min-ho;Kwon, Hyuk-chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.206-208
    • /
    • 2022
  • As the big data industry has recently developed significantly, interest in privacy violations caused by personal information leakage has increased. There have been attempts to automate this through named entity recognition in natural language processing. In this paper, named entity recognition data is constructed semi-automatically by identifying sentences with de-identification information from de-identification information in Korean Wikipedia. This can reduce the cost of learning about information that is not subject to de-identification compared to using general named entity recognition data. In addition, it has the advantage of minimizing additional systems based on rules and statistics to classify de-identification information in the output. The named entity recognition data proposed in this paper is classified into twelve categories. There are included de-identification information, such as medical records and family relationships. In the experiment using the generated dataset, KoELECTRA showed performance of 0.87796 and RoBERTa of 0.88.

  • PDF

Rule-based Named Entity (NE) Recognition from Speech (음성 자료에 대한 규칙 기반 Named Entity 인식)

  • Kim Ji-Hwan
    • MALSORI
    • /
    • no.58
    • /
    • pp.45-66
    • /
    • 2006
  • In this paper, a rule-based (transformation-based) NE recognition system is proposed. This system uses Brill's rule inference approach. The performance of the rule-based system and IdentiFinder, one of most successful stochastic systems, are compared. In the baseline case (no punctuation and no capitalisation), both systems show almost equal performance. They also have similar performance in the case of additional information such as punctuation, capitalisation and name lists. The performances of both systems degrade linearly with the number of speech recognition errors, and their rates of degradation are almost equal. These results show that automatic rule inference is a viable alternative to the HMM-based approach to NE recognition, but it retains the advantages of a rule-based approach.

  • PDF