• Title/Summary/Keyword: entity name

Search Result 63, Processing Time 0.026 seconds

A Study on the Performance Analysis of Entity Name Recognition Techniques Using Korean Patent Literature

  • Gim, Jangwon
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.2
    • /
    • pp.139-151
    • /
    • 2020
  • Entity name recognition is a part of information extraction that extracts entity names from documents and classifies the types of extracted entity names. Entity name recognition technologies are widely used in natural language processing, such as information retrieval, machine translation, and query response systems. Various deep learning-based models exist to improve entity name recognition performance, but studies that compared and analyzed these models on Korean data are insufficient. In this paper, we compare and analyze the performance of CRF, LSTM-CRF, BiLSTM-CRF, and BERT, which are actively used to identify entity names using Korean data. Also, we compare and evaluate whether embedding models, which are variously used in recent natural language processing tasks, can affect the entity name recognition model's performance improvement. As a result of experiments on patent data and Korean corpus, it was confirmed that the BiLSTM-CRF using FastText method showed the highest performance.

Topic conversation performance improvement technology through game domain entity name recognition and deep learning intention classification (게임 도메인 개체명인식과 딥러닝 의도분류를 통한 주제대화 성능향상 기술)

  • Yun, Jae-Min;Jee, Min-Seong;Shin, Dong-Chun;Ko, Yeon-Jeong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.01a
    • /
    • pp.241-242
    • /
    • 2021
  • 대화시스템에서 게임설명요청과 같은 주제대화의 경우, 입력문장의 의도를 정확하게 분류하는 것이 대화시스템 성능과 직결되므로 매우 중요하다. 본 논문에서는 개체명 인식 방법과 머신러닝 방법을 결합한 하이브리드 방법을 제안하여, 머신러닝 방법을 단독으로 사용하는 방법보다 주제대화의 의도 분류 성능을 향상시켰다.

  • PDF

An Introduction to the Study of the Outlook on Highest Ruling Entity in Daesoonjinrohoe (I) - Focusing on Descriptions for Highest Ruling Entity and It's Meanings - (대순진리회 상제관 연구 서설 (I) - 최고신에 대한 표현들과 그 의미들을 중심으로 -)

  • Cha, Seon-keun
    • Journal of the Daesoon Academy of Sciences
    • /
    • v.21
    • /
    • pp.99-156
    • /
    • 2013
  • This paper is to indicate research tendencies of faith in Daesoonjinrihoe and controversial points of those, and to consider the outlook on Sangje after defining it as theological understanding and explanation for Gu-Cheon-Sang-Je (High-est ruling Entity that is the object of devotion in Daesoon-jinrihoe). As the first introduction to the work, various descriptions for Sangje are arranged and the meanings of those are analyzed. In brief, first, the name of Gu-Cheon-Eung-Won-Nweh-Seong-Bo-Hwa-Cheon-Jon, expresses the fact that the authority of Sangje (the Supreme Entity) is exposed by spatial concept Sangje dwells in Ninth Heaven. This fact can be compared with the doctrines Allah in Islam and Jehovah in Christianity each are dwelled in Seventh Heaven. And the name shows Sangje is the ruler who reigns over the universe by using yin and yang. Second, the name, Gu-Cheon-Eung-Won-Nweh-Seong-BoHwa-Cheon-Jon, is imported from China Taoism because it has been in Ok-Chu-Gyeong (the Gaoshang shenlei yushu). But in fact it's root is in Korea because Buyeo and Goguryeo, the ancient Korean nations, have the source of the name. While the name is not the Supreme Entity in China Taoism, it is the Supreme Entity in Daesoonjinrihoe. This fact is a important difference. Third, arbitrarily or not, the name, Gu-Cheon-Eung-Won-Nweh-Seong-Bo-Hwa-Cheon-Jon, is put on the image of 'resolution of grievances'. The reason is that many peoples in Korea and China has called the name for about 1,000 years ago to help their fortunes and escape predicaments. Forth, not only Gu-Cheon-Eung-Won-Nweh-Seong-Bo-Hwa-Cheon-Jon but also the name, Three Pure Ones and Ok-Cheon-Jin-Wang (Yuqingzhenwang) in China Taoism used as the Highest ruling Entity in Daesoonjinrihoe. But the relations between three Pure Ones and Ok-Cheon-Jin-Wang and Gu-Cheon-Eung-Won-Nweh-Seong-Bo-Hwa-Cheon-Jon in Dae-soonjinrihoe are different from that in China Taoism. Fifth, Sangje is associated with the Polaris divinity of Tae-Eul, view on God in Oriental Cosmology. The description Tae-Eul as well as Gu-Cheon-Eung-Won-Nweh-Seong-Bo-Hwa-Cheon-Jon is indicated Sangje is linked to the faith of Buyeo and Goguryeo. Sixth, Sangje is not only Mugeuk-Sin (The God of The Endless) who supervise the Endless but also Taegeuk-Ji-Cheon-Jon (The God of The Ultimate Reality) who supervise the Ultimate Reality. These descriptions directly display the fact Sangje is a creator. Seventh, in case explaining Sangje, the point of view is necessary that grasps the whole viewpoints Sangje 'was' Hidden God(deus otiosus) and 'is' Unhidden God after Incarnation. Eighth, Sangje is Cheon-Ju in Donghak, but different from that. Cheon-Ju in Donghak has both transcendence and immanence in tightrope tension, but Cheon-Ju in Daesoonjinrihoe emphasize transcendence than immanence. That difference is the result of the fact Cheon-Ju in Donghak was a being having revealed a man and Cheon-Ju in Daesoonjinrihoe was a being having incarnated after revealing a man. Ninth, Sangje is Gae-Byeok-Jang who is the manager of the transforming and ordering the Three Realms of the World by the Great Do which is the mutual beneficence of all life and Hae-Won-Sin who is the God of resolution of grievances.

A Method to Solve the Entity Linking Ambiguity and NIL Entity Recognition for efficient Entity Linking based on Wikipedia (위키피디아 기반의 효과적인 개체 링킹을 위한 NIL 개체 인식과 개체 연결 중의성 해소 방법)

  • Lee, Hokyung;An, Jaehyun;Yoon, Jeongmin;Bae, Kyoungman;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.813-821
    • /
    • 2017
  • Entity Linking find the meaning of an entity mention, which indicate the entity using different expressions, in a user's query by linking the entity mention and the entity in the knowledge base. This task has four challenges, including the difficult knowledge base construction problem, multiple presentation of the entity mention, ambiguity of entity linking, and NIL entity recognition. In this paper, we first construct the entity name dictionary based on Wikipedia to build a knowledge base and solve the multiple presentation problem. We then propose various methods for NIL entity recognition and solve the ambiguity of entity linking by training the support vector machine based on several features, including the similarity of the context, semantic relevance, clue word score, named entity type similarity of the mansion, entity name matching score, and object popularity score. We sequentially use the proposed two methods based on the constructed knowledge base, to obtain the good performance in the entity linking. In the result of the experiment, our system achieved 83.66% and 90.81% F1 score, which is the performance of the NIL entity recognition to solve the ambiguity of the entity linking.

A Study on the Description of Archives Name by Controlled Access Point in Ontology (기록물 생산기관명 접근점 제어 온톨로지 기술에 관한 연구)

  • Kang, Hyen Min
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.3
    • /
    • pp.147-164
    • /
    • 2018
  • This study defined the name of records producing institution as the unique preferred form of access point which has same identification and same entity by using Standard Administration Code, and also described the name of records producing institution which has various name form as formal-name form of access point, which has same identification and same entity. This study make us be able to identify and access to all of the records that institution, has same identification and same entity, has produced. And the mechanic, I designed by ontology would make reinforce 'the principle of provenance' and 'respect for orignal order' and make user satisfy in useability of archives and expanded retrieval results.

Bi-directional LSTM-CNN-CRF for Korean Named Entity Recognition System with Feature Augmentation (자질 보강과 양방향 LSTM-CNN-CRF 기반의 한국어 개체명 인식 모델)

  • Lee, DongYub;Yu, Wonhee;Lim, HeuiSeok
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.12
    • /
    • pp.55-62
    • /
    • 2017
  • The Named Entity Recognition system is a system that recognizes words or phrases with object names such as personal name (PS), place name (LC), and group name (OG) in the document as corresponding object names. Traditional approaches to named entity recognition include statistical-based models that learn models based on hand-crafted features. Recently, it has been proposed to construct the qualities expressing the sentence using models such as deep-learning based Recurrent Neural Networks (RNN) and long-short term memory (LSTM) to solve the problem of sequence labeling. In this research, to improve the performance of the Korean named entity recognition system, we used a hand-crafted feature, part-of-speech tagging information, and pre-built lexicon information to augment features for representing sentence. Experimental results show that the proposed method improves the performance of Korean named entity recognition system. The results of this study are presented through github for future collaborative research with researchers studying Korean Natural Language Processing (NLP) and named entity recognition system.

KONG-DB: Korean Novel Geo-name DB & Search and Visualization System Using Dictionary from the Web (KONG-DB: 웹 상의 어휘 사전을 활용한 한국 소설 지명 DB, 검색 및 시각화 시스템)

  • Park, Sung Hee
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.3
    • /
    • pp.321-343
    • /
    • 2016
  • This study aimed to design a semi-automatic web-based pilot system 1) to build a Korean novel geo-name, 2) to update the database using automatic geo-name extraction for a scalable database, and 3) to retrieve/visualize the usage of an old geo-name on the map. In particular, the problem of extracting novel geo-names, which are currently obsolete, is difficult to solve because obtaining a corpus used for training dataset is burden. To build a corpus for training data, an admin tool, HTML crawler and parser in Python, crawled geo-names and usages from a vocabulary dictionary for Korean New Novel enough to train a named entity tagger for extracting even novel geo-names not shown up in a training corpus. By means of a training corpus and an automatic extraction tool, the geo-name database was made scalable. In addition, the system can visualize the geo-name on the map. The work of study also designed, implemented the prototype and empirically verified the validity of the pilot system. Lastly, items to be improved have also been addressed.

Korean-Chinese Person Name Translation for Cross Language Information Retrieval

  • Wang, Yu-Chun;Lee, Yi-Hsun;Lin, Chu-Cheng;Tsai, Richard Tzong-Han;Hsu, Wen-Lian
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.489-497
    • /
    • 2007
  • Named entity translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating person names, the most common type of name entity in Korean-Chinese cross language information retrieval (KCIR). Unlike other languages, Chinese uses characters (ideographs), which makes person name translation difficult because one syllable may map to several Chinese characters. We propose an effective hybrid person name translation method to improve the performance of KCIR. First, we use Wikipedia as a translation tool based on the inter-language links between the Korean edition and the Chinese or English editions. Second, we adopt the Naver people search engine to find the query name's Chinese or English translation. Third, we extract Korean-English transliteration pairs from Google snippets, and then search for the English-Chinese transliteration in the database of Taiwan's Central News Agency or in Google. The performance of KCIR using our method is over five times better than that of a dictionary-based system. The mean average precision is 0.3490 and the average recall is 0.7534. The method can deal with Chinese, Japanese, Korean, as well as non-CJK person name translation from Korean to Chinese. Hence, it substantially improves the performance of KCIR.

  • PDF

Development of Tourism Information Named Entity Recognition Datasets for the Fine-tune KoBERT-CRF Model

  • Jwa, Myeong-Cheol;Jwa, Jeong-Woo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.2
    • /
    • pp.55-62
    • /
    • 2022
  • A smart tourism chatbot is needed as a user interface to efficiently provide smart tourism services such as recommended travel products, tourist information, my travel itinerary, and tour guide service to tourists. We have been developed a smart tourism app and a smart tourism information system that provide smart tourism services to tourists. We also developed a smart tourism chatbot service consisting of khaiii morpheme analyzer, rule-based intention classification, and tourism information knowledge base using Neo4j graph database. In this paper, we develop the Korean and English smart tourism Name Entity (NE) datasets required for the development of the NER model using the pre-trained language models (PLMs) for the smart tourism chatbot system. We create the tourism information NER datasets by collecting source data through smart tourism app, visitJeju web of Jeju Tourism Organization (JTO), and web search, and preprocessing it using Korean and English tourism information Name Entity dictionaries. We perform training on the KoBERT-CRF NER model using the developed Korean and English tourism information NER datasets. The weight-averaged precision, recall, and f1 scores are 0.94, 0.92 and 0.94 on Korean and English tourism information NER datasets.

An OSI and SN Based Persistent Naming Approach for Parametric CAD Model Exchange (기하공간정보(OSI)와 병합정보(SN)을 이용한 고유 명칭 방법)

  • Han S.H.;Mun D.H.
    • Korean Journal of Computational Design and Engineering
    • /
    • v.11 no.1
    • /
    • pp.27-40
    • /
    • 2006
  • The exchange of parameterized feature-based CAD models is important for product data sharing among different organizations and automation systems. The role of feature-based modeling is to gonerate the shape of product and capture design intends In a CAD system. A feature is generated by referring to topological entities in a solid. Identifying referenced topological entities of a feature is essential for exchanging feature-based CAD models through a neutral format. If the CAD data contains the modification history in addition to the construction history, a matching mechanism is also required to find the same entity in the new model (post-edit model) corresponding to the entity in the old model (preedit model). This problem is known as the persistent naming problem. There are additional problems arising from the exchange of parameterized feature-based CAD models. Authors have analyzed previous studies with regard to persistent naming and characteristics for the exchange of parameterized feature-based CAD models, and propose a solution to the persistent naming problem. This solution is comprised of two parts: (a) naming of topological entities based on the object spore information (OSI) and secondary name (SN); and (b) name matching under the proposed naming.