• Title/Summary/Keyword: Entity

Search Result 2,088, Processing Time 0.032 seconds

Named Entity Boundary Recognition Using Hidden Markov Model and Hierarchical Information (은닉 마르코프 모델과 계층 정보를 이용한 개체명 경계 인식)

  • Lim, Heui-Seok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.2
    • /
    • pp.182-187
    • /
    • 2006
  • This paper proposes a method for boundary recognition of named entity using hidden markov model and ontology information of biological named entity. We uses smoothing method using 31 feature information of word and hierarchical information to alleviate sparse data problem in HMM. The GENIA corpus version 2.1 was used to train and to experiment the proposed boundary recognition system. The experimental results show that the proposed system outperform the previous system which did not use ontology information of hierarchical information and smoothing technique. Also the system shows improvement of execution time of boundary recognition.

  • PDF

Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

  • Hwang, Sangwon;Hong, Jang-Eui;Nam, Young-Kwang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1639-1658
    • /
    • 2019
  • Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.

Named Entity Recognition and Dictionary Construction for Korean Title: Books, Movies, Music and TV Programs (한국어 제목 개체명 인식 및 사전 구축: 도서, 영화, 음악, TV프로그램)

  • Park, Yongmin;Lee, Jae Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.7
    • /
    • pp.285-292
    • /
    • 2014
  • A named entity recognition method is used to improve the performance of information retrieval systems, question answering systems, machine translation systems and so on. The targets of the named entity recognition are usually PLOs (persons, locations and organizations). They are usually proper nouns or unregistered words, and traditional named entity recognizers use these characteristics to find out named entity candidates. The titles of books, movies and TV programs have different characteristics than PLO entities. They are sometimes multiple phrases, one sentence, or special characters. This makes it difficult to find the named entity candidates. In this paper we propose a method to quickly extract title named entities from news articles and automatically build a named entity dictionary for the titles. For the candidates identification, the word phrases enclosed with special symbols in a sentence are firstly extracted, and then verified by the SVM with using feature words and their distances. For the classification of the extracted title candidates, SVM is used with the mutual information of word contexts.

Using Non-Local Features to Improve Named Entity Recognition Recall

  • Mao, Xinnian;Xu, Wei;Dong, Yuan;He, Saike;Wang, Haila
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.303-310
    • /
    • 2007
  • Named Entity Recognition (NER) is always limited by its lower recall resulting from the asymmetric data distribution where the NONE class dominates the entity classes. This paper presents an approach that exploits non-local information to improve the NER recall. Several kinds of non-local features encoding entity token occurrence, entity boundary and entity class are explored under Conditional Random Fields (CRFs) framework. Experiments on SIGHAN 2006 MSRA (CityU) corpus indicate that non-local features can effectively enhance the recall of the state-of-the-art NER systems. Incorporating the non-local features into the NER systems using local features alone, our best system achieves a 23.56% (25.26%) relative error reduction on the recall and 17.10% (11.36%) relative error reduction on the F1 score; the improved F1 score 89.38% (90.09%) is significantly superior to the best NER system with F1 of 86.51% (89.03%) participated in the closed track.

  • PDF

A Case Study on Recordkeeping Metadata Standard Applying Multiple Entities (다중 개체 모형을 적용한 기록관리 메타데이터 표준 사례분석)

  • Lee, Ju-Yeon
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.10 no.2
    • /
    • pp.193-214
    • /
    • 2010
  • The multiple entity data model which contains metadata that associate two or more entities is applied recordkeeping metadata standard in recent years. This paper described and analyzed the recordkeeping metadata standard applying multiple entities such as ISO 23081, Australia recordkeeping metadata Standard, New Zealand recordkeeping metadata Standard, New South Wales recordkeeping metadata Standard, Queensland recordkeeping metadata Standard recordkeeping metadata Standard, South Australia recordkeeping metadata Standard, focusing on scope, the number of entities, category in entity, metadata elements. And shows some examples of relationship entity which is the key of multiple entity. As a result of the analysis, this paper suggests some consideration when recordkeeping metadata standard applying multiple entities is revised.

A Study on Multi-Dimensional Entity Clustering Using the Objective Function of Centroids (중심체 목적함수를 이용한 다차원 개체 CLUSTERING 기법에 관한 연구)

  • Rhee, Chul;Kang, Suk-Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.15 no.2
    • /
    • pp.1-15
    • /
    • 1990
  • A mathematical definition of the cluster is suggested. A nonlinear 0-1 integer programming formulation for the multi-dimensional entity clustering problem is developed. A heuristic method named MDEC (Multi-Dimensional Entity Clustering) using centroids and the binary partition is developed and the numerical examples are shown. This method has an advantage of providing bottle-neck entity informations.

  • PDF

Mining Search Keywords for Improving the Accuracy of Entity Search (엔터티 검색의 정확성을 높이기 위한 검색 키워드 마이닝)

  • Lee, Sun Ku;On, Byung-Won;Jung, Soo-Mok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.9
    • /
    • pp.451-464
    • /
    • 2016
  • Nowadays, entity search such as Google Product Search and Yahoo Pipes has been in the spotlight. The entity search engines have been used to retrieve web pages relevant with a particular entity. However, if an entity (e.g., Chinatown movie) has various meanings (e.g., Chinatown movies, Chinatown restaurants, and Incheon Chinatown), then the accuracy of the search result will be decreased significantly. To address this problem, in this article, we propose a novel method that quantifies the importance of search queries and then offers the best query for the entity search, based on Frequent Pattern (FP)-Tree, considering the correlation between the entity relevance and the frequency of web pages. According to the experimental results presented in this paper, the proposed method (59% in the average precision) improved the accuracy five times, compared to the traditional query terms (less than 10% in the average precision).

A case-based DSS to support enterprise data model development (전사적 데이터 모델 개발을 지원하는 사례기반 의사결정지원시스템)

  • 박동진
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1996.10a
    • /
    • pp.218-221
    • /
    • 1996
  • 전사적 데이터 모델을 개발하기 위해서는, 먼저, 기업에 있어서 중요하게 관리되어져야 할 주요 entity들을 파악하는 것이 선행되어야 한다. entity의 결정은 시스템 개발 전 단계에 걸쳐 지대한 영향을 끼치는 중요한 의사결정이나, 아직까지 이는 매우 주관적일 뿐 아니라 의사결정자의 경험 및 전문성에 매우 의존적이다. 또한 때로는 entity의 결정에 필요 이상의 많은 시간이 소요되기도 한다. 본 연구에서는 entity결정에 직면한 의사결정자를 지원하기 위하여, 사례기반 추론 기술을 채택한 의사결정지원시스템을 설계 개발하였다. 본 시스템에서는 과거에 성공적으로 entity를 결정했었다고 평가되는 사례로부터, 해당 기업의 상황에 적합한 새로운 결론을 도출해서 의사결정자를 효과적으로 지원한다.

  • PDF

ER_Modeler: A Logical Database Design Tool based on Entity-Relationship Model (ER_Modeler: 개체 관계 모델 기반 논리적 데이터베이스 설계 도구)

  • Jung, In-Hwan;Kim, Young-Ung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.11 no.5
    • /
    • pp.11-17
    • /
    • 2011
  • In this paper, we propose ER_Modeler, which is a logical database design tool based on entity-relationship model. ER_Modeler provides the entity-relationship diagrams to be built graphically on windows and generates the graphs into the appropriate data definition language for creating relational database tables. Furthermore, ER_Modeler provides the import/export functions using XML to guarantee the interoperability with ERwin which is one of the most popular commercial products.

A Study of the Performance on EJB Entity Bean with Value Object (Value Object를 이용한 EJB 엔티티빈의 성능에 관한 연구)

  • 최은희;이남용
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10a
    • /
    • pp.403-405
    • /
    • 2001
  • In an EJB 1.1 specification, every method call made to the Enterprise Java Bean, is potentially remote call. Such remote invocations use the network layer regardless of the proximity of the client to the bean, creating a network overhead. Especially. because entity bean is more notable performance fail by remote call than session bean, frequency of use on Session Bean in work-site operations is much more than Entity Bean. We focus on how to improve the performance on the entity bean with Value Object, which is one of J2EE patterns suggested by Sun Microsystems. We presents related design-issues fur performance testing, the testing results compared with original entity bean and our findings.

  • PDF