• Title/Summary/Keyword: entity name

Search Result 63, Processing Time 0.028 seconds

An Effect of Semantic Relatedness on Entity Disambiguation: Using Korean Wikipedia (개체중의성해소에서 의미관련도 활용 효과 분석: 한국어 위키피디아를 사용하여)

  • Kang, In-Su
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.2
    • /
    • pp.111-118
    • /
    • 2015
  • Entity linking is to link entity's name mentions occurring in text to corresponding entities within knowledge bases. Since the same entity mention may refer to different entities according to their context, entity linking needs to deal with entity disambiguation. Most recent works on entity disambiguation focus on semantic relatedness between entities and attempt to integrate semantic relatedness with entity prior probabilities and term co-occurrence. To the best of my knowledge, however, it is hard to find studies that analyze and present the pure effects of semantic relatedness on entity disambiguation. From the experimentation on Korean Wikipedia data set, this article empirically evaluates entity disambiguation approaches using semantic relatedness in terms of the following aspects: (1) the difference among semantic relatedness measures such as NGD, PMI, Jaccard, Dice, Simpson, (2) the influence of ambiguities in co-occurring entity mentions' set, and (3) the difference between individual and collective disambiguation approaches.

Protein Named Entity Identification Based on Probabilistic Features Derived from GENIA Corpus and Medical Text on the Web

  • Sumathipala, Sagara;Yamada, Koichi;Unehara, Muneyuki;Suzuki, Izumi
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.2
    • /
    • pp.111-120
    • /
    • 2015
  • Protein named entity identification is one of the most essential and fundamental predecessor for extracting information about protein-protein interactions from biomedical literature. In this paper, we explore the use of abstracts of biomedical literature in MEDLINE for protein name identification and present the results of the conducted experiments. We present a robust and effective approach to classify biomedical named entities into protein and non-protein classes, based on a rich set of features: orthographic, keyword, morphological and newly introduced Protein-Score features. Our procedure shows significant performance in the experiments on GENIA corpus using Random Forest, achieving the highest values of precision 92.7%, recall 91.7%, and F-measure 92.2% for protein identification, while reducing the training and testing time significantly.

Database Model for Korea Plant Name Index (데이터베이스 모델링 기법을 이용한 국가표준식물목록 전산화 연구)

  • Lee, You-Mi;Kim, Hui
    • Korean Journal of Plant Taxonomy
    • /
    • v.37 no.3
    • /
    • pp.309-321
    • /
    • 2007
  • Korea national arboretum has worked with the plant taxonomic society of Korea to make the first fully electronic floristic checklist in Korea. The result is an ever-expanding online plant name index containing scientifically authorative, up-to-date information on the approximately 7,000 taxa including cultivars. With 37 contributing taxonomists, KPNI is the largest collaborative research projects ever assembled in Korea. A comprehensive database model for the taxonomic data from literature and other sources is presented, which was devised for the Korea National Plant Index database project (KPNI). Gwangreung database model is based on an approach using entity-relationsip diagram. It encompasses taxa of all ranks, nothotaxa and hybrid formulae, cultivars, full synonymy, basionyms, Korean name, and other nomenclatural information. Ths paper presents an analysis of KPNI work processes and an overview how we are approaching the construction of Gwangreung databaese model. It can help the system engineers of other biological information systems to develop their database based on the accurate and integrative taxonomic database.

A Study on the Metadata Authority Description Schema for the Interoperability of Authority Data (MADS를 기반으로 한 전거데이터 상호운용성에 관한 연구)

  • Lee, Hyewon
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.23 no.4
    • /
    • pp.25-44
    • /
    • 2012
  • This study analyzed the current condition of authority control, introduced the MADS(Metadata Authority Description Schema) for authority data. MADS supports encoding an authority description for a agent(person, organization), event, term(topic, temporal entity, genre, geographic entity, hierarchical geographic entity, occupation) and defines main elements, subelements, and attributes using the XML schema language. Lastly, using the MADS's characteristics, this study proposed the effective use plans of interoperability of authority data.

Rule-based Named Entity (NE) Recognition from Speech (음성 자료에 대한 규칙 기반 Named Entity 인식)

  • Kim Ji-Hwan
    • MALSORI
    • /
    • no.58
    • /
    • pp.45-66
    • /
    • 2006
  • In this paper, a rule-based (transformation-based) NE recognition system is proposed. This system uses Brill's rule inference approach. The performance of the rule-based system and IdentiFinder, one of most successful stochastic systems, are compared. In the baseline case (no punctuation and no capitalisation), both systems show almost equal performance. They also have similar performance in the case of additional information such as punctuation, capitalisation and name lists. The performances of both systems degrade linearly with the number of speech recognition errors, and their rates of degradation are almost equal. These results show that automatic rule inference is a viable alternative to the HMM-based approach to NE recognition, but it retains the advantages of a rule-based approach.

  • PDF

Constructing for Korean Traditional culture Corpus and Development of Named Entity Recognition Model using Bi-LSTM-CNN-CRFs (한국 전통문화 말뭉치구축 및 Bi-LSTM-CNN-CRF를 활용한 전통문화 개체명 인식 모델 개발)

  • Kim, GyeongMin;Kim, Kuekyeng;Jo, Jaechoon;Lim, HeuiSeok
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.12
    • /
    • pp.47-52
    • /
    • 2018
  • Named Entity Recognition is a system that extracts entity names such as Persons(PS), Locations(LC), and Organizations(OG) that can have a unique meaning from a document and determines the categories of extracted entity names. Recently, Bi-LSTM-CRF, which is a combination of CRF using the transition probability between output data from LSTM-based Bi-LSTM model considering forward and backward directions of input data, showed excellent performance in the study of object name recognition using deep-learning, and it has a good performance on the efficient embedding vector creation by character and word unit and the model using CNN and LSTM. In this research, we describe the Bi-LSTM-CNN-CRF model that enhances the features of the Korean named entity recognition system and propose a method for constructing the traditional culture corpus. We also present the results of learning the constructed corpus with the feature augmentation model for the recognition of Korean object names.

A Study on Named Entity Recognition for Effective Dialogue Information Prediction (효율적 대화 정보 예측을 위한 개체명 인식 연구)

  • Go, Myunghyun;Kim, Hakdong;Lim, Heonyeong;Lee, Yurim;Jee, Minkyu;Kim, Wonil
    • Journal of Broadcast Engineering
    • /
    • v.24 no.1
    • /
    • pp.58-66
    • /
    • 2019
  • Recognition of named entity such as proper nouns in conversation sentences is the most fundamental and important field of study for efficient conversational information prediction. The most important part of a task-oriented dialogue system is to recognize what attributes an object in a conversation has. The named entity recognition model carries out recognition of the named entity through the preprocessing, word embedding, and prediction steps for the dialogue sentence. This study aims at using user - defined dictionary in preprocessing stage and finding optimal parameters at word embedding stage for efficient dialogue information prediction. In order to test the designed object name recognition model, we selected the field of daily chemical products and constructed the named entity recognition model that can be applied in the task-oriented dialogue system in the related domain.

Linguistic Characteristics of Domestic Men's Formal Wear Brand Names

  • Kwon, Hae-Sook
    • Journal of Fashion Business
    • /
    • v.14 no.6
    • /
    • pp.11-22
    • /
    • 2010
  • The main purpose of this research was to examine the linguistic characteristics of domestic men's formal wear brand name. Four linguistic characteristics of language type, combined structure type of language, word class, length of brand name were investigated in this research and also examined the difference between brand type. For sample selection, the 209 men's fashion brands were selected from '2009 Korea Fashion Yearbook' and then, 25 brands which could not collect proper informations about the brand name or naming were excluded. Among total 184 men's brand names, 66 men's formal wear brands were selected and studied. For data analysis, quantitative evaluation of the frequency and qualitative evaluation have been used. The result as follows.; (1) Seven language types were found in domestic men's formal wear brand names. English has been used the most, then followed by Italian and French. (2) For combined structure type of brand name language, the single word used the most, followed by separately combined word type, artificially combined word, and unified word type. (3) The most frequently used the type of word class was noun, and followed by phrase, adjective, and verb. In the noun type, 6 different types which expressed a person, concrete & abstract entity, place, acronym, and neologic were found. For phrase, only noun type was appeared, however, 6 out of 20 phrases were abbreviated type. All eight adjective brand names implied an attributive character of the brand such as 'Dainty' or 'Solus(Solo)'. (4) The long name used most and then followed by normal and short length of brand name. Looking by the number of syllable, 4 syllables appeared the most and then followed by 3, 5, 6, 2 & 7 showed the same rate, and 8 syllables. (5) The result which compared the difference according to each brand type showed a difference in its language type, language combined style, word class, but length of brand name.

OryzaGP: rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Do, Huy;Wang, Yue
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.17.1-17.3
    • /
    • 2019
  • Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.

Substantivalism and Relationism in the 4 Dimensional Interpretation of Newtonian Space-Time (뉴턴 시공간의 4차원 해석에서의 실체론과 관계론 간의 논쟁)

  • Yang, Kyoung-Eun
    • Journal for History of Mathematics
    • /
    • v.30 no.2
    • /
    • pp.87-100
    • /
    • 2017
  • The ontological status of Newtonian space-time has been debated under the name of substantivalism-relationism controversy. The debates between the two parties are concerned with the nature of existence of space-time. Substantivalism maintains that the points of space-time have existence analogous to material substance. Relationism claims that space-time should be understood as the framework of possible spatio-temporal relations between bodies. Newtonian space is considered as a three dimensional entity in accordance with our geometric common sense. Yet given that the concept of motion is defined as the change of position throughout time, it is possible to interpret space-time as a 4 dimensional entity. In this essay, substantivalist-relationist debate is considered within the context of non-relativistic 4 dimensional space-time theory. This essay attempts to clarify the dispute over the ontology of space-time by elucidating the relationship between the ontology of space-time, motion, and space-time symmetry.