Search | Korea Science

Kim, Sihyung;Lee, Hyeon-gu;Kim, Harksoo
- Journal of KIISE
- /
- v.45 no.2
- /
- pp.134-140
- /
- 2018
A chat system is a computer program that understands user's miscellaneous utterances and generates appropriate responses. Sometimes a chat system needs to answer users' simple information-seeking questions. However, previous generative chat systems do not consider how to embed knowledge entities (i.e., subjects and objects in triple knowledge), essential elements for question-answering. The previous chat models have a disadvantage that they generate same responses although knowledge entities in users' utterances are changed. To alleviate this problem, we propose a knowledge entity embedding method for improving question-answering accuracies of a generative chat system. The proposed method uses a Siamese recurrent neural network for embedding knowledge entities and their synonyms. For experiments, we implemented a sequence-to-sequence model in which subjects and predicates are encoded and objects are decoded. The proposed embedding method showed 12.48% higher accuracies than the conventional embedding method based on a convolutional neural network.
https://doi.org/10.5626/JOK.2018.45.2.134 인용 KSCI

Joon-young Jung
- ETRI Journal
- /
- v.46 no.1
- /
- pp.35-47
- /
- 2024
This study introduces CR-M-SpanBERT, a coreference resolution (CR) model that utilizes multiple embedding-based span bidirectional encoder representations from transformers, for antecedent recognition in natural language (NL) text. Information extraction studies aimed to extract knowledge from NL text autonomously and cost-effectively. However, the extracted information may not represent knowledge accurately owing to the presence of ambiguous entities. Therefore, we propose a CR model that identifies mentions referring to the same entity in NL text. In the case of CR, it is necessary to understand both the syntax and semantics of the NL text simultaneously. Therefore, multiple embeddings are generated for CR, which can include syntactic and semantic information for each word. We evaluate the effectiveness of CR-M-SpanBERT by comparing it to a model that uses SpanBERT as the language model in CR studies. The results demonstrate that our proposed deep neural network model achieves high-recognition accuracy for extracting antecedents from NL text. Additionally, it requires fewer epochs to achieve an average F1 accuracy greater than 75% compared with the conventional SpanBERT approach.
https://doi.org/10.4218/etrij.2023-0308 인용 PDF

Ferre, Arnaud;Ba, Mouhamadou;Bossy, Robert
- Genomics & Informatics
- /
- v.17 no.2
- /
- pp.20.1-20.5
- /
- 2019
Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.
https://doi.org/10.5808/GI.2019.17.2.e20 인용 PDF KSCI

Duan, Hongzhou;Lee, Yongju
- The Journal of the Korea institute of electronic communication sciences
- /
- v.17 no.5
- /
- pp.801-808
- /
- 2022
Research on how to embed knowledge in large-scale Linked Data and apply neural network models for entity matching is relatively scarce. The most fundamental problem with this is that different labels lead to lexical heterogeneity. In this paper, we propose an extended GCN (Graph Convolutional Network) model that combines re-align structure to solve this lexical heterogeneity problem. The proposed model improved the performance by 53% and 40%, respectively, compared to the existing embedded-based MTransE and BootEA models, and improved the performance by 5.1% compared to the GCN-based RDGCN model.
https://doi.org/10.13067/JKIECS.2022.17.5.801 인용 PDF KSCI

Won, Yousung;Woo, Jongseong;Kim, Jiseong;Hahm, YoungGyun;Choi, Key-Sun
- Journal of KIISE
- /
- v.42 no.12
- /
- pp.1568-1574
- /
- 2015
Relation extraction plays a role in for the process of transforming a sentence into a form of knowledge base. In this paper, we focus on predicates in a sentence and aim to identify the relevant knowledge base properties required to elucidate the relationship between entities, which enables a computer to understand the meaning of a sentence more clearly. Distant Supervision is a well-known approach for relation extraction, and it performs lexicalization tasks for knowledge base properties by generating a large amount of labeled data automatically. In other words, the predicate in a sentence will be linked or mapped to the possible properties which are defined by some ontologies in the knowledge base. This lexical and ontological linking of information provides us with a way of generating structured information and a basis for enrichment of the knowledge base.
https://doi.org/10.5626/JOK.2015.42.12.1568 인용 KSCI