• Title/Summary/Keyword: Pronominalization

Search Result 3, Processing Time 0.019 seconds

Generation of Zero Pronouns using Center Transition of Preceding Utterances (선행 발화의 중심 전이를 이용한 영형 생성)

  • Roh, Ji-Eun;Na, Seung-Hoon;Lee, Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.10
    • /
    • pp.990-1002
    • /
    • 2005
  • To generate coherent texts, it is important to produce appropriate pronouns to refer to previously-mentioned things in a discourse. Specifically, we focus on pronominalization by zero pronouns which frequently occur in Korean. This paper investigates zero pronouns in Korean based on the cost-based centering theory, especially focusing on the center transitions of adjacent utterances. In previous centering works, only one type of nominal entity has been considered as the target of pronominalization, even though other entities are frequently pronominalized as zero pronouns. To resolve this problem, and explain the reference phenomena of real texts, four types of nominal entity (Npair, Ninter, Nintra, and Nnon) from centering theory are defined with the concept of inter-, intra-, and pairwise salience. For each entity type, a case study of zero phenomena is performed through analyzing corpus and building a pronominalization model. This study shows that the zero phenomena of entities which have been neglected in previous centering works are explained via the renter transition of the second previous utterance. We also show that in Ninter, Nintra, and Nnon, pronominalization accuracy achieved by complex combination of several types of features is completely or nearly achieved by using the second previous utterance's transition across genres.

Generation of Natural Referring Expressions by Syntactic Information and Cost-based Centering Model (구문 정보와 비용기반 중심화 이론에 기반한 자연스러운 지시어 생성)

  • Roh Ji-Eun;Lee Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.12
    • /
    • pp.1649-1659
    • /
    • 2004
  • Text Generation is a process of generating comprehensible texts in human languages from some underlying non-linguistic representation of information. Among several sub-processes for text generation to generate coherent texts, this paper concerns referring expression generation which produces different types of expressions to refer to previously-mentioned things in a discourse. Specifically, we focus on pronominalization by zero pronouns which frequently occur in Korean. To build a generation model of referring expressions for Korean, several features are identified based on grammatical information and cost-based centering model, which are applied to various machine learning techniques. We demonstrate that our proposed features are well defined to explain pronominalization, especially pronominalization by zero pronouns in Korean, through 95 texts from three genres - Descriptive texts, News, and Short Aesop's Fables. We also show that our model significantly outperforms previous ones with a 99.9% confidence level by a T-test.

A Study on the Application of Machine Learning in Literary Texts - Focusing on Rule Selection for Speaker Directive Analysis - (문학 텍스트의 머신러닝 활용방안 연구 - 화자 지시어 분석을 위한 규칙 선별을 중심으로 -)

  • Kwon, Kyoungah;Ko, Ilju;Lee, Insung
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.313-323
    • /
    • 2021
  • The purpose of this study is to propose rules that can identify the speaker referred by the speaker directive in the text for the realization of a machine learning-based virtual character using a literary text. Through previous studies, we found that when applying literary texts to machine learning, the machine did not properly discriminate the speaker without any specific rules for the analysis of speaker directives such as other names, nicknames, pronouns, and so on. As a way to solve this problem, this study proposes 'nine rules for finding a speaker indicated by speaker directives (including pronouns)': location, distance, pronouns, preparatory subject/preparatory object, quotations, number of speakers, non-characters directives, word compound form, dispersion of speaker names. In order to utilize characters within a literary text as virtual ones, the learning text must be presented in a machine-comprehensible way. We expect that the rules suggested in this study will reduce trial and error that may occur when using literary texts for machine learning, and enable smooth learning to produce qualitatively excellent learning results.