• Title/Summary/Keyword: Eojeol

Search Result 62, Processing Time 0.025 seconds

A Comparative study on the Effectiveness of Segmentation Strategies for Korean Word and Sentence Classification tasks (한국어 단어 및 문장 분류 태스크를 위한 분절 전략의 효과성 연구)

  • Kim, Jin-Sung;Kim, Gyeong-min;Son, Jun-young;Park, Jeongbae;Lim, Heui-seok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.12
    • /
    • pp.39-47
    • /
    • 2021
  • The construction of high-quality input features through effective segmentation is essential for increasing the sentence comprehension of a language model. Improving the quality of them directly affects the performance of the downstream task. This paper comparatively studies the segmentation that effectively reflects the linguistic characteristics of Korean regarding word and sentence classification. The segmentation types are defined in four categories: eojeol, morpheme, syllable and subchar, and pre-training is carried out using the RoBERTa model structure. By dividing tasks into a sentence group and a word group, we analyze the tendency within a group and the difference between the groups. By the model with subchar-level segmentation showing higher performance than other strategies by maximal NSMC: +0.62%, KorNLI: +2.38%, KorSTS: +2.41% in sentence classification, and the model with syllable-level showing higher performance at maximum NER: +0.7%, SRL: +0.61% in word classification, the experimental results confirm the effectiveness of those schemes.

Lexico-semantic interactions during the visual and spoken recognition of homonymous Korean Eojeols (한국어 시·청각 동음동철이의 어절 재인에 나타나는 어휘-의미 상호작용)

  • Kim, Joonwoo;Kang, Kathleen Gwi-Young;Yoo, Doyoung;Jeon, Inseo;Kim, Hyun Kyung;Nam, Hyeomin;Shin, Jiyoung;Nam, Kichun
    • Phonetics and Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.1-15
    • /
    • 2021
  • The present study investigated the mental representation and processing of an ambiguous word in the bimodal processing system by manipulating the lexical ambiguity of a visually or auditorily presented word. Homonyms (e.g., '물었다') with more than two meanings and control words (e.g., '고통을') with a single meaning were used in the experiments. The lemma frequency of words was manipulated while the relative frequency of multiple meanings of each homonym was balanced. In both experiments using the lexical decision task, a robust frequency effect and a critical interaction of word type by frequency were found. In Experiment 1, spoken homonyms yielded faster latencies relative to control words (i.e., ambiguity advantage) in the low frequency condition, while ambiguity disadvantage was found in the high frequency condition. A similar interactive pattern was found in visually presented homonyms in the subsequent Experiment 2. Taken together, the first key finding is that interdependent lexico-semantic processing can be found both in the visual and auditory processing system, which in turn suggests that semantic processing is not modality dependent, but rather takes place on the basis of general lexical knowledge. The second is that multiple semantic candidates provide facilitative feedback only when the lemma frequency of the word is relatively low.