• Title/Summary/Keyword: Homograph

Search Result 17, Processing Time 0.024 seconds

An Analysis of Korean Dependency Relation by Homograph Disambiguation (동형이의어 분별에 의한 한국어 의존관계 분석)

  • Kim, Hong-Soon;Ock, Cheol-Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.6
    • /
    • pp.219-230
    • /
    • 2014
  • An analysis of dependency relation is a job that determines the governor and the dependent between words in sentence. The dependency relation of predicate is established by patterns and selectional restriction of subcategorization of the predicate. This paper proposes a method of analysis of Korean dependency relation using homograph predicate disambiguated in morphology analysis phase. The disambiguated homograph predicates has each different pattern. Especially reusing a stage transition training dictionary used during tagging POS and homograph, we propose a method of fixing the dependency relation of {noun+postposition, predicate}, and we analyze the accuracy and an effect of homograph for analysis of dependency relation. We used the Sejong Phrase Structured Corpus for experiment. We transformed the phrase structured corpus to dependency relation structure and tagged homograph. From the experiment, the accuracy of dependency relation by disambiguating homograph is 80.38%, the accuracy is increased by 0.42% compared with one of undisambiguated homograph. The Z-values in statistical hypothesis testing with significance level 1% is ${\mid}Z{\mid}=4.63{\geq}z_{0.01}=2.33$. So we can conclude that the homograph affects on analysis of dependency relation, and the stage transition training dictionary used in tagging POS and homograph affects 7.14% on the accuracy of dependency relation.

Disambiguation of Homograph Suffixes using Lexical Semantic Network(U-WIN) (어휘의미망(U-WIN)을 이용한 동형이의어 접미사의 의미 중의성 해소)

  • Bae, Young-Jun;Ock, Cheol-Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.1 no.1
    • /
    • pp.31-42
    • /
    • 2012
  • In order to process the suffix derived nouns of Korean, most of Korean processing systems have been registering the suffix derived nouns in dictionary. However, this approach is limited because the suffix is very high productive. Therefore, it is necessary to analyze semantically the unregistered suffix derived nouns. In this paper, we propose a method to disambiguate homograph suffixes using Korean lexical semantic network(U-WIN) for the purpose of semantic analysis of the suffix derived nouns. 33,104 suffix derived nouns including the homograph suffixes in the morphological and semantic tagged Sejong Corpus were used for experiments. For the experiments first of all we semantically tagged the homograph suffixes and extracted root of the suffix derived nouns and mapped the root to nodes in the U-WIN. And we assigned the distance weight to the nodes in U-WIN that could combine with each homograph suffix and we used the distance weight for disambiguating the homograph suffixes. The experiments for 35 homograph suffixes occurred in the Sejong corpus among 49 homograph suffixes in a Korean dictionary result in 91.01% accuracy.

Improvement of Korean Homograph Disambiguation using Korean Lexical Semantic Network (UWordMap) (한국어 어휘의미망(UWordMap)을 이용한 동형이의어 분별 개선)

  • Shin, Joon-Choul;Ock, Cheol-Young
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.71-79
    • /
    • 2016
  • Disambiguation of homographs is an important job in Korean semantic processing and has been researched for long time. Recently, machine learning approaches have demonstrated good results in accuracy and speed. Other knowledge-based approaches are being researched for untrained words. This paper proposes a hybrid method based on the machine learning approach that uses a lexical semantic network. The use of a hybrid approach creates an additional corpus from subcategorization information and trains this additional corpus. A homograph tagging phase uses the hypernym of the homograph and an additional corpus. Experimentation with the Sejong Corpus and UWordMap demonstrates the hybrid method is to be effective with an increase in accuracy from 96.51% to 96.52%.

An Evaluation of Translation Quality by Homograph Disambiguation in Korean-X Neural Machine Translation Systems (한-X 신경기계번역시스템에서 동형이의어 분별에 따른 변역질 평가)

  • Nguyen, Quang-Phuoc;Shin, Joon-Choul;Ock, Cheol-Young
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.504-509
    • /
    • 2018
  • Neural machine translation (NMT) has recently achieved the state-of-the-art performance. However, it is reported failing in the word sense disambiguation (WSD) for several popular language pairs. In this paper, we explore the extent to which NMT systems are able to disambiguate the Korean homographs. Homographs, words with different meanings but the same written form, cause the word choice problems for NMT systems. Consistent with the popular language pairs, we discover that NMT systems fail to translate Korean homographs correctly. We provide a Korean word sense disambiguation tool-UTagger to use for improvement of NMT's translation quality. We conducted translation experiments using Korean-English and Korean-Vietnamese language pairs. The experimental results show that UTagger can significantly improve the translation quality of NMT in terms of the BLEU, TER, and DLRATIO evaluation metrics.

  • PDF

Semantic Information Retrieval Based on User-Word Intelligent Network (U-WIN 기반의 의미적 정보검색 기술)

  • Im, Ji-Hui;Choi, Ho-Seop;Ock, Cheol-Young
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.547-550
    • /
    • 2006
  • The criterion which judges an information retrieval system performance is to how many accurately retrieve an information that the user wants. The search result which uses only homograph has been appears the various documents that relates to each meaning of the word or intensively appears the documents that relates to specific meaning of it. So in this paper, we suggest semantic information retrieval technique using relation within User-Word Intelligent Network(U-WIN) to solve a disambiguation of query In our experiment, queries divide into two classes, the homograph used in terminology and the general homograph, and it sets the expansion query forms at "query + hypemym". Thus we found that only web document search's precision is average 73.5% and integrated search's precision is average 70% in two portal site. It means that U-WIN-Based semantic information retrieval technique can be used efficiently for a IR system.

  • PDF

A Study on the Description of Relationships and Homographs in Terms of Creator and Work in the Korean Thesaurus (한글 시소러스에서 저자와 저작에 대한 관계 설정과 동형 이의어의 기술)

  • Han, Sang-Kil;Choi, Suk-Doo
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.45 no.4
    • /
    • pp.139-155
    • /
    • 2011
  • The failure of distinguishing homographs in describing relations of individual authors and relations of authorship(i.e. distinction of persons with the same name or persons of the same literary author's name) will cause difficulties of retrieving exact information. It is because relations of automorphism cannot be formed between the two sets mentioned above. Therefore, it is ultimately necessary to set up the criteria or tools to distinguish homographs in order to retrieve more exact information. In the past, some efforts were made to develop authority data in order to solve the homograph problems by individual libraries, documents and portal sites in Korea. It is well understood that developing authority data by an individual institution was very difficult with no criteria or no rules to clarify the homograph problems at the national level. This study is to develop ways of recognizing individual names including subject words and proper nouns. The results of the study will present methods of distinguishing and describing homographs between individual author sets, and authorship sets particularly focused on the areas of arts and popular culture.

Korean Homograph Tagging Model based on Sub-Word Conditional Probability (부분어절 조건부확률 기반 동형이의어 태깅 모델)

  • Shin, Joon Choul;Ock, Cheol Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.10
    • /
    • pp.407-420
    • /
    • 2014
  • In general, the Korean morpheme analysis procedure is divided into two steps. In the first step as an ambiguity generation step, an Eojeol is analyzed into many morpheme sequences as candidates. In the second step, one appropriate candidate is chosen by using contextual information. Hidden Markov Model(HMM) is typically applied in the second step. This paper proposes Sub-word Conditional Probability(SCP) model as an alternate algorithm. SCP uses sub-word information of adjacent eojeol first. If it failed, then SCP use morpheme information restrictively. In the accuracy and speed comparative test, HMM's accuracy is 96.49% and SCP's accuracy is just 0.07% lower. But SCP reduced processing time 53%.

The Prosodic Characteristics of Utterance of Sentences with Ambiguous Word in Patients with Neurogenic Communication Disorders (어휘적 중의성 문장 발화 시 신경언어장애인의 운율 특성)

  • Lee, Myoung-Soon;Kwon, Do-Ha
    • Phonetics and Speech Sciences
    • /
    • v.1 no.1
    • /
    • pp.87-91
    • /
    • 2009
  • The purpose of this study was to examine the characteristics of prosody of utterance of ambiguous sentences in patients with neurogenic communication disorders. Ambiguous words on which prosody may have an impact were used to investigate this matter. The characteristics of tone duration, pitch and intensity were analyzed to examine the characteristics of prosody in patients with lesions in the left or right hemisphere and normal controls. The whole process was recorded using a Praat 4.3.14 and for statistical analyses, two-way Anova and multiple comparative analyses were carried out using SPSS10.0 for Windows. The conclusions of this study are as follows: The length of vowel in homograph in Korean was different depending on the meaning and the duration of vowel was the longest in patients with lesions in the left hemisphere. There was agreed that they had problem of timing of prosody(Danly & Shapiro, 1982). On the other hand, there found that patients with lesions in the right hemisphere had deficiency of changeability in pitch. Among various acoustic parameters, this study focused on the duration which are closely related to suprasegmental characteristics of prosody. More acoustic parameters should be taken into account in future studies.

  • PDF

Study on 3 DoF Image and Video Stitching Using Sensed Data

  • Kim, Minwoo;Chun, Jonghoon;Kim, Sang-Kyun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.9
    • /
    • pp.4527-4548
    • /
    • 2017
  • This paper proposes a method to generate panoramic images by combining conventional feature extraction algorithms (e.g., SIFT, SURF, MPEG-7 CDVS) with sensed data from inertia sensors to enhance the stitching results. The challenge of image stitching increases when the images are taken from two different mobile phones with no posture calibration. Using inertia sensor data obtained by the mobile phone, images with different yaw, pitch, and roll angles are preprocessed and adjusted before performing stitching process. Performance of stitching (e.g., feature extraction time, inlier point numbers, stitching accuracy) between conventional feature extraction algorithms is reported along with the stitching performance with/without using the inertia sensor data. In addition, the stitching accuracy of video data was improved using the same sensed data, with discrete calculation of homograph matrix. The experimental results for stitching accuracies and speed using sensed data are presented in this paper.

Word sense disambiguation using dynamic sized context and distance weighting (가변 크기 문맥과 거리가중치를 이용한 동형이의어 중의성 해소)

  • Lee, Hyun Ah
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.38 no.4
    • /
    • pp.444-450
    • /
    • 2014
  • Most researches on word sense disambiguation have used static sized context regardless of sentence patterns. This paper proposes to use dynamic sized context considering sentence patterns and distance between words for word sense disambiguation. We evaluated our system 12 words in 32,735sentences with Sejong POS and sense tagged corpus, and dynamic sized context showed 92.2% average accuracy for predicates, which is better than accuracy of static sized context.