• Title/Summary/Keyword: Morpheme

Search Result 238, Processing Time 0.02 seconds

syntactic morpheme generation using morpheme dictionary (형태소 사전 기반 구문 형태소 생성)

  • Park, In-Cheol
    • Journal of the Korea Computer Industry Society
    • /
    • v.6 no.5
    • /
    • pp.725-734
    • /
    • 2005
  • Syntactic morpheme is proposed for reducing morpheme units generated by korean morpheme analyzer. It is proved that syntactic morpheme remarkably diminished the overhead of syntactic analyzer. However, the syntactic morpheme generation is so separated from the morpheme analyze phase in the existing system that it needs an extra execution time. Moreover, the method do not consider spacing-free statements. In this paper, we propose a syntactic morpheme generation using morpheme dictionary in order to resolve the problems. Experiments show that our proposed method reduce generation time more than one hundred times as compared with the existing one.

  • PDF

Modeling Cross-morpheme Pronunciation Variation for Korean LVCSR (한국어 연속음성인식을 위한 형태소 경계에서의 발음 변화 현상 모델링)

  • Lee Kyong-Nim;Chung Minhwa
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.75-78
    • /
    • 2003
  • In this paper, we describe a cross-morpheme pronunciation variation model which is especially useful for constructing morpheme-based pronunciation lexicon for Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. Since phonemic context together with morphological category and morpheme boundary information affect Korean pronunciation variations, we have distinguished pronunciation variation rules according to the locations such as within a morpheme, across a morpheme boundary in a compound noun, across a morpheme boundary in an eojeol, and across an eojeol boundary. In 33K-morpheme Korean CSR experiment, an absolute improvement of 1.16% in WER from the baseline performance of 23.17% WER is achieved by modeling cross-morpheme pronunciation variations with a context-dependent multiple pronunciation lexicon.

  • PDF

Modeling Cross-morpheme Pronunciation Variations for Korean Large Vocabulary Continuous Speech Recognition (한국어 연속음성인식 시스템 구현을 위한 형태소 단위의 발음 변화 모델링)

  • Chung Minhwa;Lee Kyong-Nim
    • MALSORI
    • /
    • no.49
    • /
    • pp.107-121
    • /
    • 2004
  • In this paper, we describe a cross-morpheme pronunciation variation model which is especially useful for constructing morpheme-based pronunciation lexicon to improve the performance of a Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. Since phonemic context together with morphological category and morpheme boundary information affect Korean pronunciation variations, we have distinguished phonological rules that can be applied to phonemes in within-morpheme and cross-morpheme. The results of 33K-morpheme Korean CSR experiments show that an absolute reduction of 1.45% in WER from the baseline performance of 18.42% WER was achieved by modeling proposed pronunciation variations with a possible multiple context-dependent pronunciation lexicon.

  • PDF

A Morpheme Analyzer based on Transformer using Morpheme Tokens and User Dictionary (사용자 사전과 형태소 토큰을 사용한 트랜스포머 기반 형태소 분석기)

  • DongHyun Kim;Do-Guk Kim;ChulHui Kim;MyungSun Shin;Young-Duk Seo
    • Smart Media Journal
    • /
    • v.12 no.9
    • /
    • pp.19-27
    • /
    • 2023
  • Since morphemes are the smallest unit of meaning in Korean, it is necessary to develop an accurate morphemes analyzer to improve the performance of the Korean language model. However, most existing analyzers present morpheme analysis results by learning word unit tokens as input values. However, since Korean words are consist of postpositions and affixes that are attached to the root, even if they have the same root, the meaning tends to change due to the postpositions or affixes. Therefore, learning morphemes using word unit tokens can lead to misclassification of postposition or affixes. In this paper, we use morpheme-level tokens to grasp the inherent meaning in Korean sentences and propose a morpheme analyzer based on a sequence generation method using Transformer. In addition, a user dictionary is constructed based on corpus data to solve the out - of-vocabulary problem. During the experiment, the morpheme and morpheme tags printed by each morpheme analyzer were compared with the correct answer data, and the experiment proved that the morpheme analyzer presented in this paper performed better than the existing morpheme analyzer.

Korean Broadcast News Transcription Using Morpheme-based Recognition Units

  • Kwon, Oh-Wook;Alex Waibel
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.1E
    • /
    • pp.3-11
    • /
    • 2002
  • Broadcast news transcription is one of the hardest tasks in speech recognition because broadcast speech signals have much variability in speech quality, channel and background conditions. We developed a Korean broadcast news speech recognizer. We used a morpheme-based dictionary and a language model to reduce the out-of·vocabulary (OOV) rate. We concatenated the original morpheme pairs of short length or high frequency in order to reduce insertion and deletion errors due to short morphemes. We used a lexicon with multiple pronunciations to reflect inter-morpheme pronunciation variations without severe modification of the search tree. By using the merged morpheme as recognition units, we achieved the OOV rate of 1.7% comparable to European languages with 64k vocabulary. We implemented a hidden Markov model-based recognizer with vocal tract length normalization and online speaker adaptation by maximum likelihood linear regression. Experimental results showed that the recognizer yielded 21.8% morpheme error rate for anchor speech and 31.6% for mostly noisy reporter speech.

A Comparative Study of the Trisyllabic Words with same form-morpheme and same meaning in Modern Chinese and the Trisyllabic Korean Words Written in Chinese Characters with same form-morpheme and same meaning (현대 중국어의 삼음사(三音詞)와 현용 한국 삼음절(三音節) 한자어(漢字語)의 동형(同形) 동소어(同素語) 비교 연구)

  • CHOE, GEUM DAN
    • Cross-Cultural Studies
    • /
    • v.25
    • /
    • pp.743-773
    • /
    • 2011
  • In this research, the writer has done a comparative analysis of 4,791 trisyllabic modern Chinese vocabularies from "a dictionary for trisyllabic modern Chinese word" and the corresponding Korean words written in Chinese characters out of 170,000 vocabularies hereupon that are collected in "new age new Korean dictionar y". Aa a result, we have the total 407 pairs of corresponding group with the following 3 types: 1) Chinese : Korean 3(2) : 3 syllable Chinese characters with completely same form-morpheme and same meaning, use, class (376pairs, 92.38% of 407), 2) Chinese : Korean 3 : 3 syllable Chinese characters with completely same form-morpheme and partly same meaning, use, class (18pairs, 4.42% of 407), 3)Chinese : Korean 3 : 3 syllable Chinese characters with completely same form-morpheme and different meaning, use, class (13pairs, 3.19% of 407).

A Comparative Study of New HSK and Entry-Level of TOPIK Written in Sino-Korean in the same form and morpheme of vocabularies (신(新)HSK와 초급용(初級用) TOPIK 어휘 중의 중한(中韓) 동형(同形) 동소(同素) 한자(漢字) 어휘의 비교 연구)

  • Choe, Geum Dan
    • Cross-Cultural Studies
    • /
    • v.30
    • /
    • pp.187-222
    • /
    • 2013
  • In this study, From 1,560 entry-level of TOPIK standard vocabularies are 702 Sino-Korean words selected which account for 45% of the whole vocabularies in TOPIK. In addition, the same form and morpheme words in Sino-Korean are sorted out by comparing them with 5,000 words of the NEW HSK vocabularies in Sino-Korean morpheme, array position of morpheme, meaning, and usage. Those are categorized into three parts : type of completely the same form-morpheme and same meaning, use, class(189 pairs), type of completely the same form-morpheme and partly same meaning, use, class(28 pairs), and type of completely the same form-morpheme and different meaning, use, class(10 pairs). The first type of words that account for 83.26% of them are used in exactly the same way in both Chinese and Korean. Through an accurate understanding of these vocabularies could either Chinese-speaking Korean learners or Korean-speaking Chinese learners apply those words in their mother tongue to the acquisition of the target language and get more effective means of learning methods for language proficiency test.

Part-Of-Speech Tagging and the Recognition of the Korean Unknown-words Based on Machine Learning (기계학습에 기반한 한국어 미등록 형태소 인식 및 품사 태깅)

  • Choi, Maeng-Sik;Kim, Hark-Soo
    • The KIPS Transactions:PartB
    • /
    • v.18B no.1
    • /
    • pp.45-50
    • /
    • 2011
  • Unknown morpheme errors in Korean morphological analysis are divided into two types: The one is the errors that a morphological analyzer entirely fails to return any morpheme sequences, and the other is the errors that a morphological analyzer returns incorrect combinations of known morphemes. Most previous unknown morpheme estimation techniques have been focused on only the former errors. This paper proposes a unknown morpheme estimation method which can handle both of the unknown morpheme errors. The proposed method detects Eojeols (Korean spacing units) that may include unknown morpheme errors using SVM (Support Vector Machine). Then, using CRFs (Conditional Random Fields), it segments morphemes from the detected Eojeols and annotates the segmented morphemes with new POS tags. In the experiments, the proposed method outperformed the conventional method based on the longest matching of functional words. Based on the experimental results, we knew that the second type errors should be dealt with in order to increase the performance of Korean morphological analysis.

Effects of orthographic and morphological frequency of a syllable in Korean word recognition (한국어 음절의 표기빈도와 형태소빈도가 단어인지에 미치는 효과)

  • Yi, Kwang-Oh;Bae, Sung-Bong
    • Korean Journal of Cognitive Science
    • /
    • v.20 no.3
    • /
    • pp.309-333
    • /
    • 2009
  • Two experiments were conducted to examine the role of Kulja and morpheme in processing two-syllable Sino-Korean words. In Experiment 1, the effects of morphemic frequency were not significant at the initial and final positions of a word while Kulja frequency and Kulja-morpheme correspondence at both positions in a word had a significant impact on the processing of nonwords. Lexical decision times were longer for nonwords with high frequency Kulja and for nonwords with ambiguous Kulja-morpheme correspondence whose Kulja can go with many different morphemes. In Experiment 2 Kulja-morpheme correspondence was examined for words as well as nonwords. Lexical decisions were slower for stimuli with ambiguous Kulja-morpheme correspondence. The effect was more stable for nonwords, which replicated the result of Experiment 1. In sum, the results of this study suggest that words with ambiguous Kulja-morpheme correspondence activate many different morphemes and competition among these morphemic candidates slows down the lexical selection process. Kulja frequency, Kulja neighborhood, morphemic frequency, morphological neighborhood, and Kulja-morpheme correspondence in Korean word recognition were also discussed.

  • PDF

Building a Morpheme-Based Pronunciation Lexicon for Korean Large Vocabulary Continuous Speech Recognition (한국어 대어휘 연속음성 인식용 발음사전 자동 생성 및 최적화)

  • Lee Kyong-Nim;Chung Minhwa
    • MALSORI
    • /
    • v.55
    • /
    • pp.103-118
    • /
    • 2005
  • In this paper, we describe a morpheme-based pronunciation lexicon useful for Korean LVCSR. The phonemic-context-dependent multiple pronunciation lexicon improves the recognition accuracy when cross-morpheme pronunciation variations are distinguished from within-morpheme pronunciation variations. Since adding all possible pronunciation variants to the lexicon increases the lexicon size and confusability between lexical entries, we have developed a lexicon pruning scheme for optimal selection of pronunciation variants to improve the performance of Korean LVCSR. By building a proposed pronunciation lexicon, an absolute reduction of $0.56\%$ in WER from the baseline performance of $27.39\%$ WER is achieved by cross-morpheme pronunciation variations model with a phonemic-context-dependent multiple pronunciation lexicon. On the best performance, an additional reduction of the lexicon size by $5.36\%$ is achieved from the same lexical entries.

  • PDF