• Title/Summary/Keyword: morphemes

Search Result 140, Processing Time 0.025 seconds

(A Method to Classify and Recognize Spelling Changes between Morphemes of a Korean Word) (한국어 어절의 철자변화 현상 분류와 인식 방법)

  • 김덕봉
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.5_6
    • /
    • pp.476-486
    • /
    • 2003
  • There is no explicit spelling change information in part-of-speech tagged corpora of Korean. It causes some difficulties in acquiring the data to study Korean morphology, i.e. automatically in constructing a dictionary for morphological analysis and systematically in collecting the phenomena of the spelling changes from the corpora. To solve this problem, this paper presents a method to recognize spelling changes between morphemes of a Korean word in tagged corpora, only using a string matching, without using a dictionary and phonological rules. This method not only has an ability to robustly recognize the spelling changes because it doesn't use any phonological rules, but also can be implemented with few cost. This method has been experimented with a large tagged corpus of Korean, and recognized the 100% of spelling changes in the corpus with accuracy.

Effects of orthographic and morphological frequency of a syllable in Korean word recognition (한국어 음절의 표기빈도와 형태소빈도가 단어인지에 미치는 효과)

  • Yi, Kwang-Oh;Bae, Sung-Bong
    • Korean Journal of Cognitive Science
    • /
    • v.20 no.3
    • /
    • pp.309-333
    • /
    • 2009
  • Two experiments were conducted to examine the role of Kulja and morpheme in processing two-syllable Sino-Korean words. In Experiment 1, the effects of morphemic frequency were not significant at the initial and final positions of a word while Kulja frequency and Kulja-morpheme correspondence at both positions in a word had a significant impact on the processing of nonwords. Lexical decision times were longer for nonwords with high frequency Kulja and for nonwords with ambiguous Kulja-morpheme correspondence whose Kulja can go with many different morphemes. In Experiment 2 Kulja-morpheme correspondence was examined for words as well as nonwords. Lexical decisions were slower for stimuli with ambiguous Kulja-morpheme correspondence. The effect was more stable for nonwords, which replicated the result of Experiment 1. In sum, the results of this study suggest that words with ambiguous Kulja-morpheme correspondence activate many different morphemes and competition among these morphemic candidates slows down the lexical selection process. Kulja frequency, Kulja neighborhood, morphemic frequency, morphological neighborhood, and Kulja-morpheme correspondence in Korean word recognition were also discussed.

  • PDF

Postprocessing of A Speech Recognition using the Morphological Anlaysis Technique (형태소 분석 기법을 이용한 음성 인식 후처리)

  • 박미성;김미진;김계성;김성규;이문희;최재혁;이상조
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.4
    • /
    • pp.65-77
    • /
    • 1999
  • There are two problems which will be processed to graft a continuous speech recognition results into natural language processing technique. First, the speaking's unit isn't consistent with text's spacing unit. Second, when it is to be pronounced the phonological alternation phenomena occur inside morphemes or among morphemes. In this paper, we implement the postprocessing system of a continuous speech recognition that above all, solve two problems using the eo-jeol generator and syllable recoveror and morphologically analyze the generated results and then correct the failed results through the corrector. Our system experiments with two kinds of speech corpus, i.e., a primary school text book and editorial corpus. The successful percentage of the former is 93.72%, that of the latter is 92.26%. As results of experiment, we verified that our system is stable regardless the sorts of corpus.

  • PDF

A High-Speed Korean Morphological Analysis Method based on Pre-Analyzed Partial Words (부분 어절의 기분석에 기반한 고속 한국어 형태소 분석 방법)

  • Yang, Seung-Hyun;Kim, Young-Sum
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.3
    • /
    • pp.290-301
    • /
    • 2000
  • Most morphological analysis methods require repetitive procedures of input character code conversion, segmentation and lemmatization of constituent morphemes, filtering of candidate results through looking up lexicons, which causes run-time inefficiency. To alleviate such problem of run-time inefficiency, many systems have introduced the notion of 'pre-analysis' of words. However, this method based on pre-analysis dictionary of surface also has a critical drawback in its practical application because the size of the dictionaries increases indefinite to cover all words. This paper hybridizes both extreme approaches methodologically to overcome the problems of the two, and presents a method of morphological analysis based on pre-analysis of partial words. Under such hybridized scheme, most computational overheads, such as segmentation and lemmatization of morphemes, are shifted to building-up processes of the pre-analysis dictionaries and the run-time dictionary look-ups are greatly reduced, so as to enhance the run-time performance of the system. Moreover, additional computing overheads such as input character code conversion can also be avoided because this method relies upon no graphemic processing.

  • PDF

A Filtering Method of Malicious Comments Through Morpheme Analysis (형태소 분석을 통한 악성 댓글 필터링 방안)

  • Ha, Yeram;Cheon, Junseok;Wang, Inseo;Park, Minuk;Woo, Gyun
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.9
    • /
    • pp.750-761
    • /
    • 2021
  • Even though the replying comments on Internet articles have positive effects on discussions and communications, the malicious comments are still the source of problems even driving people to death. Automatic detection of malicious comments is important in this respect. However, the current filtering method of the malicious comments, based on forbidden words, is not so effective, especially for the replying comments written in Korean. This paper proposes a new filtering approach based on morpheme analysis, identifying coarse and polite morphemes. Based on these two groups of morphemes, the soundness of comments can be calculated. Further, this paper proposes various impact measures for comments, based on the soundness. According to the experiments on malicious comments, one of the impact measures is effective for detecting malicious comments. Comparing our method with the clean-bot of a portal site, the recall is enhanced by 37.93% point and F-measure is also enhanced up to 47.66 points. According to this result, it is highly expected that the new filtering method based on morpheme analysis can be a promising alternative to those based on forbidden words.

Two-Path Language Modeling Considering Word Order Structure of Korean (한국어의 어순 구조를 고려한 Two-Path 언어모델링)

  • Shin, Joong-Hwi;Park, Jae-Hyun;Lee, Jung-Tae;Rim, Hae-Chang
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.8
    • /
    • pp.435-442
    • /
    • 2008
  • The n-gram model is appropriate for languages, such as English, in which the word-order is grammatically rigid. However, it is not suitable for Korean in which the word-order is relatively free. Previous work proposed a twoply HMM that reflected the characteristics of Korean but failed to reflect word-order structures among words. In this paper, we define a new segment unit which combines two words in order to reflect the characteristic of word-order among adjacent words that appear in verbal morphemes. Moreover, we propose a two-path language model that estimates probabilities depending on the context based on the proposed segment unit. Experimental results show that the proposed two-path language model yields 25.68% perplexity improvement compared to the previous Korean language models and reduces 94.03% perplexity for the prediction of verbal morphemes where words are combined.

Korean Broadcast News Transcription Using Morpheme-based Recognition Units

  • Kwon, Oh-Wook;Alex Waibel
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.1E
    • /
    • pp.3-11
    • /
    • 2002
  • Broadcast news transcription is one of the hardest tasks in speech recognition because broadcast speech signals have much variability in speech quality, channel and background conditions. We developed a Korean broadcast news speech recognizer. We used a morpheme-based dictionary and a language model to reduce the out-of·vocabulary (OOV) rate. We concatenated the original morpheme pairs of short length or high frequency in order to reduce insertion and deletion errors due to short morphemes. We used a lexicon with multiple pronunciations to reflect inter-morpheme pronunciation variations without severe modification of the search tree. By using the merged morpheme as recognition units, we achieved the OOV rate of 1.7% comparable to European languages with 64k vocabulary. We implemented a hidden Markov model-based recognizer with vocal tract length normalization and online speaker adaptation by maximum likelihood linear regression. Experimental results showed that the recognizer yielded 21.8% morpheme error rate for anchor speech and 31.6% for mostly noisy reporter speech.

Grammatical morphemes' effect on Korean word vector generation (형식형태소가 한국어 단어 벡터 생성에 미치는 영향)

  • Youn, Junyoung;Kim, Dowon;Min, Tae Hong;Lee, Jae Sung
    • Annual Conference on Human and Language Technology
    • /
    • 2017.10a
    • /
    • pp.179-183
    • /
    • 2017
  • 단어 벡터는 단어 사이의 관계를 벡터 연산으로 가능하게 할 뿐 아니라, 상위의 신경망 프로그램의 사전학습 데이터로 많이 활용되고 있다. 한국어 어절은 생산적인 조사나 어미 때문에 효율적인 단어 벡터 생성이 어려워 대개 실질형태소만을 사용하여 한국어 단어 벡터를 생성한다. 본 논문에서는 실질형태소와 형식형태소를 모두 사용하되, 형식형태소를 적절하게 분류하여 단어 벡터의 성능을 높이는 방법을 제안한다. 자체 구축한 단어 관계 테스트 집합으로 추출 성능을 평가해 본 결과, 제안한 방법으로 형식형태소를 사용할 경우, 성능이 향상되었다.

  • PDF

Recent R&D Trends for Pretrained Language Model (딥러닝 사전학습 언어모델 기술 동향)

  • Lim, J.H.;Kim, H.K.;Kim, Y.K.
    • Electronics and Telecommunications Trends
    • /
    • v.35 no.3
    • /
    • pp.9-19
    • /
    • 2020
  • Recently, a technique for applying a deep learning language model pretrained from a large corpus to fine-tuning for each application task has been widely used as a language processing technology. The pretrained language model shows higher performance and satisfactory generalization performance than existing methods. This paper introduces the major research trends related to deep learning pretrained language models in the field of language processing. We describe in detail the motivations, models, learning methods, and results of the BERT language model that had significant influence on subsequent studies. Subsequently, we introduce the results of language model studies after BERT, focusing on SpanBERT, RoBERTa, ALBERT, BART, and ELECTRA. Finally, we introduce the KorBERT pretrained language model, which shows satisfactory performance in Korean language. In addition, we introduce techniques on how to apply the pretrained language model to Korean (agglutinative) language, which consists of a combination of content and functional morphemes, unlike English (refractive) language whose endings change depending on the application.

Generating a Category Set of Words Using a Hierarchical Part-of-speech System and Tagged Corpus

  • Kojima, Takeyuki;Kotani, Yoshiyuki
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.217-226
    • /
    • 2002
  • In this paper, we propose a method of generating a proper categorization of morphemes by giving a hierarchical part-of-speech system and a corpus tagged using this part-of-speech system. Our method use hierarchical information in the part-of-speech system and statistical information in the corpus to generate a category set. The statistical information is based on the context of occurrence of categories. First, we specify the format of given information. Then, we describe an algorithm to generate a proper categorization. Finally, we present the results of our experiments in applying this method. We obtained a moderately proper categorization and found several candidates for improvement .

  • PDF