• Title/Summary/Keyword: Morpheme Information

Search Result 135, Processing Time 0.024 seconds

Modeling Cross-morpheme Pronunciation Variation for Korean LVCSR (한국어 연속음성인식을 위한 형태소 경계에서의 발음 변화 현상 모델링)

  • Lee Kyong-Nim;Chung Minhwa
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.75-78
    • /
    • 2003
  • In this paper, we describe a cross-morpheme pronunciation variation model which is especially useful for constructing morpheme-based pronunciation lexicon for Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. Since phonemic context together with morphological category and morpheme boundary information affect Korean pronunciation variations, we have distinguished pronunciation variation rules according to the locations such as within a morpheme, across a morpheme boundary in a compound noun, across a morpheme boundary in an eojeol, and across an eojeol boundary. In 33K-morpheme Korean CSR experiment, an absolute improvement of 1.16% in WER from the baseline performance of 23.17% WER is achieved by modeling cross-morpheme pronunciation variations with a context-dependent multiple pronunciation lexicon.

  • PDF

An Efficient Korean Morpheme Analyzer and Synthesizer using Dictionary Information and Chart Data Structure (사전 정보와 차트 자료 구조를 이용한 효율적인 형태소 분석기 및 합성기(KoMAS))

  • 김정해;이상조
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.3
    • /
    • pp.123-131
    • /
    • 1994
  • This paper describes on the analysis of morphemes and it's synthesis being constituted of Korean word phrases. To analyze morphemes, we propose the introduction of "morph" for morpheme features in lexicon and the usage of chart data structures. it controls over the generation of unnecessary morpheme, and extracts every possible morpheme unit in a word phrase which minimized lexicon investigation by using heuristic information. Moreover, to synthesize morphemes, it is composed of every possible analyzed morphemes in word phrases to take advantage of speech and union information which can be obtained for program. Therefore, the systhesis of analyzed morphemes were designed to aid a syntactic analysis next step of natural language processing. This system for analyzing and systhesizing morpheme was to generate a word phrase by unifying syntactic and semantic features of analyzed morphemes in lexicon, and then established by C language of the personal computer.

  • PDF

Modeling Cross-morpheme Pronunciation Variations for Korean Large Vocabulary Continuous Speech Recognition (한국어 연속음성인식 시스템 구현을 위한 형태소 단위의 발음 변화 모델링)

  • Chung Minhwa;Lee Kyong-Nim
    • MALSORI
    • /
    • no.49
    • /
    • pp.107-121
    • /
    • 2004
  • In this paper, we describe a cross-morpheme pronunciation variation model which is especially useful for constructing morpheme-based pronunciation lexicon to improve the performance of a Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. Since phonemic context together with morphological category and morpheme boundary information affect Korean pronunciation variations, we have distinguished phonological rules that can be applied to phonemes in within-morpheme and cross-morpheme. The results of 33K-morpheme Korean CSR experiments show that an absolute reduction of 1.45% in WER from the baseline performance of 18.42% WER was achieved by modeling proposed pronunciation variations with a possible multiple context-dependent pronunciation lexicon.

  • PDF

Morphological analysis of spoken Korean using Viterbi search (Viterbi 검색 기법을 이용한 한국어 음성 언어의 형태소 분석)

  • 김병창
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1995.06a
    • /
    • pp.200-203
    • /
    • 1995
  • This paper proposes a spoken Korean processing model which is extensible to large vocabulary continuous spoken Korean system. The integration of phoneme level speech recognition with natural language processing can support a sophisticated phonological/morphological analysis. The model consists of a diphone speech recognizer, a viterbi dictionaly searcher and a morpheme connectivity information checker. Two-level hierarchical TDNNs recognize newly defined Korean diphones. The diphone sequences are segmented and converted to the most probable morpheme sequences by the Viterbi dictionary searcher. The morpheme sequency are then examined by the morpheme connectivity information checker and the correct morpheme sequence which has the greatest probability is collected. The experiments show that the morphological analysis for spoken Korean can be achieved for 328 Eojeols with 80.6% success rate.

  • PDF

Classification of Education Video by Subtitle Analysis (자막 분석을 통한 교육 영상의 카테고리 분류 방안)

  • Lee, Ji-Hoon;Lee, Hyeon Sup;Kim, Jin-Deog
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.88-90
    • /
    • 2021
  • This paper introduces a method for extracting subtitles from lecture videos through a Korean morpheme analyzer and classifying video categories according to the extracted morpheme information. In some cases incorrect information is entered due to human error and reflected in the characteristics of the items, affecting the accuracy of the recommendation system. To prevent this, we generate a keyword table for each category using morpheme information extracted from pre-classified videos, and compare the similarity of morpheme in each category keyword table to classify categories of Lecture videos using the most similar keyword table. These human intervention reduction systems directly classify videos and aim to increase the accuracy of the system.

  • PDF

Implementation of A Morphological Analyzer Based on Pseudo-morpheme for Large Vocabulary Speech Recognizing (대어휘 음성인식을 위한 의사형태소 분석 시스템의 구현)

  • 양승원
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.4 no.2
    • /
    • pp.102-108
    • /
    • 1999
  • It is important to decide processing unit in the large vocabulary speech recognition system we propose a Pseudo-Morpheme as the recognition unit to resolve the problems in the recognition systems using the phrase or the general morpheme. We implement a morphological analysis system and tagger for Pseudo-Morpheme. The speech processing system using this pseudo-morpheme can get better result than other systems using the phrase or the general morpheme. So, the quality of the whole spoken language translation system can be improved. The analysis-ratio of our implemented system is similar to the common morphological analysis systems.

  • PDF

Error Correction in Korean Morpheme Recovery using Deep Learning (딥 러닝을 이용한 한국어 형태소의 원형 복원 오류 수정)

  • Hwang, Hyunsun;Lee, Changki
    • Journal of KIISE
    • /
    • v.42 no.11
    • /
    • pp.1452-1458
    • /
    • 2015
  • Korean Morphological Analysis is a difficult process. Because Korean is an agglutinative language, one of the most important processes in Morphological Analysis is Morpheme Recovery. There are some methods using Heuristic rules and Pre-Analyzed Partial Words that were examined for this process. These methods have performance limits as a result of not using contextual information. In this study, we built a Korean morpheme recovery system using deep learning, and this system used word embedding for the utilization of contextual information. In '들/VV' and '듣/VV' morpheme recovery, the system showed 97.97% accuracy, a better performance than with SVM(Support Vector Machine) which showed 96.22% accuracy.

Part-Of-Speech Tagging and the Recognition of the Korean Unknown-words Based on Machine Learning (기계학습에 기반한 한국어 미등록 형태소 인식 및 품사 태깅)

  • Choi, Maeng-Sik;Kim, Hark-Soo
    • The KIPS Transactions:PartB
    • /
    • v.18B no.1
    • /
    • pp.45-50
    • /
    • 2011
  • Unknown morpheme errors in Korean morphological analysis are divided into two types: The one is the errors that a morphological analyzer entirely fails to return any morpheme sequences, and the other is the errors that a morphological analyzer returns incorrect combinations of known morphemes. Most previous unknown morpheme estimation techniques have been focused on only the former errors. This paper proposes a unknown morpheme estimation method which can handle both of the unknown morpheme errors. The proposed method detects Eojeols (Korean spacing units) that may include unknown morpheme errors using SVM (Support Vector Machine). Then, using CRFs (Conditional Random Fields), it segments morphemes from the detected Eojeols and annotates the segmented morphemes with new POS tags. In the experiments, the proposed method outperformed the conventional method based on the longest matching of functional words. Based on the experimental results, we knew that the second type errors should be dealt with in order to increase the performance of Korean morphological analysis.

Morpheme Conversion for korean Text-to-Sign Language Translation System (한국어-수화 번역시스템을 위한 형태소 변환)

  • Park, Su-Hyun;Kang, Seok-Hoon;Kwon, Hyuk-Chul
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.3
    • /
    • pp.688-702
    • /
    • 1998
  • In this paper, we propose sign language morpheme generation rule corresponding to morpheme analysis for each part of speech. Korean natural sign language has extremely limited vocabulary, and the number of grammatical components eing currently used are limited, too. In this paper, therefore, we define natural sign language grammar corresponding to Korean language grammar in order to translate natural Korean language sentences to the corresponding sign language. Each phrase should define sign language morpheme generation grammar which is different from Korean language analysis grammar. Then, this grammar is applied to morpheme analysis/combination rule and sentence structure analysis rule. It will make us generate most natural sign language by definition of this grammar.

  • PDF

A Stochastic Word-Spacing System Based on Word Category-Pattern (어절 내의 형태소 범주 패턴에 기반한 통계적 자동 띄어쓰기 시스템)

  • Kang, Mi-Young;Jung, Sung-Won;Kwon, Hyuk-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.11
    • /
    • pp.965-978
    • /
    • 2006
  • This paper implements an automatic Korean word-spacing system based on word-recognition using morpheme unigrams and the pattern that the categories of those morpheme unigrams share within a candidate word. Although previous work on Korean word-spacing models has produced the advantages of easy construction and time efficiency, there still remain problems, such as data sparseness and critical memory size, which arise from the morpho-typological characteristics of Korean. In order to cope with both problems, our implementation uses the stochastic information of morpheme unigrams, and their category patterns, instead of word unigrams. A word's probability in a sentence is obtained based on morpheme probability and the weight for the morpheme's category within the category pattern of the candidate word. The category weights are trained so as to minimize the error means between the observed probabilities of words and those estimated by words' individual-morphemes' probabilities weighted according to their categories' powers in a given word's category pattern.