• 제목/요약/키워드: Pronunciation modeling

검색결과 25건 처리시간 0.024초

한국어 연속음성인식 시스템 구현을 위한 형태소 단위의 발음 변화 모델링 (Modeling Cross-morpheme Pronunciation Variations for Korean Large Vocabulary Continuous Speech Recognition)

  • 정민화;이경님
    • 대한음성학회지:말소리
    • /
    • 제49호
    • /
    • pp.107-121
    • /
    • 2004
  • In this paper, we describe a cross-morpheme pronunciation variation model which is especially useful for constructing morpheme-based pronunciation lexicon to improve the performance of a Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. Since phonemic context together with morphological category and morpheme boundary information affect Korean pronunciation variations, we have distinguished phonological rules that can be applied to phonemes in within-morpheme and cross-morpheme. The results of 33K-morpheme Korean CSR experiments show that an absolute reduction of 1.45% in WER from the baseline performance of 18.42% WER was achieved by modeling proposed pronunciation variations with a possible multiple context-dependent pronunciation lexicon.

  • PDF

한국인 화자의 외래어 발음 변이 양상과 음절 기반 외래어 자소-음소 변환 (Pronunciation Variation Patterns of Loanwords Produced by Korean and Grapheme-to-Phoneme Conversion Using Syllable-based Segmentation and Phonological Knowledge)

  • 류혁수;나민수;정민화
    • 말소리와 음성과학
    • /
    • 제7권3호
    • /
    • pp.139-149
    • /
    • 2015
  • This paper aims to analyze pronunciation variations of loanwords produced by Korean and improve the performance of pronunciation modeling of loanwords in Korean by using syllable-based segmentation and phonological knowledge. The loanword text corpus used for our experiment consists of 14.5k words extracted from the frequently used words in set-top box, music, and point-of-interest (POI) domains. At first, pronunciations of loanwords in Korean are obtained by manual transcriptions, which are used as target pronunciations. The target pronunciations are compared with the standard pronunciation using confusion matrices for analysis of pronunciation variation patterns of loanwords. Based on the confusion matrices, three salient pronunciation variations of loanwords are identified such as tensification of fricative [s] and derounding of rounded vowel [ɥi] and [$w{\varepsilon}$]. In addition, a syllable-based segmentation method considering phonological knowledge is proposed for loanword pronunciation modeling. Performance of the baseline and the proposed method is measured using phone error rate (PER)/word error rate (WER) and F-score at various context spans. Experimental results show that the proposed method outperforms the baseline. We also observe that performance degrades when training and test sets come from different domains, which implies that loanword pronunciations are influenced by data domains. It is noteworthy that pronunciation modeling for loanwords is enhanced by reflecting phonological knowledge. The loanword pronunciation modeling in Korean proposed in this paper can be used for automatic speech recognition of application interface such as navigation systems and set-top boxes and for computer-assisted pronunciation training for Korean learners of English.

한국어 연속음성인식을 위한 형태소 경계에서의 발음 변화 현상 모델링 (Modeling Cross-morpheme Pronunciation Variation for Korean LVCSR)

  • 이경님;정민화
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.75-78
    • /
    • 2003
  • In this paper, we describe a cross-morpheme pronunciation variation model which is especially useful for constructing morpheme-based pronunciation lexicon for Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. Since phonemic context together with morphological category and morpheme boundary information affect Korean pronunciation variations, we have distinguished pronunciation variation rules according to the locations such as within a morpheme, across a morpheme boundary in a compound noun, across a morpheme boundary in an eojeol, and across an eojeol boundary. In 33K-morpheme Korean CSR experiment, an absolute improvement of 1.16% in WER from the baseline performance of 23.17% WER is achieved by modeling cross-morpheme pronunciation variations with a context-dependent multiple pronunciation lexicon.

  • PDF

영어 발음교정시스템을 위한 발음사전 구축 (Pronunciation Dictionary for English Pronunciation Tutoring System)

  • 김효숙;김선주
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.168-171
    • /
    • 2003
  • This study is about modeling pronunciation dictionary necessary for PLU(phoneme like unit) level word recognition. The recognition of nonnative speakers' pronunciation enables an automatic diagnosis and an error detection which are the core of English pronunciation tutoring system. The above system needs two pronunciation dictionaries. One is for representing standard English pronunciation. The other is for representing Korean speakers' English Pronunciation. Both dictionaries are integrated to generate pronunciation networks for variants.

  • PDF

한국어 대어휘 연속음성 인식용 발음사전 자동 생성 및 최적화 (Building a Morpheme-Based Pronunciation Lexicon for Korean Large Vocabulary Continuous Speech Recognition)

  • 이경님;정민화
    • 대한음성학회지:말소리
    • /
    • 제55권
    • /
    • pp.103-118
    • /
    • 2005
  • In this paper, we describe a morpheme-based pronunciation lexicon useful for Korean LVCSR. The phonemic-context-dependent multiple pronunciation lexicon improves the recognition accuracy when cross-morpheme pronunciation variations are distinguished from within-morpheme pronunciation variations. Since adding all possible pronunciation variants to the lexicon increases the lexicon size and confusability between lexical entries, we have developed a lexicon pruning scheme for optimal selection of pronunciation variants to improve the performance of Korean LVCSR. By building a proposed pronunciation lexicon, an absolute reduction of $0.56\%$ in WER from the baseline performance of $27.39\%$ WER is achieved by cross-morpheme pronunciation variations model with a phonemic-context-dependent multiple pronunciation lexicon. On the best performance, an additional reduction of the lexicon size by $5.36\%$ is achieved from the same lexical entries.

  • PDF

조음자질을 이용한 한국인 학습자의 영어 발화 자동 발음 평가 (Automatic pronunciation assessment of English produced by Korean learners using articulatory features)

  • 류혁수;정민화
    • 말소리와 음성과학
    • /
    • 제8권4호
    • /
    • pp.103-113
    • /
    • 2016
  • This paper aims to propose articulatory features as novel predictors for automatic pronunciation assessment of English produced by Korean learners. Based on the distinctive feature theory, where phonemes are represented as a set of articulatory/phonetic properties, we propose articulatory Goodness-Of-Pronunciation(aGOP) features in terms of the corresponding articulatory attributes, such as nasal, sonorant, anterior, etc. An English speech corpus spoken by Korean learners is used in the assessment modeling. In our system, learners' speech is forced aligned and recognized by using the acoustic and pronunciation models derived from the WSJ corpus (native North American speech) and the CMU pronouncing dictionary, respectively. In order to compute aGOP features, articulatory models are trained for the corresponding articulatory attributes. In addition to the proposed features, various features which are divided into four categories such as RATE, SEGMENT, SILENCE, and GOP are applied as a baseline. In order to enhance the assessment modeling performance and investigate the weights of the salient features, relevant features are extracted by using Best Subset Selection(BSS). The results show that the proposed model using aGOP features outperform the baseline. In addition, analysis of relevant features extracted by BSS reveals that the selected aGOP features represent the salient variations of Korean learners of English. The results are expected to be effective for automatic pronunciation error detection, as well.

한국어 연속음성 인식을 위한 발음열 자동 생성 (Automatic Generation of Pronunciation Variants for Korean Continuous Speech Recognition)

  • 이경님;전재훈;정민화
    • 한국음향학회지
    • /
    • 제20권2호
    • /
    • pp.35-43
    • /
    • 2001
  • 음성 인식이나 음성 합성시 필요한 발음열을 수작업으로 작성할 경우 작성자의 음운변화 현상에 대한 전문적 언어지식을 비롯하여 많은 시간과 노력이 요구되며 일관성을 유지하기도 쉽지 않다. 또한 한국어의 음운 변화 현상은 단일 형태소의 내부와 복합어에서 결합된 형태소의 경계점, 여러 형태소가 결합해서 한 어절을 이룰 경우 그 어절 내부의 형태소의 경계점, 여러 어절이 한 어절을 이룰 때 구성 어절의 경계점에서 서로 다른 적용 양상을 보인다. 본 논문에서는 이러한 문제를 해결하기 위해서 형태음운론적 분석에 기반하여 문자열을 자동으로 발음열로 변환하는 발음 생성 시스템을 제안하였다. 이 시스템은 한국어에서 빈번하게 발생하는 음운변화 현상의 분석을 통해 정의된 음소 변동 규칙과 변이음 규칙을 다단계로 적용하여 가능한 모든 발음열을 생성한다. 각 음운변화 규칙을 포함하는 대표적인 언절 리스트를 이용하여 구성된 시스템의 안정성을 검증하였고, 발음사전 구성과 학습용 발음열의 유용성을 인식 실험을 통해 평가하였다. 그 결과 표제어 사이의 음운변화 현상을 반영한 발음사전의 경우 5-6% 정도 나은 단어 인식률을 얻었으며, 생성된 발음열을 학습에 사용한 경우에서도 향상된 결과를 얻을 수 있었다.

  • PDF

운율 정보를 이용한 한국어 위치 정보 데이타의 발음 모델링 (Pronunciation Variation Modeling for Korean Point-of-Interest Data Using Prosodic Information)

  • 김선희;박전규;나민수;전재훈;정민화
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제34권2호
    • /
    • pp.104-111
    • /
    • 2007
  • 본 논문은 두 가지의 구조적 운율 정보, 즉 운율어와 음절수를 이용하여 한국어 위치 정보 데이타의 발음모델링을 수행할 경우에 음성인식기의 성능을 평가하는 것을 목표로 하는 이다. 먼저, 위치 정보 데이타가 운율어로 구성되어 있다는 전제 하에 운율어를 이용하여 위치 정보 데이타의 가능한 모든 발음을 생성하고, 다시 음절수를 기준으로 발음변이 수를 조절하는 방법을 제시하였다. 제안한 방법에 의하여 9개의 테스트 세트와 9개의 학습 세트로 총 81개의 실험을 통하여 음성인식의 성능을 평가하였다. 실험 결과 운율어를 이용하여 발음 사전을 제작한 모든 경우에 베이스라인과 비교하여 성능이 향상되었다. 음절수에 따라서 발음 변이의 수를 조절한 결과도 전체적으로는 3음절로 그 수를 제한한 경우에 가장 좋은 인식 성능을 얻을 수 있어서, 음절수에 따른 발음 변이 수의 조절이 효과적임을 알 수 있었다. 제안한 방법과 같이 운율어와 음절수를 이용한 경우에 베이스라인의 WER 4.63%에서 최대 8.4%의 WER가 감소하였다.

타언어권 화자 음성 인식을 위한 혼잡도에 기반한 다중발음사전의 최적화 기법 (Optimizing Multiple Pronunciation Dictionary Based on a Confusability Measure for Non-native Speech Recognition)

  • 김민아;오유리;김홍국;이연우;조성의;이성로
    • 대한음성학회지:말소리
    • /
    • 제65호
    • /
    • pp.93-103
    • /
    • 2008
  • In this paper, we propose a method for optimizing a multiple pronunciation dictionary used for modeling pronunciation variations of non-native speech. The proposed method removes some confusable pronunciation variants in the dictionary, resulting in a reduced dictionary size and less decoding time for automatic speech recognition (ASR). To this end, a confusability measure is first defined based on the Levenshtein distance between two different pronunciation variants. Then, the number of phonemes for each pronunciation variant is incorporated into the confusability measure to compensate for ASR errors due to words of a shorter length. We investigate the effect of the proposed method on ASR performance, where Korean is selected as the target language and Korean utterances spoken by Chinese native speakers are considered as non-native speech. It is shown from the experiments that an ASR system using the multiple pronunciation dictionary optimized by the proposed method can provide a relative average word error rate reduction of 6.25%, with 11.67% less ASR decoding time, as compared with that using a multiple pronunciation dictionary without the optimization.

  • PDF

Creation of the Conversion Table from Hangeul to the Roman Alphabet

  • Kim, Kyoung-Jing;Rhee, Sang-Burm
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2002년도 ITC-CSCC -1
    • /
    • pp.321-324
    • /
    • 2002
  • For a rule-based conversion of Hangout into the Roman alphabet rather than a word-for-word conversion, one must come up with a faultless model for the Korean standard pronunciation rules, which are the basis of the Romanization. It is on this foundation that the Korean-Roman alphabet conversion table can be created. For linguistic modeling using PetriNet, modeling boundary and notation of modeling can be defined. In order to describe PetriNet, which is a dynamic modeling tool, as a static one, one can model the standard Korean pronunciation rules and the Hangout-Roman alphabet notation by conversion into incident matrix Thus, this research attempts to develop a mathematical modeling tool for a natural language using PetriNet, and create a Korean-Roman alphabet conversion table.

  • PDF