Search | Korea Science

On-line model compensation using noise masking effect for robust speech recognition (잡음 차폐를 이용한 온라인 모델 보상)

Jung Gue-Jun;Cho Hoon-Young;Oh Yung-Hwan
- Proceedings of the KSPS conference
- /
- 2003.05a
- /
- pp.215-218
- /
- 2003
In this paper we apply PMC (parallel model combination) to speech recognition system online. As a representative of model based noise compensation techniques, PMC compensates environmental mismatch by combining pretrained clean speech models and real-time estimated noise information. This is very effective approach for compensating extreme environmental mismatch but is inadequate to use in on-line system for heavy computational cost. To reduce the computational cost and to apply PMC online, we use a noise masking effect - the energy in a frequency band is dominated either by clean speech energy or by noise energy - in the process of model compensation. Experiments on artificially produced noisy speech data confirm that the proposed technique is fast and effective for the on-line model compensation.
PDF

A Corpus Selection Based Approach to Language Modeling for Large Vocabulary Continuous Speech Recognition (대용량 연속 음성 인식 시스템에서의 코퍼스 선별 방법에 의한 언어모델 설계)

Oh, Yoo-Rhee;Yoon, Jae-Sam;kim, Hong-Kook
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.103-106
- /
- 2005
In this paper, we propose a language modeling approach to improve the performance of a large vocabulary continuous speech recognition system. The proposed approach is based on the active learning framework that helps to select a text corpus from a plenty amount of text data required for language modeling. The perplexity is used as a measure for the corpus selection in the active learning. From the recognition experiments on the task of continuous Korean speech, the speech recognition system employing the language model by the proposed language modeling approach reduces the word error rate by about 6.6 % with less computational complexity than that using a language model constructed with randomly selected texts.
PDF

Age and Sex Differences in Acoustic Parameters of Adult Voice. (성인기 이후 연령과 성에 따른 음향음성학적 특성)

Lee, Hyo-Jin;Kim, Soo-Jin
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.141-144
- /
- 2005
The purpose of this study is to identify the acoustic changes according to age and to provide the evaluation criteria of elderly voice. The number of 120 Korean adults (three age groups * two sex groups) proceeded sustained three vowels, read apart of 'Taking a walk' and explained a picture. The data was analyzed acoustically with MDVP of CSL. The results showed that: 1)there was statistically most significant changes in sex and age in F0 than the others but no significant in Shimmer. 2)acoustic parameters were changed from young adulthood to old age. Different patterns of change with aging were observed in men and women.
PDF

Standardization of XML based Meta-data for Industrial Speech Databases (산업용 음성 DB 메타데이터 표준화)

Joo, Young-Hee;Hong, Ki-Hyung
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.211-214
- /
- 2005
본고에서는 산업용 음성 DB를 위한 XML 기반 메타데이터의 표준화에 대한 현재 상황과 표준화 활동에 대하여 소개한다. 산업용 음성 DB는 구축에 많은 시간과 비용을 요구하며, 양질의 음성 처리 시스템 (인식/합성/인증)의 개발을 위해서는 가능한 많은 양의 음성 데이터가 필요하다. 산업용 음성 DB 메타데이터 표준화는 서로 다른 기관에서 구축한 음성 DB의 공유와 재사용을 원활히 하기 위하여, 2004년 9월부터 요구사항 분석을 시작하여, 2005년 3월 초안이 완성되었다. 본 표준안은 음성 DB 메타데이터의 구조를 XML 기반으로 정의한 것이며, 음성 파일 이름, 화자 식별자, 음소 기호와 같은 구조 외의 표준화 대상에 대해서는 다루지 않는다. 이미 ETRI와 SiTEC [5]에서 XML 기반의 메타데이터 구조와 내용 표준안을 제안한 바 있으나. [5]에서 제안한 구조는 평면 구조를 취하고 있어 내용의 중복성등의 단점이 있어, 이를 보완하여 음성 DB 데이터 모델을 객체지향 방식으로 설계하였다.
PDF

Vocal Tract Area Estimation from Deaf and Normal Children's Speech (청각장애아 및 건청아 음성으로부터 성도 면적 추정)

Kim, Se-Hwan;Kwon, Oh-Wook
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.51-54
- /
- 2005
This paper analyzes the vocal tract area estimation algorithm used as a part of a speech analysis program to help deaf children correct their pronunciations by comparing their vocal tract shape with normal children's. Assuming that a vocal tract is a concatenation of cylinder tubes with a different cross section, we compute the relative vocal tract area of each tube using the reflection coefficients obtained from linear predictive coding. Then, obtain the absolute vocal tract area by computing the height of lip opening with a formula modified for children's speech. Using the speech data for five Korean vowels (/a/, /e/, /i/, /o/, and /u/), we investigate the effects of the sampling frequency, frame size, and model order. We compare vocal tract shapes obtained from deaf and normal children's speech.
PDF

Performance Evaluation of English word Pronunciation Correction system (한국인을 위한 영어 발음 교정 시스템에 대한 성능 평가)

Kim Mujung;Kim Hyosook;Kim Byunggi
- Proceedings of the KSPS conference
- /
- 2003.05a
- /
- pp.71-74
- /
- 2003
In this paper, we present some of experimental results developed in computer-based English Pronunciation Correction System for Korean speakers. The aim of the system is to detect incorrectly pronounced phonemes in spoken words and to give correction comment to users. Speech data were collected from 254 native speakers and 411 Koreans, then used for phoneme modeling and test. We built two types of acoustic phoneme models: native speaker model and Korean speaker model. We also built langugage models to reflect Koreans' commonly occurred mispronunications. The detection rate was over 90% in insertion/deletion/replacement of phonemes, but we got under 75% detection rate in diphthong split and accents.
PDF

Language Model Adaptation for Conversational Speech Recognition (대화체 연속음성 인식을 위한 언어모델 적응)

Park Young-Hee;Chung Minhwa
- Proceedings of the KSPS conference
- /
- 2003.05a
- /
- pp.83-86
- /
- 2003
This paper presents our style-based language model adaptation for Korean conversational speech recognition. Korean conversational speech is observed various characteristics of content and style such as filled pauses, word omission, and contraction as compared with the written text corpora. For style-based language model adaptation, we report two approaches. Our approaches focus on improving the estimation of domain-dependent n-gram models by relevance weighting out-of-domain text data, where style is represented by n-gram based tf*idf similarity. In addition to relevance weighting, we use disfluencies as predictor to the neighboring words. The best result reduces 6.5％ word error rate absolutely and shows that n-gram based relevance weighting reflects style difference greatly and disfluencies are good predictor.
PDF

Readability Enhancement of English Speech Recognition Output Using Automatic Capitalisation Classification (자동 대소문자 식별을 이용한 영어 음성인식 결과의 가독성 향상)

Kim, Ji-Hwan
- MALSORI
- /
- no.61
- /
- pp.101-111
- /
- 2007
A modified speech recogniser have been proposed for automatic capitalisation generation to improve the readability of English speech recognition output. In this modified speech recogniser, every word in its vocabulary is duplicated: once in a de-caplitalised form and again in the capitalised forms. In addition its language model is re-trained on mixed case texts. In order to evaluate the performance of the proposed system, experiments of automatic capitalisation generation were performed for 3 hours of Broadcast News(BN) test data using the modified HTK BN transcription system. The proposed system produced an F-measure of 0.7317 for automatic capitalisation generation with an SER of 48.55, a precision of 0.7736 and a recall of 0.6942.
PDF

A Study on the Sentence Final Tonal Patterns and the Meaning of English Wh-Questions (영어 의문사 의문문의 문미 억양 실현 양상과 의미 해석에 관한 연구)

Kim, Hwa-Young;Lee, Dong-Wha;Kim, Kee-Ho;Lee, Yong-Jae
- Speech Sciences
- /
- v.10 no.2
- /
- pp.319-338
- /
- 2003
The aim of this paper is to examine the sentence final tonal patterns of English wh-questions through phonetic experiments, based on Intonational Phonology, and to explain the meaning of the final phrase tones of English wh-questions. Pierrehumbert and Hirschberg (1990) suggested that it is pitch accents rather than boundary tones which play a crucial role in the meaning of a sentence, and that most of the general questions have H-H% tonal patterns in the sentence final. However, they could not explain why wh-questions had final falling tonal patterns (L-L%). While Bartels (1999) suggested that L phrase tone has the meaning of 'ASSERTION' and it could be applied to the explanation of the meaning of wh-questions' final tonal patterns. However, her suggestions are only theoretical explanation without any experimental support. In this paper, based on Bartels (1999), the data was classified into the following three classes: 1) echo wh-questions, 2) reference questions, and 3) common wh-questions. Using this data, a production test by three English native speakers was conducted. The results show that reference questions and common wh-questions have L phrase tones in the sentence final at a high rate, and echo wh-questions have H phrase tones in the sentence final at a high rate.
PDF

Developing a Korean Standard Speech DB (한국인 표준 음성 DB 구축)

Shin, Jiyoung;Jang, Hyejin;Kang, Younmin;Kim, Kyung-Wha
- Phonetics and Speech Sciences
- /
- v.7 no.1
- /
- pp.139-150
- /
- 2015
The data accumulated in this database will be used to develop a speaker identification system. This may also be applied towards, but not limited to, fields of phonetic studies, sociolinguistics, and language pathology. We plan to supplement the large-scale speech corpus next year, in terms of research methodology and content, to better answer the needs of diverse fields. The purpose of this study is to develop a speech corpus for standard Korean speech. For the samples to viably represent the state of spoken Korean, demographic factors were considered to modulate a balanced spread of age, gender, and dialects. Nine separate regional dialects were categorized, and five age groups were established from individuals in their 20s to 60s. A speech-sample collection protocol was developed for the purpose of this study where each speaker performs five tasks: two reading tasks, two semi-spontaneous speech tasks, and one spontaneous speech task. This particular configuration of sample data collection accommodates gathering of rich and well-balanced speech-samples across various speech types, and is expected to improve the utility of the speech corpus developed in this study. Samples from 639 individuals were collected using the protocol. Speech samples were collected also from other sources, for a combined total of samples from 1,012 individuals.
https://doi.org/10.13064/KSSS.2015.7.1.139 인용 PDF KSCI

Search Result 200, Processing Time 0.017 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)