Search | Korea Science

Speech rate in Korean across region, gender and generation (한국어 발화 속도의 지역, 성별, 세대에 따른 특징 연구)

Lee, Nara;Shin, Jiyoung;Yoo, Doyoung;Kim, KyungWha
- Phonetics and Speech Sciences
- /
- v.9 no.1
- /
- pp.27-39
- /
- 2017
This paper deals with how speech rate in Korean is affected by the sociolinguistic factors such as region, gender and generation. Speech rate was quantified as articulation rate (excluding physical pauses) and speaking rate (including physical pauses), both expressed as the number of syllables per second (sps). Other acoustic measures such as pause frequency and duration were also examined. Four hundred twelve subjects were chosen from Korean Standard Speech Database considering their age, gender and region. The result shows that generation has a significant effect on both speaking rate and articulation rate. Younger speakers produce their speech with significantly faster speaking rate and articulation rate than older speakers. Mean duration of total pause interval and the total number of pause of older speakers are also significantly different to those of younger speakers. Gender has a significant effect only on articulation rate, which means male speakers' speech rate is characterized by faster articulation rate, longer and more frequent pauses. Finally, region has no effect both on speaking and articulation rates.
https://doi.org/10.13064/KSSS.2017.9.1.027 인용 PDF KSCI

Feature Compensation Method Based on Parallel Combined Mixture Model (병렬 결합된 혼합 모델 기반의 특징 보상 기술)

김우일;이흥규;권오일;고한석
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.7
- /
- pp.603-611
- /
- 2003
This paper proposes an effective feature compensation scheme based on speech model for achieving robust speech recognition. Conventional model-based method requires off-line training with noisy speech database and is not suitable for online adaptation. In the proposed scheme, we can relax the off-line training with noisy speech database by employing the parallel model combination technique for estimation of correction factors. Applying the model combination process over to the mixture model alone as opposed to entire HMM makes the online model combination possible. Exploiting the availability of noise model from off-line sources, we accomplish the online adaptation via MAP (Maximum A Posteriori) estimation. In addition, the online channel estimation procedure is induced within the proposed framework. For more efficient implementation, we propose a selective model combination which leads to reduction or the computational complexities. The representative experimental results indicate that the suggested algorithm is effective in realizing robust speech recognition under the combined adverse conditions of additive background noise and channel distortion.
PDF KSCI

An Optimization of Speech Database in Corpus-based speech synthesis sytstem (코퍼스기반 음성합성기의 데이터베이스 최적화 방안)

Jang Kyung-Ae;Chung Min-Hwa
- Proceedings of the KSPS conference
- /
- 2002.11a
- /
- pp.209-213
- /
- 2002
This paper describes the reduction of DB without degradation of speech quality in Corpus-based Speech synthesizer of Korean language. In this paper, it is proposed that the frequency of every unit in reduced DB should reflect the frequency of units in Korean language. So, the target population of every unit is set to be proportional to their frequency in Korean large corpus(780K sentences, 45Mega phonemes). Second, the frequent instances during synthesis should be also maintained in reduced DB. To the last, it is proposed that frequency of every instance should be reflected in clustering criterion and used as criterion for selection of representative instances. The evaluation result with proposed methods reveals better quality than using conventional methods.
PDF

The Pattern Recognition Methods for Emotion Recognition with Speech Signal (음성신호를 이용한 감성인식에서의 패턴인식 방법)

Park Chang-Hyun;Sim Kwee-Bo
- Journal of Institute of Control, Robotics and Systems
- /
- v.12 no.3
- /
- pp.284-288
- /
- 2006
In this paper, we apply several pattern recognition algorithms to emotion recognition system with speech signal and compare the results. Firstly, we need emotional speech databases. Also, speech features for emotion recognition is determined on the database analysis step. Secondly, recognition algorithms are applied to these speech features. The algorithms we try are artificial neural network, Bayesian learning, Principal Component Analysis, LBG algorithm. Thereafter, the performance gap of these methods is presented on the experiment result section. Truly, emotion recognition technique is not mature. That is, the emotion feature selection, relevant classification method selection, all these problems are disputable. So, we wish this paper to be a reference for the disputes.
https://doi.org/10.5302/J.ICROS.2006.12.3.284 인용 PDF KSCI

A Reduction of Speech Database in Corpus-based Speech Synthesis System (코퍼스기반 음성합성기의 데이터베이스 감축방안)

Jang Kyung-Ae;Chung Min-Hwa;Kim Jae-In;Koo Myoung-Wan
- MALSORI
- /
- no.44
- /
- pp.145-156
- /
- 2002
This paper describes the reduction of DB without degradation of speech quality in Corpus-based Speech synthesizer of the Korean language. In this paper, it is proposed that the frequency of every unit in reduced DB reflect the frequency of units in the Korean language. So, the target population of every unit is set to be proportional to its frequency in Korean large corpus (780k sentences, 45Mega phones). Secondly, the frequent instances during synthesis should be also maintained in reduced DB. To the last, it is proposed that frequency of every instance be reflected in clustering criteria and used as another important criterion for selection of representative instances. The evaluation result with proposed methods reveals better quality than that using conventional methods.
PDF

Weighted filter bank analysis and model adaptation for improving the recognition performance of partially corrupted speech (부분 손상된 음성의 인식성능 향상을 위한 가중 필터뱅크 분석 및 모델 적응)

Cho Hoon-Young;Oh Yung-Hwan
- MALSORI
- /
- no.44
- /
- pp.157-169
- /
- 2002
We propose a weighted filter bank analysis and model adaptation (WFBA-MA) scheme to improve the utilization of uncorrupted or less severely corrupted frequency regions for robust speech recognition. A weighted met frequency cepstral coefficient is obtained by weighting log filter bank energies with reliability coefficients and hidden Markov models are also modified to reflect the local reliabilities. Experimental results on TIDIGITS database corrupted by band-limited noises and car noise indicated that the proposed WFBA-MA scheme utilizes the uncorrupted speech information well, significantly improving recognition performance in comparison to multi-band speech recognition systems.
PDF

The Pattern Recognition Methods for Emotion Recognition with Speech Signal (음성신호를 이용한 감성인식에서의 패턴인식 방법)

Park Chang-Hyeon;Sim Gwi-Bo
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2006.05a
- /
- pp.347-350
- /
- 2006
In this paper, we apply several pattern recognition algorithms to emotion recognition system with speech signal and compare the results. Firstly, we need emotional speech databases. Also, speech features for emotion recognition is determined on the database analysis step. Secondly, recognition algorithms are applied to these speech features. The algorithms we try are artificial neural network, Bayesian learning, Principal Component Analysis, LBG algorithm. Thereafter, the performance gap of these methods is presented on the experiment result section. Truly, emotion recognition technique is not mature. That is, the emotion feature selection, relevant classification method selection, all these problems are disputable. So, we wish this paper to be a reference for the disputes.
PDF

Recognition experiment of Korean connected digit telephone speech using the temporal filter based on training speech data (훈련데이터 기반의 temporal filter를 적용한 한국어 4연숫자 전화음성의 인식실험)

Jung Sung Yun;Kim Min Sung;Son Jong Mok;Bae Keun Sung;Kang Jeom Ja
- Proceedings of the KSPS conference
- /
- 2003.10a
- /
- pp.149-152
- /
- 2003
In this paper, data-driven temporal filter methods[1] are investigated for robust feature extraction. A principal component analysis technique is applied to the time trajectories of feature sequences of training speech data to get appropriate temporal filters. We did recognition experiments on the Korean connected digit telephone speech database released by SITEC, with data-driven temporal filters. Experimental results are discussed with our findings.
PDF

A Noise Reduction Method Combined with HMM Composition for Speech Recognition in Noisy Environments

Shen, Guanghu;Jung, Ho-Youl;Chung, Hyun-Yeol
- IEMEK Journal of Embedded Systems and Applications
- /
- v.3 no.1
- /
- pp.1-7
- /
- 2008
In this paper, a MSS-NOVO method that combines the HMM composition method with a noise reduction method is proposed for speech recognition in noisy environments. This combined method starts with noise reduction with modified spectral subtraction (MSS) to enhance the input noisy speech, then the noise and voice composition (NOVO) method is applied for making noise adapted models by using the noise in the non-utterance regions of the enhanced noisy speech. In order to evaluate the effectiveness of our proposed method, we compare MSS-NOVO method with other methods, i.e., SS-NOVO, MWF-NOVO. To set up the noisy speech for test, we add White noise to KLE 452 database with different SNRs range from 0dB to 15dB, at 5dB intervals. From the tests, MSS-NOVO method shows average improvement of 66.5% and 13.6% compared with the existing SS-NOVO method and MWF-NOVO method, respectively. Especially our proposed MSS-NOVO method shows a big improvement at low SNRs.
PDF

Emotional Speaker Recognition using Emotional Adaptation (감정 적응을 이용한 감정 화자 인식)

Kim, Weon-Goo
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.66 no.7
- /
- pp.1105-1110
- /
- 2017
Speech with various emotions degrades the performance of the speaker recognition system. In this paper, a speaker recognition method using emotional adaptation has been proposed to improve the performance of speaker recognition system using affective speech. For emotional adaptation, emotional speaker model was generated from speaker model without emotion using a small number of training affective speech and speaker adaptation method. Since it is not easy to obtain a sufficient affective speech for training from a speaker, it is very practical to use a small number of affective speeches in a real situation. The proposed method was evaluated using a Korean database containing four emotions. Experimental results show that the proposed method has better performance than conventional methods in speaker verification and speaker recognition.
https://doi.org/10.5370/KIEE.2017.66.7.1105 인용 PDF KSCI

Search Result 331, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)