• Title/Summary/Keyword: segmental algorithm

Search Result 46, Processing Time 0.038 seconds

UA Tree-based Reduction of Speech DB in a Large Corpus-based Korean TTS (대용량 한국어 TTS의 결정트리기반 음성 DB 감축 방안)

  • Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.7
    • /
    • pp.91-98
    • /
    • 2010
  • Large corpus-based concatenating Text-to-Speech (TTS) systems can generate natural synthetic speech without additional signal processing. Because the improvements in the natualness, personality, speaking style, emotions of synthetic speech need the increase of the size of speech DB, it is necessary to prune the redundant speech segments in a large speech segment DB. In this paper, we propose a new method to construct a segmental speech DB for the Korean TTS system based on a clustering algorithm to downsize the segmental speech DB. For the performance test, the synthetic speech was generated using the Korean TTS system which consists of the language processing module, prosody processing module, segment selection module, speech concatenation module, and segmental speech DB. And MOS test was executed with the a set of synthetic speech generated with 4 different segmental speech DBs. We constructed 4 different segmental speech DB by combining CM1(or CM2) tree clustering method and full DB (or reduced DB). Experimental results show that the proposed method can reduce the size of speech DB by 23% and get high MOS in the perception test. Therefore the proposed method can be applied to make a small sized TTS.

A Speech Enhancement Algorithm based on Human Psychoacoustic Property (심리음향 특성을 이용한 음성 향상 알고리즘)

  • Jeon, Yu-Yong;Lee, Sang-Min
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.6
    • /
    • pp.1120-1125
    • /
    • 2010
  • In the speech system, for example hearing aid as well as speech communication, speech quality is degraded by environmental noise. In this study, to enhance the speech quality which is degraded by environmental speech, we proposed an algorithm to reduce the noise and reinforce the speech. The minima controlled recursive averaging (MCRA) algorithm is used to estimate the noise spectrum and spectral weighting factor is used to reduce the noise. And partial masking effect which is one of the human hearing properties is introduced to reinforce the speech. Then we compared the waveform, spectrogram, Perceptual Evaluation of Speech Quality (PESQ) and segmental Signal to Noise Ratio (segSNR) between original speech, noisy speech, noise reduced speech and enhanced speech by proposed method. As a result, enhanced speech by proposed method is reinforced in high frequency which is degraded by noise, and PESQ, segSNR is enhanced. It means that the speech quality is enhanced.

The Evaluation of Usefulness of Wide Beam Reconstruction Method on Segmental Perfusion and Regional Wall Motion in Myocardial Perfusion SPECT (심근관류 SPECT의 분절별 관류 및 국소벽 운동에서 Wide Beam Reconstruction기법의 유용성 평가)

  • Seong, Yong-Joon;Kim, Tae-Yeob;Moon, Il-Sang;Cho, Seong-Wook;Woo, Jae-Ryong
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.15 no.1
    • /
    • pp.51-57
    • /
    • 2011
  • Purpose: The aim of this study is to identify clinical usefulness of Wide Beam Reconstruction (WBR) which is called Xpress.cardiac$^{TM}$ to confirm the agreement between segmental perfusion and regional wall motion in myocardium compared to conventional OSEM method. Materials and Methods: Subjects were separated two groups. First group was composed of 20 normal control group. Second group was composed of 10 patients (abnormal group) who had coronary artery disease. Subjects underwent myocardial perfusion SPECT ($^{201}Tl$ rest and $^{99m}Tc$-MIBI stress). Image acquisition and reconstruction were that rest stage was each step per 30, 15 seconds and stress stage was each step per 25, 13 seconds, OSEM and WBR methods were applied. Segmental perfusion and regional wall motion were applied 20-segment model of QPS, QGS algorithm in AutoQuant. Status of perfusion was composed of 5 point scoring system (0=normal, 1=mild, 2=moderate, 3=severe hypokinesia, 4=dyskinesia). Status of regional wall motion was also composed of 5 point scoring (0=normal, 1=mild, 2=moderate, 3=severe hypokinesia, 4=dyskinesia). We evaluated the agreement between conventional OSEM and WBR through automatic quantification value. Results: The agreement of rest segmental perfusion between conventional OSEM and WBR in normal patients was 99% (396/400, k=0.662, p<0.0001) and one of rest regional wall motion was 83.8% (335/400, k=0.283), the agreement of stress segmental perfusion was 95.8%(383/400, k=0.656), one of stress regional wall motion was 87.3% (349/400, k=0.390). The match rate of rest segmental perfusion in abnormal patients was 83% (166/200, k=0.605, p<0.0001) and one of rest regional wall motion was 55.5% (111/200, k=0.385), the agreement of stress segmental perfusion was 79.5% (159/200, k=0.682), one of stress regional wall motion was 63.5% (127/200, k=0.486). Conclusion: Compared to conventional OSEM, WBR method had a good agreement of segmental perfusion in myocardium in normal and abnormal groups. However regional wall motion showed meaningful low agreement. Although WBR offers high resolution and contrast ratio, it is not useful method for gated myocardial perfusion SPECT.

  • PDF

A Study on the Voice Dialing using HMM and Post Processing of the Connected Digits (HMM과 연결 숫자음의 후처리를 이용한 음성 다이얼링에 관한 연구)

  • Yang, Jin-Woo;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.5
    • /
    • pp.74-82
    • /
    • 1995
  • This paper is study on the voice dialing using HMM and post processing of the connected digits. HMM algorithm is widely used in the speech recognition with a good result. But, the maximum likelihood estimation of HMM(Hidden Markov Model) training in the speech recognition does not lead to values which maximize recognition rate. To solve the problem, we applied the post processing to segmental K-means procedure are in the recognition experiment. Korea connected digits are influenced by the prolongation more than English connected digits. To decrease the segmentation error in the level building algorithm some word models which can be produced by the prolongation are added. Some rules for the added models are applied to the recognition result and it is updated. The recognition system was implemented with DSP board having a TMS320C30 processor and IBM PC. The reference patterns were made by 3 male speakers in the noisy laboratory. The recognition experiment was performed for 21 sort of telephone number, 252 data. The recognition rate was $6\%$ in the speaker dependent, and $80.5\%$ in the speaker independent recognition test.

  • PDF

The Study of Comparison between RPE-LTP and VSELP Speech Coder (RPE-LTP와 VSELP 음성부호화기의 비교에 관한 연구)

  • 박대덕;김화준;심재훈;유재희;정하봉;서정하
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.9
    • /
    • pp.1838-1847
    • /
    • 1994
  • Until recently, they decided the standard of the digital mobile communication speech coding method and competively developed the more detailed techniques in North America, Europe, Japan, etc. But, we have not yet determined. In this paper, we compared the RPE-LTP speech coding algorithm, standard in Europe, with the VSELP speech coding algorith, standard in North America, with respect to the soruce coding. We described the comprehensive verification and comparison with each speech coder, and discussed the improvement plan. Next, we also compared the number of computations which affects the real time processing seriously. Moreover, we performed the simulation with the Korean speech data, concreting the algorithm of each speech coder. Finally, we compared the performance of each speech coder with segmental SNR and 5-point MOS. The number of computations was calculated, and the result was that the number of multiplication computing times of VSELP speech encoder was the largest. With 26 speech data, the segmental SNR of VSELP was calculated larger than that of RPE-LTP. The 5-point MOS test was performed, and the result was that the basic speech quality of VSELP was equivalent or better than that of RPE-LTP.

  • PDF

Tracking Regional Left Ventricular Wall Motion With Color Kinesis in Echocardiography (심초음파에서 국소 좌심실벽 운동 추적을 위한 Color Kinesis 구현에 관한 연구)

  • Shin, D.K.;Kim, D.Y.;Choi, K.H.
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1997 no.11
    • /
    • pp.579-582
    • /
    • 1997
  • The two dimnesional echocardiography is widely used to evaluate regional wall motion abnormaility, because of its abilities to depict left ventricluar wall motion. A new method, color kinesis is a technology or echocardiographic assessment of left ventricular wall motion. In this paper, we proposed a algorithm or color kinesis which is based on acoustic quantification and automatically detects endocardial motion during systole on a frame-by-frame basis. The echocardiograms were obtained in the short-axis views in normal subjects. Automated edge detection and endocardial contour tracing algorithm was applied to each frames, quantitative analysis based on segmentation was performed, and pre-defined color overlays superimposed on the gray scale images. Segmental analysis of color kinesis provided automated, quantitative diagnosis of regional wall motion abnormality.

  • PDF

Performance improvement of text-dependent speaker verification system using blind speech segmentation and energy weight (Blind speech segmentation과 에너지 가중치를 이용한 문장 종속형 화자인식기의 성능 향상)

  • Kim Jung-Gon;Kim Hyung Soon
    • MALSORI
    • /
    • no.47
    • /
    • pp.131-140
    • /
    • 2003
  • We propose a new method of generating client models for HMM based text-dependent speaker verification system with only a small amount of training data. To make a client model, statistical methods such as segmental K-means algorithm are widely used, but they do not guarantee the quality or reliability of a model when only limited data are avaliable. In this paper, we propose a blind speech segmentation based on level building DTW algorithm as an alternative method to make a client model with limited data. In addition, considering the fact that voiced sounds have much more speaker-specific information than unvoiced sounds and energy of the former is higher than that of the latter, we also propose a new score evaluation method using the observation probability raised to the power of weighting factor estimated from the normalized log energy. Our experiment shows that the proposed methods are superior to conventional HMM based speaker verification system.

  • PDF

Quantitative Evaluation of the Performance of Monaural FDSI Beamforming Algorithm using a KEMAR Mannequin (KEMAR 마네킹을 이용한 단이 보청기용 FDSI 빔포밍 알고리즘의 정량적 평가)

  • Cho, Kyeongwon;Nam, Kyoung Won;Han, Jonghee;Lee, Sangmin;Kim, Dongwook;Hong, Sung Hwa;Jang, Dong Pyo;Kim, In Young
    • Journal of Biomedical Engineering Research
    • /
    • v.34 no.1
    • /
    • pp.24-33
    • /
    • 2013
  • To enhance the speech perception of hearing aid users in noisy environment, most hearing aid devices adopt various beamforming algorithms such as the first-order differential microphone (DM1) and the two-stage directional microphone (DM2) algorithms that maintain sounds from the direction of the interlocutor and reduce the ambient sounds from the other directions. However, these conventional algorithms represent poor directionality ability in low frequency area. Therefore, to enhance the speech perception of hearing aid uses in low frequency range, our group had suggested a fractional delay subtraction and integration (FDSI) algorithm and estimated its theoretical performance using computer simulation in previous article. In this study, we performed a KEMAR test in non-reverberant room that compares the performance of DM1, DM2, broadband beamforming (BBF), and proposed FDSI algorithms using several objective indices such as a signal-to-noise ratio (SNR) improvement, a segmental SNR (seg-SNR) improvement, a perceptual evaluation of speech quality (PESQ), and an Itakura-Saito measure (IS). Experimental results showed that the performance of the FDSI algorithm was -3.26-7.16 dB in SNR improvement, -1.94-5.41 dB in segSNR improvement, 1.49-2.79 in PESQ, and 0.79-3.59 in IS, which demonstrated that the FDSI algorithm showed the highest improvement of SNR and segSNR, and the lowest IS. We believe that the proposed FDSI algorithm has a potential as a beamformer for digital hearing aid devices.

Performance Improvement of Speech Enhancement Using Independent Component Analysis and Perceptual Filtering (독립 성분 분석과 지각 필터를 이용한 음질 개선)

  • Koo, Kyo-Sik;Cha, Hyung-Tai
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.4
    • /
    • pp.270-277
    • /
    • 2010
  • In this paper, we proposed an algorithm that improves tone quality of noisy audio signals by using ICA(Independent Component Analysis) algorithm and perceptual filters. Many algorithms have been proposed to eliminate the noise from the audio signals, such as spectral subtraction method, perceptual filter, etc. The perceptual filter uses a noise that is acquired from silent ranges in the input signal. In this case, the improvement rate of tone quality decreases if the noise energy is changed by the environmental variation in a signal frame. But the proposed method estimates a noise that is changed at each frame using ICA algorithm. The estimated noise is applied to perceptual filter. To show the performance of the proposed algorithm, several tests are performed to various input signals. With the proposed algorithm, we could confirm the enhancement of tone quality in terms of segmental SNR (SSNR), noise-to-mask ratio (NMR) and Degradation Category Rating (DCR) test.

Performance Improvement in Speech Recognition by Weighting HMM Likelihood (은닉 마코프 모델 확률 보정을 이용한 음성 인식 성능 향상)

  • 권태희;고한석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.2
    • /
    • pp.145-152
    • /
    • 2003
  • In this paper, assuming that the score of speech utterance is the product of HMM log likelihood and HMM weight, we propose a new method that HMM weights are adapted iteratively like the general MCE training. The proposed method adjusts HMM weights for better performance using delta coefficient defined in terms of misclassification measure. Therefore, the parameter estimation and the Viterbi algorithms of conventional 1:.um can be easily applied to the proposed model by constraining the sum of HMM weights to the number of HMMs in an HMM set. Comparing with the general segmental MCE training approach, computing time decreases by reducing the number of parameters to estimate and avoiding gradient calculation through the optimal state sequence. To evaluate the performance of HMM-based speech recognizer by weighting HMM likelihood, we perform Korean isolated digit recognition experiments. The experimental results show better performance than the MCE algorithm with state weighting.