• Title/Summary/Keyword: speech rates

Search Result 271, Processing Time 0.027 seconds

Verification and estimation of a posterior probability and probability density function using vector quantization and neural network (신경회로망과 벡터양자화에 의한 사후확률과 확률 밀도함수 추정 및 검증)

  • 고희석;김현덕;이광석
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.45 no.2
    • /
    • pp.325-328
    • /
    • 1996
  • In this paper, we proposed an estimation method of a posterior probability and PDF(Probability density function) using a feed forward neural network and code books of VQ(vector quantization). In this study, We estimates a posterior probability and probability density function, which compose a new parameter with well-known Mel cepstrum and verificate the performance for the five vowels taking from syllables by NN(neural network) and PNN(probabilistic neural network). In case of new parameter, showed the best result by probabilistic neural network and recognition rates are average 83.02%.

  • PDF

On a Reduction of Pitch Searching Time by Preliminary Pitch in the CELP Vocoder

  • Bae, Seong-Gyun;Kim, Hyung-Rae;Kim, Dae-Sik;Bae, Myung-Jin
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.1104-1111
    • /
    • 1994
  • Code Excited Linear Prediction(CELP) as a speech coder exhibits good performance at data rates below 4.8 kbps. The major drawback to CELP type coders is their large amount of computation. In this paper, we propose a new pitch search method that preserves the quality of the CELP vocoder with reduced complexity. The basic idea is to restrict the pitch searching range by estimating the preliminary pitches. Applying the proposed method to the CELP vocoder, we can get approximately 87% complexity reduction in the pitch search.

  • PDF

Embedded Waveform Coding of Speech (음성 파형의 Embedded 부호화에 관한 연구)

  • 이형호;은종관
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.21 no.3
    • /
    • pp.73-83
    • /
    • 1984
  • The performances of embedded adaptive differential pulse code modulation (ADPCM), embedded adaptive delta modulation (ADM), and the same systems with a delayedfecision scheme have been studied with real speech over a wide dynamic range. The embedded ADPCM and ADM coders have been obtained by modifying the conventional ADPCM and ADM coders. The basic scheme of the embedded ADPCM coder is based on the ADPCM originally proposed by Cummiskey et at. For embedded ADM systems, we have modified continuously variable slope DM (CVSD) and hybrid commanding DM (HCDM) systems. Among these embedded coders, the performance of the embedded HCDM is superior to the other coders over a wide range of transmission rate from 16 to 64 kbits/s, When the delayedtecision scheme is applied to the embedded ADPCM the performance is improved significantly at all transmission rates. But, in the embedded ADM systems with 16 kHz sampling rate, the performance improvement resulting from delayed decision is not drastic as is in the embedded ADPCM with the same number of delayed samples.

  • PDF

A Study on the Channel Normalized Pitch Synchronous Cepstrum for Speaker Recognition (채널에 강인한 화자 인식을 위한 채널 정규화 피치 동기 켑스트럼에 관한 연구)

  • 김유진;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.1
    • /
    • pp.61-74
    • /
    • 2004
  • In this paper, a contort- and speaker-dependent cepstrum extraction method and a channel normalization method for minimizing the loss of speaker characteristics in the cepstrum were proposed for a robust speaker recognition system over the channel. The proposed extraction method creates a cepstrum based on the pitch synchronous analysis using the inherent pitch of the speaker. Therefore, the cepstrum called the 〃pitch synchronous cepstrum〃 (PSC) represents the impulse response of the vocal tract more accurately in voiced speech. And the PSC can compensate for channel distortion because the pitch is more robust in a channel environment than the spectrum of speech. And the proposed channel normalization method, the 〃formant-broadened pitch synchronous CMS〃 (FBPSCMS), applies the Formant-Broadened CMS to the PSC and improves the accuracy of the intraframe processing. We compared the text-independent closed-set speaker identification on 56 females and 112 males using TIMIT and NTIMIT database, respectively. The results show that pitch synchronous km improves the error reduction rate by up to 7.7% in comparison with conventional short-time cepstrum and the error rates of the FBPSCMS are more stable and lower than those of pole-filtered CMS.

Voice Recognition Performance Improvement using the Convergence of Bayesian method and Selective Speech Feature (베이시안 기법과 선택적 음성특징 추출을 융합한 음성 인식 성능 향상)

  • Hwang, Jae-Chun
    • Journal of the Korea Convergence Society
    • /
    • v.7 no.6
    • /
    • pp.7-11
    • /
    • 2016
  • Voice recognition systems which use a white noise and voice recognition environment are not correct voice recognition with variable voice mixture. Therefore in this paper, we propose a method using the convergence of Bayesian technique and selecting voice for effective voice recognition. we make use of bank frequency response coefficient for selective voice extraction, Using variables observed for the combination of all the possible two observations for this purpose, and has an voice signal noise information to the speech characteristic extraction selectively is obtained by the energy ratio on the output. It provide a noise elimination and recognition rates are improved with combine voice recognition of bayesian methode. The result which we confirmed that the recognition rate of 2.3% is higher than HMM and CHMM methods in vocabulary recognition, respectively.

Effects of Neonatal Hearing Screening Program (NHSP) Information on Parental Satisfaction (신생아 청각선별검사 프로그램에 관한 정보제공이 부모 만족도에 미치는 영향)

  • Ahn, Hyun-Sook;Cho, Soo-Jin
    • Phonetics and Speech Sciences
    • /
    • v.1 no.2
    • /
    • pp.51-59
    • /
    • 2009
  • This study was designed to investigate the effects of neonatal hearing screening program (NHSP) information on parental satisfaction with the Parent Satisfaction Questionnaire with Neonatal Hearing Screening Program (PSQ-NHSP) by Mazlan et al. (2006). The PSQ-NHSP consisted of four aspects including: information, personnel in charge of the hearing test, appointment activity, and overall satisfaction in the neonatal hearing screening program. A total of 106 parents (50 in the experimental group and 56 in the control group) participated in this study in one general hospital and two delivery clinics. The fifty parents in the experimental group received information and counseling with educational materials before filling out the PSQ-NHSP, but the fifty-six parents in the control group did not receive any counseling or education materials before completing the PSQ-NHSP. The PSQ-NHSP demonstrated excellent internal consistency reliability (${\sigma}=0.914$). The results of the study were as follows. First, the overall satisfaction ($3.77{\pm}0.81$) and personnel in charge of hearing test ($3.52{\pm}0.79$) aspects showed higher rates of satisfaction than the appointment activity aspect ($3.51{\pm}0.80$) for total subjects. Second, the overall parental satisfaction rate of the experimental group ($4.15{\pm}0.50$) was significantly higher than that of the control group ($3.09{\pm}0.53$) in all items. Lastly, thirty-two participants (30%) made at least one comment in response to the open-set items. A total of 29 comments were related to satisfaction with participating in the NHSP and II comments were related to dissatisfaction. In conclusion, to improve parental satisfaction it is important to provide parents with education and information about the NHSP before the test. In addition, PSQ-NHSP was found to be a useful instrument for identifying the benefits and shortfalls of the NHSP.

  • PDF

A Study on the Neural Networks for Korean Phoneme Recognition (한국어 음소 인식을 위한 신경회로망에 관한 연구)

  • Choi, Young-Bae;Yang, Jin-Woo;Lee, Hyung-Jun;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1
    • /
    • pp.5-13
    • /
    • 1994
  • This paper presents a study on Neural Networks for Phoneme Recognition and performs the Phoneme Recognition using TDNN (Time Delay Neural Network). Also, this paper proposes training algorithm for speech recognition using neural nets that is a proper to large scale TDNN. Because Phoneme Recognition is indispensable for continuous speech recognition, this paper uses TDNN to get accurate recognition result of phonemes. And this paper proposes new training algorithm that can converge TDNN to an optimal state regardless of the number of phonemes to be recognized. The recognition experiment was performed with new training algorithm for TDNN that combines backpropagation and Cauchy algorithm using stochastic approach. The results of the recognition experiment for three phoneme classes for two speakers show the recognition rates of $98.1\%$. And this paper yielded that the proposed algorithm is an efficient method for higher performance recognition and more reduced convergence time than TDNN.

  • PDF

Palatal Mucoperiosteal Island Flaps for Palate Reconstruction

  • Kim, Hong Youl;Hwang, Jin;Lee, Won Jai;Roh, Tai Suk;Lew, Dae Hyun;Yun, In Sik
    • Archives of Craniofacial Surgery
    • /
    • v.15 no.2
    • /
    • pp.70-74
    • /
    • 2014
  • Background: Many options are available to cover a palatal defect, including local or free flaps. The objective of this study was to evaluate the usefulness of palatal mucoperiosteal island flap in covering a palatal defect after tumor excision. Methods: Between October 2006 and July 2013, we identified 19 patients who underwent palatal reconstruction using a palatal mucoperiosteal island flap after tumor excision. All cases were retrospectively analyzed by defect location, size, tumor pathology, type of reconstruction, and functional outcomes. Speech and swallowing functions were evaluated using a 7-point visual analog scale (VAS) score. Results: Among the 19 patients, there were 7 men and 12 women with an age range of 25 to 74 years (mean, $52.5{\pm}14.3$ years). The size of flaps was $2-16cm^2$ (mean, $9.4{\pm}4.2cm^2$). Either unilateral or bilateral palatal island flaps were used depending on the size of defect. During the follow-up period (mean, $32.7{\pm}21.4$ months), four patients developed a temporary oronasal fistula, which healed without subsequent operative. The donor sites were well re-epithelized. Speech and swallowing function scores were $6.63{\pm}0.5$ and $6.58{\pm}0.69$ on the 7-point VAS, indicating the ability to eat solid foods and communicate verbally without significant disability. Conclusion: The palatal mucoperiosteal island flap is a good reconstruction modality for palatal defects if used under appropriate indications. The complication rates and donor site morbidity are low, with good functional outcomes.

Audio Segmentation and Classification Using Support Vector Machine and Fuzzy C-Means Clustering Techniques (서포트 벡터 머신과 퍼지 클러스터링 기법을 이용한 오디오 분할 및 분류)

  • Nguyen, Ngoc;Kang, Myeong-Su;Kim, Cheol-Hong;Kim, Jong-Myon
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.19-26
    • /
    • 2012
  • The rapid increase of information imposes new demands of content management. The purpose of automatic audio segmentation and classification is to meet the rising need for efficient content management. With this reason, this paper proposes a high-accuracy algorithm that segments audio signals and classifies them into different classes such as speech, music, silence, and environment sounds. The proposed algorithm utilizes support vector machine (SVM) to detect audio-cuts, which are boundaries between different kinds of sounds using the parameter sequence. We then extract feature vectors that are composed of statistical data and they are used as an input of fuzzy c-means (FCM) classifier to partition audio-segments into different classes. To evaluate segmentation and classification performance of the proposed SVM-FCM based algorithm, we consider precision and recall rates for segmentation and classification accuracy for classification. Furthermore, we compare the proposed algorithm with other methods including binary and FCM classifiers in terms of segmentation performance. Experimental results show that the proposed algorithm outperforms other methods in both precision and recall rates.

A Study On Generation and Reduction of the Notation Candidate for the Notation Restoration of Korean Phonetic Value (한국어 음가의 표기 복원을 위한 표기 후보 생성 및 감소에 관한 연구)

  • Rhee, Sang-Burm;Park, Sung-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.11B no.1
    • /
    • pp.99-106
    • /
    • 2004
  • The syllable restoration is a process restoring a phonetic value recognized in a speech recognition device with the notation form that a vocalization is former. In this paper a syllable restoration rule was composed of a based on standard pronunciation for a syllable restoration process. A syllable restoring regulation was used, and a generation method of a notation candidate set was researched. Also, A study is held to reduce the number of created notation candidate. Three phases of reduction processes were suggested. Reduction of a notation candidate has the non-notation syllable, non-vocabulary syllable and non-stem syllable. As a result of experiment, an average of 74% notation candidate decrease rates were shown.