• Title/Summary/Keyword: Connected speech

Search Result 147, Processing Time 0.029 seconds

Extraction of MFCC feature parameters based on the PCA-optimized filter bank and Korean connected 4-digit telephone speech recognition (PCA-optimized 필터뱅크 기반의 MFCC 특징파라미터 추출 및 한국어 4연숫자 전화음성에 대한 인식실험)

  • 정성윤;김민성;손종목;배건성
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.279-283
    • /
    • 2004
  • In general, triangular shape filters are used in the filter bank when we extract MFCC feature parameters from the spectrum of the speech signal. A different approach, which uses specific filter shapes in the filter bank that are optimized to the spectrum of training speech data, is proposed by Lee et al. to improve the recognition rate. A principal component analysis method is used to get the optimized filter coefficients. Using a large amount of 4-digit telephone speech database, in this paper, we get the MFCCs based on the PCA-optimized filter bank and compare the recognition performance with conventional MFCCs and direct weighted filter bank based MFCCs. Experimental results have shown that the MFCC based on the PCA-optimized filter bank give slight improvement in recognition rate compared to the conventional MFCCs but fail to achieve better performance than the MFCCs based on the direct weighted filter bank analysis. Experimental results are discussed with our findings.

Feature Extraction by Optimizing the Cepstral Resolution of Frequency Sub-bands (주파수 부대역의 켑스트럼 해상도 최적화에 의한 특징추출)

  • 지상문;조훈영;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.1
    • /
    • pp.35-41
    • /
    • 2003
  • Feature vectors for conventional speech recognition are usually extracted in full frequency band. Therefore, each sub-band contributes equally to final speech recognition results. In this paper, feature Teeters are extracted indepedently in each sub-band. The cepstral resolution of each sub-band feature is controlled for the optimal speech recognition. For this purpose, different dimension of each sub-band ceptral vectors are extracted based on the multi-band approach, which extracts feature vector independently for each sub-band. Speech recognition rates and clustering quality are suggested as the criteria for finding the optimal combination of sub-band Teeter dimension. In the connected digit recognition experiments using TIDIGITS database, the proposed method gave string accuracy of 99.125%, 99.775% percent correct, and 99.705% percent accuracy, which is 38%, 32% and 37% error rate reduction relative to baseline full-band feature vector, respectively.

The Perceptual and Consonant Analysis for the Voice with Hypothyroidism (갑상선 기능저하 음성에 대한 청지각적 및 파열음 분석에 대한 연구)

  • Han, Baek Hwa;Lee, Dahae;Kim, Joon Sun;Hong, Ki Hwan
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.27 no.2
    • /
    • pp.95-101
    • /
    • 2016
  • Background and Objectives : The main purpose of this study is to clarify perceptual and acoustic analysis for the patients with hypothyroidism after thyroidectomy especially focused on the characteristics of speech articulation with special reference to the consonant production. Materials and Methods : The subjects of the research were 40 male and female adults (males : 5, females : 35). They were all received radioactive iodine treatment which after total thyroidectomy. Voice samples were collected during the three stages of after surgery, pre-radioisotope treatment (RIT), and post-RIT. The acoustic analysis was conducted by using Pratt (ver.5.2.21) after measuring voice onset time (VOT). The subjective evaluation of the voices used CAPE-V. Results : A significant decrease in overall severity was displayed in the CAPE-V following RIT. It may be conjectured that this is connected to the change in voice following RIT. The loudness of the sound displayed a significant decrease in the CAPE-V following RIT. It is conjectured that this is connected to the decrease in vocal intensity following RIT. No statistically significant results were revealed for the comparative analysis on the voice onset time (VOT) in all plosives during the three periods. Conclusion : Perceptually, the overall severity of the voice with hypothyroidism was changed significantly before and after RIT. Eventhough VOT were not significantly changed, it tended to decrease VOT in patients with hypothyroidism.

  • PDF

Lip-Synch System Optimization Using Class Dependent SCHMM (클래스 종속 반연속 HMM을 이용한 립싱크 시스템 최적화)

  • Lee, Sung-Hee;Park, Jun-Ho;Ko, Han-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.7
    • /
    • pp.312-318
    • /
    • 2006
  • The conventional lip-synch system has a two-step process, speech segmentation and recognition. However, the difficulty of speech segmentation procedure and the inaccuracy of training data set due to the segmentation lead to a significant Performance degradation in the system. To cope with that, the connected vowel recognition method using Head-Body-Tail (HBT) model is proposed. The HBT model which is appropriate for handling relatively small sized vocabulary tasks reflects co-articulation effect efficiently. Moreover the 7 vowels are merged into 3 classes having similar lip shape while the system is optimized by employing a class dependent SCHMM structure. Additionally in both end sides of each word which has large variations, 8 components Gaussian mixture model is directly used to improve the ability of representation. Though the proposed method reveals similar performance with respect to the CHMM based on the HBT structure. the number of parameters is reduced by 33.92%. This reduction makes it a computationally efficient method enabling real time operation.

Implementation of Internet Terminal using G.729.1 Wideband Speech Codec for Next Generation Network (차세대 통신망을 위한 G.729.1 광대역 음성 코덱을 활용한 인터넷 단말 구현)

  • So, Woon-Seob;Kim, Dae-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.10B
    • /
    • pp.939-945
    • /
    • 2008
  • Tn this paper we described the process and the results of an implementation of Internet terminal using G.729.1 wideband speech codec for next generation network. For this purpose firstly we chose a high performance RISC application processor having DSP features for speech codec processing and enhanced Multimedia Accelerator(eMMA) function for video codec. In the implementation of this terminal, we used G.729.1 codec recently standardized in ITU-T which is a new scalable speech and audio codec that extends 0.729 speech coding standard. To adopt G.729.1 codec to this terminal we transformed most of the fixed point C codes which require more complexity into assembly codes so as to minimize processing time in the processor. As a result of this work we reduced the execution time of the original C codes about 80% and operated in real time on the terminal. For video we used H.263/MPEG-4 codec which is supported by the eMMA with hardware in the processor. In the SIP call processing test connected to real network we obtained under looms end-to-end delay and 3.8 MOS value measured with PESQ instrument. Besides this terminal operated well with commercial terminals.

Korean Continuous Speech Recognition Using Discrete Duration Control Continuous HMM (이산 지속시간제어 연속분포 HMM을 이용한 연속 음성 인식)

  • Lee, Jong-Jin;Kim, Soo-Hoon;Hur, Kang-In
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.1
    • /
    • pp.81-89
    • /
    • 1995
  • In this paper, we report the continuous speech recognition system using the continuous HMM with discrete duration control and the regression coefficients. Also, we do recognition experiment using One Pass DP method(for 25 sentences of robot control commands) with finite state automata context control. In the experiment for 4 connected spoken digits, the recognition rates are $93.8\%$ when the discrete duration control and the regression coefficients are included, and $80.7\%$ when they are not included. In the experiment for 25 sentences of the robot control commands, the recognition rate are $90.9\%$ when FSN is not included and $98.4\%$ when FSN is included.

  • PDF

A Study of Creole Languages' Pronunciation in the West Indies - Centering on Central American $Gar\acute{i}funa$ and Cuban Patois (서인도제도의 로망스어 관련 혼성어 발음에 관한 고찰 - 중미의 $Gar\acute{i}funa$어와 큐바내 Patois어를 중심으로 -)

  • Kim, Woo-Joong
    • Speech Sciences
    • /
    • v.5 no.2
    • /
    • pp.93-107
    • /
    • 1999
  • This study deals with a general review of $Gar\acute{i}funa$ and Patois, creole languages which developed out of the sociohistorical situation of the last centuries and are mainly spoken in the West Indies and Carribean Coasts. In this paper, I present some notes and ideas on the linguistic developments and features of these languages. Especially I describe their function connected with a variety of social circumstances and their phonetical/phonological changes from the base languages. This is a result of fieldwork conducted in Honduras, Belize, Cuba and Mexico, from January 1996 to February 1998, using some surveys and collecting words from different materials and texts. And I hope this paper will contribute to research in 'mixed' languages as well as to historical linguists. I am very grateful to Mr. Mauricio $Tom\acute{a}s$, the only uriversity student in $Traves\acute{i}a$, a small town in nothern Honduras and to Mr. Carlos Marcos, a medical student who is from a Haitian family in Santiago de Cuba. Without their cooperation, I couldn't have conducted this research.

  • PDF

A Pre-Selection of Candidate Units Using Accentual Characteristic In a Unit Selection Based Japanese TTS System (일본어 악센트 특징을 이용한 합성단위 선택 기반 일본어 TTS의 후보 합성단위의 사전선택 방법)

  • Na, Deok-Su;Min, So-Yeon;Lee, Kwang-Hyoung;Lee, Jong-Seok;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.4
    • /
    • pp.159-165
    • /
    • 2007
  • In this paper, we propose a new pre-selection of candidate units that is suitable for the unit selection based Japanese TTS system. General pre-selection method performed by calculating a context-dependent cost within IP (Intonation Phrase). Different from other languages, however. Japanese has an accent represented as the height of a relative pitch, and several words form a single accentual phrase. Also. the prosody in Japanese changes in accentual phrase units. By reflecting such prosodic change in pre-selection. the qualify of synthesized speech can be improved. Furthermore, by calculating a context-dependent cost within accentual phrase, synthesis speed can be improved than calculating within intonation phrase. The proposed method defines AP. analyzes AP in context and performs pre-selection using accentual phrase matching which calculates CCL (connected context length) of the Phoneme's candidates that should be synthesized in each accentual phrase. The baseline system used in the proposed method is VoiceText, which is a synthesizer of Voiceware. Evaluations were made on perceptual error (intonation error, concatenation mismatch error) and synthesis time. Experimental result showed that the proposed method improved the qualify of synthesized speech. as well as shortened the synthesis time.

A Preliminary Study on the Determining Indicatory Factors for Frenulotomy: Maximum Lingual Length-Protrusion of 3-6 Year Old Normal Children with Boley Gauge (Digimatic Caliper$Caliper^{(R)}$) (설소대 절단술의 결정 요인에 관한 기초 연구: Boley gauge를 이용한 3$\sim$6세 정상 아동의 혀의 최대 신장 길이 계측)

  • Choi, Jae-Nam;Pyo, Hwa-Young;Sim, Hyun-Sub;Choi, Hong-Shik
    • Speech Sciences
    • /
    • v.8 no.3
    • /
    • pp.161-172
    • /
    • 2001
  • Ankyloglossia (tongue-tie) limits movement of the tongue connected with feeding and has adverse impacts on both dental health and speech. For the patients with ankyloglossia, surgical intervention is recommended as primary treatment. This study suggests the efficient tool in determining indicatory factors for frenulotomy by quantifying Maximum Lingual Length-Protrusion (MLL-P) with boley gauge, and as a preliminary study, to show the measurement results with normal children using the tool. The subjects were 61 normal children, and the distance (MLL-P) between mandibular central incisor and tongue tip during tongue protrusion was measured with a boley gauge (Digimatic $Caliper^{(R)}$). The results of this study can be summarized as follows: (1) The mean value of MLL-P (N=61 normal children) was 21.44 mm, (2) The mean value of MLL-P was 20.69 mm in males (N=33) and 21.91 mm in females (N=28). There was no statistically significant difference between males and females, (3) The mean value of MLL-P was 19.34 mm, 21.19 mm, 22.33 mm, 22.61 mm for measurement of 3-, 4-, 5- and 6-year-old children, respectively, and (4) The mean value of MLL-P showed statistically significant difference between 3- and 5-year-old children, between 3- and 6-year old children.

  • PDF

A Segmentation Algorithm of the Connected Word Speech by Statistical Method (統計的인 方法에 依한 連結音의 音素分割 알고리듬)

  • Cho, Jeong-Ho;Hong, Jae-Keun;Kim, Soo-Joong
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.4
    • /
    • pp.151-163
    • /
    • 1989
  • A statistical approach for the segmentation of speed signals is described in this paper. The main idea of this algorithm is the use of three AR models. Two fixed models are identified at the stationary parts of the signal before and after the spectral change. Changes are detected when the distance between these two models is high. Another model is located between two fixed models and is used to estimate spectral change time. This segmentation algorithm has been tested with connected words and compared to classical methods. The results showed that it can provide more accurate locations of boundaries of segments and can reduce the amount of oversegmentation.

  • PDF