Search | Korea Science

A Study on Measuring the Speaking Rate of Speaking Signal by Using Line Spectrum Pair Coefficients

Jang, Kyung-A;Bae, Myung-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3E
- /
- pp.18-24
- /
- 2001
Speaking rate represents how many phonemes in speech signal have in limited time. It is various and changeable depending on the speakers and the characters of each phoneme. The preprocessing to remove the effect of variety of speaking rate is necessary before recognizing the speech in the present speech recognition systems. So if it is possible to estimate the speaking rate in advance, the performance of speech recognition can be higher. However, the conventional speech vocoder decides the transmission rate for analyzing the fixed period no regardless of the variety rate of phoneme but if the speaking rate can be estimated in advance, it is very important information of speech to use in speech coding part as well. It increases the quality of sound in vocoder as well as applies the variable transmission rate. In this paper, we propose the method for presenting the speaking rate as parameter in speech vocoder. To estimate the speaking rate, the variety of phoneme is estimated and the Line Spectrum Pairs is used to estimate it. As a result of comparing the speaking rate performance with the proposed algorithm and passivity method worked by eye, error between two methods is 5.38% about fast utterance and 1.78% about slow utterance and the accuracy between two methods is 98% about slow utterance and 94% about fast utterances in 30 dB SNR and 10 dB SNR respectively.
PDF

An Amplitude Warping Approach to Intra-Speaker Normalization for Speech Recognition (음성인식에서 화자 내 정규화를 위한 진폭 변경 방법)

Kim Dong-Hyun;Hong Kwang-Seok
- Journal of Internet Computing and Services
- /
- v.4 no.3
- /
- pp.9-14
- /
- 2003
The method of vocal tract normalization is a successful method for improving the accuracy of inter-speaker normalization. In this paper, we present an intra-speaker warping factor estimation based on pitch alteration utterance. The feature space distributions of untransformed speech from the pitch alteration utterance of intra-speaker would vary due to the acoustic differences of speech produced by glottis and vocal tract. The variation of utterance is two types: frequency and amplitude variation. The vocal tract normalization is frequency normalization among inter-speaker normalization methods. Therefore, we have to consider amplitude variation, and it may be possible to determine the amplitude warping factor by calculating the inverse ratio of input to reference pitch. k, the recognition results, the error rate is reduced from 0.4% to 2.3% for digit and word decoding.
PDF

A Study on Automatic Expansion of Dialogue Examples Using Logs of a Dialogue System (대화시스템의 로그를 이용한 대화예제의 자동 확충에 관한 연구)

Hong, Gum-Won;Lee, Jeong-Hoon;Shin, Jung-Hwi;Lee, Do-Gil;Rim, Hae-Chang
- 한국HCI학회:학술대회논문집
- /
- 2009.02a
- /
- pp.257-262
- /
- 2009
This paper studies an automatic expansion of dialogue examples using the logs of an example-based dialogue system. Conventional approaches to example-based dialogue system manually construct dialogue examples between humans and a Chatbot, which are labor intensive and time consuming. The proposed method automatically classifies natural utterance pairs and adds them into dialogue example database. Experimental results show that lexical, POS and modality features are useful for classifying natural utterance pairs, and prove that the dialogue examples can be automatically expanded using the logs of a dialogue system.
PDF

Prediction of Domain Action Using a Neural Network (신경망을 이용한 영역 행위 예측)

Lee, Hyun-Jung;Seo, Jung-Yun;Kim, Hark-Soo
- Korean Journal of Cognitive Science
- /
- v.18 no.2
- /
- pp.179-191
- /
- 2007
In a goal-oriented dialogue, spoken' intentions can be represented by domain actions that consist of pairs of a speech art and a concept sequence. The domain action prediction of user's utterance is useful to correct some errors that occur in a speech recognition process, and the domain action prediction of system's utterance is useful to generate flexible responses. In this paper, we propose a model to predict a domain action of the next utterance using a neural network. The proposed model predicts the next domain action by using a dialogue history vector and a current domain action as inputs of the neural network. In the experiment, the proposed model showed the precision of 80.02% in speech act prediction and the precision of 82.09% in concept sequence prediction.
PDF

A Korean Mobile Conversational Agent System (한국어 모바일 대화형 에이전트 시스템)

Hong, Gum-Won;Lee, Yeon-Soo;Kim, Min-Jeoung;Lee, Seung-Wook;Lee, Joo-Young;Rim, Hae-Chang
- Journal of the Korea Society of Computer and Information
- /
- v.13 no.6
- /
- pp.263-271
- /
- 2008
This paper presents a Korean conversational agent system in a mobile environment using natural language processing techniques. The aim of a conversational agent in mobile environment is to provide natural language interface and enable more natural interaction between a human and an agent. Constructing such an agent, it is required to develop various natural language understanding components and effective utterance generation methods. To understand spoken style utterance, we perform morphosyntactic analysis, shallow semantic analysis including modality classification and predicate argument structure analysis, and to generate a system utterance, we perform example based search which considers lexical similarity, syntactic similarity and semantic similarity.
PDF

An Adaptive Utterance Verification Framework Using Minimum Verification Error Training

Shin, Sung-Hwan;Jung, Ho-Young;Juang, Biing-Hwang
- ETRI Journal
- /
- v.33 no.3
- /
- pp.423-433
- /
- 2011
This paper introduces an adaptive and integrated utterance verification (UV) framework using minimum verification error (MVE) training as a new set of solutions suitable for real applications. UV is traditionally considered an add-on procedure to automatic speech recognition (ASR) and thus treated separately from the ASR system model design. This traditional two-stage approach often fails to cope with a wide range of variations, such as a new speaker or a new environment which is not matched with the original speaker population or the original acoustic environment that the ASR system is trained on. In this paper, we propose an integrated solution to enhance the overall UV system performance in such real applications. The integration is accomplished by adapting and merging the target model for UV with the acoustic model for ASR based on the common MVE principle at each iteration in the recognition stage. The proposed iterative procedure for UV model adaptation also involves revision of the data segmentation and the decoded hypotheses. Under this new framework, remarkable enhancement in not only recognition performance, but also verification performance has been obtained.
https://doi.org/10.4218/etrij.11.0110.0489 인용 PDF KSCI

SVM-based Utterance Verification Using Various Confidence Measures (다양한 신뢰도 척도를 이용한 SVM 기반 발화검증 연구)

Kwon, Suk-Bong;Kim, Hoi-Rin;Kang, Jeom-Ja;Koo, Myong-Wan;Ryu, Chang-Sun
- MALSORI
- /
- no.60
- /
- pp.165-180
- /
- 2006
In this paper, we present several confidence measures (CM) for speech recognition systems to evaluate the reliability of recognition results. We propose heuristic CMs such as mean log-likelihood score, N-best word log-likelihood ratio, likelihood sequence fluctuation and likelihood ratio testing(LRT)-based CMs using several types of anti-models. Furthermore, we propose new algorithms to add weighting terms on phone-level log-likelihood ratio to merge word-level log-likelihood ratios. These weighting terms are computed from the distance between acoustic models and knowledge-based phoneme classifications. LRT-based CMs show better performance than heuristic CMs excessively, and LRT-based CMs using phonetic information show that the relative reduction in equal error rate ranges between $8{\sim}13%$ compared to the baseline LRT-based CMs. We use the support vector machine to fuse several CMs and improve the performance of utterance verification. From our experiments, we know that selection of CMs with low correlation is more effective than CMs with high correlation.
PDF

Study on the discourse functions of Ranhou in Mandarin Chinese - Focused on radio call-in programme (현대중국어 '연후(然後)'의 담화기능 소고 - 전화참여 라디오 프로그램을 대상으로)

Park, Chan Wook
- Cross-Cultural Studies
- /
- v.22
- /
- pp.329-354
- /
- 2011
This paper aims to probe into the meaning of Ranhou in Mandarin Chinese and to account for discourse functions of it in radio call-in programme. For this purpose, the present study investigates the meaning of Ran and Hou repectively at first and explains the change of meaning of Ranhou, because we assume that Ranhou is compounded by Ran and Hou, and the core meaning is derived from its compounded meaning. Then we examine which time category Ranhou belongs to more based on the concept of time(reference, event, discourse) in Schiffrin(1987), and examine also where it is located within turn. Following this examination, we analysis and explain discourse functions what it is situated. Therethrough, we understand that 1) Ran has 'agreement or confirmation of preceded utterance' therefore has anaphoric meaning, and Hou has 'after' in the meaning cline: back of body-back part-behind-after-retarded(proposed by Heine et al. 1991), so that Ranhou has 'after agreement or confirmation of preceded utterance of mine' and extends to 'on premise preceded utterance or event' furthermore, and therefore can have possibility having various functions; 2) Ranhou has various functions in natural language in spite of the institutional setting. It can indicate (1) temporal relation of events, (2) logic relation of two(or more) events, e.g. causality, elaboration, concession, list, (3) turn maintence, acquisition, management, (4) verbal filler.

Statistical Speech Feature Selection for Emotion Recognition

Kwon Oh-Wook;Chan Kwokleung;Lee Te-Won
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.4E
- /
- pp.144-151
- /
- 2005
We evaluate the performance of emotion recognition via speech signals when a plain speaker talks to an entertainment robot. For each frame of a speech utterance, we extract the frame-based features: pitch, energy, formant, band energies, mel frequency cepstral coefficients (MFCCs), and velocity/acceleration of pitch and MFCCs. For discriminative classifiers, a fixed-length utterance-based feature vector is computed from the statistics of the frame-based features. Using a speaker-independent database, we evaluate the performance of two promising classifiers: support vector machine (SVM) and hidden Markov model (HMM). For angry/bored/happy/neutral/sad emotion classification, the SVM and HMM classifiers yield $42.3\%\;and\;40.8\%$ accuracy, respectively. We show that the accuracy is significant compared to the performance by foreign human listeners.
PDF KSCI

Dialogue Strategies to Overcome Speech Recognition Errors in Form-Filling Dialogue (양식 채우기 대화에서 음성 인식 오류의 보완을 위한 대화 전략)

Kang Sang-Woo;Lee Song-Wook;Seo Jung-Yun
- Korean Journal of Cognitive Science
- /
- v.17 no.2
- /
- pp.139-150
- /
- 2006
Speech recognition errors cause fatal results in a spoken dialogue system. When a system can not determine the speech-act of u utterance due to speech recognition errors, a dialogue system has a difficulty in continuing conversation. In this paper, we propose strategies for sub-dialogue generation by inferring the speech-act of an utterance with patterns of recognition errors on the field of form-filling dialogue. We used the proposed method on a plan-based dialogue model, corrected 27% of incomplete tasks, and acquired overall 89% of task completion rate.
PDF

Search Result 382, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)