• Title/Summary/Keyword: speech speed

Search Result 238, Processing Time 0.023 seconds

A Study on the Relation Between the LSF's and Spectral Distribution of Speech Signals (Line Spectral Frequency와 음성신호의 주파수 분포에 관한 연구)

  • 이동수;김영화
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.4
    • /
    • pp.430-436
    • /
    • 1988
  • LSF(Line Spectral Frequency) derived from LPC has known as a very useful transmission parameter of speech signals, for it has a good linear interpolation characteristics and a low spectrum distortion at low bit rates coding. This paper presents that it is possible to extract directly the formant frequencies of speech signals from LSF parameter without application of FFT algorithm by comparing the distribution of LSF parameter with the frequency distribution of analysis filter. This paper suggests the advanced algorithm that results in improving the speed of convergence at analytic solution method. Also, for the flexibility of parameters, the process that transforms from LSF to LPC is presented.

  • PDF

On a study on PSOLA coding technique based on the measurement of formant similarity (포만트 유사도 측정에 의한 PSOLA 음성 부호화에 관한 연구)

  • 나덕수;이희원;김규홍;배명진
    • Proceedings of the IEEK Conference
    • /
    • 1998.06a
    • /
    • pp.607-610
    • /
    • 1998
  • The major objectives of speech coding include high compression ratio for transmission in the band limited channel, high synthesized speech quality in terms of the intelligibility and the naturalness and fast processing speed. In general, speech coding methods are classified into the following three categories: the wavelform coding, the source coding and the hybird coding. In this paper, we proposed a new waveform coding method using PSOLA(pitch-synchronous overlap add) technique. First, we fixed one basic waveform per pitch and measured the formant similarity between basic and neighbor waveform. Second, if the similairy satisfied threshold values, we compress the neighbor waveform per pitch and then store or transmit. When the comparession is about 45%, we obtained about 4 in MOS.

  • PDF

Measuring Acoustical Parameters of English Words by the Position in the Phrases (영어어구의 위치에 따른 단어의 음향 변수 측정)

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.115-128
    • /
    • 2007
  • The purposes of this paper were to develop an automatic script to collect such acoustic parameters as duration, intensity, pitch and the first two formant values of English words produced by two native Canadian speakers either alone or in a two-word phrase at a normal speed and to compare those values by the position in the phrases. A Praat script was proposed to obtain the comparable parameters at evenly divided time point of the target word. Results showed that the total duration of the word in the phrase was shorter than that of the word produced alone. That was attributed to the pronunciation style of the native speakers generally placing the primary word stress in the first word position. Also, the reduction ratio of the male speaker depended on the word position in the phrase while the female speaker didn't. Moreover, there were different contours of intensity and pitch by the position of the target word in the phrase while almost the same formant patterns were observed. Further studies would be desirable to examine those parameters of the words in the authentic speech materials.

  • PDF

A comparative study between French schwa and Korean [i] - An experimental phonetic and phonological perspective -

  • Lee, Eun-Yung;Kim, Seon-Jung
    • Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.171-186
    • /
    • 2000
  • The aim of this paper is to investigate the acoustic characteristics of the French vowel [e] and Korean [i] and to seek a way of understanding them from a phonological point of view. These two vowels have similar distributional properties, i.e. they alternate with zero in some contexts. Therefore, in both languages, they are not found when immediately followed by a nucleus with phonetic content and in word-final positions. We firstly compare the two vowels by measuring the actual frequencies of the formants, pitch and energy using CSL. We also consider whether the realisation of the two vowels is affected by the speed of speech sounds. In order to show that realisation of the two vowels in both languages is not arbitrary, rather predicted, we will introduce the notion of proper government, proposed and developed by Kaye (1987, 1990) and Charette (1991).

  • PDF

Pitch Detection Using Variable LPF

  • Hong KEUM
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.963-970
    • /
    • 1994
  • In speech signal processing, it is very important to detect the pitch exactly. The algorithms for pitch extraction that have been proposed until now are not enough to detect the fine pitch in speech signal. Thus we propose the new algorithm which takes advantage of the G-peak extraction. It is the method to find MZCI(maximum zer-crossing interval) which is defined as cut-off bandwidth rate of LPF (low pass filter)and detect the pitch period of the voiced signals. This algorithm performs robustly with a gross error rate of 3.63% even in 0 dB SNR environment. The gross error rate for clean speech is only 0.18%. Also it is able to process all course with speed.

  • PDF

A Comparative Study on French Intonation between French and Korean Learners (불어 원어민과 한국인 불어 학습자의 억양 비교 연구)

  • Kim, Hyun-gi
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.27-38
    • /
    • 1997
  • The differences in French Intonation between French and Korean learners can be applied to French intonation education. One native French speaker and three native Korean speakers who learned French language at High school were selected for this study. The subjects spoke test phrases based on the different syntactic structures. High-Speed speech Analysis system(RILP) was used for this experiment. The different intonation curves were showed at the end of phrase and at the beginning of phrase between French and Korean learners. At the end of phrases, French intonation appeared to have increasing and decending pitch contours in the case of wh-question, exclamation and finality. However, Korean learner's intonation showed only increasing pitch contours. At the beginning of phrase, French intonation shows decending pitch contours in the case of minor continuation and command. In contrast, Korean learner's intonation appeared to have increasing pitch contours. The new intonation training system using PC can have great effect on education of French as a second language.

  • PDF

Design of Multimodal User Interface using Speech and Gesture Recognition for Wearable Watch Platform (착용형 단말에서의 음성 인식과 제스처 인식을 융합한 멀티 모달 사용자 인터페이스 설계)

  • Seong, Ki Eun;Park, Yu Jin;Kang, Soon Ju
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.6
    • /
    • pp.418-423
    • /
    • 2015
  • As the development of technology advances at exceptional speed, the functions of wearable devices become more diverse and complicated, and many users find some of the functions difficult to use. In this paper, the main aim is to provide the user with an interface that is more friendly and easier to use. The speech recognition is easy to use and also easy to insert an input order. However, speech recognition is problematic when using on a wearable device that has limited computing power and battery. The wearable device cannot predict when the user will give an order through speech recognition. This means that while speech recognition must always be activated, because of the battery issue, the time taken waiting for the user to give an order is impractical. In order to solve this problem, we use gesture recognition. This paper describes how to use both speech and gesture recognition as a multimodal interface to increase the user's comfort.

Speech Recognition of the Korean Vowel 'ㅡ' based on Neural Network Learning of Bulk Indicators (벌크 지표의 신경망 학습에 기반한 한국어 모음 'ㅡ'의 음성 인식)

  • Lee, Jae Won
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.11
    • /
    • pp.617-624
    • /
    • 2017
  • Speech recognition is now one of the most widely used technologies in HCI. Many applications where speech recognition may be used (such as home automation, automatic speech translation, and car navigation) are now under active development. In addition, the demand for speech recognition systems in mobile environments is rapidly increasing. This paper is intended to present a method for instant recognition of the Korean vowel 'ㅡ', as a part of a Korean speech recognition system. The proposed method uses bulk indicators (which are calculated in the time domain) instead of the frequency domain and consequently, the computational cost for the recognition can be reduced. The bulk indicators representing predominant sequence patterns of the vowel 'ㅡ' are learned by neural networks and final recognition decisions are made by those trained neural networks. The results of the experiment show that the proposed method can achieve 88.7% recognition accuracy, and recognition speed of 0.74 msec per syllable.

A Study on the Speech Transmission Index Method for Estimating Articulation of Loudspeaking Telephony (음성전송지수를 이용한 확성전화기의 명료도 평가 방법)

  • Jang, Dae-Young;Kang, Seong-Hoon;Sim, Dong-Yeon;Kim, Chun-Duck
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.5
    • /
    • pp.32-39
    • /
    • 1994
  • The speech transmission quality in telephone is quantified in terms of loudness rating, but this method has been validated only for the handset telephony. The transmission quality of loudspeaking telephony in any room must be evaluated not only with speech transmission but also with background noise, echo and reverberation since the effect of room acoustics is much stroger for loudspeaking telephoy. Therefore, it requires a better approach to specify the quality of loudspeaking telephony. By calcuating the speech transmission index (STI), a physical method for measuring the quality of speech transmission was proposed by Steeneken. In this paper, the application of a STI method for estimating articulation of loudspeaking telephony was discussed. And the STI measurement system with high speed calculation was also three rooms, having different reverberation times. The result show that the STI decreases as the reverberation time of rooms increases. It suggests that speech transmission index method can be useful evaluating articulation of a loudspeaking telephony including the sound field characteristics.

  • PDF

A Study on Design and Implementation of Speech Recognition System Using ART2 Algorithm

  • Kim, Joeng Hoon;Kim, Dong Han;Jang, Won Il;Lee, Sang Bae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.2
    • /
    • pp.149-154
    • /
    • 2004
  • In this research, we selected the speech recognition to implement the electric wheelchair system as a method to control it by only using the speech and used DTW (Dynamic Time Warping), which is speaker-dependent and has a relatively high recognition rate among the speech recognitions. However, it has to have small memory and fast process speed performance under consideration of real-time. Thus, we introduced VQ (Vector Quantization) which is widely used as a compression algorithm of speaker-independent recognition, to secure fast recognition and small memory. However, we found that the recognition rate decreased after using VQ. To improve the recognition rate, we applied ART2 (Adaptive Reason Theory 2) algorithm as a post-process algorithm to obtain about 5% recognition rate improvement. To utilize ART2, we have to apply an error range. In case that the subtraction of the first distance from the second distance for each distance obtained to apply DTW is 20 or more, the error range is applied. Likewise, ART2 was applied and we could obtain fast process and high recognition rate. Moreover, since this system is a moving object, the system should be implemented as an embedded one. Thus, we selected TMS320C32 chip, which can process significantly many calculations relatively fast, to implement the embedded system. Considering that the memory is speech, we used 128kbyte-RAM and 64kbyte ROM to save large amount of data. In case of speech input, we used 16-bit stereo audio codec, securing relatively accurate data through high resolution capacity.