• Title/Summary/Keyword: speech waveform

Search Result 135, Processing Time 0.02 seconds

A Comparative Study on the Pronunciations of Korean and Vietnamese on Korean Syllable Final Double Consonants (베트남인 한국어 학습자와 한국인의 한국어 겹받침 발음 비교 연구)

  • Jang, Kyungnam;You, Kwang-Bock
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.637-646
    • /
    • 2022
  • In this paper the comparative study on the pronunciation of Vietnamese learners and Koreans for the Korean syllable final double consonants was performed. For many errors and the suggested teaching methods related to the pronunciation of the Korean syllable final double consonants that were investigated and analyzed through linguistic research the results of this study by using the analysis tools of speech signal processing were confirmed. Thus, we suggest the new educational method in this paper. Using SVM, which is widely used in machine learning of artificial intelligence the pronunciation of Vietnamese learners and that of Koreans were compared. Being able to obtain the decision hyperplane of the SVM means that Vietnamese learners' pronunciation of the Korean syllable final double consonants is quite different from that of Koreans. Otherwise their pronunciation are pretty similar each other. The new teaching method presented in this paper is not only composed of writing and listening but is included things such as the speech signal waveform in the time domain and its corresponding energy that can be visualized to the learners.

A Study on Multi-Pulse Speech Coding Method by Using V/S/TSIUVC (V/S/TSIUVC를 이용한 멀티펄스 음성부호화 방식에 관한 연구)

  • Lee See-Woo
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.9
    • /
    • pp.1233-1239
    • /
    • 2004
  • In a speech coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech qualify in case coexist with a voiced and an unvoiced consonants in a frame. This paper present a new multi-pulse coding method by using V/S/TSIUVC switching, individual pitch pulses and TSIUVC approximation-synthesis method in order to restrict a distortion of speech quality. The TSIUVC is extracted by using the zero crossing rate and individual pitch pulse. And the TSIUVC extraction rate was 91% for female voice and 96.2% for male voice respectively. The important thing is that the frequency information of 0.347kHz below and 2.813kHz above can be made with high quality synthesis waveform within TSIUVC. I evaluate the MPC use V/UV and the FBD-MPC use V/S/TSIUVC. As a result, I knew that synthesis speech of the FBD-MPC was better in speech quality than synthesis speech of the MPC.

  • PDF

Real-time Implementation of Variable Transmission Bit Rate Vocoder Improved Speech Quality in SOLA-B Algorithm & G.729A Vocoder Using on the TMS320C5416 (TMS320C5416을 이용한 SOLA-B 알고리즘과 G.729A 보코더의 음질 향상된 가변 전송률 보코더의 실시간 구현)

  • Ham, Myung-Kyu;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.241-250
    • /
    • 2003
  • In this paper, we implemented the vocoder of variable rate by applying the SOLA-B algorithm to the G.729A to the TMS320C5416 in real-time. This method using the SOLA-B algorithm is that it is reduced the duration of the speech in encoding and is played at the speed of normal by extending the duration of the speech in decoding. But the method applied to the existed G.729A and SOLA-B algorithm is caused the loss of speech quality in G.729A which is not reflected about length variation of speech. Therefore the proposed method is encoded according as it is modified the structure of LSP quantization table about the length of speech is reduced by using the SOLA-B algorithm. The vocoder of variable rate by applying the G.729A and SOLA-B algorithm is represented the maximum complexity of 10.2MIPS about encoder and 2.8MIPS about decoder in 8kbps transmission rate. Also it is evaluated 17.3MIPS about encoder, 9.9MIPS about decoder in 6kbps and 18.5MIPS about encoder, 11.1MIPS about decoder in 4kbps according to the transmission rate. The used memory is about program ROM 9.7kwords, table ROM 4.69kwords, RAM 5.2kwords. The waveform of output is showed by the result of C simulator and Bit Exact. Also, the result of MOS test for evaluation of speech quality of the vocoder of variable rate which is implemented in real-time, it is estimated about 3.68 in 4kbps.

  • PDF

Speech Transition Detection and approximate-synthesis Method for Speech Signal Compression and Recovery (음성신호 압축 및 복원을 위한 음성 천이구간 검출과 근사합성 방식)

  • Lee, Kwang-Seok;Kim, Bong-Gi;Kang, Seong-Soo;Kim, Hyun-Deok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.05a
    • /
    • pp.763-767
    • /
    • 2008
  • In a speech coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech qualify in case coexist with a voiced and an unvoiced consonants in a frame. So, We proposed TS(Transition Segment) including unvoiced consonant searching and extraction method in order to uncoexistent with a voiced and unvoiced consonants in a frame. This research present a new method of TS approximate-synthesis by using Least Mean Square and frequency band division. As a result, this method obtain a high quality approximation-synthesis waveforms within TS by using frequency information of 0.547kHz below and 2.813kHz above. The important thing is that the maximum error signal can be made with low distortion approximation-synthesis waveform within TS. This method has the capability of being applied to a new speech coding of Voiced/Silence/TS, speech analysis and speech synthesis.

  • PDF

Speech Signal Compression and Recovery Using Transition Detection and Approximate-Synthesis (천이구간 추출 및 근사합성에 의한 음성신호 압축과 복원)

  • Lee, Kwang-Seok;Lee, Byeong-Ro
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.2
    • /
    • pp.413-418
    • /
    • 2009
  • In a speech coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech qualify in case coexist with a voiced and an unvoiced consonants in a frame. So, We proposed TS(Transition Segment) including unvoiced consonant searching and extraction method in order to uncoexistent with a voiced and unvoiced consonants in a frame. This research present a new method of TS approximate-synthesis by using Least Mean Square and frequency band division. As a result, this method obtain a high qualify approximation-synthesis waveforms within TS by using frequency information of 0.547kHz below and 2.813kHz above. The important thing is that the maximum error signal can be made with low distortion approximation-synthesis waveform within TS. This method has the capability of being applied to a new speech coding of Voiced/Silence/TS, speech analysis and speech synthesis.

Speech syntheis engine for TTS (TTS 적용을 위한 음성합성엔진)

  • 이희만;김지영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.23 no.6
    • /
    • pp.1443-1453
    • /
    • 1998
  • This paper presents the speech synthesis engine that converts the character strings kept in a computer memory into the synthesized speech sounds with enhancing the intelligibility and the naturalness by adapting the waveform processing method. The speech engine using demisyllable speech segments receives command streams for pitch modification, duration and energy control. The command based engine isolates the high level processing of text normalization, letter-to-sound and the lexical analysis and the low level processing of signal filtering and pitch processing. The TTS(Text-to-Speech) system implemented by using the speech synthesis engine has three independent object modules of the Text-Normalizer, the Commander and the said Speech Synthesis Engine those of which are easily replaced by other compatible modules. The architecture separating the high level and the low level processing has the advantage of the expandibility and the portability because of the mix-and-match nature.

  • PDF

A study on Speech Coding Method using V/S/TSIUVC Switching (V/S/TSIUVC 스위칭을 이용한 음성부호화 방식에 관한 연구)

  • Lee, See-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.6
    • /
    • pp.1180-1184
    • /
    • 2006
  • In a speech coding system using excitation source of voiced and unvoiced, it would be a distortion of speech quality in a voiced and an unvoiced consonants in a frame. In this paper, I propose a new multi-pulse coding method make use of V/S/TSIUVC switching and TSIUVC approximation-synthesis method in order to restrict a distortion of speech quality. The TSIUVC is extracted by using the zero crossing rate and individual pitch pulse. And the TSIUVC extraction rate was 91% for female voice and 96.2% for male voice. The important thing is that the frequency information of 0.547kHz below and 2.813kHz above can be made with high quality synthesis waveform within TSIUVC. I evaluated the MPC of V/UV and FBD-MPC of V/S/TSIUVC. As a result, the synthesis speech of FBD-MPC was better in speech quality than the MPC.

  • PDF

Speech Recognition of the Korean Vowel 'ㅐ', Based on Time Domain Sequence Patterns (시간 영역 시퀀스 패턴에 기반한 한국어 모음 'ㅐ'의 음성 인식)

  • Lee, Jae Won
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.11
    • /
    • pp.713-720
    • /
    • 2015
  • As computing and network technologies are further developed, communication equipment continues to become smaller, and as a result, mobility is now a predominant feature of current technology. Therefore, demand for speech recognition systems in mobile environments is rapidly increasing. This paper proposes a novel method to recognize the Korean vowel 'ㅐ' as a part of a phoneme-based Korean speech recognition system. The proposed method works by analyzing a sequence of patterns in the time domain instead of the frequency domain, and consequently, its use can markedly reduce computational costs. Three algorithms are presented to detect typical sequence patterns of 'ㅐ', and these are combined to produce the final decision. The results of the experiment show that the proposed method has an accuracy of 89.1% in recognizing the vowel 'ㅐ'.

A Study on APC-MPC in 8kbps of Convergence System (융복합 시스템의 8kbps에 있어서 APC-MPC에 관한 연구)

  • Lee, See-Woo
    • Journal of Digital Convergence
    • /
    • v.13 no.7
    • /
    • pp.177-182
    • /
    • 2015
  • In a MPC(Multi-Pulse Coding) using excitation source of voiced and unvoiced, it would be a distortion of voice waveform. This is caused by normalization of synthesis speech waveform of voiced in the process of restoration. To solve this problem, this paper present APC-MPC of amplitude-position compensation in a multi-pulses each pitch interval in order to reduce distortion of synthesis waveform. Also, I was implemented that the APC-MPC in coding system. And I evaluate the SNRseg of APC-MPC in 8kbps coding condition of convergence system. As a result, SNRseg of APC-MPC was 13.9dB for female voice and 14.3dB for male voice respectively. And so, I expect to be able to this method for cellular phone and smart phone using excitation source of low bit rate.

A Simple Pitch Tracking Algorithm based on the Energy Operator (에너지 연산자에 기초한 간단한 피치 추적 방법)

  • Tai-Ho Lee
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.5 no.1
    • /
    • pp.1-5
    • /
    • 2004
  • A new method for the estimation of pitch-frequency contour of voiced speech is presented. The method is based on the double application of Kaiser's energy operator[1], which has the capabilities of extracting amplitude and frequency of a sinusoidal waveform. According to the modulation model, a vowel can be represented by a combination of damped sinusoids representing formants, modulated by pitch pulses. Therefore, the amplitude envelope of each of the components will give a pitch-like waveform and the pitch can be obtained by averaging the frequencies of this waveform. The first part is the same as Gopalan's approach[9], but by substituting the LPC based spectral analysis with the second application of energy operator, the algorithm becomes very simple and can be processed on-line. Although the estimation is rather coarse, the suggested algorithm can be useful for getting a general sketch of pitch contour on-line.

  • PDF