• Title/Summary/Keyword: speech waveform

Search Result 135, Processing Time 0.024 seconds

A Study on VTS Speech Waveform Compensation for Drinking Probability Judgement (음주가능성 판단을 위한 VTS 음성파형 보상에 관한 연구)

  • Lee, Won-Hee;Bae, Seong-Geon;Bae, Myung-Jin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.01a
    • /
    • pp.191-192
    • /
    • 2017
  • 해상에서는 도로가 아닌 해상 위라는 환경 때문에 음주단속을 실시하여 음주운항을 예방하기엔 어려움이 존재한다. 원거리로 신체정보를 보낼 수 있는 음성을 통하여 음주단속을 한다면 거리가 얼마나 떨어져 있더라도 실시간으로 측정이 가능하다. VTS 무선 교신 환경을 이용하여 음성을 통신할 때도 무선 환경이 고르지 못할 경우에 클리핑이 일어나 음주가능성 판단율이 저하될 수 있다. 따라서 본 논문에서는 음성신호가 왜곡이 되어 음주 가능성 여부의 판단율 오차를 줄이기 위해 신호를 보상하는 방법을 제안하였다.

  • PDF

Coding History Detection of Speech Signal using Deep Neural Network (심층 신경망을 이용한 음성 신호의 부호화 이력 검출)

  • Cho, Hyo-Jin;Jang, Won;Shin, Seong-Hyeon;Park, Hochong
    • Journal of Broadcast Engineering
    • /
    • v.23 no.1
    • /
    • pp.86-92
    • /
    • 2018
  • In this paper, we propose a method for coding history detection of digital speech signal. In digital speech communication and storage, the signal is encoded to reduce the number of bits. Therefore, when a speech signal waveform is given, we need to detect its coding history so that we can determine whether the signal is an original or an coded one, and if coded, determine the number of times of coding. In this paper, we propose a coding history detection method for 12.2kbps AMR codec in terms of original, single coding, and double coding. The proposed method extracts a speech-specific feature vector from the given speech, and models the feature vector using a deep neural network. We confirm that the proposed feature vector provides better performance in coding history detection than the feature vector computed from the general spectrogram.

Context-adaptive Smoothing for Speech Synthesis (음성 합성기를 위한 문맥 적응 스무딩 필터의 구현)

  • 이기승;김정수;이재원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.285-292
    • /
    • 2002
  • One of the problems that should be solved in Text-To-Speech (TTS) is discontinuities at unit-joining points. To cope with this problem, a smoothing method using a low-pass filter is employed in this paper, In the proposed soothing method, a filter coefficient that controls the amount of smoothing is determined according to contort information to be synthesized. This method efficiently reduces both discontinuities at unit-joining points and artifacts caused by undesired smoothing. The amount of smoothing is determined with discontinuities around unit-joins points in the current synthesized speech and discontinuities predicted from context. The discontinuity predictor is implemented by CART that has context feature variables. To evaluate the performance of the proposed method, a corpus-based concatenative TTS was used as a baseline system. More than 6075 of listeners realized that the quality of the synthesized speech through the proposed smoothing is superior to that of non-smoothing synthesized speech in both naturalness and intelligibility.

Enhaced 2.4 kbps Harmonic Stochastic Excitation Coding for Time/Frequency Transitional Speech (시간/주파수 전이신호를 위한 향상된 2.4 kbps 하모닉 스토케스틱 여기 음성 부호화 방법)

  • 김종학;이인성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.7
    • /
    • pp.53-58
    • /
    • 2000
  • 본 논문은 주파수 전이신호와 시간 전이 신호에 대해서 고조파 잡음 여기 방법과 시간 분리 여기 방법을 적용한 2.4 kbps 음성부호화 방법을 제안한다. 혼합 여기 부호화 방법은 주기 신호와 비 주기 신호를 효과적으로 표현하기 위해 하모닉 잡음 모델을 사용한다. 혼합신호에 대한 잡음 성분은 캡스트럴 분석 방법을 사용함으로써 추출되고, AR (Autoregressive Model) 모델에 의해 표현된다. 시간 전이구간 신호에서의 모호한 음성을 효과적으로 제거하기 위한 또 다른 방법이 제안된다. 제안된 시간 분리 방법은 시간 에너지 변화정도를 관찰함으로써 전이 시점을 감지하고 다른 시간 길이를 가지는 두 블록으로 분리하여 분석한다. 시간 분리 방법은 분석을 위한 비대칭 윈도우와 합성에서의 위상 합성 방법을 포함한다. 제안된 방법을 사용한 2.4 kbps 음성부호화 방법은 주관적 음질 평가에서 전이구간에서의 지각적 음질의 향상을 보여주었으며, 원본 음성 스펙트럼과의 고조파 비 매칭에 의한 윙윙거리는 기계적인 잡음을 감소시킨다.

  • PDF

An acoustic feature [noise] in the sound pattern of Korean and other languages (소리체제에서 음향 자질[noise]: 한국어와 기타 언어들에서의 한 예증)

  • Rhee, Seok-Chae
    • Speech Sciences
    • /
    • v.6
    • /
    • pp.103-117
    • /
    • 1999
  • This paper suggests that the onset-coda asymmetry found in languages like Korean and others should be dealt with in terms of one acoustic feature rather than other articulatory features, claiming that the acoustic feature involved here is [noise], i.e., 'aperiodic waveform energy'. It determines the structural well-formedness of the languages in question whether a coda ends in [noise] or not, regardless of the intensity, the frequency, and the time duration of the [noise]. Fricatives, affricates, aspirated stops, tense stops, and released stops are all disallowed in the coda position due to the acoustic feature [noise] they, commonly end with if they were, posited in the coda. The proposal implies that the three seemingly separate prohibitions of consonants in the coda position -- i) no fricatives/affricates, ii) no aspirated/tense stops, and iii) no released stops -- are directly correlated with each other. Incorporation of the one acoustic feature [noise] in the feature theory enables us to see that the aspects of onset-coda asymmetry are derived from one single source: ban, of [noise] in the coda.

  • PDF

The Computation Reduction Algorithm Independent of the Language for CELP Vocoders (각국 언어 특성에 독립적인 CELP 계열 보코더에서의 계산량 단축 알고리즘)

  • 민소연;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2451-2454
    • /
    • 2003
  • In this paper, we propose the computation reduction methods of LSP(Line spectrum pairs) transformation that is mainly used in CELP vocoders. In order to decrease the computational time in real root method the characteristic of four proposed algorithms is as the following. First, scheme to reduce the LSP transformation time uses met scale. Developed the second scheme is the control of searching order by the distribution characteristic of LSP parameters. Third, scheme to reduce the LSP transformation time uses voice characteristics. Developed the fourth scheme is the control of searching interval and order by the distribution characteristic of LSP parameters. As a result of searching time, computational amount, transformed LSP parameters, SNR, MOS test, waveform of synthesized speech, speech, spectrogram analysis, searching time reduced about 37.5%, 46.21%, 46.3%, 51.29% in average, computational amount is reduced about 44.76%, 49.44%, 47.03%, 57.40%. But the transformed LSP parameters of the proposed methods were the same as those of real root method.

  • PDF

A study on application of the statistic model about an utterance of the speaker (화자의 발음에 대한 통계적 모델의 적용에 관한 연구)

  • Kim, Dae-Sik;Bae, Myong-Jin;Yoon, Jae-Gang
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.25-28
    • /
    • 1988
  • A speech that play a part of important mediation in the man's conversation is the sound of representation to man's emotion and thought, then voice sound could be verified and identified a speaker's speech by individual property. This study indicates as distribution of pitch in searching for sample number of each pitch with eye in the sound waveform of speaker. We propose the algorithm that judge speaker's emotion state, personality, regional group, age, sex distinction, e.t.c., according to the deviation degree.

  • PDF

ON A REDUCTION OF PITCH SEARCHING TIME BY PREPROCESSING IN THE CELP VOCODER

  • Kim, Daesik;Bae, Myungjin;Kim, Jongjae;Byun, Kyungjin;Han, Kichun;Yoo, Hahyoung
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.904-911
    • /
    • 1994
  • Code Excited Linear Prediction (CELP) speech coders exhibit good performance at data rates below 4.8 kbps. The major drawback to CELP type coders is their many computation. In this paper, we propose a new pitch search method that preserves the quality of the CELP vocoder with reducing complexity. The basic idea is to apply the preprocessing technique beforehand grasping the autocorrelation property of speech waveform. By using the proposed method, we can get approximately 77% complexity reduction in the pitch search.

  • PDF

Speech Synthesis for the Korean large Vocabulary Through the Waveform Analysis in Time Domains and Evauation of Synthesized Speech Quality (시간영역에서의 파형분석에 의한 무제한 어휘 합성 및 음절 유형별 규칙합성음 음질평가)

  • Kang, Chan-Hee;Chin, Yong-Ohk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1
    • /
    • pp.71-83
    • /
    • 1994
  • This paper deals with the improvement of the synthesized speech quality and naturality in the Korean TTS(Text-to-Speech) system. We had extracted the parameters(table2) such as its amplitude, duration and pitch period in a syllable through the analysis of speech waveforms(table1) in the time domain and synthesized syllables using them. To the frequencies of the Korean pronunciation large vocabulary dictionary we had synthesized speeches selected 229 syllables such as V types are 19, CV types are 80. VC types are 30 and CVC types are 100. According to the 4 Korean syllable types from the data format dictionary(table3) we had tested each 15 syllables with the objective MOS(Mean Opinion Score) evaluation method about the 4 items i.e., intelligibility, clearness, loudness, and naturality after selecting random group without the knowledge of them. As the results of experiments the qualities of them are very clear and we can control the prosodic elements such as durations, accents and pitch periods (fig9, 10, 11, 12).

  • PDF

A Study on Variation and Determination of Gaussian function Using SNR Criteria Function for Robust Speech Recognition (잡음에 강한 음성 인식에서 SNR 기준 함수를 사용한 가우시안 함수 변형 및 결정에 관한 연구)

  • 전선도;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.7
    • /
    • pp.112-117
    • /
    • 1999
  • In case of spectral subtraction for noise robust speech recognition system, this method often makes loss of speech signal. In this study, we propose a method that variation and determination of Gaussian function at semi-continuous HMM(Hidden Markov Model) is made on the basis of SNR criteria function, in which SNR means signal to noise ratio between estimation noise and subtracted signal per frame. For proving effectiveness of this method, we show the estimation error to be related with the magnitude of estimated noise through signal waveform. For this reason, Gaussian function is varied and determined by SNR. When we test recognition rate by computer simulation under the noise environment of driving car over the speed of 80㎞/h, the proposed Gaussian decision method by SNR turns out to get more improved recognition rate compared with the frequency subtracted and non-subtracted cases.

  • PDF