• Title/Summary/Keyword: Speech signal processing

Search Result 331, Processing Time 0.03 seconds

A Study on the Reconstruction of a Frame Based Speech Signal through Dictionary Learning and Adaptive Compressed Sensing (Adaptive Compressed Sensing과 Dictionary Learning을 이용한 프레임 기반 음성신호의 복원에 대한 연구)

  • Jeong, Seongmoon;Lim, Dongmin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37A no.12
    • /
    • pp.1122-1132
    • /
    • 2012
  • Compressed sensing has been applied to many fields such as images, speech signals, radars, etc. It has been mainly applied to stationary signals, and reconstruction error could grow as compression ratios are increased by decreasing measurements. To resolve the problem, speech signals are divided into frames and processed in parallel. The frames are made sparse by dictionary learning, and adaptive compressed sensing is applied which designs the compressed sensing reconstruction matrix adaptively by using the difference between the sparse coefficient vector and its reconstruction. Through the proposed method, we could see that fast and accurate reconstruction of non-stationary signals is possible with compressed sensing.

A New Statistical Voice Activity Detector Based on UMP Test (UMP 테스트에 근거한 새로운 통계적 음성검출기)

  • Jang, Keun-Won;Chang, Joon-Hyuk;Kim, Dong-Kook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.1
    • /
    • pp.16-24
    • /
    • 2007
  • Voice activity detectors (VADs) are important in wireless communication and speech signal processing. In the conventional VAD methods. an expression for the likelihood ratio test (LRT) based on statistical models is derived. Then, speech or noise is decided by comparing the value of the expression with a threshold. We propose a new method with the modified decision rule based on the Gaussian distribution and the uniformly most power (UMP) test. This method requires the distribution of the absolute value of the incoming speech signal. Then we can obtain the final decision through the relation between the Rayleigh distributions. This VAD method can detect speech without a priori signal-to-noise ratio (SNR) which is required in the conventional VAD algorithms. Additionally, in the various VAD performance tests, the proposed VAD method is shown to be more effective than the traditional scheme.

UA Tree-based Reduction of Speech DB in a Large Corpus-based Korean TTS (대용량 한국어 TTS의 결정트리기반 음성 DB 감축 방안)

  • Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.7
    • /
    • pp.91-98
    • /
    • 2010
  • Large corpus-based concatenating Text-to-Speech (TTS) systems can generate natural synthetic speech without additional signal processing. Because the improvements in the natualness, personality, speaking style, emotions of synthetic speech need the increase of the size of speech DB, it is necessary to prune the redundant speech segments in a large speech segment DB. In this paper, we propose a new method to construct a segmental speech DB for the Korean TTS system based on a clustering algorithm to downsize the segmental speech DB. For the performance test, the synthetic speech was generated using the Korean TTS system which consists of the language processing module, prosody processing module, segment selection module, speech concatenation module, and segmental speech DB. And MOS test was executed with the a set of synthetic speech generated with 4 different segmental speech DBs. We constructed 4 different segmental speech DB by combining CM1(or CM2) tree clustering method and full DB (or reduced DB). Experimental results show that the proposed method can reduce the size of speech DB by 23% and get high MOS in the perception test. Therefore the proposed method can be applied to make a small sized TTS.

The Flattening Algorithm of Speech Spectrum by Quadrature Mirror Filter (QMF에 의한 음성스펙트럼의 평탄화 알고리즘)

  • Min, So-Yeon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.5
    • /
    • pp.907-912
    • /
    • 2006
  • Pre-emphasizing the speech compensates for falloff at high frequencies. The most common form of pre-emphasis is y(n)=s(n)-A${\cdot}$s(n-1), where A typically lies between 0.9 and 1.0 in voiced signal. And, this value reflects the degree of pre-emphasis and equals R(1)/R(0) in conventional method. This paper proposes a new flattening method to compensate the weaked high frequency components that occur by vocal cord characteristic. We used QMF(Quardrature Mirror Filter) to minimize the output signal distortion. After using the QMF to compensate high frequency components, flattening process is followed by R(1)/R(0) at each frame. Experimental results show that the proposed method flattened the weaked high frequency components effectively than auto correlation method. Therefore, the flattening algorithm will apply in speech signal processing like speech recognition, speech analysis and synthesis.

  • PDF

A Novel Approach for Blind Estimation of Reverberation Time using Gamma Distribution Model

  • Hamza, Amad;Jan, Tariqullah;Jehangir, Asiya;Shah, Waqar;Zafar, Haseeb;Asif, M.
    • Journal of Electrical Engineering and Technology
    • /
    • v.11 no.2
    • /
    • pp.529-536
    • /
    • 2016
  • In this paper we proposed an unsupervised algorithm to estimate the reverberation time (RT) directly from the reverberant speech signal. For estimation process we use maximum likelihood estimation (MLE) which is a very well-known and state of the art method for estimation in the field of signal processing. All existing RT estimation methods are based on the decay rate distribution. The decay rate can be obtained either from the energy envelop decay curve analysis of noise source when it is switch off or from decay curve of impulse response of an enclosure. The analysis of a pre-existing method of reverberation time estimation is the foundation of the proposed method. In one of the state of the art method, the reverberation decay is modeled as a Laplacian distribution. In this paper, the proposed method models the reverberation decay as a Gamma distribution along with the unification of an effective technique for spotting free decay in reverberant speech. Maximum likelihood estimation technique is then used to estimate the RT from the free decays. The method was motivated by our observation that the RT of a reverberant signal when falls in specific range, then the decay rate of the signal follows Gamma distribution. Experiments are carried out on different reverberant speech signal to measure the accuracy of the suggested method. The experimental results reveal that the proposed method performs better and the accuracy is high in comparison to the state of the art method.

Application of Preemphasis FIR Filtering To Speech Detection and Phoneme Segmentation (프리엠퍼시스 FIR 필터링의 음성 검출 및 음소 분할에의 응용)

  • Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.5
    • /
    • pp.665-670
    • /
    • 2013
  • In this paper, we propose a new method of speech detection and phoneme segmentation. We investigate the effect of applying preemphasis FIR filtering on the speech signal before the usual speech detection that utilizes the energy profile for discriminating signals from background noise. By this procedure, only the speech section of low energy and frequency becomes distinct in energy profile. It is verified experimentally that the silence/speech boundary becomes sharper by applying the filtering compared to the conventional method. By applications of this procedure, phoneme segmentation is also found to be much facilitated.

Adaptive Noise Canceller for Speech Enhancement Using 2-D Binary Mask (2차원 이진 마스크를 이용한 적응형 음성향상 잡음 제거기)

  • Lee, Gihyoun;Lee, Jyung Hyun;Cho, Jin-Ho;Kim, Myoung Nam
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.7
    • /
    • pp.1127-1136
    • /
    • 2016
  • Speech enhancement algorithm plays an important role in numerous speech signal processing applications. Over the last few decades, many algorithms have been studied for speech enhancement. The algorithms are based on spectral subtraction, Wiener filter, and subspace method etc. They have good performance of speech enhancement, but the performance can be deteriorated in specific noises or low SNR environment. In this paper, a new speech enhancement algorithms are proposed based on adaptive noise canceller. And the proposed algorithm improved performance of adaptive noise cancelling using 2-D binary mask. From objective experimental index, it is confirmed that the proposed algorithm is useful and has better performance than recently proposed speech enhancement algorithms.

A Comparative Study on the Pronunciations of Korean and Vietnamese on Korean Syllable Final Double Consonants (베트남인 한국어 학습자와 한국인의 한국어 겹받침 발음 비교 연구)

  • Jang, Kyungnam;You, Kwang-Bock
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.637-646
    • /
    • 2022
  • In this paper the comparative study on the pronunciation of Vietnamese learners and Koreans for the Korean syllable final double consonants was performed. For many errors and the suggested teaching methods related to the pronunciation of the Korean syllable final double consonants that were investigated and analyzed through linguistic research the results of this study by using the analysis tools of speech signal processing were confirmed. Thus, we suggest the new educational method in this paper. Using SVM, which is widely used in machine learning of artificial intelligence the pronunciation of Vietnamese learners and that of Koreans were compared. Being able to obtain the decision hyperplane of the SVM means that Vietnamese learners' pronunciation of the Korean syllable final double consonants is quite different from that of Koreans. Otherwise their pronunciation are pretty similar each other. The new teaching method presented in this paper is not only composed of writing and listening but is included things such as the speech signal waveform in the time domain and its corresponding energy that can be visualized to the learners.

Performance analysis of speaker verification system adopting the ACHARF ANC (ACHARF ANC를 채용한 화자인증시스템의 성능분석)

  • Lee Hyun Seung;Choi Hong Sub;Shin Yoon Ki
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.179-182
    • /
    • 2002
  • The development of noise robust speech processing systems is becoming increasingly important as speech technology is currently widely applied in real world applications. Recently, to resolve such a noise problem, adaptive noise canceller(ANC) is frequently used, which is based upon adaptive filters. The adaptive recursive filters perform better than adaptive non-recursive filters due to the added poles, but the stability may be severely threatened. But these problems of adaptive recursive filters was solved by ACHARF algorithm. This paper presents a method which combines speaker verification system with ANC(Adaptive Noise Canceller) using the ACHARF algorithm. In the front-end stage, ANC is adopted to suppress the additive noise imposed on the speech signal. The results show that the performance of speaker verification system becomes better than before.

  • PDF

Robust Speech Recognition Using Independent Component Analysis (독립성분분석을 이용한 강인한 음성인식)

  • 임형규;이창기
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.2
    • /
    • pp.269-274
    • /
    • 2004
  • Noisy speech recognition is one of most important problems in speech recognition. In this paper, a method which efficiently removes the mixed noise with speech, is proposed. The proposed method is based on the ICA to separate the mixed noise. ICA(Independent component analysis) is a signal processing technique, whose goal is to express a set of random variables as linear combinations of components that are statistically as independent from each other as possible.

  • PDF