• Title/Summary/Keyword: speech parameter

Search Result 373, Processing Time 0.025 seconds

Modified AWSSDR method for frequency-dependent reverberation time estimation (주파수 대역별 잔향시간 추정을 위한 변형된 AWSSDR 방식)

  • Min Sik Kim;Hyung Soon Kim
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.91-100
    • /
    • 2023
  • Reverberation time (T60) is a typical acoustic parameter that provides information about reverberation. Since the impacts of reverberation vary depending on the frequency bands even in the same space, frequency-dependent (FD) T60, which offers detailed insights into the acoustic environments, can be useful. However, most conventional blind T60 estimation methods, which estimate the T60 from speech signals, focus on fullband T60 estimation, and a few blind FDT60 estimation methods commonly show poor performance in the low-frequency bands. This paper introduces a modified approach based on Attentive pooling based Weighted Sum of Spectral Decay Rates (AWSSDR), previously proposed for blind T60 estimation, by extending its target from fullband T60 to FDT60. The experimental results show that the proposed method outperforms conventional blind FDT60 estimation methods on the acoustic characterization of environments (ACE) challenge evaluation dataset. Notably, it consistently exhibits excellent estimation performance in all frequency bands. This demonstrates that the mechanism of the AWSSDR method is valuable for blind FDT60 estimation because it reflects the FD variations in the impact of reverberation, aggregating information about FDT60 from the speech signal by processing the spectral decay rates associated with the physical properties of reverberation in each frequency band.

A Study on Wavelet Application for Signal Analysis (신호 해석을 위한 웨이브렛 응용에 관한 연구)

  • Bae, Sang-Bum;Ryu, Ji-Goo;Kim, Nam-Ho
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2005.11a
    • /
    • pp.302-305
    • /
    • 2005
  • Recently, many methods to analyze signal have been proposed and representative methods are the Fourier transform and wavelet transform. In these methods, the Fourier transform represents signal with combination cosine and sine at all locations in the frequency domain. However, it doesn't provide time information that particular frequency occurs in signal and denpends on only the global feature of the signal. So, to improve these points the wavelet transform which is capable of multiresolution analysis has been applied to many fields such as speech processing, image processing and computer vision. And the wavelet transform, which uses changing window according to scale parameter, presents time-frequency localization. In this paper, we proposed a new approach using a wavelet of cosine and sine type and analyzed features of signal in a limited point of frequency-time plane.

  • PDF

Robust Speaker Identification using Independent Component Analysis (독립성분 분석을 이용한 강인한 화자식별)

  • Jang, Gil-Jin;Oh, Yung-Hwan
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.5
    • /
    • pp.583-592
    • /
    • 2000
  • This paper proposes feature parameter transformation method using independent component analysis (ICA) for speaker identification. The proposed method assumes that the cepstral vectors from various channel-conditioned speech are constructed by a linear combination of some characteristic functions with random channel noise added, and transforms them into new vectors using ICA. The resultant vector space can give emphasis to the repetitive speaker information and suppress the random channel distortions. Experimental results show that the transformation method is effective for the improvement of speaker identification system.

  • PDF

A Study on Annoyance of Interior Noise on Town-Bus

  • Park, Hyung Woo;Kim, Sung Han
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.9 no.2
    • /
    • pp.42-47
    • /
    • 2017
  • In these days, the size of urban is growing and the function of city becomes complicated. And also, in city, people lives a lot. The life of urban is getting closer and linked with neighboring people in many parts. Especially, when peoples are exposing during using public transportation, even though does not be known, in they were living. Seoul is the most crowded place in Korea. In Seoul,The village buses have been serviced to the narrow streets. And people who use this bus, wants to seek the comfort of the ride, air quality and noise during in vehicle. In this paper, we determine the degree of annoyance with the noise inside the town bus in dB scale. And, such a situation was confirmed annoyance see their effect. The Interior noise did not see a big difference in the new car and the old car. Annoyance but also according to the skill of the bus driver remains the difference was confirmed.

Parameter Generation Algorithm for LSTM-RNN-based Speech Synthesis (LSTM-RNN 기반 음성합성을 위한 파라미터 생성 알고리즘)

  • Park, Sangjun;Hahn, Minsoo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2017.06a
    • /
    • pp.105-106
    • /
    • 2017
  • 본 논문에서는 최대 우도 기반 파라미터 생성 알고리즘을 적용하여 인공 신경망의 출력인 음향 파라미터 열의 정확성 및 자연성을 향상시키는 방법을 제안하였다. 인공 신경망의 출력으로 정적 특징벡터 뿐 만 아니라 동적 특징벡터도 함께 사용하였고, 미리 계산된 파라미터 분산을 파라미터 생성에 사용하였다. 추정된 정적, 동적 특징벡터의 평균, 분산을 EM 알고리즘에 적용하여 최대 우도 기준 파라미터를 추정할 수 있다. 제안된 알고리즘은 파라미터 생성 시 동적 특징벡터 및 분산을 함께 적용하여 시간축에서의 자연성을 향상시켰다. 제안된 알고리즘의 객관적 평가로 MCD, F0 의 RMSE 를 측정하였고, 주관적평가로 선호도 평가를 실시하였다. 그 결과 기존 알고리즘 대비 객관적, 주관적 성능이 향상되는 것을 검증하였다.

  • PDF

A New Distance Measure for a Variable-Sized Acoustic Model Based on MDL Technique

  • Cho, Hoon-Young;Kim, Sang-Hun
    • ETRI Journal
    • /
    • v.32 no.5
    • /
    • pp.795-800
    • /
    • 2010
  • Embedding a large vocabulary speech recognition system in mobile devices requires a reduced acoustic model obtained by eliminating redundant model parameters. In conventional optimization methods based on the minimum description length (MDL) criterion, a binary Gaussian tree is built at each state of a hidden Markov model by iteratively finding and merging similar mixture components. An optimal subset of the tree nodes is then selected to generate a downsized acoustic model. To obtain a better binary Gaussian tree by improving the process of finding the most similar Gaussian components, this paper proposes a new distance measure that exploits the difference in likelihood values for cases before and after two components are combined. The mixture weight of Gaussian components is also introduced in the component merging step. Experimental results show that the proposed method outperforms MDL-based optimization using either a Kullback-Leibler (KL) divergence or weighted KL divergence measure. The proposed method could also reduce the acoustic model size by 50% with less than a 1.5% increase in error rate compared to a baseline system.

Performance Improvement of Classification Between Pathological and Normal Voice Using HOS Parameter (HOS 특징 벡터를 이용한 장애 음성 분류 성능의 향상)

  • Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
    • MALSORI
    • /
    • no.66
    • /
    • pp.61-72
    • /
    • 2008
  • This paper proposes a method to improve pathological and normal voice classification performance by combining multiple features such as auditory-based and higher-order features. Their performances are measured by Gaussian mixture models (GMMs) and linear discriminant analysis (LDA). The combination of multiple features proposed by the frame-based LDA method is shown to be an effective method for pathological and normal voice classification, with a 87.0% classification rate. This is a noticeable improvement of 17.72% compared to the MFCC-based GMM algorithm in terms of error reduction.

  • PDF

Reduction of Number of Free Parameters in Segmental-feature HMM (분절 특징 HMM의 매개 변수 수의 감소에 관한 연구)

  • 윤영선;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.7
    • /
    • pp.48-52
    • /
    • 2000
  • 음성 인식에 많이 사용되는 HMM (hidden Markov model)을 개선하기 위하여 분절 특징을 사용한 분절 특징 HMM은 성능이 우수하다고 발표되었다. 그러나, 분절 길이가 증가하고 회귀 차수가 놓아질수록 분절 특징 HMM을 표현하는 매개 변수의 수도 같이 증가된다. 따라서, 본 연구에서는 상태에서 관측 가능한 분절의 분산을 분절 내의 모든 프레임에 대하여 공통적으로 표현하는 고정 분산 방법을 통하여 성능의 저하 없이 매개 변수의 수를 줄이도록 시도하였다. 실험 결과, 두 혼합 밀도인 경우 고정 분산을 이용한 분절 특징 HMM의 성능과 시변 분산을 이용한 성능의 차이가 거의 없어, 제안된 방법의 유효성을 입증하였다.

  • PDF

Sensibility Classification Algorithm of EEGs using Multi-template Method (다중 템플릿 방법을 이용한 뇌파의 감성 분류 알고리즘)

  • Kim Dong-Jun
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.12
    • /
    • pp.834-838
    • /
    • 2004
  • This paper proposes an algorithm for EEG pattern classification using the Multi-template method, which is a kind of speaker adaptation method for speech signal processing. 10-channel EEG signals are collected in various environments. The linear prediction coefficients of the EEGs are extracted as the feature parameter of human sensibility. The human sensibility classification algorithm is developed using neural networks. Using EEGs of comfortable or uncomfortable seats, the proposed algorithm showed about 75% of classification performance in subject-independent test. In the tests using EEG signals according to room temperature and humidity variations, the proposed algorithm showed good performance in tracking of pleasantness changes and the subject-independent tests produced similar performances with subject-dependent ones.

Comparative Study on the Acoustic Characteristics of the Korean Vowel /a/ before and after LMS (후두미세수술 전후 /아/의 음향적 특성 비교)

  • Hwang, Yeon-Sin;Seong, Cheol-Jae
    • MALSORI
    • /
    • no.67
    • /
    • pp.33-60
    • /
    • 2008
  • The aim of this study is to show the differences in acoustic parameters between a pathological voice /a/ caused by vocal polyp and a normal voice /a/ produced after LMS (Laryngeal Microscopic Surgery). It was expected that voices of two kinds could be analyzed effectively in terms of HNR in specific frequency bands than in all frequency bands. For this study, 10 patients' voice were recorded before and after LMS and then were manipulated in terms of four acoustic parameter. It was found out that (a) frequency bands of 500Hz in the range of 1,000Hz to 4,000Hz were very useful to obtain HNR values; (b) frequency bands in the range of 1,248Hz to 5,500Hz on a log scale were very useful to obtain HNR values; (c) F0 dropped after LMS but not significantly; (d) the bandwidth of the second formant (B2) decreased significantly after LMS, while that of the first formant (B1) decreased after LMS but not significantly.

  • PDF