• Title/Summary/Keyword: speech distortion

Search Result 227, Processing Time 0.025 seconds

A Study on Real Time Pitch Alteration of Speech Signal (음성신호의 실시간 피치변경에 관한 연구)

  • 김종국;박형빈;배명진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.1
    • /
    • pp.82-89
    • /
    • 2004
  • This paper describes how to reduce the effect of an occupation threshold by that the transform of mixture components of HMM parameters is controlled in hierarchical tree structure to prevent from over-adaptation. To reduce correlations between data elements and to remove elements with less variance, we employ PCA (principal component analysis) and ICA (independent component analysis) that would give as good a representation as possible, and decline the effect of over-adaptation. When we set lower occupation threshold and increase the number of transformation function, ordinary WLLR adaptation algorithm represents lower recognition rate than SI models, whereas the proposed MLLR adaptation algorithm represents the improvement of over 2% for the word recognition rate as compared to performance of SI models.

A Novel Covariance Matrix Estimation Method for MVDR Beamforming In Audio-Visual Communication Systems (오디오-비디오 통신 시스템에서 MVDR 빔 형성 기법을 위한 새로운 공분산 행렬 예측 방법)

  • You, Gyeong-Kuk;Yang, Jae-Mo;Lee, Jinkyu;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.5
    • /
    • pp.326-334
    • /
    • 2014
  • This paper proposes a novel covariance matrix estimation scheme for minimum variance distortionless response (MVDR) beamforming. By accurately tracking direction-of-sound source arrival (DoA) information using audio-visual sensors, the covariance matrix is efficiently estimated by adopting a variable forgetting factor. The variable forgetting factor is determined by considering signal-to-interference ratio (SIR). Experimental results verify that the performance of the proposed method is superior to that of the conventional one in terms of interference/noise reduction and speech distortion.

A Weighted Feature Voting Approach for Robust and Real-Time Voice Activity Detection

  • Moattar, Mohammad Hossein;Homayounpour, Mohammad Mehdi
    • ETRI Journal
    • /
    • v.33 no.1
    • /
    • pp.99-109
    • /
    • 2011
  • This paper concerns a robust real-time voice activity detection (VAD) approach which is easy to understand and implement. The proposed approach employs several short-term speech/nonspeech discriminating features in a voting paradigm to achieve a reliable performance in different environments. This paper mainly focuses on the performance improvement of a recently proposed approach which uses spectral peak valley difference (SPVD) as a feature for silence detection. The main issue of this paper is to apply a set of features with SPVD to improve the VAD robustness. The proposed approach uses a weighted voting scheme in order to take the discriminative power of the employed feature set into account. The experiments show that the proposed approach is more robust than the baseline approach from different points of view, including channel distortion and threshold selection. The proposed approach is also compared with some other VAD techniques for better confirmation of its achievements. Using the proposed weighted voting approach, the average VAD performance is increased to 89.29% for 5 different noise types and 8 SNR levels. The resulting performance is 13.79% higher than the approach based only on SPVD and even 2.25% higher than the not-weighted voting scheme.

A Study on Channel Mis-match Compensation Technique for Robust Speaker Verification System (강인한 화자확인 시스템을 위한 채널 불일치 보상 기법에 관한 연구)

  • 강철호;정희석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.228-234
    • /
    • 2004
  • In this paper, we proposed the compensation technique that overcomes the limitations of the conventional approaches through summing up the bias terms between world's codebook and individual codebook vectors of feature parameters. But, mean compensation without condition can bring higher false acceptance. Therefore, the proposed technique compensates the channel mis-match condition by weighted bias sum using nonlinear function regarding to the distortion between speech and silence. The simulation results show that the FRR (flase reject rate) is decreased 14.95% when the proposed algorithm was applied.

Effect of the Nasal Cavity Resonance on the Acoustic Characteristics of Korean Vowels (비강 공명이 한국어 모음에 미치는 음향학적 영향)

  • 성명훈;오승하;강명구;고태용;김광현;김진영
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.4 no.1
    • /
    • pp.24-32
    • /
    • 1991
  • Cleft palate or velopharyngeal incompetence shows many disorders and disabilities affecting speech transmission. including distortion. substitution. and the nasalization of the vowels. The nasalized vowels are produced primarily by lowering of the velum. resulting in opening a side passage for the air flow through the nasal cavity. These abnormal movements give rise to complex modification of the physical property of the sound or in the sound spectrum. The authors employed Sonagraph$^{\circledR}$ as a sound analyzer in order to ascertain the features which characterize the nasalization of vowels. Twenty healthy Korean male adult voluteers were analyzed in artificial conditions of anterior and posterior nasal obstruction. and velo-pharyngeal incompetence. The results were as follows : 1) Fundamental frequency was not changed by nasal obstruction or velopharyngeal incompetence. 2) There was no significant difference of the formant intensity between normal and nasal vowels. 3) In VPI, a decrease of the frequency of $F_2$ was observed in /e/ and /i/ vowels(p<0.001). 4) In VPI, the $F_2$ was frequently missed in /o/ and /u/ vowels. 5) In the consonant spectra of VPI, the 'release burst' was usually not observed.

  • PDF

The establishment of sending loudness rating for digital telephone using the input level of CODEC (코덱 입력레벨을 이용한 디지털 전화기의 송화음량정격 설계)

  • 홍진우;장대영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.2
    • /
    • pp.326-332
    • /
    • 1996
  • In this paper, a method to design the sending loudness rating(SLR) is proposed and the desirable transmission characteristics are considered in order to specify the transmission quality, based on the loudness ratings, for the digital telephone system that is a terminal for digital speech communication. To specify the desirable SLR for digital telephone system, the subjective test defining the preferred range of inout level for CODEC was performed. From the test results, it was identified that the optimal input level for CODEC is -15dB and the range not to cause the quantization noise and the distortion of CODEC fall within -12dB and -18dB.

  • PDF

Reconstruction of a small defect of the lower vermilion adjacent to white roll using a modified O-Z flap

  • Kim, Hong Il;Kim, Ho Sung;Park, Jin Hyung;Yi, Hyung Suk;Kim, Yoon Soo;Kim, Hyo Young
    • Archives of Craniofacial Surgery
    • /
    • v.22 no.3
    • /
    • pp.164-167
    • /
    • 2021
  • Reconstruction of lip defects is important because the lips play an important role in maintaining aesthetic facial balance, facial expressions, and speech. There are various methods of lip reconstruction such as primary repair, skin grafting, and utilization of local and free flaps. It is important to select a proper reconstruction method according to the size and location of lip defect. Failure to select an appropriate method may result in distortion, color mismatch, sensory loss, and aesthetic imbalance. Herein we present a case of successful aesthetic reconstruction of the lower vermilion. We removed a venous malformation, which was limited to the lower vermilion and adjacent to the white roll, and repaired the defect using the modified O-Z flap.

Front-End Processing for Speech Recognition in the Telephone Network (전화망에서의 음성인식을 위한 전처리 연구)

  • Jun, Won-Suk;Shin, Won-Ho;Yang, Tae-Young;Kim, Weon-Goo;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.4
    • /
    • pp.57-63
    • /
    • 1997
  • In this paper, we study the efficient feature vector extraction method and front-end processing to improve the performance of the speech recognition system using KT(Korea Telecommunication) database collected through various telephone channels. First of all, we compare the recognition performances of the feature vectors known to be robust to noise and environmental variation and verify the performance enhancement of the recognition system using weighted cepstral distance measure methods. The experiment result shows that the recognition rate is increasedby using both PLP(Perceptual Linear Prediction) and MFCC(Mel Frequency Cepstral Coefficient) in comparison with LPC cepstrum used in KT recognition system. In cepstral distance measure, the weighted cepstral distance measure functions such as RPS(Root Power Sums) and BPL(Band-Pass Lifter) help the recognition enhancement. The application of the spectral subtraction method decrease the recognition rate because of the effect of distortion. However, RASTA(RelAtive SpecTrAl) processing, CMS(Cepstral Mean Subtraction) and SBR(Signal Bias Removal) enhance the recognition performance. Especially, the CMS method is simple but shows high recognition enhancement. Finally, the performances of the modified methods for the real-time implementation of CMS are compared and the improved method is suggested to prevent the performance degradation.

  • PDF

A Method For Improvement Of Split Vector Quantization Of The ISF Parameters Using Adaptive Extended Codebook (적응적인 확장된 코드북을 이용한 분할 벡터 양자화기 구조의 ISF 양자화기 개선)

  • Lim, Jong-Ha;Jeong, Gyu-Hyeok;Hong, Gi-Bong;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.1
    • /
    • pp.1-8
    • /
    • 2011
  • This paper presents a method for improving the performance of ISF coefficients quantizer through compensating the defect of the split structure vector quantization using the ordering property of ISF coefficients. And design the ISF coefficients quantizer for wideband speech codec using proposed method. The wideband speech codec uses split structure vector quantizer which could not use the correlation between ISF coefficients fully to reduce complexity and the size of codebook. The proposed algorithm uses the ordering property of ISF coefficients to overcome the defect. Using the ordering property, the codebook redundancy could be figured out. The codebook redundancy is replaced by the adaptive-extended codebook to improve the performance of the quantizer through using the ordering property, ISF coefficient prediction and interpolation of existing codebook. As a result, the proposed algorithm shows that the adaptive-extended codebook algorithm could get about 2 bit gains in comparison with the existing split structure ISF quantizer of AMR-WB (G.722.2) in the points of spectral distortion.

Analysis on Filter Bubble reinforcement of SNS recommendation algorithm identified in the Russia-Ukraine war (러시아-우크라이나 전쟁에서 파악된 SNS 추천알고리즘의 필터버블 강화현상 분석)

  • CHUN, Sang-Hun;CHOI, Seo-Yeon;SHIN, Seong-Joong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.3
    • /
    • pp.25-30
    • /
    • 2022
  • This study is a study on the filter bubble reinforcement phenomenon of SNS recommendation algorithm such as YouTube, which is a characteristic of the Russian-Ukraine war (2022), and the victory or defeat factors of the hybrid war. This war is identified as a hybrid war, and the use of New Media based on the SNS recommendation algorithm is emerging as a factor that determines the outcome of the war beyond political leverage. For this reason, the filter bubble phenomenon goes beyond the dictionary meaning of confirmation bias that limits information exposed to viewers. A YouTube video of Ukrainian President Zelensky encouraging protests in Kyiv garnered 7.02 million views, but Putin's speech only 800,000, which is a evidence that his speech was not exposed to the recommendation algorithm. The war of these SNS recommendation algorithms tends to develop into an algorithm war between the US (YouTube, Twitter, Facebook) and China (TikTok) big tech companies. Influenced by US companies, Ukraine is now able to receive international support, and in Russia, under the influence of Chinese companies, Putin's approval rating is over 80%, resulting in conflicting results. Since this algorithmic empowerment is based on the confirmation bias of public opinion by 'filter bubble', the justification that a new guideline setting for this distortion phenomenon should be presented shortly is drawing attention through this Russia-Ukraine war.