Search | Korea Science

Performance comparison evaluation of real and complex networks for deep neural network-based speech enhancement in the frequency domain (주파수 영역 심층 신경망 기반 음성 향상을 위한 실수 네트워크와 복소 네트워크 성능 비교 평가)

Hwang, Seo-Rim;Park, Sung Wook;Park, Youngcheol
- The Journal of the Acoustical Society of Korea
- /
- v.41 no.1
- /
- pp.30-37
- /
- 2022
This paper compares and evaluates model performance from two perspectives according to the learning target and network structure for training Deep Neural Network (DNN)-based speech enhancement models in the frequency domain. In this case, spectrum mapping and Time-Frequency (T-F) masking techniques were used as learning targets, and a real network and a complex network were used for the network structure. The performance of the speech enhancement model was evaluated through two objective evaluation metrics: Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) depending on the scale of the dataset. Test results show the appropriate size of the training data differs depending on the type of networks and the type of dataset. In addition, they show that, in some cases, using a real network may be a more realistic solution if the number of total parameters is considered because the real network shows relatively higher performance than the complex network depending on the size of the data and the learning target.
https://doi.org/10.7776/ASK.2022.41.1.030 인용 PDF KSCI

Complex nested U-Net-based speech enhancement model using a dual-branch decoder (이중 분기 디코더를 사용하는 복소 중첩 U-Net 기반 음성 향상 모델)

Seorim Hwang;Sung Wook Park;Youngcheol Park
- The Journal of the Acoustical Society of Korea
- /
- v.43 no.2
- /
- pp.253-259
- /
- 2024
This paper proposes a new speech enhancement model based on a complex nested U-Net with a dual-branch decoder. The proposed model consists of a complex nested U-Net to simultaneously estimate the magnitude and phase components of the speech signal, and the decoder has a dual-branch decoder structure that performs spectral mapping and time-frequency masking in each branch. At this time, compared to the single-branch decoder structure, the dual-branch decoder structure allows noise to be effectively removed while minimizing the loss of speech information. The experiment was conducted on the VoiceBank + DEMAND database, commonly used for speech enhancement model training, and was evaluated through various objective evaluation metrics. As a result of the experiment, the complex nested U-Net-based speech enhancement model using a dual-branch decoder increased the Perceptual Evaluation of Speech Quality (PESQ) score by about 0.13 compared to the baseline, and showed a higher objective evaluation score than recently proposed speech enhancement models.
https://doi.org/10.7776/ASK.2024.43.2.253 인용 PDF

The Effect of An Increase of Closed Quotient on Improvement of Voice Quality after Type I Thyroplasty in Patients with Unilateral Vocal Cord Paralysis (일측 성대마비 환자에서 성대내전술 후 성대접촉율의 증가가 음질 개선에 미치는 영향)

Kim, Han-Su;Choi, Seung-Hee;Lim, Jae-Yol;Choi, Hong-Shik
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.15 no.1
- /
- pp.16-20
- /
- 2004
Purpose : To assess perceptual, acoustic and aerodynamic measure of voice quality in patients with unilateral vocal cord paralysis before and after type I thyroplasty. Methods : The clinical records of patients operated type I thyroplasty in the Departement of otorhinoalryngolgy, Yongdong Severance hospital from November 2001 to November 2003 were reviewed. All patients uderwent a vocal function evaluation including perceptual, acoustic and aerodynamic measures of voice preoperative and on $60^{th}$ postoperative day. The perceptual and acoustic measures were obtained from recording of patients' reading a 'Sanchak' passage. The perceptual evaluation was performed by 2 speech pathologist using a 4-point rating scale. Acoustic parameters(voice range profile low(RAL), voice range profile high(RAH), average fundamental frequency(AFX), closed quotient, harmonic to noise ratio, jitter and shimmer) were investigated by Lx speech studio. Mean flow rate(MFR), subglottic pressure(Psub) and intensity were measured using the Phonatory function analyzer. The maximum phonation time was also measured. The data were statistically analyzed. A paired t-test (p<0.1) was used to compare preoperative and postoperative results. And multiple regression test was used to find which parameter was most correlated to improvement of postoperative voice quality. Results : Among aerodynamic parameters, Psub $(88.11mmH_2O{\rightarrow}58.7mmH_2O)$, MPT(7.87sec${\rightarrow}$12.53sec), MFR (359.8ml/sec${\rightarrow}$161.06ml/sec) were statistically improved. AFx(205.5Hz${\rightarrow}$163.27Hz), AQx(23.9%${\rightarrow}$48.3%), RAL, RAH. Jotter and shimmer were improved. In multiple regression test, AFx and AQx was noted as the two meost correlated parameters to improvement of postoperative breathiness. But general grade of voice quality was more correlated to Psub and shimmer. Conclusion : Vocal fold medialization procedures effectively reduce glottic gap. Increasing of contact area of both vocal folds induced improvement in aerodynamic parameters and leaded stabilizing of vocal fold vibration. That effect results in improvement in acoustic parameters (shimmer, jitter, signal-to-noise ratio, voice range profile) and voice quality.
PDF

A Hybrid Image Coding Using BTC and DPCM with Performance Evaluation (BTC와 DPCM을 결합한 영상신호의 복합 부호화와 성능평가)

고형화;이충웅
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.25 no.4
- /
- pp.447-452
- /
- 1988
This paper proposes a hybrid image coding in order to improve the coding performance by combining the BTC with the DPCM. And utilizing the human perceptual characteristics, a new objective image quality evaluation method has been proposed to obtain an excellent result in good agreement with the subjective quality evaluation. A hyb-1 method consisting of the DPCM and the AMBTC has retained a good picture quality at the bit rate of 1.5 bits/pel. A hyb-3 method combining the EBTC-3 with the DPCM has scarcely degraded the picture quality compared with the original image at the bit rate of 2.1 bits/pel. A newly proposed mehtod of picture quality evaluation accumulating a blocky noise at the edge block and an impulsive noise at the flat block selectively has been coincident with the subjective evaluation of quality.
PDF

Study on the pronunciation correction in English words (영어 단어 학습시의 발성 교정 기술에 관한 연구)

Beack, Seung-Kwon;Choi, Jung-Kyu;Hahn, Min-Soo
- Speech Sciences
- /
- v.7 no.2
- /
- pp.245-253
- /
- 2000
In this paper, we implement an elementary system to correct accents and pronunciations in English words spoken by non-native English speakers. In case of the accent evaluation, energy and pitch information are used to find stressed syllables, and then we extract the segment information of input patterns using a dynamic time warping method to discriminate and evaluate accent position. For the pronunciation evaluation, we utilize the segment information using the same algorithm as in accent evaluation, and perform the spectral distance measure for each phoneme between input patterns and reference patterns. Based on these spectral distances, we decide whether to recommend the pronunciation correction or not. Our results show that 98 percent of accent and 71 percent of pronunciation evaluation agree with the perceptual measure.
PDF

A Study of the Characteristics of Perception According to Gender in the Image Evaluation of Cafe Facades (카페 파사드의 이미지평가에 나타난 성별 지각특성에 관한 연구)

Son, Kwang-Ho;Choi, Gae-Young
- Korean Institute of Interior Design Journal
- /
- v.23 no.2
- /
- pp.99-107
- /
- 2014
Façade design makes the customer to select and remember of commercial space. Therefore, if it can be identified characteristics of perception in image evaluation, it will be possible to easy visiting space and re-selection with motivation through identity and differentiation of commercial space. For this study, cafes among commercial facilities were selected for the appreciation of the differentiated design features attracting customer's eyes through space image evaluation by gender. The followings are the conclusions drawn from the study. First, when the features of mean and deviation of [Factor 1] were employed for the appreciation of the perceptual characteristics of both men and women, it was clearly found that customers regards the facade design as a coarse one even though they are confused about whether the facade design is the straightened-up one. Second, customers perceive facade design as the one which is bright but not unique through [Factor 2] while in the process of selecting adjectives to describe it men's perception as to "being vivid but interesting" was dispersed and women's as to "Being bright" was also dispersed, too but the women perceived it as "being opaque". Third, it was revealed that the perceptual characteristics of [Factor 3] were perceived as "warm but boorish" and "warm but crude" by men and women respectively. Fourth, most (80%) of the adjectives employed for vivid description of their perception by both genders were found to be consistent.
https://doi.org/10.14774/JKIID.2014.23.2.099 인용 PDF KSCI

Online blind source separation and dereverberation of speech based on a joint diagonalizability constraint (공동 행렬대각화 조건 기반 온라인 음원 신호 분리 및 잔향제거)

Yu, Ho-Gun;Kim, Do-Hui;Song, Min-Hwan;Park, Hyung-Min
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.5
- /
- pp.503-514
- /
- 2021
Reverberation in speech signals tends to significantly degrade the performance of the Blind Source Separation (BSS) system. Especially in online systems, the performance degradation becomes severe. Methods based on joint diagonalizability constraints have been recently developed to tackle the problem. To improve the quality of separated speech, in this paper, we add the proposed de-reverberation method to the online BSS algorithm based on the constraints in reverberant environments. Through experiments on the WSJCAM0 corpus, the proposed method was compared with the existing online BSS algorithm. The performance evaluation by the Signal-to-Distortion Ratio and the Perceptual Evaluation of Speech Quality demonstrated that SDR improved from 1.23 dB to 3.76 dB and PESQ improved from 1.15 to 2.12 on average.
https://doi.org/10.7776/ASK.2021.40.5.503 인용 PDF KSCI

Packet Loss Concealment Algorithm Using Pitch Harmonic Motion Estimation and Adaptive Signal Scale Estimation (피치 하모닉 움직임 예측과 적응적 신호 크기 예측을 이용한 패킷 손실 은닉 알고리즘)

Kim, Tae-Ha;Lee, In-Sung
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.14 no.4
- /
- pp.247-256
- /
- 2021
In this paper, we propose a packet loss concealment (PLC) algorithm using pitch harmonic motion prediction and adaptive signal amplitude prediction and. The spectral motion prediction method divides the spectral motion of the previous usable frame into predetermined sub-bands to predict and restore the motion of the lost signal. In the proposed algorithm, the speech signal is classified into voiced and unvoiced sounds. In the case of voiced sounds, it is further divided into pitch harmonics using the pitch frequency to predict and restore the pitch harmonic motion of the lost frame, and for the unvoiced sound, the lost frame is restored using the spectral motion prediction method. When the continuous loss of speech frames occurs, a method of adjusting the gain using the least mean square (LMS) predictor is proposed. The performance of the proposed algorithm was evaluated through the objective evaluation method, PESQ (Perceptual Evaluation of Speech Quality) and was showed MOS 0.1 improvement over the conventional method.
https://doi.org/10.17661/jkiiect.2021.14.4.247 인용 PDF KSCI

Speech Enhancement Based on Minima Controlled Recursive Averaging Technique Incorporating Conditional MAP (조건 사후 최대 확률 기반 최소값 제어 재귀평균기법을 이용한 음성향상)

Kum, Jong-Mo;Park, Yun-Sik;Chang, Joon-Hyuk
- The Journal of the Acoustical Society of Korea
- /
- v.27 no.5
- /
- pp.256-261
- /
- 2008
In this paper, we propose a novel approach to improve the performance of minima controlled recursive averaging (MCRA) which is based on the conditional maximum a posteriori criterion. A crucial component of a practical speech enhancement system is the estimation of the noise power spectrum. One state-of-the-art approach is the minima controlled recursive averaging (MCRA) technique. The noise estimate in the MCRA technique is obtained by averaging past spectral power values based on a smoothing parameter that is adjusted by the signal presence probability in frequency subbands. We improve the MCRA using the speech presence probability which is the a posteriori probability conditioned on both the current observation the speech presence or absence of the previous frame. With the performance criteria of the ITU-T P.862 perceptual evaluation of speech quality (PESQ) and subjective evaluation of speech quality, we show that the proposed algorithm yields better results compared to the conventional MCRA-based scheme.
https://doi.org/10.7776/ASK.2008.27.5.256 인용 PDF KSCI

A study on speech enhancement using complex-valued spectrum employing Feature map Dependent attention gate (특징 맵 중요도 기반 어텐션을 적용한 복소 스펙트럼 기반 음성 향상에 관한 연구)

Jaehee Jung;Wooil Kim
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.6
- /
- pp.544-551
- /
- 2023
Speech enhancement used to improve the perceptual quality and intelligibility of noise speech has been studied as a method using a complex-valued spectrum that can improve both magnitude and phase in a method using a magnitude spectrum. In this paper, a study was conducted on how to apply attention mechanism to complex-valued spectrum-based speech enhancement systems to further improve the intelligibility and quality of noise speech. The attention is performed based on additive attention and allows the attention weight to be calculated in consideration of the complex-valued spectrum. In addition, the global average pooling was used to consider the importance of the feature map. Complex-valued spectrum-based speech enhancement was performed based on the Deep Complex U-Net (DCUNET) model, and additive attention was conducted based on the proposed method in the Attention U-Net model. The results of the experiments on noise speech in a living room environment showed that the proposed method is improved performance over the baseline model according to evaluation metrics such as Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short Time Object Intelligence (STOI), and consistently improved performance across various background noise environments and low Signal-to-Noise Ratio (SNR) conditions. Through this, the proposed speech enhancement system demonstrated its effectiveness in improving the intelligibility and quality of noisy speech.
https://doi.org/10.7776/ASK.2023.42.6.544 인용 PDF

Search Result 248, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)