Search | Korea Science

A study on loss combination in time and frequency for effective speech enhancement based on complex-valued spectrum (효과적인 복소 스펙트럼 기반 음성 향상을 위한 시간과 주파수 영역 손실함수 조합에 관한 연구)

Jung, Jaehee;Kim, Wooil
- The Journal of the Acoustical Society of Korea
- /
- v.41 no.1
- /
- pp.38-44
- /
- 2022
Speech enhancement is performed to improve intelligibility and quality of the noise-corrupted speech. In this paper, speech enhancement performance was compared using different loss functions in time and frequency domains. This study proposes a combination of loss functions to utilize advantage of each domain by considering both the details of spectrum and the speech waveform. In our study, Scale Invariant-Source to Noise Ratio (SI-SNR) is used for the time domain loss function, and Mean Squared Error (MSE) is used for the frequency domain, which is calculated over the complex-valued spectrum and magnitude spectrum. The phase loss is obtained using the sin function. Speech enhancement result is evaluated using Source-to-Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligibility (STOI). In order to confirm the result of speech enhancement, resulting spectrograms are also compared. The experimental results over the TIMIT database show the highest performance when using combination of SI-SNR and magnitude loss functions.
https://doi.org/10.7776/ASK.2022.41.1.038 인용 PDF KSCI

A study on combination of loss functions for effective mask-based speech enhancement in noisy environments (잡음 환경에 효과적인 마스크 기반 음성 향상을 위한 손실함수 조합에 관한 연구)

Jung, Jaehee;Kim, Wooil
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.3
- /
- pp.234-240
- /
- 2021
In this paper, the mask-based speech enhancement is improved for effective speech recognition in noise environments. In the mask-based speech enhancement, enhanced spectrum is obtained by multiplying the noisy speech spectrum by the mask. The VoiceFilter (VF) model is used as the mask estimation, and the Spectrogram Inpainting (SI) technique is used to remove residual noise of enhanced spectrum. In this paper, we propose a combined loss to further improve speech enhancement. In order to effectively remove the residual noise in the speech, the positive part of the Triplet loss is used with the component loss. For the experiment TIMIT database is re-constructed using NOISEX92 noise and background music samples with various Signal to Noise Ratio (SNR) conditions. Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligibility (STOI) are used as the metrics of performance evaluation. When the VF was trained with the mean squared error and the SI model was trained with the combined loss, SDR, PESQ, and STOI were improved by 0.5, 0.06, and 0.002 respectively compared to the system trained only with the mean squared error.
https://doi.org/10.7776/ASK.2021.40.3.234 인용 PDF KSCI

A Study on Hyper Parameters of Graph Neural Network (그래프 신경망 하이퍼 파라미터 연구)

Youn-A Min;Jin-Young Jun
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2023.07a
- /
- pp.517-518
- /
- 2023
본 논문에서는 인공지능 신경망의 하이퍼 파라미터들이 그래프 신경망 모델의 성능에 미치는 영향을 알아보기 위하여 대규모 그래프 데이터를 기반으로 이진 분류 문제를 예측하는 그래프 합성곱 신경망 모델(Graph Convolution Network Model)을 구현하고 모델의 다양한 하이퍼 파라미터 중 손실함수와 활성화 함수를 여러 가지 조합으로 적용하며 모델 학습과 예측 실험을 시행하였다. 실험 결과, 활성화 함수보다는 손실함수의 선택이 모델의 예측 성능에 좀 더 큰 영향을 미치는 것을 확인하였다.
PDF

Performance measures for correlated multiple characteristics in parameter design (다특성치 파라미터 설계의 평가척도에 관한 연구)

김욱일;강창욱
- Proceedings of the Korean Operations and Management Science Society Conference
- /
- 1994.04a
- /
- pp.367-369
- /
- 1994
지금까지 다구치 방법에서는 다특성치 문제에 있어서 특성치들 간의 관계를 무시하고 특성치들은 서로 독립이라는 가정 하에, 각 특성치에 대한 최적공 정조건을 찾아 다특성치로 확장시키는 방법이 사용되었다. 그러나 현실적으 로 많은 다특성치 문제에서 특성치들 간의 상관관계가 존재한다. 따라서 본 연구에서는 특성치들 간의 상관관계를 고려한 새로운 평가척도를 제시하고 자 한다. 본 연구에서는 각 특성치와 특성치들 간의 상관관계에 가중치를 부 여하는 방법을 사용하였다. 다특성치 손실함수를 단일 특성치 종류의 조합에 따라 여섯개의 모형으로 구분하였고, 각 모형의 다특성치 손실함수는 특성치 자체에 의해 야기되는 손실과 특성치들간의 관계에 의해 야기되는 손실로 나누었다. 또한 새로운 평가척도로는 다특성치 손실함수의 각 항에 의해 야 기되는 기대손실의 합인 다특성치의 기대손실을 선택하였다. 본 연구의 타당 성에 대해서는 기존의 데이터를 이용. 분석하여 기존 논문과 비교하였다.

Performance comparison evaluation of speech enhancement using various loss functions (다양한 손실 함수를 이용한 음성 향상 성능 비교 평가)

Hwang, Seo-Rim;Byun, Joon;Park, Young-Cheol
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.2
- /
- pp.176-182
- /
- 2021
This paper evaluates and compares the performance of the Deep Nerual Network (DNN)-based speech enhancement models according to various loss functions. We used a complex network that can consider the phase information of speech as a baseline model. As the loss function, we consider two types of basic loss functions; the Mean Squared Error (MSE) and the Scale-Invariant Source-to-Noise Ratio (SI-SNR), and two types of perceptual-based loss functions, including the Perceptual Metric for Speech Quality Evaluation (PMSQE) and the Log Mel Spectra (LMS). The performance comparison was performed through objective evaluation and listening tests with outputs obtained using various combinations of the loss functions. Test results show that when a perceptual-based loss function was combined with MSE or SI-SNR, the overall performance is improved, and the perceptual-based loss functions, even exhibiting lower objective scores showed better performance in the listening test.
https://doi.org/10.7776/ASK.2021.40.2.176 인용 PDF KSCI

Comparative Analysis of VT-ADL Model Performance Based on Variations in the Loss Function (Loss Function 변화에 따른 VT-ADL 모델 성능 비교 분석)

Namjung Kim;Changjoon Park;Junhwi Park;Jaehyun Lee;Jeonghwan Gwak
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2024.01a
- /
- pp.41-43
- /
- 2024
본 연구에서는 Vision Transformer 기반의 Anomaly Detection and Localization (VT-ADL) 모델에 초점을 맞추고, 손실 함수의 변경이 MVTec 데이터셋에 대한 이상 검출 및 지역화 성능에 미치는 영향을 비교 분석한다. 기존의 손실 함수를 KL Divergence와 Log-Likelihood Loss의 조합인 VAE Loss로 대체하여, 성능 변화를 심층적으로 조사했다. 실험을 통해 VAE Loss로의 전환은 VT-ADL 모델의 이상 검출 능력을 현저히 향상시키며, 특히 PRO-score에서 기존 대비 약 5%의 개선을 보였다는 점을 확인하였다. 이러한 결과는 손실 함수의 최적화가 VT-ADL 모델의 전반적인 성능에 중요한 영향을 미칠 수 있음을 시사한다. 또한, 이 연구는 Vision Transformer 기반 모델의 이상 검출과 지역화 작업에 있어서 손실 함수 선택의 중요성을 강조하며, 향후 관련 연구에 유용한 기준을 제공할 수 있을 것으로 기대된다.
PDF

Reconfiguration of Distribution System Using Simulated Annealing (시뮬레이티드 어닐링을 이용한 배전 계통 재구성)

전영재;김재철
- Proceedings of the Korea Database Society Conference
- /
- 1999.06a
- /
- pp.195-202
- /
- 1999
본 논문은 배전 계통에서 부하 제약조건과 운전 제약조건을 고려한 손실 감소와 부하 평형에 대해 시뮬레이티드 어닐링 알고리즘을 적용한 재구성 방법을 서술하였다. 네트워크 재구성은 수많은 연계 개폐기와 구분 개폐기의 조합에 의해 이루어지기 때문에 조합적인 최적화 문제이다. 이러한 문제는 수많은 조합에 제약조건까지 있어 해를 구하기가 쉽지 않을 뿐 아니라 국소 해에 빠질 가능성이 많다. 따라서 신경망 중에서 제약조건에 따라 신경망 구조에 영향을 미치지 않으면서 전역 최소해에 수렴하는 특성을 가진 시뮬레이티드 어닐링 기법을 이용하여 배전 계통의 선로를 재구성하였다. 시뮬레이티드 어닐링은 이론적으로 최적해가 보장되지만 무한대의 시간이 걸리기 때문에 현실적으로 적용할 때 해 공간을 탐색하는 규칙과 온도를 적절히 내리는 냉각 스케줄(cooling schedule)이 중요하다. 본 논문에서는 알고리즘 상에서 제약조건 위반 여부를 점검할 수 있는 제약조건과 페널티 상수(penalty factor)를 통해 목적함수에 반영하는 제약조건으로 나누어 모든 후보해를 가능해가 되게 하였고 기존에 사용되던 Kirkpatrick의 냉각 스케줄 대신에 후보해의 통계적 처리에 의해 온도를 내리는 다항-시간 냉각 스케줄(polynomial-time cooling schedule)을 사용하여 수행시간을 단축하고 수렴성을 높였다. 제안한 알고리즘의 효용성을 입증하기 위해 32, 69모선 예제 계통으로 테스트하였다.
PDF

Reconfiguration of Distribution System Using Simulated Annealing (시뮬레이티드 어닐링을 이용한 배전 계통 재구성)

전영재;김재철
- Proceedings of the Korea Inteligent Information System Society Conference
- /
- 1999.03a
- /
- pp.195-202
- /
- 1999
본 논문은 배전 계통에서 부하 제약조건과 운전 제약조건을 고려한 손실 감소와 부하 평형에 대해 시뮬레이티드 어닐링 알고리즘을 적용한 재구성 방법을 서술하였다. 네트워크 재구성은 수많은 연계 개폐기와 구분 계폐기의 조합에 의해 이루어지기 때문에 조합적인 최적화 문제이다. 이러한 문제는 수많은 조합에 제약조건까지 있어 해를 구하기가 쉽지 않을뿐 아니라 국소 해에 빠질 가능성이 많다. 따라서 신경망 중에서 제약조건에 따라 신경망 구조에 영향을 미치지 않으면서 전역 최소해에 수렴하는 특성을 가진 시뮬레이티드 어닐링 기법을 이용하여 배전 계통의 선로를 재구성하였다. 시뮬레이티드 어닐링은 이론적으로 최적해가 보장되지만 무한대의 시간이 걸리기 때문에 현실적으로 적용할 때 해 공간을 탐색하는 규칙과 온도를 적절히 내리는 냉각 스케줄(cooling schedule)이 중요하다. 본 논문에서는 알고리즘 상에서 제약조건 위한 여부를 점검할 수 있는 제약조건과 페널티 상수(penalty factor)를 통해 목적함수에 반영하는 제약조건으로 나누어 모든 후보해를 가능해가 되게 하였고 기존에 사용되는 Kirkpatrick의 냉각 스케줄 대신에 후보해의 통계적 처리에 의해 온도를 내리는 다항-시간 냉각 스케줄(polynomial-time schedule)을 사용하여 수행시간을 단축하고 수렴성을 높였다. 제안한 알고리즘의 효용성을 입증하기 위해 32,69모선 예제 계통으로 테스트하였다.
PDF

Development of Biofilter System to Ammonia Removal exhausted from Livestock Facilities (축사내 암모니아 제거를 위한 바이오필터 시스템 개발)

조성인;김명락;여운영
- Proceedings of the Korean Society for Agricultural Machinery Conference
- /
- 2002.02a
- /
- pp.383-388
- /
- 2002
본 연구에서 구성한 바이오필터 시스템은 암모니아 가스를 대상으로 여러 조건에서 성능을 구명하였으며, 필터 설계시 중요 인자인 송풍량, 온도, 함수율, 압력강하, 체류시간들간의 관계를 구명하였다. 필터 내부의 온도 변화는 체류시간 및 압력손실에 거의 영향을 주지 않았으며, 함수율의 변화가 체류시간과 압력손실에 미치는 영향은 함수율 값이 증가할수록 체류시간은 감소했으며 반대로 압력손실은 증가하는 결과를 보였다. 이는 필터 내부의 공극률 변화로 생긴 결과라 판단된다. 송풍량은 바이오필터 효율에 절대적으로 영향을 미치며 송풍량이 증가할수록 체류시간은 감소하며 초기 제거율도 떨어진다. 미생물의 투입 여부에 따른 제거율은 미생물 접종을 하지 않은 경우 초기 흡착에 의한 영향으로 제거율이 높다가 시간이 지남에 따라 차츰 낮아져 90% 이하로 떨어지는 경향을 보였고, 균주를 접종한 경우에 있어서는 시운전 기간 동안 거의 100% 가까운 제거 성능을 보였다. 본 연구는 실험실에서 암모니아 가스만을 대상을 하여 실험하였다. 따라서 실제 축사에서 발생하는 다양한 성분의 악취와 농도에 대한 성능 검증과 개선에 대한 연구가 보다 장기간에 걸쳐 이루어져야 할 것이다. 또한 소요되는 에너지와 운전비용의 절감 등의 유지관리, 바이오필터와 타 방식과의 조합, 그리고 다양한 전처리 방식의 개발 등 여러 측면에서 바이오필터 성능 개선에 대한 연구가 병행되어야 할 것으로 판단된다.
PDF

A study on end-to-end speaker diarization system using single-label classification (단일 레이블 분류를 이용한 종단 간 화자 분할 시스템 성능 향상에 관한 연구)

Jaehee Jung;Wooil Kim
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.6
- /
- pp.536-543
- /
- 2023
Speaker diarization, which labels for "who spoken when?" in speech with multiple speakers, has been studied on a deep neural network-based end-to-end method for labeling on speech overlap and optimization of speaker diarization models. Most deep neural network-based end-to-end speaker diarization systems perform multi-label classification problem that predicts the labels of all speakers spoken in each frame of speech. However, the performance of the multi-label-based model varies greatly depending on what the threshold is set to. In this paper, it is studied a speaker diarization system using single-label classification so that speaker diarization can be performed without thresholds. The proposed model estimate labels from the output of the model by converting speaker labels into a single label. To consider speaker label permutations in the training, the proposed model is used a combination of Permutation Invariant Training (PIT) loss and cross-entropy loss. In addition, how to add the residual connection structures to model is studied for effective learning of speaker diarization models with deep structures. The experiment used the Librispech database to generate and use simulated noise data for two speakers. When compared with the proposed method and baseline model using the Diarization Error Rate (DER) performance the proposed method can be labeling without threshold, and it has improved performance by about 20.7 %.
https://doi.org/10.7776/ASK.2023.42.6.536 인용 PDF

Search Result 25, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)