• Title/Summary/Keyword: perceptual loss

Search Result 59, Processing Time 0.018 seconds

Performance comparison evaluation of speech enhancement using various loss functions (다양한 손실 함수를 이용한 음성 향상 성능 비교 평가)

  • Hwang, Seo-Rim;Byun, Joon;Park, Young-Cheol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.2
    • /
    • pp.176-182
    • /
    • 2021
  • This paper evaluates and compares the performance of the Deep Nerual Network (DNN)-based speech enhancement models according to various loss functions. We used a complex network that can consider the phase information of speech as a baseline model. As the loss function, we consider two types of basic loss functions; the Mean Squared Error (MSE) and the Scale-Invariant Source-to-Noise Ratio (SI-SNR), and two types of perceptual-based loss functions, including the Perceptual Metric for Speech Quality Evaluation (PMSQE) and the Log Mel Spectra (LMS). The performance comparison was performed through objective evaluation and listening tests with outputs obtained using various combinations of the loss functions. Test results show that when a perceptual-based loss function was combined with MSE or SI-SNR, the overall performance is improved, and the perceptual-based loss functions, even exhibiting lower objective scores showed better performance in the listening test.

Perceptual Quality-based Video Coding with Foveated Contrast Sensitivity (Foveated Contrast Sensitivity를 이용한 인지품질 기반 비디오 코딩)

  • Ryu, Jiwoo;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.19 no.4
    • /
    • pp.468-477
    • /
    • 2014
  • This paper proposes a novel perceptual quality-based (PQ-based) video coding method with foveated contrast sensitivity (FCS). Conventional methods on PQ-based video coding with FCS achieve minimum loss on perceptual quality of compressed video by exploiting the property of human visual system (HVS), that is, its sensitivity differs by the spatial frequency of visual stimuli. On the other hand, PQ-based video coding with foveated masking (FM) exploits the difference of the sensitivity of the HVS between the central vision and the peripheral vision. In this study, a novel FCS model is proposed which considers both the conventional DCT-based JND model and the FM model. Psychological study is conducted to construct the proposed FCS model, and the proposed model is applied to PQ-based video coding algorithm implemented on HM10.0 reference software. Experimental results show that the proposed method decreases bitrate by the average of 10% without loss on the perceptual quality.

No-Referenced Video-Quality Assessment for H.264 SVC with Packet Loss (패킷 손실시 H.264 SVC의 무기준법 영상 화질 평가 방법)

  • Kim, Hyun-Tae;Kim, Yo-Han;Shin, Ji-Tae;Won, Seok-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.11C
    • /
    • pp.655-661
    • /
    • 2011
  • The transmission issues for the scalable video coding extension of H.264/AVC (H.264 SVC) video has been widely studied. In this paper, we propose an objective video-quality assessment metric based on no-reference for H.264 SVC using scalability information. The proposed metric estimate the perceptual video-quality reflecting error conditions with the consideration of the motion vectors, error propagation patterns with the hierarchical prediction structure, quantization parameters, and number of frame which damaged by packet loss. The proposed metric reflects the human perceptual quality of video and we evaluate the performance of proposed metric by using correlation relationship between differential mean opinion score (DMOS) as a subjective quality and proposed one.

Adaptive Importance Channel Selection for Perceptual Image Compression

  • He, Yifan;Li, Feng;Bai, Huihui;Zhao, Yao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.9
    • /
    • pp.3823-3840
    • /
    • 2020
  • Recently, auto-encoder has emerged as the most popular method in convolutional neural network (CNN) based image compression and has achieved impressive performance. In the traditional auto-encoder based image compression model, the encoder simply sends the features of last layer to the decoder, which cannot allocate bits over different spatial regions in an efficient way. Besides, these methods do not fully exploit the contextual information under different receptive fields for better reconstruction performance. In this paper, to solve these issues, a novel auto-encoder model is designed for image compression, which can effectively transmit the hierarchical features of the encoder to the decoder. Specifically, we first propose an adaptive bit-allocation strategy, which can adaptively select an importance channel. Then, we conduct the multiply operation on the generated importance mask and the features of the last layer in our proposed encoder to achieve efficient bit allocation. Moreover, we present an additional novel perceptual loss function for more accurate image details. Extensive experiments demonstrated that the proposed model can achieve significant superiority compared with JPEG and JPEG2000 both in both subjective and objective quality. Besides, our model shows better performance than the state-of-the-art convolutional neural network (CNN)-based image compression methods in terms of PSNR.

Conversational Quality Measurement System for Mobile VoIP Speech Communication (모바일 VoIP 음성통신을 위한 대화음질 측정 시스템)

  • Cho, Jae-Man;Kim, Hyoung-Gook
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.10 no.4
    • /
    • pp.71-77
    • /
    • 2011
  • In this paper, we propose a conversational quality measurement (CQM) system for providing the objective QoS of high quality mobile VoIP voice telecommunication. For measuring the conversational quality, the VoIP telecommunication system is implemented in two smart phones connected with VoIP. The VoIP telecommunication system consists of echo cancellation, noise reduction, speech encoding/decoding, packet generation with RTP (Real-Time Protocol), jitter buffer control and POS (Play-out Schedule) with LC (loss Concealment). The CQM system is connected to a microphone and a speaker of each smart phone. The voice signal of each speaker is recorded and used to measure CE (Conversational Efficiency), CS (Conversational Symmetry), PESQ (Perceptual Evaluation of Speech Quality) and CE-CS-PESQ correlation. We prove the CQM system by measuring CE, CS and PESQ under various SNR, delay and loss due to IP network environment.

Perceptual Generative Adversarial Network for Single Image De-Snowing (단일 영상에서 눈송이 제거를 위한 지각적 GAN)

  • Wan, Weiguo;Lee, Hyo Jong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.10
    • /
    • pp.403-410
    • /
    • 2019
  • Image de-snowing aims at eliminating the negative influence by snow particles and improving scene understanding in images. In this paper, a perceptual generative adversarial network based a single image snow removal method is proposed. The residual U-Net is designed as a generator to generate the snow free image. In order to handle various sizes of snow particles, the inception module with different filter kernels is adopted to extract multiple resolution features of the input snow image. Except the adversarial loss, the perceptual loss and total variation loss are employed to improve the quality of the resulted image. Experimental results indicate that our method can obtain excellent performance both on synthetic and realistic snow images in terms of visual observation and commonly used visual quality indices.

Performance Improvement of Packet Loss Concealment Algorithm in G.711 Using Adaptive Signal Scale Estimation (적응적 신호 크기 예측을 이용한 G.711 패킷 손실 은닉 알고리즘의 성능향상)

  • Kim, Tae-Ha;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.5
    • /
    • pp.403-409
    • /
    • 2015
  • In this paper, we propose Packet Loss Concealment (PLC) method using adaptive signal scale estimation for performance improvement of G.711 PLC. The conventional method controls a gain using 20 % attenuation factor when continuous loss occurs. However, this method lead to deterioration because that don't consider the change of signal. So, we propose gain control by adaptive signal scale estimation through before and after frame information using Least Mean Square (LMS) predictor. Performance evaluation of proposed algorithm is presented through Perceptual Evaluation of Speech Quality (PESQ) evaulation.

A Novel Multi-Channel Hearing Aid Algorithm with SMR(signal-to-masking ratio) Improvement (신호 대 마스킹 비 개선을 통한 다채널 보청 알고리즘)

  • 김헌중;홍민철;차형태
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.8
    • /
    • pp.12-21
    • /
    • 2000
  • In this paper, we propose a novel hearing aid algorithm for sensorinural hearing loss restoration with multi-channel(band) dynamic range compression and psychoacoustics. In this way, we can present a normal perception condition to the impaired listener. The proposed algorithm make loudness scaling function achieve proper loudness level, and analysis masking property for the signal will be perceived to impaired listener, and then, restore normal spectral contrast using SMR(signal-to-masking ratio) defined by distance between the level of each frequency and masking threshold.

  • PDF

Packet Loss Concealment Algorithm Using Pitch Harmonic Motion Estimation and Adaptive Signal Scale Estimation (피치 하모닉 움직임 예측과 적응적 신호 크기 예측을 이용한 패킷 손실 은닉 알고리즘)

  • Kim, Tae-Ha;Lee, In-Sung
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.4
    • /
    • pp.247-256
    • /
    • 2021
  • In this paper, we propose a packet loss concealment (PLC) algorithm using pitch harmonic motion prediction and adaptive signal amplitude prediction and. The spectral motion prediction method divides the spectral motion of the previous usable frame into predetermined sub-bands to predict and restore the motion of the lost signal. In the proposed algorithm, the speech signal is classified into voiced and unvoiced sounds. In the case of voiced sounds, it is further divided into pitch harmonics using the pitch frequency to predict and restore the pitch harmonic motion of the lost frame, and for the unvoiced sound, the lost frame is restored using the spectral motion prediction method. When the continuous loss of speech frames occurs, a method of adjusting the gain using the least mean square (LMS) predictor is proposed. The performance of the proposed algorithm was evaluated through the objective evaluation method, PESQ (Perceptual Evaluation of Speech Quality) and was showed MOS 0.1 improvement over the conventional method.

A high-density gamma white spots-Gaussian mixture noise removal method for neutron images denoising based on Swin Transformer UNet and Monte Carlo calculation

  • Di Zhang;Guomin Sun;Zihui Yang;Jie Yu
    • Nuclear Engineering and Technology
    • /
    • v.56 no.2
    • /
    • pp.715-727
    • /
    • 2024
  • During fast neutron imaging, besides the dark current noise and readout noise of the CCD camera, the main noise in fast neutron imaging comes from high-energy gamma rays generated by neutron nuclear reactions in and around the experimental setup. These high-energy gamma rays result in the presence of high-density gamma white spots (GWS) in the fast neutron image. Due to the microscopic quantum characteristics of the neutron beam itself and environmental scattering effects, fast neutron images typically exhibit a mixture of Gaussian noise. Existing denoising methods in neutron images are difficult to handle when dealing with a mixture of GWS and Gaussian noise. Herein we put forward a deep learning approach based on the Swin Transformer UNet (SUNet) model to remove high-density GWS-Gaussian mixture noise from fast neutron images. The improved denoising model utilizes a customized loss function for training, which combines perceptual loss and mean squared error loss to avoid grid-like artifacts caused by using a single perceptual loss. To address the high cost of acquiring real fast neutron images, this study introduces Monte Carlo method to simulate noise data with GWS characteristics by computing the interaction between gamma rays and sensors based on the principle of GWS generation. Ultimately, the experimental scenarios involving simulated neutron noise images and real fast neutron images demonstrate that the proposed method not only improves the quality and signal-to-noise ratio of fast neutron images but also preserves the details of the original images during denoising.