• Title/Summary/Keyword: Mean Opinion Score (MOS)

Search Result 94, Processing Time 0.025 seconds

Adaptive Enhancement Algorithm of Perceptual Filter Using Variable Threshold (가변 임계값을 이용한 지각 필터의 적응적인 음질 개선 알고리즘)

  • 차형태
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.6
    • /
    • pp.446-453
    • /
    • 2004
  • In this paper, a new adaptive perceptual filter using variable threshold to enhance audio signals degraded by additively nonstationary noise is proposed. The adaptive perceptual filter updates variable threshold each time according to the power of signal and the effect of noise variation. So the noisy audio signal is enhanced by the method which controls a residual noise effectively. The proposed algorithm uses the perceptual filter which transforms a time domain signal into frequency domain and calculates an intensity energy and an excitation energy in bark domain. In this method. the stage updated the response of filter is decided by threshold. The proposed algorithm using vairable threshold effectively controls a residual noise using the energy difference of audio signals degraded by the additive nonstationary noise. The proposed method is tested with the noisy audio signals degraded by nonstationary noise at various signal -to-noise ratios (SNR). We carry out NMR and MOS test when the input SNR is 15dB. 20dB. 25dB and 30dB. An approximate improvement of 17.4dB. 15.3dB, 12.8dB. 9.8dB in NMR and enhancement of 2.9, 2.5, 2.3, 1.7 in MOS test is achieved with the input signals. respectively.

Salient Region Detection Algorithm for Music Video Browsing (뮤직비디오 브라우징을 위한 중요 구간 검출 알고리즘)

  • Kim, Hyoung-Gook;Shin, Dong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.112-118
    • /
    • 2009
  • This paper proposes a rapid detection algorithm of a salient region for music video browsing system, which can be applied to mobile device and digital video recorder (DVR). The input music video is decomposed into the music and video tracks. For the music track, the music highlight including musical chorus is detected based on structure analysis using energy-based peak position detection. Using the emotional models generated by SVM-AdaBoost learning algorithm, the music signal of the music videos is classified into one of the predefined emotional classes of the music automatically. For the video track, the face scene including the singer or actor/actress is detected based on a boosted cascade of simple features. Finally, the salient region is generated based on the alignment of boundaries of the music highlight and the visual face scene. First, the users select their favorite music videos from various music videos in the mobile devices or DVR with the information of a music video's emotion and thereafter they can browse the salient region with a length of 30-seconds using the proposed algorithm quickly. A mean opinion score (MOS) test with a database of 200 music videos is conducted to compare the detected salient region with the predefined manual part. The MOS test results show that the detected salient region using the proposed method performed much better than the predefined manual part without audiovisual processing.

Adaptive QoS Study for Video Streaming Service In MMT Protocol (비디오 스트리밍 서비스를 위한 MMT 기반 적응적 QoS 연구)

  • Jo, Bokyun;Lee, Doohyun;Suh, Doug Young
    • Journal of Broadcast Engineering
    • /
    • v.20 no.1
    • /
    • pp.40-47
    • /
    • 2015
  • This paper discusses QoS enhancement in the Best-effort services of the service plan provided by MPEG Media Transport (MMT) systems for video streaming applications. Among MMT services, i.e. per-flow, per-class, and best-effort services, the server does not provide guaranteed bandwidth for the best-effort service only. Therefore, in the best-effort services, a bandwidth access priority is defined for various services, where the lowest priority is assigned to the low-level video services. To alleviate the issue of bandwidth limitation in the best-effort services, this paper investigates transmission of low-resolution video with low bitrate and up-sampling. Our experimental results prove the superiority of the proposed method in terms of delivered video quality.

A Study on an Improvement of the Performance by Spectrum Analysis with Variable Window in CELP Vocoder (CELP 부호화기에서 가변 윈도우 스펙트럼 분석에 의한 성능 향상에 관한 연구)

  • Min So-Yeon;Kim Eun-Hwan;Bae Myung-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.6 s.38
    • /
    • pp.233-238
    • /
    • 2005
  • In general CELP(Code Excited Linear Prediction) type vocoders provide good speech qualify around 4.8kbps. Among them, G.723.1 developed for Internet Phone and video-conferencing includes two vocoders, 5.3kbps ACELP(Algebraic-CELP) and 6.3kbps MP-MLQ(Multi-Pulse Maximum Likelihood Quantization) In order to improve the speech qualify in CELP vocoder, in this paper. we proposed a new spectrum analysis algorithm with variable window In CELP vocoder, the spectrum of the synthesised speech signal is distorted because the fixed size windows is used for spectrum analysis. So we have measured the spectral leakage and in order to minimize the spectral leakage have adjusted the window size. Applying this method G.723.1 ACELP, we can got SD(Spectral Distortion) reduction 0.084(dB), residual energy reduction 6.3$\%$ and MOS(Mean Opinion Score) improvement 0.1.

  • PDF

Service Quality Management Based on Quality of Experience (체감품질을 고려한 서비스 품질의 관리)

  • Shin, Minsoo;Kim, Dohoon
    • Korean Management Science Review
    • /
    • v.33 no.3
    • /
    • pp.19-30
    • /
    • 2016
  • This study provides a framework to assess network design under the regime of QoE (Quality of Experience). Our approach is expected to reveal the necessity of developing the QoE measures and applying this notion to network design, particularly in the mobile environment. Furthermore, our model shows the ample potential that both users and network providers are able to attain a win-win case by shifting the focus on network design and service operations from QoS (Quality of Service) to QoE. Since the former considers only relevant technological specifications, it may fail in capturing critical factors surrounding users, such as a context where the corresponding user is working on. For example, according to one study [13], the bit-rate, a widely employed QoS measure, shows inferior performance in provisioning network resources to the MOS (Mean Opinion Score), a representative QoE measure. Our framework develops the idea and construct a prototype to systematically assess network design and operations in terms of QoE. The proposed prototype aims at achieving a higher level of efficiency without severely deteriorating users' satisfaction level. We also provide some simulation results which support our idea. That is, reducing the chance of over-provisioning on the basis of the QoE paradigm results in a great flexibility. It may give price cut for users or postponement of network investment for providers or both. Our simulation results also seem robust irrespective of the forms of the QoS-QoE relationship.

Service Quality Criteria for Voice Services over a WiBro Network (와이브로 네트워크를 통한 음성 서비스의 측정 기반 품질 기준 수립)

  • Kim, Beom-Joon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.6 no.6
    • /
    • pp.823-829
    • /
    • 2011
  • This paper covers the service quality of packet-based voice service that is provided over a wireless broadband (WiBro) network. Using a measurement software that has been developed in the course of preparing a advanced service quality management scheme for the packet-based voice service over a wireless network[2][3], a huge scale of experiment is conducted to measure the real quality of the voice service. Based on our analysis of the measurement result, the service quality of the voice service is supposed to be quite good over WiBro networks. In addition, another experiment to investigate the effect of degradation of wireless transmission conditions on the service quality of the voice service shows the values of wireless service metris in which mean opinion score (MOS) starts to decrease.

Matching Pursuit Sinusoidal Modeling with Damping Factor (Damping 요소를 첨가한 매칭 퍼슈잇 정현파 모델링)

  • Jeong, Gyu-Hyeok;Kim, Jong-Hark;Lim, Joung-Woo;Joo, Gi-Ho;Lee, In-Sung
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.1
    • /
    • pp.105-113
    • /
    • 2007
  • In this paper, we propose the matching pursuit with damping factors, a new sinusoidal model improving the matching pursuit, for the codecs based on sinusoidal model. The proposed model defines damping factors by using a correlativity of parameters between the current and adjacent frame, and estimates sinusoidal parameters more accurately in analysis frame by using the matching pursuit according to damping factor, and synthesizes the final signal. Then it is possible to model efficiently without interpolation schemes. The proposed sinusoidal model shows a better speech quality without an additional delay than the conventional sinusoidal model with interpolation methods. Through the SNR(signal to noise ratio), the MOS(Mean Opinion Score), LR(Itakura-Saito likelihood ratio), and CD(cepstral distance), we compare the performance of our model with that of matching pursuit using interpolation methods.

Service Quality Criteria for Voice Services over a HSDPA System (HSDPA 시스템을 통한 음성 서비스의 측정 기반 품질 기준 수립)

  • Kim, Beom-Joon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.7 no.2
    • /
    • pp.249-255
    • /
    • 2012
  • This paper covers the service quality of packet-based voice service that is provided over a high speed downlink packet access (HSDPA) system. Using the measurement software that has been developed in the course of preparing a advanced service quality management scheme for the packet-based voice service over a wireless network[2][3], a huge scale of experiment is conducted to measure the real quality of the voice service. Based on our analysis of the measurement result, the service quality of the voice service is supposed to be quite good over HSDPA system. In addition, another experiment to investigate the effect of degradation of wireless transmission conditions on the service quality of the voice service shows the values of wireless service metrics in which mean opinion score (MOS) starts to decrease.

Speech Synthesis for the Korean large Vocabulary Through the Waveform Analysis in Time Domains and Evauation of Synthesized Speech Quality (시간영역에서의 파형분석에 의한 무제한 어휘 합성 및 음절 유형별 규칙합성음 음질평가)

  • Kang, Chan-Hee;Chin, Yong-Ohk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1
    • /
    • pp.71-83
    • /
    • 1994
  • This paper deals with the improvement of the synthesized speech quality and naturality in the Korean TTS(Text-to-Speech) system. We had extracted the parameters(table2) such as its amplitude, duration and pitch period in a syllable through the analysis of speech waveforms(table1) in the time domain and synthesized syllables using them. To the frequencies of the Korean pronunciation large vocabulary dictionary we had synthesized speeches selected 229 syllables such as V types are 19, CV types are 80. VC types are 30 and CVC types are 100. According to the 4 Korean syllable types from the data format dictionary(table3) we had tested each 15 syllables with the objective MOS(Mean Opinion Score) evaluation method about the 4 items i.e., intelligibility, clearness, loudness, and naturality after selecting random group without the knowledge of them. As the results of experiments the qualities of them are very clear and we can control the prosodic elements such as durations, accents and pitch periods (fig9, 10, 11, 12).

  • PDF

Voice-to-voice conversion using transformer network (Transformer 네트워크를 이용한 음성신호 변환)

  • Kim, June-Woo;Jung, Ho-Young
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.55-63
    • /
    • 2020
  • Voice conversion can be applied to various voice processing applications. It can also play an important role in data augmentation for speech recognition. The conventional method uses the architecture of voice conversion with speech synthesis, with Mel filter bank as the main parameter. Mel filter bank is well-suited for quick computation of neural networks but cannot be converted into a high-quality waveform without the aid of a vocoder. Further, it is not effective in terms of obtaining data for speech recognition. In this paper, we focus on performing voice-to-voice conversion using only the raw spectrum. We propose a deep learning model based on the transformer network, which quickly learns the voice conversion properties using an attention mechanism between source and target spectral components. The experiments were performed on TIDIGITS data, a series of numbers spoken by an English speaker. The conversion voices were evaluated for naturalness and similarity using mean opinion score (MOS) obtained from 30 participants. Our final results yielded 3.52±0.22 for naturalness and 3.89±0.19 for similarity.