• Title/Summary/Keyword: Mean Opinion Score (MOS)

Search Result 94, Processing Time 0.024 seconds

The analysis of the impact of the wireless channel quality on the quality of experience (QoE) through statistical analysis (통계적 분석을 통한 무선 채널 품질이 사용자 체감 품질에 미치는 영향 분석)

  • Kim, Beom-Joon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.4
    • /
    • pp.491-498
    • /
    • 2014
  • As internet services are being provided through a wireless access, the importance of quality of experience (QoE) is stressed that is defined as the quality that indicates user's actual feeling when a service is provided. Unlike quality of service (QoS) that can be expressed as a numerical value, it is difficult to represent QoE in an objective way. If an internet service is serviced over a wireless channel, its QoE can be affected by a number of factors such as fading, mobility and so on. This paper, therefore, attempts to specify the relationship between QoE and QoS by conducting practical measurements for the voice service through 3G high speed packet access (HSPA) access network. Analysing the measured results, it has been shown that received signal strength indicator (RSSI) has a great influence on mean opinion score (MOS) through transmission delay.

Transmission Performance of Lattice Structure Ad-Hoc Network under Intrusions (침해가 있는 격자구조 애드-혹 네트워크의 전송성능)

  • Kim, Young-Dong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.7
    • /
    • pp.767-772
    • /
    • 2014
  • As temporary network, ad-hoc network has been effected by structures and implemented environments of networks. In this paper, transmission performance of lattice structure ad-hoc network, which is expected to use in sensor network and IoT(Internet of Things), is analyzed in point of intrusions and countermeasure for intrusion is suggested. In this paper, computer simulation based on NS-2 is used for performance analysis, VoIP(Voice over Internet Protocol) as a widely used service is chosen for performance measure. MOS(Mean Opinion Score) and call connection rate is used as performance parameter. As results of performance analysis, it is shown that for MOS, random network is better then lattice network at intrusion environments, but for call connection rate, lattice network is better then random network.

Resource Allocation for Guaranteeing QoE in Mobile Communication Networks

  • Lee, Moon-Ho;Lee, Jong-Chan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.2
    • /
    • pp.45-50
    • /
    • 2017
  • This paper proposes a novel resource allocation scheme which allows to guarantee the user-perceived service quality for various high-quality mobile multimedia service such as interactive game, tactile internet service, remote emergency medical service or remote disaster handling robot control to a certain level in the mobile networks. In our proposed scheme, Mean Opinion Score(MOS), which represents the degree of user satisfaction for perceived quality, is determined based on the delay limit allowable to each service. Moreover resources are allocated in consideration of this MOS. Simulation results show that our proposed scheme can decrease the outage probability in comparison with existing schemes Moreover it can increase the total throughput as well.

The implementation of database for high quality Embedded Text-to-speech system (고품질 내장형 음성합성 시스템을 위한 음성합성 DB구현)

  • Kwon, Oh-Il
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.4 s.304
    • /
    • pp.103-110
    • /
    • 2005
  • Speech Database is one of the most important part of Text-to-speech(TTS) system Especially, the embedded TTS system needs more small size of database than that of the server TTS system So, the compression and statistical reduction or database is a very important factor in the embedded TTS system But this compression and statistical reduction of database always rise a loss of quality of the synthesised speech. In this paper, we propose a method of constructing database for high quality embedded TTS system and verify the quality of synthesised speech with MOS(Mean Opinion Score) test.

BS-PLC(Both Side-Packet Loss Concealment) for CELP Coder (CELP 부호화기를 위한 양방향 패킷 손실 은닉 알고리즘)

  • Lee In-Sung;Hwang Jeong-Joon;Jeong Gyu-Hyeok
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.42 no.12
    • /
    • pp.127-134
    • /
    • 2005
  • Lost packet robustness is an most important quality measure for voice over IP networks(VoIP). Recovery of the lost packet from the received information is crucial to realize this robustness. So, this paper proposes the lost packet recovery method from the received information for real-time communication for CELP coder. The proposed BS-PLC (Both Side Packet Loss Concealment) based WSOLA(Waveform Shift OverLab Add) allow the lost packet to be recovered from both the 'previous' and 'next' good packet as the LP parameter and the excitation signal are respectively recovered. The burst of packet loss is modeled by Gilbert model. The proposed scheme is applied to G.729 most used in VoIP and is evaluated through the SNR(signal to noise) and the MOS(Mean Opinion Score) test. As a simulation result, The proposed scheme provide 0.3 higher in Mean Opinion Score and 2 dB higher in terms of SNR than an error concealment procedure in the decoder of G.729 at $20\%$ average packet loss rate.

2.4kbps Speech Coding Algorithm Using the Sinusoidal Model (정현파 모델을 이용한 2.4kbps 음성부호화 알고리즘)

  • 백성기;배건성
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.123-126
    • /
    • 2000
  • STC(Sinusoidal Transform Coding) 방식은 음성신호의 주파수 영역에서 스펙트럼 피크치들을 정현파로 모델링하여 합성하는 방식을 말한다. 저전송률 STC 방식에서는 전송되는 정보량을 줄이기 위해 스펙트럼 피크를 대신해 음성신호의 스펙트럼 포락선 정보와, 피치정보를 이용하여 얻어지는 고조파 성분들을 정현파로 모델링하여 음성을 합성한다. 본 논문에서는 음성신호의 정현파 모델에 기반하여 2.4kbps 전송속도를 갖는 음성부호화 알고리즘을 제안하였으며, 실험결과로 합성음의 파형과 스펙트럼 특성, 위상특성, 그리고 MOS(Mean Opinion Score) 테스트를 이용한 합성음의 음질을 비교/분석 하였다.

  • PDF

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

  • Sohee Han;Jisub Um;Hoirin Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.

2.4kbps Speech Coding Algorithm Using the Sinusoidal Model (정현파 모델을 이용한 2.4kbps 음성부호화 알고리즘)

  • 백성기;배건성
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.3A
    • /
    • pp.196-204
    • /
    • 2002
  • The Sinusoidal Transform Coding(STC) is a vocoding scheme based on a sinusoidal model of a speech signal. The low bit-rate speech coding based on sinusoidal model is a method that models and synthesizes speech with fundamental frequency and its harmonic elements, spectral envelope and phase in the frequency region. In this paper, we propose the 2.4kbps low-rate speech coding algorithm using the sinusoidal model of a speech signal. In the proposed coder, the pitch frequency is estimated by choosing the frequency that makes least mean squared error between synthetic speech with all spectrum peaks and speech synthesized with chosen frequency and its harmonics. The spectral envelope is estimated using SEEVOC(Spectral Envelope Estimation VOCoder) algorithm and the discrete all-pole model. The phase information is obtained using the time of pitch pulse occurrence, i.e., the onset time, as well as the phase of the vocal tract system. Experimental results show that the synthetic speech preserves both the formant and phase information of the original speech very well. The performance of the coder has been evaluated in terms of the MOS test based on informal listening tests, and it achieved over the MOS score of 3.1.

Proposed Assessment for Quality of Experience of Live IPTV in Home Environments

  • Jeong, Jongpil;Choi, Jae-Young
    • International journal of advanced smart convergence
    • /
    • v.4 no.1
    • /
    • pp.18-30
    • /
    • 2015
  • As the speed of networks that subscribers can use has greatly increased, demand for high-quality broadcast content, such as from Internet Protocol Television (IPTV) and Video on Demand (VoD), is likewise increasing. Therefore, while broadcasters are increasing content and channels, they are striving to improve consumer quality of experience (QoE) to differentiate themselves from competitors, including by producing higher physical-quality content. Recently, subjective measurement methods have been internationally standardized as the most reliable approach for measuring and evaluating IPTV QoE. However, a majority of these methods are performed in experimental environments and are based on the extremely brief viewing period of approximately ten seconds using original reference videos. It is actually difficult to apply standard evaluation methods based on a ten-second viewing interval to assess real broadcast watching of IPTV or other services that involve a longer time (i.e., more than thirty minutes). In this paper, we therefore propose a method that accommodates actual viewing environments. Using the mean opinion score, we experimentally analyze the effects of evaluation interval changes under actual conditions in which IPTV service is provided. In addition, we propose improvements by applying the results into actual live broadcast IPTV service and by analyzing consumer service QoE.

An end-to-end synthesis method for Korean text-to-speech systems (한국어 text-to-speech(TTS) 시스템을 위한 엔드투엔드 합성 방식 연구)

  • Choi, Yeunju;Jung, Youngmoon;Kim, Younggwan;Suh, Youngjoo;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.39-48
    • /
    • 2018
  • A typical statistical parametric speech synthesis (text-to-speech, TTS) system consists of separate modules, such as a text analysis module, an acoustic modeling module, and a speech synthesis module. This causes two problems: 1) expert knowledge of each module is required, and 2) errors generated in each module accumulate passing through each module. An end-to-end TTS system could avoid such problems by synthesizing voice signals directly from an input string. In this study, we implemented an end-to-end Korean TTS system using Google's Tacotron, which is an end-to-end TTS system based on a sequence-to-sequence model with attention mechanism. We used 4392 utterances spoken by a Korean female speaker, an amount that corresponds to 37% of the dataset Google used for training Tacotron. Our system obtained mean opinion score (MOS) 2.98 and degradation mean opinion score (DMOS) 3.25. We will discuss the factors which affected training of the system. Experiments demonstrate that the post-processing network needs to be designed considering output language and input characters and that according to the amount of training data, the maximum value of n for n-grams modeled by the encoder should be small enough.