• Title/Summary/Keyword: speech quality

Search Result 807, Processing Time 0.027 seconds

A Novel Approach to a Robust A Priori SNR Estimator in Speech Enhancement (음성 향상에서 강인한 새로운 선행 SNR 추정 기법에 관한 연구)

  • Park, Yun-Sik;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.8
    • /
    • pp.383-388
    • /
    • 2006
  • This Paper presents a novel approach to single channel microphone speech enhancement in noisy environments. Widely used noise reduction techniques based on the spectral subtraction are generally expressed as a spectral gam depending on the signal-to-noise ratio (SNR). The well-known decision-directed(DD) estimator of Ephraim and Malah efficiently reduces musical noise under the background noise conditions, but generates the delay of the a prioiri SNR because the DD weights the speech spectrum component of the Previous frame in the speech signal. Therefore, the noise suppression gain which is affected by the delay of the a priori SNR, which is estimated by the DD matches the previous frame rather than the current one, so after noise suppression. this degrades the noise reduction performance during speech transient periods. We propose a computationally simple but effective speech enhancement technique based on the sigmoid type function for the weight Parameter of the DD. The proposed approach solves the delay problem about the main parameter, the a priori SNR of the DD while maintaining the benefits of the DD. Performances of the proposed enhancement algorithm are evaluated by ITU-T p.862 Perceptual Evaluation of Speech duality (PESQ). the Mean Opinion Score (MOS) and the speech spectrogram under various noise environments and yields better results compared with the fixed weight parameter of the DD.

Effects of oral Health on Health-related Quality of life in Cancer Patients and Cancer Survivors: The 7th Korea National Health and Nutrition Examination Survey (암환자와 암생존자의 구강건강이 건강 관련 삶의 질에 미치는 영향: 제7기 국민건강영양조사)

  • Kyung-Yi Chung
    • Journal of the Health Care and Life Science
    • /
    • v.11 no.2
    • /
    • pp.431-439
    • /
    • 2023
  • This study investigated the effect of oral health on health-related quality of life in cancer patients and cancer survivors using data from the 7th Korea National Health and Nutrition Examination survey. Data analysis was a complex sample analysis using SPSS/WIN 26.0. As a result of conducting complex sample logistic regression analysis to investigate factors affecting the quality of life of cancer patients and cancer survivors, the average quality of life was 1.18 points for cancer patients and 1.16 points for cancer survivors. The quality of life of cancer patients was significantly lowerer when they had discomfort chewing and speech difficulties, and needed dentures. Cancer suvivors was significantly lowerer when they had speech difficulties. Therefore, it is necessary to recognize the importance of oral care for cancer patients and cancer survivors, and to develop and utilize oral health care programs so that continuous and professional access to oral health care is necessary.

Speech Animation Synthesis based on a Korean Co-articulation Model (한국어 동시조음 모델에 기반한 스피치 애니메이션 생성)

  • Jang, Minjung;Jung, Sunjin;Noh, Junyong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.3
    • /
    • pp.49-59
    • /
    • 2020
  • In this paper, we propose a speech animation synthesis specialized in Korean through a rule-based co-articulation model. Speech animation has been widely used in the cultural industry, such as movies, animations, and games that require natural and realistic motion. Because the technique for audio driven speech animation has been mainly developed for English, however, the animation results for domestic content are often visually very unnatural. For example, dubbing of a voice actor is played with no mouth motion at all or with an unsynchronized looping of simple mouth shapes at best. Although there are language-independent speech animation models, which are not specialized in Korean, they are yet to ensure the quality to be utilized in a domestic content production. Therefore, we propose a natural speech animation synthesis method that reflects the linguistic characteristics of Korean driven by an input audio and text. Reflecting the features that vowels mostly determine the mouth shape in Korean, a coarticulation model separating lips and the tongue has been defined to solve the previous problem of lip distortion and occasional missing of some phoneme characteristics. Our model also reflects the differences in prosodic features for improved dynamics in speech animation. Through user studies, we verify that the proposed model can synthesize natural speech animation.

Personalized Speech Classification Scheme for the Smart Speaker Accessibility Improvement of the Speech-Impaired people (언어장애인의 스마트스피커 접근성 향상을 위한 개인화된 음성 분류 기법)

  • SeungKwon Lee;U-Jin Choe;Gwangil Jeon
    • Smart Media Journal
    • /
    • v.11 no.11
    • /
    • pp.17-24
    • /
    • 2022
  • With the spread of smart speakers based on voice recognition technology and deep learning technology, not only non-disabled people, but also the blind or physically handicapped can easily control home appliances such as lights and TVs through voice by linking home network services. This has greatly improved the quality of life. However, in the case of speech-impaired people, it is impossible to use the useful services of the smart speaker because they have inaccurate pronunciation due to articulation or speech disorders. In this paper, we propose a personalized voice classification technique for the speech-impaired to use for some of the functions provided by the smart speaker. The goal of this paper is to increase the recognition rate and accuracy of sentences spoken by speech-impaired people even with a small amount of data and a short learning time so that the service provided by the smart speaker can be actually used. In this paper, data augmentation and one cycle learning rate optimization technique were applied while fine-tuning ResNet18 model. Through an experiment, after recording 10 times for each 30 smart speaker commands, and learning within 3 minutes, the speech classification recognition rate was about 95.2%.

Modification of pitch Algorithm and Its Application to Noise (피치 알고리즘의 수정 및 소음에의 적용)

  • Shin, Sung-Hwan;Ih, Jeong-Guon
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2002.11b
    • /
    • pp.511-516
    • /
    • 2002
  • Pitch is a perception related to the subjective frequency that is one of the psychological aspects or attributes of tones. It is also an important factor to determine the sound quality together with loudness and timber. Although the study on pitch has been active in the field of speech communication, but its application to the product sound quality is not yet enough. In this study, the empirical data by Zwicker is made use in the modification of the currently available pitch extraction model based on the place theory. By applying this modified model to various sound samples composed of tonal or banded components, the applicability of the model is suggested. As a demonstration example, the algorithm is used for the sound quality analysis of a product noise having fundamental frequency and harmonics. The result shows that the pitch should be regarded as an important subjective cue in the sound quality analysis.

  • PDF

Problems Judicial Liability of On-Line Service Providers under the Infringement of Copyright in Internet (인터넷 상에서 저작권침해에 따른 온라인서비스 제공자의 책임문제)

  • 박종삼
    • Journal of Arbitration Studies
    • /
    • v.12 no.1
    • /
    • pp.123-169
    • /
    • 2002
  • The Advent of the global information structure and the do-called digital revolution raise countless new issues and questions. There are no limitations regulating the expressions on the cyberspace due to internet's of quality anonymity\ulcorner diversity\ulcorner spontaneity. Therefore, the freedom of speech is expanded in both areas of time and space, which was impossible with the old communicating system. The rapid development of the internet may not have occurred without techniques of linking and framing, which provide users flexible and easy access to other website. These techniques have enabled internet users to navigate the internet efficiently and sort through the products, services and information available on the internet. Although online technology raises many new legal issues, the law available to help us resolve them, at least today, is largely based on the world as it existed before online commerce became a reality. Thus the challenge is to predict how these new legal issues may be resolved using the current law. Especially, the damage from the above side effects on the cyberspace can be much more serious than in the real world because of promptness, wideness and anonymity. Therefore, regulating and controling the freedom of speech on the cyberspace became needed, and there are two kinds of opinion; one is that the laws in the real world should be applied for the cyberspace and the other is that regulating and controling the freedom of speech on the cyberspace should be performed by the users of cyberspace not by laws because the cyberspace is a free space and must not be interfered. In this study, the current judicial regulation of cyberspace, the side effects of cyberspace and the limitations of the freedom of speech are studied to solve the above problems with speech and the liabilities of on-line service providers are discussed around defamation the distribution of obscene pictures and information, and infringement of copyright.

  • PDF

On a Pitch Alteration Method using Scaling the Harmonics Compensated with the Phase for Speech Synthesis (위상 보상된 고조파 스케일링에 의한 음성합성용 피치변경법)

  • Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.6
    • /
    • pp.91-97
    • /
    • 1994
  • In speech processing, the waveform codings are concerned with simply preserving the waveform of signal through a redundancy reduction process. In the case of speech synthesis, the waveform codings with high quality are mainly used to the synthesis by analysis. Because the parameters of this coding are not classified as both excitation and vocal tract, it is difficult to apply the waveform coding to the synthesis by rule. Thus, in order to apply the waveform coding to synthesis by rule, it is necessary to alter the pitches. In this paper, we proposed a new pitch alteration method that can change the pitch period in waveform coding by dividing the speech signals into the vocal tract and excitation parameters. This method is a time-frequency domain method preserving the phase component of the waveform in time domain and the magnitude component in frequency domain. Thus, it is possible that the waveform coding is carried out the synthesis by rule in speech processing. In case of using the algorithm, we can obtain spectrum distortion with $2.94\%$. That is, the spectrum distortion is decreased more $5.06\%$ than that of the pitch alteration method in time domain.

  • PDF

Real-time Implementation of the G.729 Annex A Using ARM9 $Thumb^{\circledR}$ Processor Core (ARM9 $Thumb^{\circledR}$ 프로세서 코어를 이용한 G.729A의 실시간 구현)

  • 성호상;이동원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.7
    • /
    • pp.63-68
    • /
    • 2001
  • This paper describes the details of ITU-T SGIS G.729A speech coder implementation using ARM9 Thumb/sup R/ processor core and various techniques used in the optimization process. ITU-T G.729 speech coder is the standard of the toll quality 8 kbit/s speech coding. The input to the speech encoder is assumed to be a 16 bits PCM signal at a sampling rate of 8000 samples per second. G.729A is reduced complexity version of the G.729 coder. This version is bit stream interoperable with the full version. The implemented coder requires 34.8 MIPS for the encoder and 8.1 MIPS for the decoder, 36.5 kBytes of program ROM and 6.3 kBytes of data RAM, respectively. The implemented coder is tested against the set of 9 test vectors provided by ITU-T for bit exact implementation.

  • PDF

A Study on 8kbps FBD-MPC Method Considering Low Bit Rate (Low Bit Rate을 고려한 8kbps FBD-MPC 방식에 관한 연구)

  • Lee, See-Woo
    • Journal of Digital Convergence
    • /
    • v.12 no.6
    • /
    • pp.271-276
    • /
    • 2014
  • In a speech coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech quality in case coexist with a voiced and unvoiced consonants in a frame. In this paper, I propose a method of 8kbps Multi-Pulse Speech Coding(FBD-MPC: Frequency Band Division MPC) by using TSIUVC(Transition Segment Including Unvoiced Consonant) searching, extraction and approximation-synthesis method in a frequency domain. I evaluate the 8kbps MPC and FBD-MPC. As a result, SNRseg of FBD-MPC was improved 0.5dB for female voice and 0.2dB for male voice respectively. Compared to the MPC, SNRseg of FBD-MPC has been improved that I was able to control the distortion of the speech waveform finally. And so, I expect to be able to this method for cellular phone and smart phone using excitation source of low bit rate.

Research on Open Source Encoding Technology for MPEG Unified Speech and Audio Coding (MPEG 통합 음성/오디오 코덱을 위한 오픈 소스 부호화 기술에 관한 연구)

  • Song, Jeongook;Lee, Joonil;Kang, Hong-Goo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.1
    • /
    • pp.86-96
    • /
    • 2013
  • Unified Speech and Audio Coding (USAC) is the speech/audio codec with the best quality, approved on Final Draft International Standard (FDIS) at MPEG meeting in 2011. Since MPEG conventionally standardizes only the decoder, it is not easy to study on the encoder technologies. Furthermore, Reference Model(RM) shows extremely poor performance. To solve these problems, the open source project(JAME) proposes the methods to make the improved performance of main encoder technologies in USAC. Especially, this paper introduces the encoder modules: the signal classifier for selective operation between two coders, the psychoacoustic model in frequency domain, and window transition technology. Finally, the results of verification test for FDIS and the performance of Common Encoder are appended.