• Title/Summary/Keyword: Perceptual evaluation

Search Result 248, Processing Time 0.028 seconds

A Scalable Audio Coder for High-quality Speech and Audio Services

  • Lee, Gil-Ho;Lee, Young-Han;Kim, Hong-Kook;Kim, Do-Young;Lee, Mi-Suk
    • MALSORI
    • /
    • no.61
    • /
    • pp.75-86
    • /
    • 2007
  • In this paper, we propose a scalable audio coder, which has a variable bandwidth from the narrowband speech bandwidth to the audio bandwidth and also has a bit-rate from 8 to 320 kbits/s, in order to cope with the quality of service(QoS) according to the network load. First of all, the proposed scalable coder splits bandwidth of the input audio into narrowband up to around 4 kHz and above. Next, the narrowband signals are compressed by a speech coding method compatible to an existing standard speech coder such as G.729, and the other signals whose bandwidth is above the narrowband are compressed on the basis of a psychoacoustic model. It is shown from the objective quality tests using the signal-to-noise ratio(SNR) and the perceptual evaluation of audio quality(PEAQ) that the proposed scalable audio coder provides a comparable quality to the MPEG-1 Layer III (MP3) audio coder.

  • PDF

Speech enhancement based on reinforcement learning (강화학습 기반의 음성향상기법)

  • Park, Tae-Jun;Chang, Joon-Hyuk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.335-337
    • /
    • 2018
  • 음성향상기법은 음성에 포함된 잡음이나 잔향을 제거하는 기술로써 마이크로폰으로 입력된 음성신호는 잡음이나 잔향에 의해 왜곡되어지므로 음성인식, 음성통신 등의 음성신호처리 기술의 핵심 기술이다. 이전에는 음성신호와 잡음신호 사이의 통계적 정보를 이용하는 통계모델 기반의 음성향상기법이 주로 사용되었으나 통계 모델 기반의 음성향상기술은 정상 잡음 환경과는 달리 비정상 잡음 환경에서 성능이 크게 저하되는 문제점을 가지고 있었다. 최근 머신러닝 기법인 심화신경망 (DNN, deep neural network)이 도입되어 음성 향상 기법에서 우수한 성능을 내고 있다. 심화신경망을 이용한 음성 향상 기법은 다수의 은닉 층과 은닉 노드들을 통하여 잡음이 존재하는 음성 신호와 잡음이 존재하지 않는 깨끗한 음성 신호 사이의 비선형적인 관계를 잘 모델링하였다. 이러한 심화신경망 기반의 음성향상기법을 향상 시킬 수 있는 방법 중 하나인 강화학습을 적용하여 기존 심화신경망 대비 성능을 향상시켰다. 강화학습이란 대표적으로 구글의 알파고에 적용된 기술로써 특정 state에서 최고의 reward를 받기 위해 어떠한 policy를 통한 action을 취해서 다음 state로 나아갈지를 매우 많은 경우에 대해 학습을 통해 최적의 action을 선택할 수 있도록 학습하는 방법을 말한다. 본 논문에서는 composite measure를 기반으로 reward를 설계하여 기존 PESQ (Perceptual Evaluation of Speech Quality) 기반의 reward를 설계한 기술 대비 음성인식 성능을 높였다.

The Correlation between Speech Intelligibility and Acoustic Measurements in Children with Speech Sound Disorders (말소리장애 아동의 말명료도와 음향학적 측정치 간 상관관계)

  • Kang, Eunyeong
    • Journal of The Korean Society of Integrative Medicine
    • /
    • v.6 no.4
    • /
    • pp.191-206
    • /
    • 2018
  • Purpose : This study investigated the correlation between speech intelligibility and acoustic measurements of speech sounds produced by the children with speech sound disorders and children without any diagnosed speech sound disorder. Methods : A total of 60 children with and without speech sound disorders were the subjects of this study. Speech samples were obtained by having the subjects? speak meaningful words. Acoustic measurements were analyzed on a spectrogram using the Multi-speech 3700 program. Speech intelligibility was determined according to a listener's perceptual judgment. Results : Children with speech sound disorders had significantly lower speech intelligibility than those without speech sound disorders. The intensity of the vowel /u/, the duration of the vowel /${\omega}$/, and the second formant of the vowel /${\omega}$/ were significantly different between both groups. There was no difference in voice onset time between the groups. There was a correlation between acoustic measurements and speech intelligibility. Conclusion : The results of this study showed that the speech intelligibility of children with speech sound disorders was affected by intensity, word duration, and formant frequency. It is necessary to complement clinical setting results using acoustic measurements in addition to evaluation of speech intelligibility.

MLSE-Net: Multi-level Semantic Enriched Network for Medical Image Segmentation

  • Di Gai;Heng Luo;Jing He;Pengxiang Su;Zheng Huang;Song Zhang;Zhijun Tu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2458-2482
    • /
    • 2023
  • Medical image segmentation techniques based on convolution neural networks indulge in feature extraction triggering redundancy of parameters and unsatisfactory target localization, which outcomes in less accurate segmentation results to assist doctors in diagnosis. In this paper, we propose a multi-level semantic-rich encoding-decoding network, which consists of a Pooling-Conv-Former (PCFormer) module and a Cbam-Dilated-Transformer (CDT) module. In the PCFormer module, it is used to tackle the issue of parameter explosion in the conservative transformer and to compensate for the feature loss in the down-sampling process. In the CDT module, the Cbam attention module is adopted to highlight the feature regions by blending the intersection of attention mechanisms implicitly, and the Dilated convolution-Concat (DCC) module is designed as a parallel concatenation of multiple atrous convolution blocks to display the expanded perceptual field explicitly. In addition, MultiHead Attention-DwConv-Transformer (MDTransformer) module is utilized to evidently distinguish the target region from the background region. Extensive experiments on medical image segmentation from Glas, SIIM-ACR, ISIC and LGG demonstrated that our proposed network outperforms existing advanced methods in terms of both objective evaluation and subjective visual performance.

Evaluation of Factors Affecting the Use of the Accounting Information System Using the TAM Model: A Field Study in Algerian Firms

  • Widad Benzine;Ahcene Tiar
    • Asia pacific journal of information systems
    • /
    • v.32 no.2
    • /
    • pp.435-459
    • /
    • 2022
  • The accounting literature abounds with many studies concerning the organizational and technical aspects of the AIS to simulate progress in the business environment. However, few studies have focused on the role of individual factors in overcoming resistance to change and maximizing the value of using the system. Therefore, this study aims to shed light on user beliefs by evaluating the factors that affect the use of the AIS using a developed TAM. A total of 132 subjects participated in this study, in which the questionnaire was used as a data collection tool and AMOS was used to test the model. The results showed that subjective norm, training and experience were the most important previous factors that affect the perceptual factors represented in usefulness, ease of use and the inevitability of change, which all had an impact on the continuance intention to use the AIS among users in Algerian firms. This study shed light on the importance of assessing individual factors rather than focusing only on the ways to develop AIS or researching for new technologies and the costs of this investment because this will increase the chances of success in using the system.

Driving State of the stroke patients after Cognitive Perceptual Assessment for Driving evaluation at the National Rehabilitation Center (국립재활원에서 운전인지평가를 받은 뇌졸중 환자의 운전 실태조사)

  • Lee, J.A.;Choi, H.;Lee, S.
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.7 no.2
    • /
    • pp.117-124
    • /
    • 2013
  • Objective : To investigate the actual driving state of the stroke patients who had assessed CPAD. Methods : We conducted a follow-up survey with 48 stroke patients who had assessed CPAD. First, we reviewed the medical chart and then carried out the telephone survey. Results : Of the 48 subjects, 12 were driving and 36 were not driving. Current drivers' CPAD score, it was 54.13, was higher than non-drivers' CPAD score(p<0.05). Those who passed the CPAD were driving more than who failed(OR=8.3, 95%CI=1.931-35.558). Conclusion : The pass group of CPAD have higher chance of driving than fail group and have lower chance of car accidence than fail group. Thus we can apply the CPAD for driving cognitive evaluation tests.

  • PDF

A Study on Development of a Hearing Impairment Simulator considering Frequency Selectivity and Asymmetrical Auditory Filter of the Hearing Impaired (난청인의 주파수 선택도와 비대칭적 청각 필터를 고려한 난청 시뮬레이터 개발에 관한 연구)

  • Joo, Sang-Ick;Kang, Hyun-Deok;Song, Young-Rok;Lee, Sang-Min
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.4
    • /
    • pp.831-840
    • /
    • 2010
  • In this paper, we propose a hearing impairment simulator considering reduced frequency selectivity and asymmetrical auditory filter of the hearing impaired, and we verified the reduced frequency selectivity and asymmetrical auditory filter affected in speech perception through experiments. The reduced frequency selectivity has made embodied by spectral smearing using LPC(linear prediction coding). The shapes of auditory filter are asymmetrical different with each center frequency. Hearing impaired person which has hearing loss was differently changed with that of normal hearing people and it has different value for speech of quality through auditory filter. The experiments confirmed subjective test and objective test. The subjective experiments are composed of 4 kinds of tests: pure tone test, SRT(speech reception threshold) test, and WRS(word recognition score) test without spectral smearing, and WRS test with spectral smearing. The experiment of the hearing impairment simulator was performed from 9 subjects who have normal ears. The amount of spectral smearing was controlled by LPC order. The asymmetrical auditory filter of proposed hearing impairment simulator was simulated and then some tests to estimate the filter's performance objectively were performed. The objective experiment as simulated auditory filter's performance evaluation method used PESQ(perceptual evaluation of speech quality) and LLR(log likelihood ratio) for speech through auditory filter. The processed speech was evaluated objective speech quality and distortion using PESQ and LLR value. When hearing loss processed, PESQ and LLR value have big difference according to asymmetrical auditory filter in hearing impairment simulator.

Noise evaluation method of DC motor according to change of load (부하에 따른 DC모터 소음 평가법)

  • Cha, Su-Ho;Shin, Sung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.2
    • /
    • pp.113-119
    • /
    • 2020
  • Motor noise is a major concern in order to improve perceptual feeling of car interior sound due to increased motor usage in passenger cars. The purpose of this study is to propose factors that can represent the acoustic performance of motor noise according to the change of load. To this end, at first, it is shown that power spectrum and total loudness are not fit for noise performance, and then, PNB, partial loudness related to the brush friction component, and PNR, partial loudness related to the torque ripple component are investigated as factors representing motor noise. The performance curve of motor noise using PNB and PNR is proposed to identify trends of motor noise according to the loads. The curve could be a guide for the noise control, the selection of motor, and the improvement of a system.

Quantitative Evaluation of the Performance of Monaural FDSI Beamforming Algorithm using a KEMAR Mannequin (KEMAR 마네킹을 이용한 단이 보청기용 FDSI 빔포밍 알고리즘의 정량적 평가)

  • Cho, Kyeongwon;Nam, Kyoung Won;Han, Jonghee;Lee, Sangmin;Kim, Dongwook;Hong, Sung Hwa;Jang, Dong Pyo;Kim, In Young
    • Journal of Biomedical Engineering Research
    • /
    • v.34 no.1
    • /
    • pp.24-33
    • /
    • 2013
  • To enhance the speech perception of hearing aid users in noisy environment, most hearing aid devices adopt various beamforming algorithms such as the first-order differential microphone (DM1) and the two-stage directional microphone (DM2) algorithms that maintain sounds from the direction of the interlocutor and reduce the ambient sounds from the other directions. However, these conventional algorithms represent poor directionality ability in low frequency area. Therefore, to enhance the speech perception of hearing aid uses in low frequency range, our group had suggested a fractional delay subtraction and integration (FDSI) algorithm and estimated its theoretical performance using computer simulation in previous article. In this study, we performed a KEMAR test in non-reverberant room that compares the performance of DM1, DM2, broadband beamforming (BBF), and proposed FDSI algorithms using several objective indices such as a signal-to-noise ratio (SNR) improvement, a segmental SNR (seg-SNR) improvement, a perceptual evaluation of speech quality (PESQ), and an Itakura-Saito measure (IS). Experimental results showed that the performance of the FDSI algorithm was -3.26-7.16 dB in SNR improvement, -1.94-5.41 dB in segSNR improvement, 1.49-2.79 in PESQ, and 0.79-3.59 in IS, which demonstrated that the FDSI algorithm showed the highest improvement of SNR and segSNR, and the lowest IS. We believe that the proposed FDSI algorithm has a potential as a beamformer for digital hearing aid devices.

Efficacy of laughing voice treatment (SKMVTT) in benign vocal fold lesions (양성성대질환의 웃음 음성치료(SKMVTT))

  • Jung, Dae-Yong;Wi, Joon-Yeol;Kim, Seong-Tae
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.155-161
    • /
    • 2018
  • The purpose of this study was to evaluate the efficacy of a multiple voice therapy technique ($SKMVTT^{(R)}$) using laughter for the treatment of various benign vocal fold lesions. To achieve this, 23 female patients diagnosed with vocal nodules, vocal polyp, and muscle tension dysphonia through videostroboscopy were enrolled in vocal hygiene and $SKMVTT^{(R)}$. All of the patients were treated once a week for 4 to 12 sessions. The GRBAS scale was used to confirm the changes in voice quality before and after the treatment. Acoustic analysis was performed to evaluate jitter, shimmer, NHR, fundamental frequency variation, amplitude variation, PFR, and dB range. Videostroboscopy was performed to confirm the changes in the laryngeal features before and after the treatment. After the $SKMVTT^{(R)}$, the results of the perceptual evaluation demonstrated that the G, R, and B scales significantly improved. An acoustic evaluation also demonstrated that jitter, shimmer, NHR, vAm, vFo, PFR, and dB range also significantly improved after the $SKMVTT^{(R)}$. In comparison to the videostroboscopic findings, the size of the vocal nodules and vocal polyp decreased or disappeared after the treatment. In addition, the size of the cuneiform tubercles decreased, the length of the aryepiglottic folds became longer, and the laryngeal findings of the supraglottic compressions improved after the $SKMVTT^{(R)}$. These results suggest that the $SKMVTT^{(R)}$ is effective in improving the vocal quality of patients with benign vocal fold lesions. In conclusion, it seems that laughter and inspiratory phonation suppressed abnormal laryngeal elevation and lowered laryngeal height, which seems to have the effect of improving hyperfunctional phonation.