• Title/Summary/Keyword: Voice quality

Search Result 763, Processing Time 0.026 seconds

Spectral and Cepstral Analyses of Esophageal Speakers (식도발성화자 음성의 spectral & cepstral 분석)

  • Shim, Hee-Jeong;Jang, Hyo-Ryung;Shin, Hee-Baek;Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.47-54
    • /
    • 2014
  • The purpose of this study was to analyze spectral versus cepstral measurements in esophageal speakers. The comparison between the measurements in thirteen male esophageal speakers was compared with the control group of thirteen normal speakers using the sustained vowel /a/. The main results can be summarized as below: (a) the CPP and L/H ratio of the esophageal group were significantly lower than those of the control group (b) the CPP was significantly correlated with the spectral parameters such as jitter, shimmer, NHR and VTI, and (c) the ROC analysis showed that the threshold of 10.25dB for the CPP achieved a good classification for esophageal speakers, with 100% perfect sensitivity and specificity. Thus, it was known that cepstral-based acoustic measures such as CPP, may be more reliable predictors than other spectral-based acoustic measures such as jitter and shimmer. And it was found that cepstral-based acoustic measures were effective in distinguishing esophageal voice quality from normal voice quality. This research will contribute to establishing a baseline related to speech characteristics in voice rehabilitation with laryngectomees.

Transmission of Channel Error Information over Voice Packet (음성 패킷을 이용한 채널의 에러 정보 전달)

  • 박호종;차성호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.394-400
    • /
    • 2002
  • In digital speech communications, the quality of service can be increased by speech coding scheme that is adaptive to the error rate of voice packet transmission. However, current communication protocol in cellular and internet communications does not provide the function that transmits the channel error information. To solute this problem, in this paper, new method for real-time transmission of channel error information is proposed, where channel error information is embedded in voice packet. The proposed method utilizes the pulse positions of codevector in ACELP speech codec, which results in little degradation in speech quality and low false alarm rate. The simulations with various speech data show that the proposed method meets the requirement in speech quality, detection rate, and false alarm rate.

Deep Learning based Singing Voice Synthesis Modeling (딥러닝 기반 가창 음성합성(Singing Voice Synthesis) 모델링)

  • Kim, Minae;Kim, Somin;Park, Jihyun;Heo, Gabin;Choi, Yunjeong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.127-130
    • /
    • 2022
  • This paper is a study on singing voice synthesis modeling using a generator loss function, which analyzes various factors that may occur when applying BEGAN among deep learning algorithms optimized for image generation to Audio domain. and we conduct experiments to derive optimal quality. In this paper, we focused the problem that the L1 loss proposed in the BEGAN-based models degrades the meaning of hyperparameter the gamma(𝛾) which was defined to control the diversity and quality of generated audio samples. In experiments we show that our proposed method and finding the optimal values through tuning, it can contribute to the improvement of the quality of the singing synthesis product.

  • PDF

Speech Intelligibility of Alaryngeal Voices and Pre/Post Operative Evaluation of Voice Quality using the Speech Recognition Program(HUVOIS) (음성인식프로그램을 이용한 무후두 음성의 말 명료도와 병적 음성의 수술 전후 개선도 측정)

  • Kim, Han-Su;Choi, Seong-Hee;Kim, Jae-In;Lee, Jae-Yol;Choi, Hong-Shik
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.15 no.2
    • /
    • pp.92-97
    • /
    • 2004
  • Background and Objectives : The purpose of this study was to examine objectively pre and post operative voice quality evaluation and intelligibility of alaryngeal voice using speech recognition program, HUVOIS. Materials and Methods : 2 laryngologists and 1 speech pathologist were evaluated 'G', 'R', 'B' in the GRBAS sclae and speech intelligibility using NTID rating scale from standard paragraph. And also acoustic estimates such as jitter, shimmer, HNR were obtained from Lx Speech Studio. Results : Speech recognition rate was not significantly different between pre and post operation for pathological vocie samples though voice quality(G, B) and acoustic values(Jitter, HNR) were significantly improved after post operation. In Alaryngeal voices, reed type electrolarynx 'Moksori' was the highest both speech intelligibility and speech recognition rate, whereas esophageal speech was the lowest. Coefficient correlation of speech intelligibility and speech recognition rate was found in alaryngeal voices, but not in pathological voices. Conclusion : Current study was not proved speech recognition program, HUVOIS during telephone program was not objective and efficient method for assisting subjective GRBAS scale.

  • PDF

The Effect of Voice Therapy in Vocal Polyp Patients (성대용종 환자의 음성치료 효과)

  • Kim, Seong-Tae;Jeong, Go-Eun;Kim, Sang-Yoon;Choi, Seung-Ho;Lim, Gil-Chai;Han, Ju-Hee;Nam, Soon-Yuhl
    • Phonetics and Speech Sciences
    • /
    • v.1 no.2
    • /
    • pp.43-49
    • /
    • 2009
  • Vocal polyps are benign phonotraumatic lesions which are traditionally treated using phonomicrosurgical techniques. In the case of hyperfunctional voice use, voice therapy is effective and results in voice improvement. However, the utility of voice therapy about vocal polyp is in great demand. The purpose of this study was to evaluate the effects of voice therapy in patients with vocal polyps. The authors reviewed the medical records of 193 patients with vocal nodules or vocal polyps, and 64 patients (31 nodules and 33 polyps) were enrolled. All of the subjects had received explanation of problems, vocal hygiene education, and been treated by the $SKMVTT^{(R)}$ (Seong-Tae Kim's multiple voice therapy technique) ranging from 4 to 16 sessions (mean: 8.6 sessions). All subjects were examined by perceptual assessment, acoustic and aerodynamic measures, and VRP (voice range profile). In perceptual assessment, patients with vocal nodules had more breathy and strained voices than the vocal polyp group. Both groups significantly reduced rough, breathy voice after voice therapy. Patients with vocal polyps had worse voice quality than patients with nodules in acoustic measures. Both groups showed reduced jitter and shimmer after voice therapy. In aerodynamic measures, MPT and Psub were increased, and MFR was reduced (p<.05). Participants' frequency range and intensity range were increased after voice therapy, but only frequency range resulted in a significant difference (p<.05). In conclusion, the therapeutic effect of voice therapy in patients with vocal nodules and polyps was demonstrated perceptually and acoustically. We can suggest that voice therapy, including advice, vocal hygiene, and $SKMVTT^{(R)}$ is a useful as an initial choice of treatment for patients with vocal polyps before considering a surgical approach.

  • PDF

Development of Cannula-typed Silicone Voice Prosthesis(So-Mang$\circledR$) (Cannula-typed Silicone Voice Prosthesis(소망$\circledR$)의 개발)

  • 최홍식;정은주;전희선;문인석;김영호;김광문
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.12 no.2
    • /
    • pp.152-157
    • /
    • 2001
  • Background : Electrolarynx, Esophageal voice, and Silicone voice prosthesis with tracheoesophageal(T-E) fistula have been used as vocal rehabilitating methods for the post-laryngectomized patients. Prosthetic rehabilitation of voice after total laryngectomy has gained wide acceptance and has become a common practice in many clinics since the pioneering works of Singer and Blom In 1979. Since the introduction of tracheo-esophageal puncture and application of Blom Singer$\circledR$ voice prosthesis in 1980, several reliable voice prostheses have been developed and are successfully being used. Objectives : Even though quality of voice produced by Silicone voice prosthesis with T-E fistula is superior to other modalities, it still has some disadvantages. We devised a new cannulatyped silicone voice prosthesis. Methods : 1) Devising a new prototype of cannula-typed silicone voice prosthesis. 2) Application of the prototype using canine animal model(laryngectormized dog) and fitting trial on human patient whose previously inserted Silicone voice prosthesis is not functioning due to presumed fungal infection. Discussion : Final form of prototype was made after several times of major and minor modifications. Insertion of the newly developed Cannula-typed Silicone voice prosthesis on canine animal model and human trial were done without any difficulty. There were no serious leakage of saliva or food during swallowing. Conclusion : The newly developed Cannula-typed Silicone voice prosthesis(So-Mang$\circledR$) and the modified replacement method will further improve the results of post-laryngectomized prosthetic voice rehabilitation. Long-term animal study and human trial are planned in the near future.

  • PDF

Preliminary Study for Comparison of Subjective Voice Evaluations among Vocal and Applied Music Major Students (성악과 실용음악 보컬 전공 대학생들의 주관적 음성평가 비교 예비연구)

  • Lee, Dahye;Hwang, Youngjin;Kim, Jaeock
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.37-45
    • /
    • 2014
  • The purpose of this study was to determine whether the Korean Singing Voice Handicap Index (K-SVHI) was suitable for singers in other genres than vocal music to assess their vocal problems subjectively. Twenty six college students majoring in vocal music and twenty six students majoring in applied music were included in the study. They were divided into G0 and G1 in voice quality using the GRBAS scale during the tasks of singing. K-SVHI was divided into three sub-areas (Physical, Functional, and Emotional). In the singing task, both groups showed no significant difference between K-SVHI scores by G scale. In the reading task, the vocal music group had significantly higher K-SVHI in G0 than in G1 in K-SVHIs by G scale, while the applied vocal music group had significantly higher K-SVHI in G1 than in G0. Also, the two groups were not significantly different in G0, G1 in the singing task while the vocal music group showed higher K-SVHI than the applied vocal music group in G0 in the reading task. In addition, the vocal music group had higher K-SVHI than the applied vocal music group in G1 in both tasks. As comparing by groups in three sub-areas of K-SVHI, significant differences were found in the Emotional and Functional area. Those results showed that singers felt their voice problems differently by musical genres, which means that K-SVHI may not be a proper tool for evaluating voice handicap of singers in diverse voice music genres.

A Study about Voice of Patients with Chronic Obstructive Pulmonary Disease/Asthma before & after ${\beta}_2$-agonist (${\beta}_2$-촉진제 사용전후에 따른 만성폐쇄성폐질환/천식 환자의 음성 연구)

  • Kang, Young-Ae;Kim, Se-Hun;Jong, Seong-Su;Lee, Tae-Yong;Seong, Cheol-Jae
    • Phonetics and Speech Sciences
    • /
    • v.2 no.2
    • /
    • pp.101-108
    • /
    • 2010
  • An inhaled salbutamol and salmeterol for chronic obstructive pulmonary disease(COPD) and asthma have been used worldwidely. But there has been few study about the voice change evoked from the post-medicine effect. To evaluate the voice influenced of short-acting and long-acting ${\beta}_2$-agonists, two experiments were carried out: one was salbutamol experiment 1 with eight patients, the other was salmeterol experiment 2 with six patients. Experiment 1 was made of two stages: premedication & postmedication. Experiment 2 was four stages: stageI was premedication, stageII was postmedication & pregaggling, stageIII was postmedication & postgaggling(100 ml with water), and stageIV was postmedication & 30 minutes later. Measured parameters were F0, F0_SD, Jitter_rap, Shimmer_apq11, HNR, BW(1, 2, 3), Intensity, and H1-H2. The mean data collected from 3 repetitions each was statistically analyzed by Wilcoxon signed rank test for experiment 1 and repeated measures ANOVA for experiment 2. In experiment 1, significant differences were found in the Jitter_rap(Z= -2.10, p=0.036). The findings indicated that the postmedicated voice was worse than premedicated voice. In experiment 2, there wasn't significant difference, but values of parameters related to voice quality(Jitter_rap, Shimmer_apq11, HNR, and H1-H2) showed changes toward stageⅣ, that is, the voice quality was worse under medication.

  • PDF

Automatic severity classification of dysarthria using voice quality, prosody, and pronunciation features (음질, 운율, 발음 특징을 이용한 마비말장애 중증도 자동 분류)

  • Yeo, Eun Jung;Kim, Sunhee;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.57-66
    • /
    • 2021
  • This study focuses on the issue of automatic severity classification of dysarthric speakers based on speech intelligibility. Speech intelligibility is a complex measure that is affected by the features of multiple speech dimensions. However, most previous studies are restricted to using features from a single speech dimension. To effectively capture the characteristics of the speech disorder, we extracted features of multiple speech dimensions: voice quality, prosody, and pronunciation. Voice quality consists of jitter, shimmer, Harmonic to Noise Ratio (HNR), number of voice breaks, and degree of voice breaks. Prosody includes speech rate (total duration, speech duration, speaking rate, articulation rate), pitch (F0 mean/std/min/max/med/25quartile/75 quartile), and rhythm (%V, deltas, Varcos, rPVIs, nPVIs). Pronunciation contains Percentage of Correct Phonemes (Percentage of Correct Consonants/Vowels/Total phonemes) and degree of vowel distortion (Vowel Space Area, Formant Centralized Ratio, Vowel Articulatory Index, F2-Ratio). Experiments were conducted using various feature combinations. The experimental results indicate that using features from all three speech dimensions gives the best result, with a 80.15 F1-score, compared to using features from just one or two speech dimensions. The result implies voice quality, prosody, and pronunciation features should all be considered in automatic severity classification of dysarthria.

Analysis of Correlation between Sleep Interval Length and Jitter Buffer Size for QoS of IPTV and VoIP Audio Service over Mobile WiMax (Mobile WiMAX에서 IPTV 및 VoIP 음성서비스 품질을 고려한 수면구간 길이와 지터버퍼 크기의 상관관계 분석)

  • Kim, Hyung-Suk;Kim, Tae-Hyoun;Hwang, Ho-Young
    • The KIPS Transactions:PartC
    • /
    • v.17C no.3
    • /
    • pp.299-306
    • /
    • 2010
  • IPTV and VoIP services are considered as killer applications over Mobile WiMAX network, which provideshigh mobility and data rate. Among those which affect the quality of voice in those services, the jitter buffer or playout buffer can compensate the poor voice quality caused by the packet drop due to frequent route change or differences among routes between service endpoints. In this paper, we analyze the correlation between the sleep interval length and jitter buffer size in order to guarantee a predefined level of voice quality. For this purpose, we present an end-to-end delay model considering additional delay incurred by the WiMAX PSC-II sleep mode and a VoIP service quality requirement based on the delay constraints. Through extensive simulation experiments, we also show that the increase of jitter buffer size may degrade the voice quality since it can introduce additional packet drop in the jitter buffer under WiMAX power saving mode.