• Title/Summary/Keyword: voice frequency

Search Result 544, Processing Time 0.035 seconds

The Effect of a Portable Voice Feedback Device on the Hyperfunctional Voice Behaviors of Children with Vocal Nodules (휴대용 음성 피드백 도구의 사용이 과기능적 음성 행동의 발생 빈도에 미치는 영향)

  • Lee, Moo-Kyung
    • Phonetics and Speech Sciences
    • /
    • v.1 no.2
    • /
    • pp.31-36
    • /
    • 2009
  • This study attempted to examine the effects of a portable voice feedback device on the hyperfunctional voice behaviors of children with vocal nodules when they wore the device in their daily lives. The device could set fundamental frequency and intensity at optimal levels for the subjects, It produces an audible alarm for inappropriate hyperfunctional voices beyond the preset levels, In addition, the frequency of hyperfunctional voice behaviors was recorded by the device, therefore the users were able to chart their number of hyperfunctional voice behaviors per day, According to results acquired after having subjects wear the device for 12 weeks, the subjects' frequency of hyperfunctional voice behaviors decreased significantly (p < .01). Especially from the first to fourth week, the frequency of their hyperfunctional voice behaviors declined significantly.

  • PDF

Voice transformation for HTS using correlation between fundamental frequency and vocal tract length (기본주파수와 성도길이의 상관관계를 이용한 HTS 음성합성기에서의 목소리 변환)

  • Yoo, Hyogeun;Kim, Younggwan;Suh, Youngjoo;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.41-47
    • /
    • 2017
  • The main advantage of the statistical parametric speech synthesis is its flexibility in changing voice characteristics. A personalized text-to-speech(TTS) system can be implemented by combining a speech synthesis system and a voice transformation system, and it is widely used in many application areas. It is known that the fundamental frequency and the spectral envelope of speech signal can be independently modified to convert the voice characteristics. Also it is important to maintain naturalness of the transformed speech. In this paper, a speech synthesis system based on Hidden Markov Model(HMM-based speech synthesis, HTS) using the STRAIGHT vocoder is constructed and voice transformation is conducted by modifying the fundamental frequency and spectral envelope. The fundamental frequency is transformed in a scaling method, and the spectral envelope is transformed through frequency warping method to control the speaker's vocal tract length. In particular, this study proposes a voice transformation method using the correlation between fundamental frequency and vocal tract length. Subjective evaluations were conducted to assess preference and mean opinion scores(MOS) for naturalness of synthetic speech. Experimental results showed that the proposed voice transformation method achieved higher preference than baseline systems while maintaining the naturalness of the speech quality.

Voice Boosting Filter Design in Frequency Domain for Relief of Husky Voice (쉰목소리 완화를 위한 주파수 영역 음성 강조 필터 설계)

  • Kim, Hyuntae;Lee, Sanghyeop
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.12
    • /
    • pp.1919-1926
    • /
    • 2016
  • The people who complain of pain due to voice causes such as vocal cord nodules is increasing year by year. If the voice is changed, it is possible to give to colleagues discomfort or inconvenience during conversation. In this paper, we propose a way to reduce discomfort by improving the husky voice during the conversation. A VBF (voice boosting filter) is firstly designed to improve the husky voices. This filter may further emphasize the formant frequency components than the frequency components around the formant frequency, because the value is relatively greater than the other frequency. And a fixed-point type DSP chipset, TMS320F2812 is applied to the system, the operating frequency is 150MHz. The system was implemented as a compact for use as a portable, its size is $2.5cm{\times}10cm$. Through the test using three husky voices with some type of statement, it was satisfactory in processing speed and sound quality improvement.

A Study on the Acoustic Characteristics of Sexy Voice (섹시한 음성의 음향학적 특징 연구)

  • Jeong Ok-Ran;Jo Sung-Mi
    • MALSORI
    • /
    • no.57
    • /
    • pp.73-84
    • /
    • 2006
  • The purpose of this study was to explore the acoustic characteristics of sexy voice. In this study, we measured acoustic parameters (fundamental frequency, jitter, shimmer, and nasalance) of a sustained vowel sound produced by 40 actors (20 males and 20 females) and 40 non-actors (20 males and 20 females). Digital audio recordings were made in the sustained vowel |a| for acoustic analyses using Praat (version 4.1.9) and Nasal View (version 4.5). Twenty voice pathologists participated in the listening experiment and judged the degree of sexiness on a 7-point scale. The results showed that fundamental frequency, shimmer and nasalance had significant differences between actors and non-actors. The acoustic parameters of sexy voice matched perceptual aspects of a previous study: Low fundamental frequency-low pitch and high shimmer-husky voice. On the other hand, the nasalance score did not match that of the previous study: Decreased nasalance had a higher score on sexiness scale judged by the listeners. It would be desirable to study the voice quality by analyzing and controlling more acoustic and auditory parameters for practical applications in the future.

  • PDF

Bilingual Voice Conversion Using Frequency Warping on Formant Space (포만트 공간에서의 주파수 변환을 이용한 이중 언어 음성 변환 연구)

  • Chae, Yi-Geun;Yun, Young-Sun;Jung, Jin Man;Eun, Seongbae
    • Phonetics and Speech Sciences
    • /
    • v.6 no.4
    • /
    • pp.133-139
    • /
    • 2014
  • This paper describes several approaches to transform a speaker's individuality to another's individuality using frequency warping between bilingual formant frequencies on different language environments. The proposed methods are simple and intuitive voice conversion algorithms that do not use training data between different languages. The approaches find the warping function from source speaker's frequency to target speaker's frequency on formant space. The formant space comprises four representative monophthongs for each language. The warping functions can be represented by piecewise linear equations, inverse matrix. The used features are pure frequency components including magnitudes, phases, and line spectral frequencies (LSF). The experiments show that the LSF-based voice conversion methods give better performance than other methods.

Characteristics of Cow´s Voices in Time and Frequency domains for Recognition

  • Ikeda, Yoshio;Ishii, Y.
    • Agricultural and Biosystems Engineering
    • /
    • v.2 no.1
    • /
    • pp.15-23
    • /
    • 2001
  • On the assumption that the voices of the cows are produced by the linear prediction filter, we characterized the cows’voices. The order of this filter was determined by examining the voice characteristics both in time and frequency domains. The proposed order of the linear prediction filter is 15 for modeling voice production of the cow. The characteristics of the amplitude envelope of the voice signal was investigated by analyzing the sequence of the short time variance both in time and frequency domains, and the new parameters were defined. One of the coefficients o the linear prediction filter generating the voice signal, the fundamental frequency, the slope of the straight line regressed from the log-log spectra of the short time variance and the coefficients of the linear prediction filter generating the sequence of the short time variance of the voice signal can differentiate the two cows.

  • PDF

Voice Frequency Synthesis using VAW-GAN based Amplitude Scaling for Emotion Transformation

  • Kwon, Hye-Jeong;Kim, Min-Jeong;Baek, Ji-Won;Chung, Kyungyong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.2
    • /
    • pp.713-725
    • /
    • 2022
  • Mostly, artificial intelligence does not show any definite change in emotions. For this reason, it is hard to demonstrate empathy in communication with humans. If frequency modification is applied to neutral emotions, or if a different emotional frequency is added to them, it is possible to develop artificial intelligence with emotions. This study proposes the emotion conversion using the Generative Adversarial Network (GAN) based voice frequency synthesis. The proposed method extracts a frequency from speech data of twenty-four actors and actresses. In other words, it extracts voice features of their different emotions, preserves linguistic features, and converts emotions only. After that, it generates a frequency in variational auto-encoding Wasserstein generative adversarial network (VAW-GAN) in order to make prosody and preserve linguistic information. That makes it possible to learn speech features in parallel. Finally, it corrects a frequency by employing Amplitude Scaling. With the use of the spectral conversion of logarithmic scale, it is converted into a frequency in consideration of human hearing features. Accordingly, the proposed technique provides the emotion conversion of speeches in order to express emotions in line with artificially generated voices or speeches.

An Aerodynamic and Acoustic Analysis of the Breathy Voice of Thyroidectomy Patients (갑상선 수술 후 성대마비 환자의 기식 음성에 대한 공기역학적 및 음향적 분석)

  • Kang, Young-Ae;Yoon, Kyu-Chul;Kim, Jae-Ock
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.95-104
    • /
    • 2012
  • Thyroidectomy patients may have vocal paralysis or paresis, resulting in a breathy voice. The aim of this study was to investigate the aerodynamic and acoustic characteristics of a breathy voice in thyroidectomy patients. Thirty-five subjects who have vocal paralysis after thyroidectomy participated in this study. According to perceptual judgements by three speech pathologists and one phonetic scholar, subjects were divided into two groups: breathy voice group (n = 21) and non-breathy voice group (n = 14). Aerodynamic analysis was conducted by three tasks (Voicing Efficiency, Maximum Sustained Phonation, Vital Capacity) and acoustic analysis was measured during Maximum Sustained Phonation task. The breathy voice group had significantly higher subglottal pressure and more pathological voice characteristics than the non breathy voice group. Showing 94.1% classification accuracy in result logistic regression of aerodynamic analysis, the predictor parameters for breathiness were maximum sound pressure level, sound pressure level range, phonation time of Maximum Sustained Phonation task and Pitch range, peak air pressure, and mean peak air pressure of Voicing Efficiency task. Classification accuracy of acoustic logistic regression was 88.6%, and five frequency perturbation parameters were shown as predictors. Vocal paralysis creates air turbulence at the glottis. It fluctuates frequency-related parameters and increases aspiration in high frequency areas. These changes determine perceptual breathiness.

Voice Recognition Based on Adaptive MFCC and Neural Network (적응 MFCC와 Neural Network 기반의 음성인식법)

  • Bae, Hyun-Soo;Lee, Suk-Gyu
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.5 no.2
    • /
    • pp.57-66
    • /
    • 2010
  • In this paper, we propose an enhanced voice recognition algorithm using adaptive MFCC(Mel Frequency Cepstral Coefficients) and neural network. Though it is very important to extract voice data from the raw data to enhance the voice recognition ratio, conventional algorithms are subject to deteriorating voice data when they eliminate noise within special frequency band. Differently from the conventional MFCC, the proposed algorithm imposed bigger weights to some specified frequency regions and unoverlapped filterbank to enhance the recognition ratio without deteriorating voice data. In simulation results, the proposed algorithm shows better performance comparing with MFCC since it is robust to variation of the environment.

Acoustic and Stroboscopic Characteristics in Teachers, Clergies and Telephone Operators (교사, 목사 및 교환수들의 음성발성에 대한 음향분석학적 특징)

  • 진성민;박상욱;이정우;이경철;이용배
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.9 no.1
    • /
    • pp.53-58
    • /
    • 1998
  • Objectives : To compare the voice quality and voice problems of untrained professional voice user groups with that of normal control group without voice problem. Materials and Methods : The sustained vowel sounds of 13 male and 36 female teachers, 46 clergies and 15 telephone operators, and 40 normal male and 20 normal female persons were analyzed, using a videostroboscopy and acoustic analyzer. Together with these analyses, a questionnaire associated with risk factors for current and past voice problems was handed over to the patients. Results : The most common symptom in subjective groups was the voice fatigue. In stroboscopic examination, the professional voice user groups shelved functional voice disorder findings regardless of the Intensity of voice use. In the clergy and teacher using loud voice, vocal polyp, vocal nodule and hyperfunction of laryngeal muscle were frequently observed. In the clergy and telephone operator, jitter and shimmer were significantly increased. In the female teacher, the value of jitter, fundamental frequency variation and fundamental frequency were statiscally significant. However, the voice of male teacher showed no significant findings in the acoustic and aerodynamic studies. Conclusion : In the management of voice problems for untrained professional voice user groups, it is important to find the exact causes and patterns of voice problems, and to be individualized the management according to the causes.

  • PDF