• 제목/요약/키워드: voice frequency

검색결과 545건 처리시간 0.025초

휴대용 음성 피드백 도구의 사용이 과기능적 음성 행동의 발생 빈도에 미치는 영향 (The Effect of a Portable Voice Feedback Device on the Hyperfunctional Voice Behaviors of Children with Vocal Nodules)

  • 이무경
    • 말소리와 음성과학
    • /
    • 제1권2호
    • /
    • pp.31-36
    • /
    • 2009
  • This study attempted to examine the effects of a portable voice feedback device on the hyperfunctional voice behaviors of children with vocal nodules when they wore the device in their daily lives. The device could set fundamental frequency and intensity at optimal levels for the subjects, It produces an audible alarm for inappropriate hyperfunctional voices beyond the preset levels, In addition, the frequency of hyperfunctional voice behaviors was recorded by the device, therefore the users were able to chart their number of hyperfunctional voice behaviors per day, According to results acquired after having subjects wear the device for 12 weeks, the subjects' frequency of hyperfunctional voice behaviors decreased significantly (p < .01). Especially from the first to fourth week, the frequency of their hyperfunctional voice behaviors declined significantly.

  • PDF

기본주파수와 성도길이의 상관관계를 이용한 HTS 음성합성기에서의 목소리 변환 (Voice transformation for HTS using correlation between fundamental frequency and vocal tract length)

  • 유효근;김영관;서영주;김회린
    • 말소리와 음성과학
    • /
    • 제9권1호
    • /
    • pp.41-47
    • /
    • 2017
  • The main advantage of the statistical parametric speech synthesis is its flexibility in changing voice characteristics. A personalized text-to-speech(TTS) system can be implemented by combining a speech synthesis system and a voice transformation system, and it is widely used in many application areas. It is known that the fundamental frequency and the spectral envelope of speech signal can be independently modified to convert the voice characteristics. Also it is important to maintain naturalness of the transformed speech. In this paper, a speech synthesis system based on Hidden Markov Model(HMM-based speech synthesis, HTS) using the STRAIGHT vocoder is constructed and voice transformation is conducted by modifying the fundamental frequency and spectral envelope. The fundamental frequency is transformed in a scaling method, and the spectral envelope is transformed through frequency warping method to control the speaker's vocal tract length. In particular, this study proposes a voice transformation method using the correlation between fundamental frequency and vocal tract length. Subjective evaluations were conducted to assess preference and mean opinion scores(MOS) for naturalness of synthetic speech. Experimental results showed that the proposed voice transformation method achieved higher preference than baseline systems while maintaining the naturalness of the speech quality.

쉰목소리 완화를 위한 주파수 영역 음성 강조 필터 설계 (Voice Boosting Filter Design in Frequency Domain for Relief of Husky Voice)

  • 김현태;이상협
    • 한국멀티미디어학회논문지
    • /
    • 제19권12호
    • /
    • pp.1919-1926
    • /
    • 2016
  • The people who complain of pain due to voice causes such as vocal cord nodules is increasing year by year. If the voice is changed, it is possible to give to colleagues discomfort or inconvenience during conversation. In this paper, we propose a way to reduce discomfort by improving the husky voice during the conversation. A VBF (voice boosting filter) is firstly designed to improve the husky voices. This filter may further emphasize the formant frequency components than the frequency components around the formant frequency, because the value is relatively greater than the other frequency. And a fixed-point type DSP chipset, TMS320F2812 is applied to the system, the operating frequency is 150MHz. The system was implemented as a compact for use as a portable, its size is $2.5cm{\times}10cm$. Through the test using three husky voices with some type of statement, it was satisfactory in processing speed and sound quality improvement.

섹시한 음성의 음향학적 특징 연구 (A Study on the Acoustic Characteristics of Sexy Voice)

  • 정옥란;조성미
    • 대한음성학회지:말소리
    • /
    • 제57호
    • /
    • pp.73-84
    • /
    • 2006
  • The purpose of this study was to explore the acoustic characteristics of sexy voice. In this study, we measured acoustic parameters (fundamental frequency, jitter, shimmer, and nasalance) of a sustained vowel sound produced by 40 actors (20 males and 20 females) and 40 non-actors (20 males and 20 females). Digital audio recordings were made in the sustained vowel |a| for acoustic analyses using Praat (version 4.1.9) and Nasal View (version 4.5). Twenty voice pathologists participated in the listening experiment and judged the degree of sexiness on a 7-point scale. The results showed that fundamental frequency, shimmer and nasalance had significant differences between actors and non-actors. The acoustic parameters of sexy voice matched perceptual aspects of a previous study: Low fundamental frequency-low pitch and high shimmer-husky voice. On the other hand, the nasalance score did not match that of the previous study: Decreased nasalance had a higher score on sexiness scale judged by the listeners. It would be desirable to study the voice quality by analyzing and controlling more acoustic and auditory parameters for practical applications in the future.

  • PDF

포만트 공간에서의 주파수 변환을 이용한 이중 언어 음성 변환 연구 (Bilingual Voice Conversion Using Frequency Warping on Formant Space)

  • 채의근;윤영선;정진만;은성배
    • 말소리와 음성과학
    • /
    • 제6권4호
    • /
    • pp.133-139
    • /
    • 2014
  • This paper describes several approaches to transform a speaker's individuality to another's individuality using frequency warping between bilingual formant frequencies on different language environments. The proposed methods are simple and intuitive voice conversion algorithms that do not use training data between different languages. The approaches find the warping function from source speaker's frequency to target speaker's frequency on formant space. The formant space comprises four representative monophthongs for each language. The warping functions can be represented by piecewise linear equations, inverse matrix. The used features are pure frequency components including magnitudes, phases, and line spectral frequencies (LSF). The experiments show that the LSF-based voice conversion methods give better performance than other methods.

Characteristics of Cow´s Voices in Time and Frequency domains for Recognition

  • Ikeda, Yoshio;Ishii, Y.
    • Agricultural and Biosystems Engineering
    • /
    • 제2권1호
    • /
    • pp.15-23
    • /
    • 2001
  • On the assumption that the voices of the cows are produced by the linear prediction filter, we characterized the cows’voices. The order of this filter was determined by examining the voice characteristics both in time and frequency domains. The proposed order of the linear prediction filter is 15 for modeling voice production of the cow. The characteristics of the amplitude envelope of the voice signal was investigated by analyzing the sequence of the short time variance both in time and frequency domains, and the new parameters were defined. One of the coefficients o the linear prediction filter generating the voice signal, the fundamental frequency, the slope of the straight line regressed from the log-log spectra of the short time variance and the coefficients of the linear prediction filter generating the sequence of the short time variance of the voice signal can differentiate the two cows.

  • PDF

Voice Frequency Synthesis using VAW-GAN based Amplitude Scaling for Emotion Transformation

  • Kwon, Hye-Jeong;Kim, Min-Jeong;Baek, Ji-Won;Chung, Kyungyong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권2호
    • /
    • pp.713-725
    • /
    • 2022
  • Mostly, artificial intelligence does not show any definite change in emotions. For this reason, it is hard to demonstrate empathy in communication with humans. If frequency modification is applied to neutral emotions, or if a different emotional frequency is added to them, it is possible to develop artificial intelligence with emotions. This study proposes the emotion conversion using the Generative Adversarial Network (GAN) based voice frequency synthesis. The proposed method extracts a frequency from speech data of twenty-four actors and actresses. In other words, it extracts voice features of their different emotions, preserves linguistic features, and converts emotions only. After that, it generates a frequency in variational auto-encoding Wasserstein generative adversarial network (VAW-GAN) in order to make prosody and preserve linguistic information. That makes it possible to learn speech features in parallel. Finally, it corrects a frequency by employing Amplitude Scaling. With the use of the spectral conversion of logarithmic scale, it is converted into a frequency in consideration of human hearing features. Accordingly, the proposed technique provides the emotion conversion of speeches in order to express emotions in line with artificially generated voices or speeches.

갑상선 수술 후 성대마비 환자의 기식 음성에 대한 공기역학적 및 음향적 분석 (An Aerodynamic and Acoustic Analysis of the Breathy Voice of Thyroidectomy Patients)

  • 강영애;윤규철;김재옥
    • 말소리와 음성과학
    • /
    • 제4권2호
    • /
    • pp.95-104
    • /
    • 2012
  • Thyroidectomy patients may have vocal paralysis or paresis, resulting in a breathy voice. The aim of this study was to investigate the aerodynamic and acoustic characteristics of a breathy voice in thyroidectomy patients. Thirty-five subjects who have vocal paralysis after thyroidectomy participated in this study. According to perceptual judgements by three speech pathologists and one phonetic scholar, subjects were divided into two groups: breathy voice group (n = 21) and non-breathy voice group (n = 14). Aerodynamic analysis was conducted by three tasks (Voicing Efficiency, Maximum Sustained Phonation, Vital Capacity) and acoustic analysis was measured during Maximum Sustained Phonation task. The breathy voice group had significantly higher subglottal pressure and more pathological voice characteristics than the non breathy voice group. Showing 94.1% classification accuracy in result logistic regression of aerodynamic analysis, the predictor parameters for breathiness were maximum sound pressure level, sound pressure level range, phonation time of Maximum Sustained Phonation task and Pitch range, peak air pressure, and mean peak air pressure of Voicing Efficiency task. Classification accuracy of acoustic logistic regression was 88.6%, and five frequency perturbation parameters were shown as predictors. Vocal paralysis creates air turbulence at the glottis. It fluctuates frequency-related parameters and increases aspiration in high frequency areas. These changes determine perceptual breathiness.

적응 MFCC와 Neural Network 기반의 음성인식법 (Voice Recognition Based on Adaptive MFCC and Neural Network)

  • 배현수;이석규
    • 대한임베디드공학회논문지
    • /
    • 제5권2호
    • /
    • pp.57-66
    • /
    • 2010
  • In this paper, we propose an enhanced voice recognition algorithm using adaptive MFCC(Mel Frequency Cepstral Coefficients) and neural network. Though it is very important to extract voice data from the raw data to enhance the voice recognition ratio, conventional algorithms are subject to deteriorating voice data when they eliminate noise within special frequency band. Differently from the conventional MFCC, the proposed algorithm imposed bigger weights to some specified frequency regions and unoverlapped filterbank to enhance the recognition ratio without deteriorating voice data. In simulation results, the proposed algorithm shows better performance comparing with MFCC since it is robust to variation of the environment.

교사, 목사 및 교환수들의 음성발성에 대한 음향분석학적 특징 (Acoustic and Stroboscopic Characteristics in Teachers, Clergies and Telephone Operators)

  • 진성민;박상욱;이정우;이경철;이용배
    • 대한후두음성언어의학회지
    • /
    • 제9권1호
    • /
    • pp.53-58
    • /
    • 1998
  • Objectives : To compare the voice quality and voice problems of untrained professional voice user groups with that of normal control group without voice problem. Materials and Methods : The sustained vowel sounds of 13 male and 36 female teachers, 46 clergies and 15 telephone operators, and 40 normal male and 20 normal female persons were analyzed, using a videostroboscopy and acoustic analyzer. Together with these analyses, a questionnaire associated with risk factors for current and past voice problems was handed over to the patients. Results : The most common symptom in subjective groups was the voice fatigue. In stroboscopic examination, the professional voice user groups shelved functional voice disorder findings regardless of the Intensity of voice use. In the clergy and teacher using loud voice, vocal polyp, vocal nodule and hyperfunction of laryngeal muscle were frequently observed. In the clergy and telephone operator, jitter and shimmer were significantly increased. In the female teacher, the value of jitter, fundamental frequency variation and fundamental frequency were statiscally significant. However, the voice of male teacher showed no significant findings in the acoustic and aerodynamic studies. Conclusion : In the management of voice problems for untrained professional voice user groups, it is important to find the exact causes and patterns of voice problems, and to be individualized the management according to the causes.

  • PDF