• Title/Summary/Keyword: Human voice

Search Result 357, Processing Time 0.033 seconds

The Recognition of Korean Syllables using Parameter Based on Principal Component Analysis (PCA 기반 파라메타를 이용한 숫자음 인식)

  • 박경훈;표창수;김창근;허강인
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.12a
    • /
    • pp.181-184
    • /
    • 2000
  • The new method of feature extraction is proposed, considering the statistic feature of human voice, unlike the conventional methods of voice extraction. PCA(principal Component Analysis) is applied to this new method. PCA removes the repeating of data after finding the axis direction which has the greatest variance in input dimension. Then the new method is applied to real voice recognition to assess performance. When results of the number recognition in this paper and the conventional Mel-Cepstrum of voice feature parameter are compared, there is 0.5% difference of recognition rate. Better recognition rate is expected than word or sentence recognition in that less convergence time than the conventional method in extracting voice feature. Also, better recognition tate is expected when the optimum vector is used by statistic feature of data.

  • PDF

Voice Frequency Synthesis using VAW-GAN based Amplitude Scaling for Emotion Transformation

  • Kwon, Hye-Jeong;Kim, Min-Jeong;Baek, Ji-Won;Chung, Kyungyong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.2
    • /
    • pp.713-725
    • /
    • 2022
  • Mostly, artificial intelligence does not show any definite change in emotions. For this reason, it is hard to demonstrate empathy in communication with humans. If frequency modification is applied to neutral emotions, or if a different emotional frequency is added to them, it is possible to develop artificial intelligence with emotions. This study proposes the emotion conversion using the Generative Adversarial Network (GAN) based voice frequency synthesis. The proposed method extracts a frequency from speech data of twenty-four actors and actresses. In other words, it extracts voice features of their different emotions, preserves linguistic features, and converts emotions only. After that, it generates a frequency in variational auto-encoding Wasserstein generative adversarial network (VAW-GAN) in order to make prosody and preserve linguistic information. That makes it possible to learn speech features in parallel. Finally, it corrects a frequency by employing Amplitude Scaling. With the use of the spectral conversion of logarithmic scale, it is converted into a frequency in consideration of human hearing features. Accordingly, the proposed technique provides the emotion conversion of speeches in order to express emotions in line with artificially generated voices or speeches.

Formant Frequency as a Measure of Physical Fatigue

  • Ha, Wook Hyun;Kim, Hong Tae;Park, Sung Ha
    • Journal of the Ergonomics Society of Korea
    • /
    • v.32 no.1
    • /
    • pp.139-144
    • /
    • 2013
  • Objective: The current study investigated a non-obtrusive measure for detecting physical fatigue based on the analysis of formant frequencies of human voice. Background: Fatigue has been considered as a main cause in industrial and traffic accidents. Therefore, it is critical to detect worker's fatigue for accident prevention. Method: After running exercises on a treadmill, participants were instructed to read a sentence and their voices were recorded under four different physical fatigue levels. Korean vowels of "아", "어", "오", "우", and "이" from the voice recorded were then used to collect formant 1 frequencies. Results: Results of separate ANOVAs showed a significant main effect of physical fatigue on formant 1 frequency of "아", "어", and "이". Furthermore, post-hoc comparisons revealed that formant 1 frequency of "아" was most sensitive to physical fatigue level employed in this experiment. Conclusion: Formant 1 frequencies of some vowels significantly decrease as the physical fatigue level increases. Application: Potential application of this study includes the development of a measure of physical fatigue state that is free from sensor attachment and requires little preparation.

Comparison and Analysis of Response of Premature Infants to Auditory Stimulus (일변량 분산 분석과 이변량 시계열 분석을 이용한 미숙아의 목소리 자극에 대한 심박동수와 호흡수 반응의 비교)

  • Lee, Hye-Jung
    • Child Health Nursing Research
    • /
    • v.15 no.3
    • /
    • pp.261-270
    • /
    • 2009
  • Purpose: The purpose of this study was to compare the result of one-way ANOVA with that of cross-correlation time series analysis in order to evaluate physiologic responses of premature infants to human voices. Methods: Four premature infants born prior to 32 weeks gestational age were included in the study. The Gould 4000TA Recording System recorded the preterm infant's heart and respiratory rate while they were listening to a pre-recorded voice recording. Each infant listened to both male and female voices (1 min each) at each testing session. Results: The results of both one-wayANOVA and cross-correlation time series analysis using heart and respiratory rate data were not consistent in some of premature infants. A cross-correlation time series analysis revealed that the responses of premature infant to vocal stimulation occurred at a varying number of seconds after the stimulus was presented and lasted for over 20-30 sec. Conclusion: The results indicate that a time series analysis can provide more detailed information on the rapidly changing physiologic status of premature infant to the auditory stimulus. In addition, the results provide an insight into an auditory responsitivity of premature infants to a naturally occurring sound, the human voice, in the neonatal intensive care unit.

  • PDF

Implementation of Speech Recognizer using DSP(Digital Signal Processor) (DSP를 이용한 음성인식기 구현)

  • 임창환;문철홍;전경남
    • Proceedings of the IEEK Conference
    • /
    • 2000.11d
    • /
    • pp.187-190
    • /
    • 2000
  • In this paper, implementation of speech Recognizer system, Separated from Personal computer. By using DSP, this intends to extend the voice recognizing, limited into PC because of amount of data and calculations. For this performance The thesis uses the real time End point detector and organizes no additional device between human and the system, characteristic vector are that detects End point and voice from absolute energy and ZCR, that uses 12 difference Cepstrum from LPC, that uses the method to compensate the process of pattern separating and pre-calculated standard pattern limitation.

  • PDF

Design of Intelligent Emotion Recognition Model (지능형 감정인식 모델설계)

  • 김이곤;김서영;하종필
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.46-50
    • /
    • 2001
  • Voice is one of the most efficient communication media and it includes several kinds of factors about speaker, context emotion and so on. Human emotion is expressed in the speech, the gesture, the physiological phenomena (the breath, the beating of the pulse, etc). In this paper, the method to have cognizance of emotion from anyone's voice signals is presented and simulated by using neuro-fuzzy model.

  • PDF

A research on man-robot cooperative interaction system

  • Ishii, Masaru
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1992.10b
    • /
    • pp.555-557
    • /
    • 1992
  • Recently, realization of an intelligent cooperative interaction system between a man and robot systems is required. In this paper, HyperCard with a voice control is used for above system because of its easy handling and excellent human interfaces. Clicking buttons in the HyperCard by a mouse device or a voice command means controlling each joint of a robot system. Robot teaching operation of grasping a bin and pouring liquid in it into a cup is carried out. This robot teaching method using HyperCard provides a foundation for realizing a user friendly cooperative interaction system.

  • PDF

Design of Emotion Recognition Using Speech Signals (음성신호를 이용한 감정인식 모델설계)

  • 김이곤;김서영;하종필
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2001.10a
    • /
    • pp.265-270
    • /
    • 2001
  • Voice is one of the most efficient communication media and it includes several kinds of factors about speaker, context emotion and so on. Human emotion is expressed in the speech, the gesture, the physiological phenomena(the breath, the beating of the pulse, etc). In this paper, the method to have cognizance of emotion from anyone's voice signals is presented and simulated by using neuro-fuzzy model.

  • PDF

Application and Technology of Voice Synthesis Engine for Music Production (음악제작을 위한 음성합성엔진의 활용과 기술)

  • Park, Byung-Kyu
    • Journal of Digital Contents Society
    • /
    • v.11 no.2
    • /
    • pp.235-242
    • /
    • 2010
  • Differently from instruments which synthesized sounds and tones in the past, voice synthesis engine for music production has reached to the level of creating music as if actual artists were singing. It uses the samples of human voices naturally connected to the different levels of phoneme within the frequency range. Voice synthesis engine is not simply limited to the music production but it is changing cultural paradigm through the second creations of new music type including character music concerts, media productions, albums, and mobile services. Currently, voice synthesis engine technology makes it possible that users input pitch, lyrics, and musical expression parameters through the score editor and they mix and connect voice samples brought from the database to sing. New music types derived from such a development of computer music has sparked a big impact culturally. Accordingly, this paper attempts to examine the specific case studies and the synthesis technologies for users to understand the voice synthesis engine more easily, and it will contribute to their variety of music production.

A Study On Intelligent Robot Control Based On Voice Recognition For Smart FA (스마트 FA를 위한 음성인식 지능로봇제어에 관한 연구)

  • Sim, H.S.;Kim, M.S.;Choi, M.H.;Bae, H.Y.;Kim, H.J.;Kim, D.B.;Han, S.H.
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.21 no.2
    • /
    • pp.87-93
    • /
    • 2018
  • This Study Propose A New Approach To Impliment A Intelligent Robot Control Based on Voice Recognition For Smart Factory Automation Since human usually communicate each other by voices, it is very convenient if voice is used to command humanoid robots or the other type robot system. A lot of researches has been performed about voice recognition systems for this purpose. Hidden Markov Model is a robust statistical methodology for efficient voice recognition in noise environments. It has being tested in a wide range of applications. A prediction approach traditionally applied for the text compression and coding, Prediction by Partial Matching which is a finite-context statistical modeling technique and can predict the next characters based on the context, has shown a great potential in developing novel solutions to several language modeling problems in speech recognition. It was illustrated the reliability of voice recognition by experiments for humanoid robot with 26 joints as the purpose of application to the manufacturing process.