• Title/Summary/Keyword: Voice analysis

Search Result 1,185, Processing Time 0.033 seconds

Real time instruction classification system

  • Sang-Hoon Lee;Dong-Jin Kwon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.3
    • /
    • pp.212-220
    • /
    • 2024
  • A recently the advancement of society, AI technology has made significant strides, especially in the fields of computer vision and voice recognition. This study introduces a system that leverages these technologies to recognize users through a camera and relay commands within a vehicle based on voice commands. The system uses the YOLO (You Only Look Once) machine learning algorithm, widely used for object and entity recognition, to identify specific users. For voice command recognition, a machine learning model based on spectrogram voice analysis is employed to identify specific commands. This design aims to enhance security and convenience by preventing unauthorized access to vehicles and IoT devices by anyone other than registered users. We converts camera input data into YOLO system inputs to determine if it is a person, Additionally, it collects voice data through a microphone embedded in the device or computer, converting it into time-domain spectrogram data to be used as input for the voice recognition machine learning system. The input camera image data and voice data undergo inference tasks through pre-trained models, enabling the recognition of simple commands within a limited space based on the inference results. This study demonstrates the feasibility of constructing a device management system within a confined space that enhances security and user convenience through a simple real-time system model. Finally our work aims to provide practical solutions in various application fields, such as smart homes and autonomous vehicles.

The Change of the Voice Parameters in Long-term Sensorineural Hearing Loss Patients (장기간의 양측 감각신경성 난청환자에서 음성지표의 변화)

  • 윤자복;조경래;정상원;최정환;유영삼;우훈영;이강수
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.12 no.2
    • /
    • pp.140-144
    • /
    • 2001
  • Backgrounds & Objectives : Prolonged hearing loss was considered as one of the factors which have the potential to cause vocal changes. However, the analysis of quality of phonation in hearing loss patients has not been achieved enough. The purpose of the study was to evaluate the difference in objective acoustic parameters between long-term hearing impaired patients and normal control group. Material & Methods : The material of this investigation comprised a group of 20 patients (M : F=10 : 10) with moderate or profound hearing loss(over 50dB). The duration of all hearing loss was over 1 year. All of them underwent the acoustic examinations comprising electroglottography, multidimensional voice program and formant analysis during phonation of the bowels /a/ with free confortable tone and /i/ with voluntary high tone. The results of the acoustic examinations were compared with those of a control group, composed of 20 sex- and age-matched normal hearing subjects. Results : In the male hearing loss subjects, the significant increase was detected in pitch and shimmer during phonation of /a/ and in pitch during phonation of /i/. In addition, this group was characterized by decreased fundamental frequency during phonation of /i/. In female, there was no difference between hearing loss group and normal control group except a decreased formant 1 frequency. Conclusion : Long-term moderate and profound sensorineural hearing loss could affect the objective voice parameters.

  • PDF

Acoustic Characteristics of the Smoking Patients in the Voice Disorders (흡연환자 음성의 음향학적 특성에 관한 연구)

  • Lee, Myeong-Hee;Lee, Seung-Rho;Moon, Seung-Young;Lim, Sang-Ho;Cho, Young-Joo;Hong, Ki-Hwan
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.19 no.2
    • /
    • pp.123-127
    • /
    • 2008
  • Background and Objectives: Smoking has been identified as one of the main determinants of negative changes on the larynx histology. The purpose of this study is to investigate the voice characteristics and correlation between voice parameters of the smokers with vocal polyp or nodule or both of them. Materials and Method: MPT, $F_0$, jitter, shimmer, NHR of Korean /a/ vowel from 54 smokers and 50 nonsmokers diagnosed as vocal polyp or nodule were analyzed. A Computerized Speech Lab (4400) was for the analysis of each voice sample and statistical analysis was done by one-way ANOVA and Pearson correlation coefficient. Result and Conclusion: It showed that we can find difference between smokers and nonsmokers group in MPT, $F_0$, jitter, shimmer were different except NHR. each group shows difference in correlation coefficient between MPT, $F_0$, jitter, shimmer.

  • PDF

Voice quality of normal elderly people after a 3oz water-swallow test: An acoustic analysis (3온스 물 삼킴검사 이후 정상 노년층의 음질 변화: 음향학적 분석)

  • Lee, Sol Hee;Choi, Hong-Shik;Choi, Seong-Hee;Kim, HyangHee
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.69-76
    • /
    • 2018
  • The elderly are at increased risk of developing dysphagia due to aging and illnesses. The aim of the current study was to analyze, via an acoustic study, the change in the voice quality of normal elderly people after a 3oz water-swallow test. Subjects included a group of 60 normal elderly people (age: $mean{\pm}SD=76.9{\pm}6.66$) and 60 healthy young adults (age: $mean{\pm}SD=25.1{\pm}2.36$). Every participant produced a five-second /a/ phonation pre- and post-swallowing, and the fractioned two-second sections were analyzed using the MDVP (multi dimensional voice program) analysis. The elderly group demonstrated a post-swallowing increase in the following related acoustic parameters: fundamental frequency, fundamental frequency variation, amplitude-variation, and noise in both two-second sections. However, the younger group showed an increase only in frequency related acoustic parameters (i.e., STD ) in the first two-second section. The significant changes in values in the post-swallowing parameters might indicate temporary irregularities in pitch and amplitude along with higher amounts of noise in the voice. The results could be attributed to water residues in the vocal fold and vocal tract, as well as a deterioration of the motor and sensory functions caused by anatomical and physiological changes that result from aging.

Shimmer Change According to Fundamental Frequency Variation of Korean Normal Adults

  • Pyo, Hwa-Young;Sim, Hyun-Sub
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.143-152
    • /
    • 2003
  • The present study was performed to investigate change in shimmer according to $F_{0}$ variation precisely, and to offer suggestions for a clinical application. The analysis for the present study was done by the fundamental frequency ($F_{0}$) and shimmer measurement results of the previous 120 Korean normal adults' voice study of Pyo et al. (2002), used three vowels, /i/, /a/, /and /u/. Through the analysis of 60 female samples from the previous study, we found that $F_{0}$ of the vowels was the highest in /u/, and the lowest in /a/, but, on the contrary, shimmer was highest in /a/and lowest in /u/. Thirty of 60 subjects showed such an inverse relationship between $F_{0}$ and shimmer, as a whole. In the vowel /a/, 47 of 60 subjects showed the increased $F_{0}$ and decreased shimmer, in /i/, 32 subjects, and in /u/, 33 subjects showed the same results. The decrease in shimmer means the improvement of voice quality, so by these results, we expect to answer the question why the patients with spasmodic dysphonia can improve their voice quality with increased pitched voice production.

  • PDF

Robust Pitch Detection Algorithm for Pathological Voice inducing Pitch Halving and Doubling (피치 반감 배가를 유발하는 병적인 음성 분석을 위한 강인한 피치 검출 알고리즘)

  • Jang, Seung-Jin;Choi, Seong-Hee;Kim, Hyo-Min;Choi, Hong-Shik;Yoon, Young-Ro
    • Proceedings of the KIEE Conference
    • /
    • 2007.07a
    • /
    • pp.1797-1798
    • /
    • 2007
  • In field of voice pathology, diverse statistics extracted form pitch estimation were commonly used to assess voice quality. In this study, we proposed robust pitch detection algorithm which can estimate pitch of pathological voices in benign vocal fold lesions. we also compared our proposed algorithm with three established pitch detection algorithms; autocorrelation, simplified inverse filtering technique, and nonlinear state-space embedding methods. In the database of total pathological voices of 99 and normal voices of 30, an analysis of errors related with pitch detection was evaluated between pathological and normal voices, or among the types of pathological voices. According to the results of pitch errors, gross pitch error showed some increases in cases of pathological voices; especially excessive increase in PDA based on nonlinear time-series. In an analysis of types of pathological voices classified by aperiodicity and the degree of chaos, the more voice has aperiodic and chaotic, the more growth of pitch errors increased. Consequently, it is required to survey the severity of tested voice in order to obtain accurate pitch estimates.

  • PDF

Voice Analysis before and after Radioactive Iodine Ablation in Patients with Total Thyroidectomy (적갑상선 전절제술 환자의 방사성 동위원소치료 전.후 음성의 변화에 대한 연구)

  • Hong, Ki Hwan;Seo, Eun Ji;Lee, Hyun Doo;Yoon, Yun Sub;Lim, Seok Tae
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.24 no.1
    • /
    • pp.33-40
    • /
    • 2013
  • Background and Objectives:This study is to objectively compare and analyze the acoustic changes in the patients with total thyroidectomy before and after RI therapy. Subjects and Methods:For this study, a total of 50 patients with total thyroidectomy were participated as subjects. Voice samples were obtained at the time of post-operation (Post-OP), before high-dose radioactive iodine therapy (Pre-RIT), and after high-dose radioactive iodine therapy (Post-RIT). Acoustic analysis, the maximum phonation time and K-VHI (Korea-Voice handicap index) were used for subjective evaluation. Results:According to the comparison analysis of the three periods, mFo (Hz) was significantly reduced in all of the vowels /a/ and /i/ as the hormone was discontinued. This can be related to the reduction in vocal range. As thyroid hormone was discontinued, Shim (%) and APQ (%) values, which are the parameters related to the degree of aggressiveness, showed a significant increase in the middle vowel /a/. As thyroid hormone was discontinued, emotional index was significantly decreased in VHI (voice handicap index). Conclusion:These results can be assumed that thyroid hormone suspension is related to the increased changes in the vocal intensity, the increase in noise and the reduction in vocal range. Emotionally, these data can be assumed that the responsive factors of one's own voice disorders were significantly decreased in the patients with vocal handicap.

  • PDF

Acoustic Characteristics on the Adolescent Period Aged from 16 to 18 Years (16~18세 청소년기 음성의 음향음성학적 특성)

  • Ko, Hye-Ju;Kang, Min-Jae;Kwon, Hyuk-Jae;Choi, Yaelin;Lee, Mi-Geum;Choi, Hong-Shik
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.81-90
    • /
    • 2013
  • During adolescence the mutational period is characterized by the changes in the laryngeal structure, the length of the vocal cords, and a tone of voice. Usually, adolescents at 15 or 16 reach the voice of adults but the mutational period is sometimes delayed. Therefore, studies on the voice of adolescents between 16 ~ 18 right after the mutational period are required. Accordingly, this paper attempted to provide basic data about the normal standard for patients with voice disorders during this period by evaluating the vocal characteristics of males and females between 16 ~ 18 with an objective device bycomparing and analyzing them by sex and age. The study was conducted on a total of 60 subjects composed of each 10 subjects of each age. The vocal analysis was conducted by MPT (Maximum Phonation Time) measurement, sustained vowels and sentence reading. As for /a/ sustained vowels, fundamental frequency, hereinafter referred to as $F_0$, jitter, shimmer, noise-to-harmonic ratio, hereinafter referred to as NHR were measured by using the Multi-dimensional voice program (MDVP) among the Multi-Speech program of Computerized Speech Lab (Kay Elemetrics). The sentence reading, mean $F_0$, maximum $F_0$ and minimum $F_0$ were measured using the Real-Time Pitch (RTP) Model 5121 among the Multi-Speech program of Computerized Speech Lab (Kay Elemetrics). As a result, according to sex, there were statistically significant differences in $F_0$, jitter, shimmer, mean $F_0$, maximum $F_0$, and minimum $F_0$; and according to age, there were statistically significant differences in MPT. In conclusion, the voice of the adolescents between 16 ~ 18 reached the maturity levels of adults but the voice quality which can be considered on the scale of voice disorders showed transition to the voice of an adult during the mutational period.

Analysis of Voice Color Similarity for the development of HMM Based Emotional Text to Speech Synthesis (HMM 기반 감정 음성 합성기 개발을 위한 감정 음성 데이터의 음색 유사도 분석)

  • Min, So-Yeon;Na, Deok-Su
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.9
    • /
    • pp.5763-5768
    • /
    • 2014
  • Maintaining a voice color is important when compounding both the normal voice because an emotion is not expressed with various emotional voices in a single synthesizer. When a synthesizer is developed using the recording data of too many expressed emotions, a voice color cannot be maintained and each synthetic speech is can be heard like the voice of different speakers. In this paper, the speech data was recorded and the change in the voice color was analyzed to develop an emotional HMM-based speech synthesizer. To realize a speech synthesizer, a voice was recorded, and a database was built. On the other hand, a recording process is very important, particularly when realizing an emotional speech synthesizer. Monitoring is needed because it is quite difficult to define emotion and maintain a particular level. In the realized synthesizer, a normal voice and three emotional voice (Happiness, Sadness, Anger) were used, and each emotional voice consists of two levels, High/Low. To analyze the voice color of the normal voice and emotional voice, the average spectrum, which was the measured accumulated spectrum of vowels, was used and the F1(first formant) calculated by the average spectrum was compared. The voice similarity of Low-level emotional data was higher than High-level emotional data, and the proposed method can be monitored by the change in voice similarity.

Evaluation of the Feasibility of a Voice Alarm in a Highway Work Zone (음성 경고의 도로 공사구간 적용 가능성 평가)

  • Moon, Jae-Pil;Park, Hyun-jin;Oh, Cheol
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.15 no.5
    • /
    • pp.83-94
    • /
    • 2016
  • Providing a voice alarm to drivers approaching a work zone could be an effective alternative to mitigate the potential safety problems of the work zone. This study conceived a voice alarm with a direction sound speaker and a field test was conducted that evaluated the feasibility of the voice alarm at a highway work zone. During the field study, we carried out on-site driver surveys to obtain drivers' perception and preference, collected approaching speeds, and measured sound level during the off-peak 2-hour for two days, respectively. The results showed that while the voice alarm has the potential to be an effective tool in improving safety, the alternative appeared to have the negative effect of noise. Further refinement to a voice alarm with a directional speaker is required to improve feasibility, and the results are expected to be utilized as basic data useful for the refinement.