• Title/Summary/Keyword: Voice Analysis

Search Result 1,173, Processing Time 0.028 seconds

Laryngeal Cancer Screening using Cepstral Parameters (켑스트럼 파라미터를 이용한 후두암 검진)

  • 이원범;전경명;권순복;전계록;김수미;김형순;양병곤;조철우;왕수건
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.14 no.2
    • /
    • pp.110-116
    • /
    • 2003
  • Background and Objectives : Laryngeal cancer discrimination using voice signals is a non-invasive method that can carry out the examination rapidly and simply without giving discomfort to the patients. n appropriate analysis parameters and classifiers are developed, this method can be used effectively in various applications including telemedicine. This study examines voice analysis parameters used for laryngeal disease discrimination to help discriminate laryngeal diseases by voice signal analysis. The study also estimates the laryngeal cancer discrimination activity of the Gaussian mixture model (GMM) classifier based on the statistical modelling of voice analysis parameters. Materials and Methods : The Multi-dimensional voice program (MDVP) parameters, which have been widely used for the analysis of laryngeal cancer voice, sometimes fail to analyze the voice of a laryngeal cancer patient whose cycle is seriously damaged. Accordingly, it is necessary to develop a new method that enables an analysis of high reliability for the voice signals that cannot be analyzed by the MDVP. To conduct the experiments of laryngeal cancer discrimination, the authors used three types of voices collected at the Department of Otorhinorlaryngology, Pusan National University Hospital. 50 normal males voice data, 50 voices of males with benign laryngeal diseases and 105 voices of males laryngeal cancer. In addition, the experiment also included 11 voices data of males with laryngeal cancer that cannot be analyzed by the MDVP, Only monosyllabic vowel /a/ was used as voice data. Since there were only 11 voices of laryngeal cancer patients that cannot be analyzed by the MDVP, those voices were used only for discrimination. This study examined the linear predictive cepstral coefficients (LPCC) and the met-frequency cepstral coefficients (MFCC) that are the two major cepstrum analysis methods in the area of acoustic recognition. Results : The results showed that this met frequency scaling process was effective in acoustic recognition but not useful for laryngeal cancer discrimination. Accordingly, the linear frequency cepstral coefficients (LFCC) that excluded the met frequency scaling from the MFCC was introduced. The LFCC showed more excellent discrimination activity rather than the MFCC in predictability of laryngeal cancer. Conclusion : In conclusion, the parameters applied in this study could discriminate accurately even the terminal laryngeal cancer whose periodicity is disturbed. Also it is thought that future studies on various classification algorithms and parameters representing pathophysiology of vocal cords will make it possible to discriminate benign laryngeal diseases as well, in addition to laryngeal cancer.

  • PDF

Pilot Study on the Classification for Sasangin by the Voice Analysis (음성분석에 의한 체질진단에 관한 연구)

  • Lee Eui-Ju;Song Kwang-Bin;Choi Hwan-Soo;Yoo Jung-Hee;Kwak Chang-Kyu;Sohn Eun-Hae;Koh Byung-Hee
    • The Journal of Korean Medicine
    • /
    • v.26 no.1 s.61
    • /
    • pp.93-102
    • /
    • 2005
  • Objective : This research was conducted to evaluate the method of sasangin classification by voice analysis, The 2 pilot tests were thus designed to solve the following problems: 'What are the conditions at classification for sasangin by the voice analysis?' and 'What are the important variances of /a/ parameter?'. Methods: 122 volunteers Were examined to make a diagnosis of sasangin by QSCC II and they were disease-free and healthy, First, they said /a/ three times for 2 seconds in their usual voice, Second, they said /a/ for 2 seconds by the different ways of high tone, mid tone, and low tone. The sounds were collected by a recording program (cooledit 2000) through a Sony microphone (ecm-26l). We analyzed the voices by maltlab, the simulation tool. Results: There were no differences and were correlations when one said /a/ three times for 2 seconds in the usual voice. There were some things to correlate when one said /a/ three times for 2 seconds by the different ways of high speech, usual speech, and low speech. Others were nothing to correlate. We evaluated the value of sasangin classification method by only /a/ voice analysis. The hit ratio was average $66.3\%\;:\;soyangin\;67.9\%,\;taeumin\;68.0\%,\;soeumin\;63.9\%$. Conclusion: We must set up the conditions to use the method of sasangin classification by voice analysis. The value of sasangin classification method by only fa! voice analysis was a hit ratio of $66.3\%$.

  • PDF

Voice Features Extraction of Lung Diseases Based on the Analysis of Speech Rates and Intensity (발화속도 및 강도 분석에 기반한 폐질환의 음성적 특징 추출)

  • Kim, Bong-Hyun;Cho, Dong-Uk
    • The KIPS Transactions:PartB
    • /
    • v.16B no.6
    • /
    • pp.471-478
    • /
    • 2009
  • The lung diseases classifying as one of the six incurable diseases in modern days are caused mostly by smoking and air pollution. Such causes the lung function damages, and results in malfunction of the exchange of carbon dioxide and oxygen in an alveolus, which the interest is augment with risk diseases of life prolongation. With this in the paper, we proposed a diagnosis method of lung diseases by applying parameters of voice analysis aiming at the getting the voice feature extraction. Firstly, we sampled the voice data from patients and normal persons in the same age and sex, and made two sample groups from them. Also, we conducted an analysis by applying the various parameters of voice analysis through the collected voice data. The relational significance between the patient and normal groups can be evaluated in terms of speech rates and intensity as a part of analized parameters. In conclusion, the patient group has shown slower speech rates and bigger intensity than the normal group. With this, we propose the method of voice feature extraction for lung diseases.

Performance Analysis of a Statistical Packet Voice/Data Multiplexer (통계적 패킷 음성 / 데이터 다중화기의 성능 해석)

  • 신병철;은종관
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.11 no.3
    • /
    • pp.179-196
    • /
    • 1986
  • In this paper, the peformance of a statistical packet voice/data multiplexer is studied. In ths study we assume that in the packet voice/data multiplexer two separate finite queues are used for voice and data traffics, and that voice traffic gets priority over data. For the performance analysis we divide the output link of the multiplexer into a sequence of time slots. The voice signal is modeled as an (M+1) - state Markov process, M being the packet generation period in slots. As for the data traffic, it is modeled by a simple Poisson process. In our discrete time domain analysis, the queueing behavior of voice traffic is little affected by the data traffic since voice signal has priority over data. Therefore, we first analyze the queueing behavior of voice traffic, and then using the result, we study the queueing behavior of data traffic. For the packet voice multiplexer, both inpur state and voice buffer occupancy are formulated by a two-dimensional Markov chain. For the integrated voice/data multiplexer we use a three-dimensional Markov chain that represents the input voice state and the buffer occupancies of voice and data. With these models, the numerical results for the performance have been obtained by the Gauss-Seidel iteration method. The analytical results have been verified by computer simylation. From the results we have found that there exist tradeoffs among the number of voice users, output link capacity, voic queue size and overflow probability for the voice traffic, and also exist tradeoffs among traffic load, data queue size and oveflow probability for the data traffic. Also, there exists a tradeoff between the performance of voice and data traffics for given inpur traffics and link capacity. In addition, it has been found that the average queueing delay of data traffic is longer than the maximum buffer size, when the gain of time assignment speech interpolation(TASI) is more than two and the number of voice users is small.

  • PDF

An analysis of a statistical difference of acoustic Parameters' distribution between normal voice and pathological voice (병적 음성과 정상 음성의 음향학적 파라미터 분포에 대한 통계적 분석)

  • 김용주;권순복;김기련;신민철;조철우;왕수건
    • Proceedings of the IEEK Conference
    • /
    • 2001.06d
    • /
    • pp.249-252
    • /
    • 2001
  • The most basic means of communication among humans is a voice. Without speaking of voice technologies, we found it is important and convenient to use a voice in everyday life. But. in consideration to speech recognition systems, we can't always desire a normal voice input as input signal to the system. Generally speaking. a pathological voice as against a normal which is a voice with a problem in the larynx. could be also special case of input voice. Of course, but the distortion of a speech signal by environmental effects i.e., noise or transmission channel was a raised problem. we will take up a pathological voices with laryngeal disease which is essential distortion factor in voice. Also, we are to find out the difference of acoustic parameters distribution between normal and pathological voice by a statistical method in our research.

  • PDF

The Efficiency of Voice Therapy for the Patients with Mutational Falsetto (변성발성장애 환자에 대한 음성치료의 효과)

  • 표화영
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.9 no.2
    • /
    • pp.134-141
    • /
    • 1998
  • Mutational falsetto is a kind of voice disorders due to the failure to acquire proper low-pitched voice during the puberty. The patients with mutational falsetto can produce the normal low-pitched voice by the surgical treatment, like the type III-thyroplasty, or the voice therapy. The present study is, focusing on the latter treatment, to consider the efficiency of voice therapy for the mutational falsetto. The 7 patients who were diagnosed as mutational falsetto by the laryngologists, and treated by the voice therapist were selected as subjects. Their voices of pretherapy and posttherapy were analyzed on the aspects of acoustics and aerodynamics. Acoustic analysis was done by the MDVP(Multidimensional Voice Program) of CSL(Computerized Speech Lab, Kay Elemetrics, Co.), and aerodynamic analysis, by the Maximum Sustained Phonation of Aerophone II(Kay Elemetrics, Co.). By these measurements, we could find that fundamental frequency(F0) was significantly lowered, on the average, 65Hz. Maximum phonation time(MPT) was increased 4.57 second, and shimmer was decreased 1.644%, respectively, and each changes was statistically significant, too. On the average, jitter was decreased 0.499%, mean flow rate(MFR) was decreased 27.71ml/sec, and NHR was increased 0.023 which was the only parameter not showing improvement. But the changes of jitter, MFR and NHR were not statistically significant.

  • PDF

The Efficiency of Voice Therapy on Various Laryngeal Disorders (각종 후두질환에서 음성치료의 효과)

  • 왕수건;권순복;노환중;고의경;전경명
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.10 no.1
    • /
    • pp.17-23
    • /
    • 1999
  • Vocal hyperfunction is considered to be the most significant characteristic in larynx disorders which is found among many patients presenting hoarseness Primarily as chief complaint. In Pusan National University Hospital, we executed the voice therapy to 28 patients being 17 female and 11 male patients who visited the Voice & Speech Therapy Clinic, due to the voice disorder, and then compared and analysed the voice before and after its therapy using acoustic and aerodynamic test. The obtained results were as follows. In the analysis by the local findings, it was improved to 88% in the patients of vocal nodule, 75% in mutational falsetto, 75% in the functional dysphonia, 75% in the vocal cord palsy, 50% in the vocal polyp and 50% in dysphonia plica ventricularis. For the acoustic analysis, Fo, litter, Shimmer and NHR were measured. In the patients of mutational falsetto, Fo, Jitter and NHR were shown to be improved significantly and in the patients of vocal nodule, Shimmer was shown to be improved significantly. In the patients of vocal polyp, Fo was significantly improved. In the patients of vocal cord palsy in litter and NHH were significantly improved. In the patients of dysphonia plica ventricularis, Shimmer and NHR were significantly improved and the patients of functional dysphonia were more improved in Fo, litter and Shimmer. For the aerodynamic analysis, MPT was measured. In particular, it was shown to be improved significantly in the patients of vocal nodule, improved in the vocal polyp, vocal cord palsy, functional dysphonia patients.

  • PDF

The Comparison of the Acoustic and Aerodynamic Characteristics of $PROVOX^{(R)}$ Voice and Esophageal Voice Produced by the Same Laryngectomee (동일 후적자가 산출하는 기관식도 발성($PROVOX^{(R)}$ 발성)과 식도 발성에 대한 음향학적 및 공기역학적 특성 비교)

  • Pyo, H.Y.;Choi, H.S.;Lim, S.E.;Choi, S.H.
    • Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.121-139
    • /
    • 1999
  • Our experimental subject was a laryngectomee who had undergone total laryngectomy with $PROVOX^{(R)}$ insertion, and learned esophageal speech after the surgery, so he could produce both $PROVOX^{(R)}$ voice and esophageal voice. With this subject's production of $PROVOX^{(R)}$ and esophageal voice, we are to compare the acoustic and aerodynamic characteristics of the two voices, under the same physical conditions of the same person. As a result, the fundamental frequency of esophageal voice was 137.2 Hz, and that of $PROVOX^{(R)}$ was 97.5 Hz. $PROVOX^{(R)}$ voice showed lower jitter, shimmer and NHR than esophageal voice, which means that $PROVOX^{(R)}$ voice showed better voice quality than esophageal voice. In spectrographic analysis, the formation of formants and pseudoformants were more distinct in esophageal voice and several temporal aspects of acoutic features such as VOT and closure duration were more similar with normal voice in $PROVOX^{(R)}$ voice. During the sentence utterance, esophageal voice showed longer pause or silence duration than $PROVOX^{(R)}$ voice. Maximum phonation time and mean flow rate of $PROVOX^{(R)}$ voice were much longer and larger than esophageal voice, but mean and range of sound pressure level, subglottic pressure and voice efficiency were similar in the two voices. Glottal resistance of esophageal voice was much larger than $PROVOX^{(R)}$ voice which showed still larger glottal resistance than normal voice.

  • PDF

A Study on Voice Color Control Rules for Speech Synthesis System (음성합성시스템을 위한 음색제어규칙 연구)

  • Kim, Jin-Young;Eom, Ki-Wan
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.25-44
    • /
    • 1997
  • When listening the various speech synthesis systems developed and being used in our country, we find that though the quality of these systems has improved, they lack naturalness. Moreover, since the voice color of these systems are limited to only one recorded speech DB, it is necessary to record another speech DB to create different voice colors. 'Voice Color' is an abstract concept that characterizes voice personality. So speech synthesis systems need a voice color control function to create various voices. The aim of this study is to examine several factors of voice color control rules for the text-to-speech system which makes natural and various voice types for the sounding of synthetic speech. In order to find such rules from natural speech, glottal source parameters and frequency characteristics of the vocal tract for several voice colors have been studied. In this paper voice colors were catalogued as: deep, sonorous, thick, soft, harsh, high tone, shrill, and weak. For the voice source model, the LF-model was used and for the frequency characteristics of vocal tract, the formant frequencies, bandwidths, and amplitudes were used. These acoustic parameters were tested through multiple regression analysis to achieve the general relation between these parameters and voice colors.

  • PDF

Voice Activity Detection Algorithm using Wavelet Band Entropy Ensemble Analysis in Car Noisy Environments (문서 편집 접근성 향상을 위한 음성 명령 기반 모바일 어플리케이션 개발)

  • Park, Joo Hyun;Park, Seah;Lee, Muneui;Lim, Soon-Bum
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.11
    • /
    • pp.1342-1352
    • /
    • 2018
  • Voice Command systems are important means of ensuring accessibility to digital devices for use in situations where both hands are not free or for people with disabilities. Interests in services using speech recognition technology have been increasing. In this study, we developed a mobile writing application using voice recognition and voice command technology which helps people create and edit documents easily. This application is characterized by the minimization of the touch on the screen and the writing of memo by voice. We have systematically designed a mode to distinguish voice writing and voice command so that the writing and execution system can be used simultaneously in one voice interface. It provides a shortcut function that can control the cursor by voice, which makes document editing as convenient as possible. This allows people to conveniently access writing applications by voice under both physical and environmental constraints.