• Title/Summary/Keyword: utterance condition

Search Result 16, Processing Time 0.025 seconds

The Study of Breath Competence Depending on Utterance Condition by Healthy Speakers: a Preliminary Study (발화조건에 따른 정상 성인의 호흡 능력 차이 비교: 예비연구)

  • Lee, In-Ae;Lee, Hye-Eun;Hwang, Young-Jin
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.115-120
    • /
    • 2012
  • This study sought to compare breath competence in three different utterance conditions when reading a passage aloud, making a spontaneous speech, and singing. We tested 15 normal females (ages averaging $24{\pm}4.4$) and measured breath competence through an objective, aero-mechanical instrument called PAS (Phonatory aerodynamic system, model 6600, KAY Electronics, Inc). Breathing sets of inspiration and expiration were measured by breath group number, breath group duration, and the ratio of inspiration to expiration. The results from this study led us to the following conclusion: The breath group number and the breath group duration showed no significant difference. However, the only variance that we could find was in the ratio of inspiration and expiration. In significantly different speech patterns, singing resulted in the most varied ratio of inspiration and expiration, followed by reading a text aloud, and spontaneous speech. The average frequency rates and maximum intensity levels varied with regards to varying utterance conditions. This thus shows that breath competence and phonation competence have a closely interrelated relationship.

Variance characteristics of speaking fundamental frequency and vocal intensity depending on utterance conditions (발화조건에 따른 기본주파수 및 음성강도 변동의 특징)

  • Lee, Moo-Kyung
    • Phonetics and Speech Sciences
    • /
    • v.4 no.1
    • /
    • pp.111-118
    • /
    • 2012
  • The purpose of this study was to characterize and determine variances of speaking fundamental frequency and vocal intensity depending on gender and three utterance conditions (spontaneous speech, reading, and counting). A total of 65 undergraduate students (32 male students, 33 female students) attending universities in Daegu, South Korea participated in this study. The subjects were all in their 20s. This study used KayPENTAX's Visi-Pitch IV (Model 3950) to measure the variances of speaking fundamental frequency (SFF0) and vocal intensity (VI). As a result, this study came to the following conclusions. First, it was found that both males and females showed no significant difference in SFF0 and vocal intensity among three utterance conditions. Second, this study sought to analyze differences in the variances of SFF0 between males and females. As a result, it was found that females showed significantly higher levels of four measured variances (SFF0 $SD^{**}$, SFF0 $range^{***}$, Min $SFF0^{***}$ and Max $SFF0^{***}$) than males on spontaneous speech. However, it was found that there was no significant difference between males and females in SFF0 range on reading or in SFF0 SD and SFF0 range on counting. It was found that there was no significant difference between males and females in the level of measured variances of vocal intensity depending on utterance conditions. Finally, this study made a comparison and analysis on differences in the variances of SFF0 and vocal intensity among utterance conditions. As a result, it was found that all the measured variances of SFF0 in males were most significantly reduced depending upon spontaneous speech which was followed by reading and counting respectively (SFF0 SD: p<.001, SFF0 range: p<.05, Max SFF0: p<.05). Females however, show no significant difference in the measured variances of SFF0 depending upon three utterance conditions. It was also found that the measured variances of vocal intensity in females were most significantly reduced depending on spontaneous speech that was followed by reading and counting (VI SD: p<.001, VI range: p<.001, Min VI: p<.01 Max VI: p<.05), while males showed no significant difference in the measured variances of vocal intensity depending on three utterance conditions. In sum, these findings suggest that variances of SFF0 in males are affected by three utterance conditions, while variances of vocal intensity in females are affected by three utterance conditions.

Deep neural networks for speaker verification with short speech utterances (짧은 음성을 대상으로 하는 화자 확인을 위한 심층 신경망)

  • Yang, IL-Ho;Heo, Hee-Soo;Yoon, Sung-Hyun;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.6
    • /
    • pp.501-509
    • /
    • 2016
  • We propose a method to improve the robustness of speaker verification on short test utterances. The accuracy of the state-of-the-art i-vector/probabilistic linear discriminant analysis systems can be degraded when testing utterance durations are short. The proposed method compensates for utterance variations of short test feature vectors using deep neural networks. We design three different types of DNN (Deep Neural Network) structures which are trained with different target output vectors. Each DNN is trained to minimize the discrepancy between the feed-forwarded output of a given short utterance feature and its original long utterance feature. We use short 2-10 s condition of the NIST (National Institute of Standards Technology, U.S.) 2008 SRE (Speaker Recognition Evaluation) corpus to evaluate the method. The experimental results show that the proposed method reduces the minimum detection cost relative to the baseline system.

Adaptive Channel Normalization Based on Infomax Algorithm for Robust Speech Recognition

  • Jung, Ho-Young
    • ETRI Journal
    • /
    • v.29 no.3
    • /
    • pp.300-304
    • /
    • 2007
  • This paper proposes a new data-driven method for high-pass approaches, which suppresses slow-varying noise components. Conventional high-pass approaches are based on the idea of decorrelating the feature vector sequence, and are trying for adaptability to various conditions. The proposed method is based on temporal local decorrelation using the information-maximization theory for each utterance. This is performed on an utterance-by-utterance basis, which provides an adaptive channel normalization filter for each condition. The performance of the proposed method is evaluated by isolated-word recognition experiments with channel distortion. Experimental results show that the proposed method yields outstanding improvement for channel-distorted speech recognition.

  • PDF

Study of Developing SOP for Extracting Stable Vocal Features for Accurate Diagnosis (음성의 안정적 변수 추출을 위한 SOP 개발 연구)

  • Kim, Keun-Ho;Jang, Jun-Su;Kim, Young-Su;Kim, Jong-Yeol
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.25 no.6
    • /
    • pp.1108-1112
    • /
    • 2011
  • Voice can be widely used to classify the four constitution types and to recognize one's health condition from extracting meaningful features as physical quantity in traditional Korean medicine or Western medicine. In this paper, we proposed the method to update the standard operating procedure (SOP) to acquire and record voices for extracting stable vocal features since they are sensitive to the variation of a subject's utterance. At first, we obtained pitch frequencies from vowels and the sentence and intensity form the sentence as features with voices acquired under subjects' utterance conditions and then the deviation ratios of features from median values according to the utterance conditions were obtained and the condition to minimize the ratio was selected as a new SOP. As a result, we decided the SOP for a subject to utter vowels with the length of 2s~1s and sentences with over 2s interval between them after practice, in consideration of the deviation and qualitative requirements. Stable voice features obtained from updated SOP produce accurate diagnosis, which will be developed and simplified for using in the u-Healthcare system of personalized medicine.

Preceded Utterance Conversational Agent's Effect on User Experience with User's Task Performance and Conversational Agent's Self-Disclosure (선제 발화하는 대화형 에이전트가 사용자 경험에 미치는영향: 사용자 과제 수행과 대화형 에이전트의 자기노출을 중심으로)

  • Shin, Hyorim;Lee, Soyeon;Kang, Hyunmin
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.1
    • /
    • pp.565-576
    • /
    • 2022
  • The scope and functions of a conversational agent are gradually expanding. In particular, research and technology development is being conducted on a conversational agent that can speak first without user calls. However, still in its early stages, there is a lack of research on how a preceded utterance conversational agent will affect users. Accordingly, this study conducted a 2×3 mixed design using the user's task performance condition and the agent's self-exposure as independent variables and measured Intimacy, Functional Satisfaction, Psychological Reactance, and Workload as dependent variables to identify the effects of preceded utterance conversational agent on user experience.

A Study on Laryngeal Behavior of Persons Who Stutter with Fiber-Optic Nasolaryngoscope (후두 내시경(Fiber-Optic Nasolaryngoscope)을 이용한 말더듬인의 후두양상에 관한 연구)

  • Jung, Hun;Ahn, Jong-Bok;Choi, Byung-Heun;Kwon, Do-Ha
    • Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.159-173
    • /
    • 2008
  • The purpose of this study was to use fiber-optic nasolaryngoscope to find out differences in laryngeal behavior between persons who stutter(PS) and those who do not stutter(NS) upon their utterance. To meet the goal above, this study took 5 NS and 5 PS respectively as a part of sampling, so that they were all asked to join an experiment hereof. As a result, this study came to the following findings: First, there was not any significant difference in laryngeal behavior of uttering spoken languages between stuttering group and control. Second, there were some differences in laryngeal behavior of repetition and prolongation, which were a sort of disfluency revealed in the utterance of nonfluent spoken languages between stuttering group and control. Third, as reported by prior studies, it was found that there were differences in laryngeal abehavior of stutterer group's nonfluent spoken languages depending upon stuttering types. In this study, a variety of laryngeal behavior unreported in prior studies could be found. In addition, it was notable that stutterers showed different laryngeal behavior depending on their personal stuttering types. On block condition, Subject 1 showed laryngeal behavior of fAB, INT and fAD; Subject 2 showed laryngeal behavior of fAB, fAD and rAD; Subject 3 showed laryngeal behavior of fAD and rAD; Subject 4 showed only laryngeal behavior of fAD; and Subejct 5 showed laryngeal behavior of fAB, fAD and rAD. Summing up, these findings imply that when stutterers utter nonfluent words, they may reveal a variety of laryngeal behavior depending on their personal stuttering types. Moreover, it is found that there are more or less differences in the utterance of nonfluent spoken languages between NS and stuttering ones. In particular, it is interesting that one common trait of nonfluent spoken languages uttered by PS is evidently excessive laryngeal stress, no matter which type of stuttering they reveal.

  • PDF

Speaker Verification with the Constraint of Limited Data

  • Kumari, Thyamagondlu Renukamurthy Jayanthi;Jayanna, Haradagere Siddaramaiah
    • Journal of Information Processing Systems
    • /
    • v.14 no.4
    • /
    • pp.807-823
    • /
    • 2018
  • Speaker verification system performance depends on the utterance of each speaker. To verify the speaker, important information has to be captured from the utterance. Nowadays under the constraints of limited data, speaker verification has become a challenging task. The testing and training data are in terms of few seconds in limited data. The feature vectors extracted from single frame size and rate (SFSR) analysis is not sufficient for training and testing speakers in speaker verification. This leads to poor speaker modeling during training and may not provide good decision during testing. The problem is to be resolved by increasing feature vectors of training and testing data to the same duration. For that we are using multiple frame size (MFS), multiple frame rate (MFR), and multiple frame size and rate (MFSR) analysis techniques for speaker verification under limited data condition. These analysis techniques relatively extract more feature vector during training and testing and develop improved modeling and testing for limited data. To demonstrate this we have used mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) as feature. Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) are used for modeling the speaker. The database used is NIST-2003. The experimental results indicate that, improved performance of MFS, MFR, and MFSR analysis radically better compared with SFSR analysis. The experimental results show that LPCC based MFSR analysis perform better compared to other analysis techniques and feature extraction techniques.

Implementation of Speaker Verification Security System Using DSP Processor(TMS320C32) (DSP Processor(TMS320C32)를 이용한 화자인증 보안시스템의 구현)

  • Haam, Young-Jun;Kwon, Hyuk-Jae;Choi, Soo-Young;Jeong, lk-Joo
    • Journal of Industrial Technology
    • /
    • v.21 no.B
    • /
    • pp.107-116
    • /
    • 2001
  • The speech includes various kinds of information : language information, speaker's information, affectivity, hygienic condition, utterance environment etc. when a person communicates with others. All technologies to utilize in real life processing this speech are called the speech technology. The speech technology contains speaker's information that among them and it includes a speech which is known as a speaker recognition. DTW(Dynamic Time Warping) is the speaker recognition technology that seeks the pattern of standard speech signal and the similarity degree in an inputted speech signal using dynamic programming. ln this study, using TMS320C32 DSP processor, we are to embody this DTW and to construct a security system.

  • PDF

Cross-speaker anaphora in dynamic semantics

  • Yeom, Jae-Il
    • Language and Information
    • /
    • v.14 no.2
    • /
    • pp.103-129
    • /
    • 2010
  • In this paper, I show that anaphora across speakers shows both dynamic and static sides. To capture them all formally, I will adopt semantics based on the assumption that variables range over individual concepts that connect epistemic alternatives. As information increases, a variable can take a different range of possible individual concepts. This is captured by the notion of virtual individual (= vi), a set of individual concepts which are indistinguishable in an information state. The use of a pronoun involves two information states, one for the antecedent, which is always part of the common ground, and the other for the pronoun. Information increase changes vis for variables in the common ground. A pronoun can be used felicitously if there is a unique virtual individual in the information state for the antecedent which does not split in two or more distinctive virtual individuals in the information state for the pronoun. The felicity condition for cross-speaker anaphora can be satisfied in declaratives involving modality, interrogatives and imperatives in a rather less demanding way, because in these cases the utterance does not necessarily require non-trivial personal information for proper use of a pronoun.

  • PDF