• 제목/요약/키워드: Utterance condition

검색결과 16건 처리시간 0.021초

발화조건에 따른 정상 성인의 호흡 능력 차이 비교: 예비연구 (The Study of Breath Competence Depending on Utterance Condition by Healthy Speakers: a Preliminary Study)

  • 이인애;이혜은;황영진
    • 말소리와 음성과학
    • /
    • 제4권2호
    • /
    • pp.115-120
    • /
    • 2012
  • This study sought to compare breath competence in three different utterance conditions when reading a passage aloud, making a spontaneous speech, and singing. We tested 15 normal females (ages averaging $24{\pm}4.4$) and measured breath competence through an objective, aero-mechanical instrument called PAS (Phonatory aerodynamic system, model 6600, KAY Electronics, Inc). Breathing sets of inspiration and expiration were measured by breath group number, breath group duration, and the ratio of inspiration to expiration. The results from this study led us to the following conclusion: The breath group number and the breath group duration showed no significant difference. However, the only variance that we could find was in the ratio of inspiration and expiration. In significantly different speech patterns, singing resulted in the most varied ratio of inspiration and expiration, followed by reading a text aloud, and spontaneous speech. The average frequency rates and maximum intensity levels varied with regards to varying utterance conditions. This thus shows that breath competence and phonation competence have a closely interrelated relationship.

발화조건에 따른 기본주파수 및 음성강도 변동의 특징 (Variance characteristics of speaking fundamental frequency and vocal intensity depending on utterance conditions)

  • 이무경
    • 말소리와 음성과학
    • /
    • 제4권1호
    • /
    • pp.111-118
    • /
    • 2012
  • The purpose of this study was to characterize and determine variances of speaking fundamental frequency and vocal intensity depending on gender and three utterance conditions (spontaneous speech, reading, and counting). A total of 65 undergraduate students (32 male students, 33 female students) attending universities in Daegu, South Korea participated in this study. The subjects were all in their 20s. This study used KayPENTAX's Visi-Pitch IV (Model 3950) to measure the variances of speaking fundamental frequency (SFF0) and vocal intensity (VI). As a result, this study came to the following conclusions. First, it was found that both males and females showed no significant difference in SFF0 and vocal intensity among three utterance conditions. Second, this study sought to analyze differences in the variances of SFF0 between males and females. As a result, it was found that females showed significantly higher levels of four measured variances (SFF0 $SD^{**}$, SFF0 $range^{***}$, Min $SFF0^{***}$ and Max $SFF0^{***}$) than males on spontaneous speech. However, it was found that there was no significant difference between males and females in SFF0 range on reading or in SFF0 SD and SFF0 range on counting. It was found that there was no significant difference between males and females in the level of measured variances of vocal intensity depending on utterance conditions. Finally, this study made a comparison and analysis on differences in the variances of SFF0 and vocal intensity among utterance conditions. As a result, it was found that all the measured variances of SFF0 in males were most significantly reduced depending upon spontaneous speech which was followed by reading and counting respectively (SFF0 SD: p<.001, SFF0 range: p<.05, Max SFF0: p<.05). Females however, show no significant difference in the measured variances of SFF0 depending upon three utterance conditions. It was also found that the measured variances of vocal intensity in females were most significantly reduced depending on spontaneous speech that was followed by reading and counting (VI SD: p<.001, VI range: p<.001, Min VI: p<.01 Max VI: p<.05), while males showed no significant difference in the measured variances of vocal intensity depending on three utterance conditions. In sum, these findings suggest that variances of SFF0 in males are affected by three utterance conditions, while variances of vocal intensity in females are affected by three utterance conditions.

짧은 음성을 대상으로 하는 화자 확인을 위한 심층 신경망 (Deep neural networks for speaker verification with short speech utterances)

  • 양일호;허희수;윤성현;유하진
    • 한국음향학회지
    • /
    • 제35권6호
    • /
    • pp.501-509
    • /
    • 2016
  • 본 논문에서는 짧은 테스트 발성에 대한 화자 확인 성능을 개선하는 방법을 제안한다. 테스트 발성의 길이가 짧을 경우 i-벡터/확률적 선형판별분석 기반 화자 확인 시스템의 성능이 하락한다. 제안한 방법은 짧은 발성으로부터 추출한 특징 벡터를 심층 신경망으로 변환하여 발성 길이에 따른 변이를 보상한다. 이 때, 학습시의 출력 레이블에 따라 세 종류의 심층 신경망 이용 방법을 제안한다. 각 신경망은 입력 받은 짧은 발성 특징에 대한 출력 결과와 원래의 긴 발성으로부터 추출한 특징과의 차이를 줄이도록 학습한다. NIST (National Institute of Standards Technology, 미국) 2008 SRE(Speaker Recognition Evaluation) 코퍼스의 short 2-10 s 조건 하에서 제안한 방법의 성능을 평가한다. 실험 결과 부류 내 분산 정규화 및 선형 판별 분석을 이용하는 기존 방법에 비해 최소 검출 비용이 감소하는 것을 확인하였다. 또한 짧은 발성 분산 정규화 기반 방법과도 성능을 비교하였다.

Adaptive Channel Normalization Based on Infomax Algorithm for Robust Speech Recognition

  • Jung, Ho-Young
    • ETRI Journal
    • /
    • 제29권3호
    • /
    • pp.300-304
    • /
    • 2007
  • This paper proposes a new data-driven method for high-pass approaches, which suppresses slow-varying noise components. Conventional high-pass approaches are based on the idea of decorrelating the feature vector sequence, and are trying for adaptability to various conditions. The proposed method is based on temporal local decorrelation using the information-maximization theory for each utterance. This is performed on an utterance-by-utterance basis, which provides an adaptive channel normalization filter for each condition. The performance of the proposed method is evaluated by isolated-word recognition experiments with channel distortion. Experimental results show that the proposed method yields outstanding improvement for channel-distorted speech recognition.

  • PDF

음성의 안정적 변수 추출을 위한 SOP 개발 연구 (Study of Developing SOP for Extracting Stable Vocal Features for Accurate Diagnosis)

  • 김근호;장준수;김영수;김종열
    • 동의생리병리학회지
    • /
    • 제25권6호
    • /
    • pp.1108-1112
    • /
    • 2011
  • Voice can be widely used to classify the four constitution types and to recognize one's health condition from extracting meaningful features as physical quantity in traditional Korean medicine or Western medicine. In this paper, we proposed the method to update the standard operating procedure (SOP) to acquire and record voices for extracting stable vocal features since they are sensitive to the variation of a subject's utterance. At first, we obtained pitch frequencies from vowels and the sentence and intensity form the sentence as features with voices acquired under subjects' utterance conditions and then the deviation ratios of features from median values according to the utterance conditions were obtained and the condition to minimize the ratio was selected as a new SOP. As a result, we decided the SOP for a subject to utter vowels with the length of 2s~1s and sentences with over 2s interval between them after practice, in consideration of the deviation and qualitative requirements. Stable voice features obtained from updated SOP produce accurate diagnosis, which will be developed and simplified for using in the u-Healthcare system of personalized medicine.

선제 발화하는 대화형 에이전트가 사용자 경험에 미치는영향: 사용자 과제 수행과 대화형 에이전트의 자기노출을 중심으로 (Preceded Utterance Conversational Agent's Effect on User Experience with User's Task Performance and Conversational Agent's Self-Disclosure)

  • 신효림;이소연;강현민
    • 문화기술의 융합
    • /
    • 제8권1호
    • /
    • pp.565-576
    • /
    • 2022
  • 대화형 에이전트의 사용 범위와 기능이 점차 확장되고 있다. 특히나, 사용자의 호출이 있어야만 말을 하는 대화형 에이전트에서 사용자의 호출 없이도 먼저 말을 걸 수 있는 선제 발화하는 대화형 에이전트에 대한 연구와 기술개발이 이루어지고 있다. 그러나 아직 초기 단계이기 때문에 선제 발화하는 대화형 에이전트가 사용자에게 어떠한 영향을 미칠지에 대한 연구가 부족한 상황이다. 이에 이 연구는 선제 발화하는 대화형 에이전트가 사용자 경험에 미치는 영향을 확인하기 위해 사용자의 과제 수행 조건과 에이전트의 자기노출 유무를 독립변인으로 하는 2×3 혼합 설계를 통해 친밀감, 기능적 만족감, 심리적 저항감, 작업 부하를 측정하였다.

후두 내시경(Fiber-Optic Nasolaryngoscope)을 이용한 말더듬인의 후두양상에 관한 연구 (A Study on Laryngeal Behavior of Persons Who Stutter with Fiber-Optic Nasolaryngoscope)

  • 정훈;안종복;최병흔;권도하
    • 음성과학
    • /
    • 제15권3호
    • /
    • pp.159-173
    • /
    • 2008
  • The purpose of this study was to use fiber-optic nasolaryngoscope to find out differences in laryngeal behavior between persons who stutter(PS) and those who do not stutter(NS) upon their utterance. To meet the goal above, this study took 5 NS and 5 PS respectively as a part of sampling, so that they were all asked to join an experiment hereof. As a result, this study came to the following findings: First, there was not any significant difference in laryngeal behavior of uttering spoken languages between stuttering group and control. Second, there were some differences in laryngeal behavior of repetition and prolongation, which were a sort of disfluency revealed in the utterance of nonfluent spoken languages between stuttering group and control. Third, as reported by prior studies, it was found that there were differences in laryngeal abehavior of stutterer group's nonfluent spoken languages depending upon stuttering types. In this study, a variety of laryngeal behavior unreported in prior studies could be found. In addition, it was notable that stutterers showed different laryngeal behavior depending on their personal stuttering types. On block condition, Subject 1 showed laryngeal behavior of fAB, INT and fAD; Subject 2 showed laryngeal behavior of fAB, fAD and rAD; Subject 3 showed laryngeal behavior of fAD and rAD; Subject 4 showed only laryngeal behavior of fAD; and Subejct 5 showed laryngeal behavior of fAB, fAD and rAD. Summing up, these findings imply that when stutterers utter nonfluent words, they may reveal a variety of laryngeal behavior depending on their personal stuttering types. Moreover, it is found that there are more or less differences in the utterance of nonfluent spoken languages between NS and stuttering ones. In particular, it is interesting that one common trait of nonfluent spoken languages uttered by PS is evidently excessive laryngeal stress, no matter which type of stuttering they reveal.

  • PDF

Speaker Verification with the Constraint of Limited Data

  • Kumari, Thyamagondlu Renukamurthy Jayanthi;Jayanna, Haradagere Siddaramaiah
    • Journal of Information Processing Systems
    • /
    • 제14권4호
    • /
    • pp.807-823
    • /
    • 2018
  • Speaker verification system performance depends on the utterance of each speaker. To verify the speaker, important information has to be captured from the utterance. Nowadays under the constraints of limited data, speaker verification has become a challenging task. The testing and training data are in terms of few seconds in limited data. The feature vectors extracted from single frame size and rate (SFSR) analysis is not sufficient for training and testing speakers in speaker verification. This leads to poor speaker modeling during training and may not provide good decision during testing. The problem is to be resolved by increasing feature vectors of training and testing data to the same duration. For that we are using multiple frame size (MFS), multiple frame rate (MFR), and multiple frame size and rate (MFSR) analysis techniques for speaker verification under limited data condition. These analysis techniques relatively extract more feature vector during training and testing and develop improved modeling and testing for limited data. To demonstrate this we have used mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) as feature. Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) are used for modeling the speaker. The database used is NIST-2003. The experimental results indicate that, improved performance of MFS, MFR, and MFSR analysis radically better compared with SFSR analysis. The experimental results show that LPCC based MFSR analysis perform better compared to other analysis techniques and feature extraction techniques.

DSP Processor(TMS320C32)를 이용한 화자인증 보안시스템의 구현 (Implementation of Speaker Verification Security System Using DSP Processor(TMS320C32))

  • 함영준;권혁재;최수영;정익주
    • 산업기술연구
    • /
    • 제21권B호
    • /
    • pp.107-116
    • /
    • 2001
  • The speech includes various kinds of information : language information, speaker's information, affectivity, hygienic condition, utterance environment etc. when a person communicates with others. All technologies to utilize in real life processing this speech are called the speech technology. The speech technology contains speaker's information that among them and it includes a speech which is known as a speaker recognition. DTW(Dynamic Time Warping) is the speaker recognition technology that seeks the pattern of standard speech signal and the similarity degree in an inputted speech signal using dynamic programming. ln this study, using TMS320C32 DSP processor, we are to embody this DTW and to construct a security system.

  • PDF

Cross-speaker anaphora in dynamic semantics

  • Yeom, Jae-Il
    • 한국언어정보학회지:언어와정보
    • /
    • 제14권2호
    • /
    • pp.103-129
    • /
    • 2010
  • In this paper, I show that anaphora across speakers shows both dynamic and static sides. To capture them all formally, I will adopt semantics based on the assumption that variables range over individual concepts that connect epistemic alternatives. As information increases, a variable can take a different range of possible individual concepts. This is captured by the notion of virtual individual (= vi), a set of individual concepts which are indistinguishable in an information state. The use of a pronoun involves two information states, one for the antecedent, which is always part of the common ground, and the other for the pronoun. Information increase changes vis for variables in the common ground. A pronoun can be used felicitously if there is a unique virtual individual in the information state for the antecedent which does not split in two or more distinctive virtual individuals in the information state for the pronoun. The felicity condition for cross-speaker anaphora can be satisfied in declaratives involving modality, interrogatives and imperatives in a rather less demanding way, because in these cases the utterance does not necessarily require non-trivial personal information for proper use of a pronoun.

  • PDF