• Title/Summary/Keyword: Utterance

Search Result 382, Processing Time 0.025 seconds

Characteristics of Right Hemispheric Damaged Patients in Korean Focused Prosodic Sentences (한국어 초점 발화 시 우반구 손상인의 초점 운율 특성)

  • Lee, Myung Soon;Park, Hyun
    • Therapeutic Science for Rehabilitation
    • /
    • v.8 no.3
    • /
    • pp.69-81
    • /
    • 2019
  • Objective: The purpose of this study was to examine the characteristics of prosody of ambiguous sentences in patients with right hemisphere damage(RHD). Methods: Sentences with each word prosodically focused were used to investigate. Several acoustic parameters such as intensity, F0, and duration were measured to identify characteristics of prosody in patients with lesions in the right hemisphere and normal controls. All speech samples were recorded using the Praat 4.3.14 software. Data were analyzed with the independent sample t-test using SPSS 18.0. Results: The results of this study are as follows: First, intensity of the first syllable of the focus word was different between the two groups in several sentences. Second, F0 was different between the two groups in all sentences. Third, duration was different between the groups in several sentences. Accordingly, prosody were varied and values of acoustic parameters differed due to the focus of utterance. The group with right hemisphere damage showed restricted prosody. Conclusions: Intensity, duration, and F0 are all used as elements of prosody in emphasizing structural and pragmatic meaning, but according to the focus, strength and duration were related to F0. In contrast, F0 has a significant linguistic difference, but there was a significant difference between the RHD and normal people, so F0 can be a discriminatory factor of rhyme evaluation of the right hemisphere damaged and it is necessary to accumulate more strong evidence through future research.

Development and validation of Speech Range Profile task (발화범위 프로파일 과제 개발 및 타당성 검증)

  • Kim, Jaeock;Lee, Seung Jin
    • Phonetics and Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.77-87
    • /
    • 2019
  • The study aimed to develop Speech Range Profile (SRP) and to examine and validate its clinical application. Forty-five participants without voice disorders aged 18-29 years were compared using SRP and Voice Range Profile (VRP). The authors developed the "Fire!" paragraph as a SRP task compromising 14 sentences including all Korean spoken phonemes and sentence types. To compare SRP and VRP results, the participants read the paragraph (reading) and counted from 21 to 30 (counting) as a part of SRP tasks, and produced a vowel /a/ from low to high frequencies (gliding) and a shortened form of the VRP as a part of VRP tasks. $F0_{max}$, $F0_{min}$, $F0_{range}$, $I_{max}$, $I_{min}$, and $I_{range}$ for each task were measured and compared, showing that $F0_{max}$, $F0_{min}$, $F0_{range}$, $I_{max}$, and $I_{range}$ were not different between reading and gliding. $I_{min}$, had the lowest value in counting. It is concluded that the newly developed SRP task, reading the "Fire" paragraph, can yield a maximum phonation range similar to that found by VRP. Therefore, it is expected that voice evaluation can be effectively performed in a relatively short time by applying SRP with the "Fire" paragraph, a functional utterance task, in place of VRP, which may be difficult to measure long term or in cases of severe voice disorders.

Classification of muscle tension dysphonia (MTD) female speech and normal speech using cepstrum variables and random forest algorithm (켑스트럼 변수와 랜덤포레스트 알고리듬을 이용한 MTD(근긴장성 발성장애) 여성화자 음성과 정상음성 분류)

  • Yun, Joowon;Shim, Heejeong;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.91-98
    • /
    • 2020
  • This study investigated the acoustic characteristics of sustained vowel /a/ and sentence utterance produced by patients with muscle tension dysphonia (MTD) using cepstrum-based acoustic variables. 36 women diagnosed with MTD and the same number of women with normal voice participated in the study and the data were recorded and measured by ADSVTM. The results demonstrated that cepstral peak prominence (CPP) and CPP_F0 among all of the variables were statistically significantly lower than those of control group. When it comes to the GRBAS scale, overall severity (G) was most prominent, and roughness (R), breathiness (B), and strain (S) indices followed in order in the voice quality of MTD patients. As these characteristics increased, a statistically significant negative correlation was observed in CPP. We tried to classify MTD and control group using CPP and CPP_F0 variables. As a result of statistic modeling with a Random Forest machine learning algorithm, much higher classification accuracy (100% in training data and 83.3% in test data) was found in the sentence reading task, with CPP being proved to be playing a more crucial role in both vowel and sentence reading tasks.

The relationship between fluency levels and suprasegmentals according to the sentence types in the English read speech by Korean middle school English learners (한국 중학생의 영어 읽기 발화에서 문장유형에 따른 유창성 등급과 초분절 요소의 관계)

  • Kim, Hwa-Young
    • Phonetics and Speech Sciences
    • /
    • v.14 no.3
    • /
    • pp.51-66
    • /
    • 2022
  • This study aims to help Korean English learners to learn English pronunciation by revealing which suprasegmentals affect the implementation of English sentences closer to native English speakers when they read English sentences. To this end, Korean middle school English learners were selected as subjects and research data were gathered through sentence types (declarative, interrogative, imperative, and exclamative), as well as syllables. Speech rate, pause frequency, pause duration, F0 range, and rhythm among suprasegmentals were used for analysis of these English sentence utterances. Mean analysis, correlation analysis, and regression analysis were performed. The results showed that speech rate, pause frequency, pause duration, and F0 range affected the evaluation of fluency levels. In the regression analysis between all suprasegmentals and fluency levels, the suprasegmentals that most affected fluency levels were speech rate and F0 range. Rhythm had no meaningful relation with fluency levels. Therefore, when teaching English pronunciation, it is necessary to teach students to increase their speech rate and F0 range. In addition, students should be trained to reduce both the number and the duration of pauses during utterance to improve their fluency. It is noteworthy that of the four sentence types, exclamative sentences were produced with faster speech rate, fewer pauses, shorter pause duration, and higher rhythm values.

Perceptual discrimination of wh-scopes in Gyeongsang Korean (경상 방언 의문문 작용역의 지각 구분)

  • Yun, Weonhee
    • Phonetics and Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.1-10
    • /
    • 2022
  • A wh-phrase positioned in an embedded clause can be interpreted as having a matrix scope if the sentence is produced with proper prosodic structures such as the wh-intonation. In a previous experiment, a sentence with a wh-phrase in an embedded clause was given to 40 speakers of Gyeongsang Korean. A script containing the sentence was provided to induce a matrix scope interpretation for the wh-phrase. These 40 utterances were prepared as stimuli for a perception test to verify whether the wh-phrases in the stimuli were perceived as having matrix scopes. Each utterance was played thrice to 24 subjects. The results showed that more than half of the 72 responses indicated a preference for an embedded scope rather than a matrix scope in 20 of the utterances. A multiple linear regression analysis showed that the matrix scope responses were best predicted by the magnitude of the pitch prominence in a prosodic word consisting of an embedded verb and a complementizer. The pitch prominence was calculated by subtracting the fundamental frequency (F0) at the right edge of the prosodic word from the peak F0 in the same prosodic word. The smaller the magnitude, the more matrix responses there were. These results suggest that the categorical perception of wh-scopes is based on the magnitude of pitch prominence.

The effects of speakers' age on temporal features of speech among healthy young, middle-aged, and older adults (연령세대에 따른 말 산출의 시간적 특성: 말속도와 쉼을 중심으로)

  • Kim, Yeji;Lee, Song-min;Choi, Min-kyung;Jung, Sang-min;Sung, Jee Eun;Lee, Youngmee
    • Phonetics and Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.37-47
    • /
    • 2022
  • The purpose of the this study is to observe the effects of healthy adults' age on temporal features of speech and identify which could differentiate older and young adults. We examined speech rates(i.e., overall speaking rate, articulation rate), occurrence of pause, and duration of pause per utterance by utilizing the National Institute of Korean Language's open corpus. We selected a total of 30 healthy adults (10 young, 10 middle-aged, and 10 older adults) in this study. There were significant differences among the groups in the overall speaking rate, articulation rate, total occurrence of pause, the occurrence of pause between syntactic words, total duration of pause, and duration of pause between syntactic words. The older and middle-aged adults showed slower speech rates and longer and more frequent pause than young adults. But there were no significant differences among the three groups in terms of pause within syntactic word. The overall speaking rate significantly differentiated older adults from young adults. These findings suggested that the effect of speakers' age was reflected in gradual changes in the temporal features of their speech.

Extending StarGAN-VC to Unseen Speakers Using RawNet3 Speaker Representation (RawNet3 화자 표현을 활용한 임의의 화자 간 음성 변환을 위한 StarGAN의 확장)

  • Bogyung Park;Somin Park;Hyunki Hong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.7
    • /
    • pp.303-314
    • /
    • 2023
  • Voice conversion, a technology that allows an individual's speech data to be regenerated with the acoustic properties(tone, cadence, gender) of another, has countless applications in education, communication, and entertainment. This paper proposes an approach based on the StarGAN-VC model that generates realistic-sounding speech without requiring parallel utterances. To overcome the constraints of the existing StarGAN-VC model that utilizes one-hot vectors of original and target speaker information, this paper extracts feature vectors of target speakers using a pre-trained version of Rawnet3. This results in a latent space where voice conversion can be performed without direct speaker-to-speaker mappings, enabling an any-to-any structure. In addition to the loss terms used in the original StarGAN-VC model, Wasserstein distance is used as a loss term to ensure that generated voice segments match the acoustic properties of the target voice. Two Time-Scale Update Rule (TTUR) is also used to facilitate stable training. Experimental results show that the proposed method outperforms previous methods, including the StarGAN-VC network on which it was based.

Detection of video editing points using facial keypoints (얼굴 특징점을 활용한 영상 편집점 탐지)

  • Joshep Na;Jinho Kim;Jonghyuk Park
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.15-30
    • /
    • 2023
  • Recently, various services using artificial intelligence(AI) are emerging in the media field as well However, most of the video editing, which involves finding an editing point and attaching the video, is carried out in a passive manner, requiring a lot of time and human resources. Therefore, this study proposes a methodology that can detect the edit points of video according to whether person in video are spoken by using Video Swin Transformer. First, facial keypoints are detected through face alignment. To this end, the proposed structure first detects facial keypoints through face alignment. Through this process, the temporal and spatial changes of the face are reflected from the input video data. And, through the Video Swin Transformer-based model proposed in this study, the behavior of the person in the video is classified. Specifically, after combining the feature map generated through Video Swin Transformer from video data and the facial keypoints detected through Face Alignment, utterance is classified through convolution layers. In conclusion, the performance of the image editing point detection model using facial keypoints proposed in this paper improved from 87.46% to 89.17% compared to the model without facial keypoints.

Funds of Knowledge and Features of Teaching and Learning in the Hybrid Space of Middle School Science Class: Focus on 7th grade Biology (과학 수업의 혼성공간에서 드러나는 중학생의 지식자본 및 교수학습 특성: 7학년 생명 영역을 중심으로)

  • Lee, Minjoo;Kim, Heui-Baik
    • Journal of The Korean Association For Science Education
    • /
    • v.34 no.8
    • /
    • pp.731-744
    • /
    • 2014
  • Extracting students' own culture and resources as main sources in science class, we begin a research to explore teaching and learning settings that are more responsive to adolescents. This study has been designed to explore the funds of knowledge that students bring into middle school science class. It also focused on the features of teaching and learning settings that stimulated the autonomous inflow of students' funds of knowledge as resources of science learning. Data from participant observations and in-depth interviews with 7th grade students were qualitatively analyzed based on grounded theory. We found that students' funds of knowledge were formed from their family life, neighbor communities, peer group, and pop culture. The funds of knowledge based on peer culture emerged as the most salient factor of students' enhanced participation and utterance. Common features of classes that stimulated the inflow of funds of knowledge were analyzed to be: (1) hybrid spaces for learning designed in advance: (2) sharing and enlargement of the funds of knowledge that has been brought into the class: and (3) common orientation of the community of practice for knowledge co-construction and shared outcomes. From these findings, this paper discussed the educational implications for promoting students' potential resources to actual sources of science class. It also discussed students' development of participation specifically among the generally marginalized students. Science classes based on the funds of knowledge of students offer an increased possibility of knowledge co-construction through the hybridized interactions of student's everyday lives and science knowledge and lead to more meaningful learning experiences.

Automatic Speech Style Recognition Through Sentence Sequencing for Speaker Recognition in Bilateral Dialogue Situations (양자 간 대화 상황에서의 화자인식을 위한 문장 시퀀싱 방법을 통한 자동 말투 인식)

  • Kang, Garam;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.17-32
    • /
    • 2021
  • Speaker recognition is generally divided into speaker identification and speaker verification. Speaker recognition plays an important function in the automatic voice system, and the importance of speaker recognition technology is becoming more prominent as the recent development of portable devices, voice technology, and audio content fields continue to expand. Previous speaker recognition studies have been conducted with the goal of automatically determining who the speaker is based on voice files and improving accuracy. Speech is an important sociolinguistic subject, and it contains very useful information that reveals the speaker's attitude, conversation intention, and personality, and this can be an important clue to speaker recognition. The final ending used in the speaker's speech determines the type of sentence or has functions and information such as the speaker's intention, psychological attitude, or relationship to the listener. The use of the terminating ending has various probabilities depending on the characteristics of the speaker, so the type and distribution of the terminating ending of a specific unidentified speaker will be helpful in recognizing the speaker. However, there have been few studies that considered speech in the existing text-based speaker recognition, and if speech information is added to the speech signal-based speaker recognition technique, the accuracy of speaker recognition can be further improved. Hence, the purpose of this paper is to propose a novel method using speech style expressed as a sentence-final ending to improve the accuracy of Korean speaker recognition. To this end, a method called sentence sequencing that generates vector values by using the type and frequency of the sentence-final ending appearing in the utterance of a specific person is proposed. To evaluate the performance of the proposed method, learning and performance evaluation were conducted with a actual drama script. The method proposed in this study can be used as a means to improve the performance of Korean speech recognition service.