• Title/Summary/Keyword: 말소리

Search Result 1,337, Processing Time 0.02 seconds

Modified AWSSDR method for frequency-dependent reverberation time estimation (주파수 대역별 잔향시간 추정을 위한 변형된 AWSSDR 방식)

  • Min Sik Kim;Hyung Soon Kim
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.91-100
    • /
    • 2023
  • Reverberation time (T60) is a typical acoustic parameter that provides information about reverberation. Since the impacts of reverberation vary depending on the frequency bands even in the same space, frequency-dependent (FD) T60, which offers detailed insights into the acoustic environments, can be useful. However, most conventional blind T60 estimation methods, which estimate the T60 from speech signals, focus on fullband T60 estimation, and a few blind FDT60 estimation methods commonly show poor performance in the low-frequency bands. This paper introduces a modified approach based on Attentive pooling based Weighted Sum of Spectral Decay Rates (AWSSDR), previously proposed for blind T60 estimation, by extending its target from fullband T60 to FDT60. The experimental results show that the proposed method outperforms conventional blind FDT60 estimation methods on the acoustic characterization of environments (ACE) challenge evaluation dataset. Notably, it consistently exhibits excellent estimation performance in all frequency bands. This demonstrates that the mechanism of the AWSSDR method is valuable for blind FDT60 estimation because it reflects the FD variations in the impact of reverberation, aggregating information about FDT60 from the speech signal by processing the spectral decay rates associated with the physical properties of reverberation in each frequency band.

Machine-learning-based out-of-hospital cardiac arrest (OHCA) detection in emergency calls using speech recognition (119 응급신고에서 수보요원과 신고자의 통화분석을 활용한 머신 러닝 기반의 심정지 탐지 모델)

  • Jong In Kim;Joo Young Lee;Jio Chung;Dae Jin Shin;Dong Hyun Choi;Ki Hong Kim;Ki Jeong Hong;Sunhee Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.109-118
    • /
    • 2023
  • Cardiac arrest is a critical medical emergency where immediate response is essential for patient survival. This is especially true for Out-of-Hospital Cardiac Arrest (OHCA), for which the actions of emergency medical services in the early stages significantly impact outcomes. However, in Korea, a challenge arises due to a shortage of dispatcher who handle a large volume of emergency calls. In such situations, the implementation of a machine learning-based OHCA detection program can assist responders and improve patient survival rates. In this study, we address this challenge by developing a machine learning-based OHCA detection program. This program analyzes transcripts of conversations between responders and callers to identify instances of cardiac arrest. The proposed model includes an automatic transcription module for these conversations, a text-based cardiac arrest detection model, and the necessary server and client components for program deployment. Importantly, The experimental results demonstrate the model's effectiveness, achieving a performance score of 79.49% based on the F1 metric and reducing the time needed for cardiac arrest detection by 15 seconds compared to dispatcher. Despite working with a limited dataset, this research highlights the potential of a cardiac arrest detection program as a valuable tool for responders, ultimately enhancing cardiac arrest survival rates.

Change in acoustic characteristics of voice quality and speech fluency with aging (노화에 따른 음질과 구어 유창성의 음향학적 특성 변화)

  • Hee-June Park;Jin Park
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.45-51
    • /
    • 2023
  • Voice issues such as voice weakness that arise with age can have social and emotional impacts, potentially leading to feelings of isolation and depression. This study aimed to investigate the changes in acoustic characteristics resulting from aging, focusing on voice quality and spoken fluency. To this end, tasks involving sustained vowel phonation and paragraph reading were recorded for 20 elderly and 20 young participants. Voice-quality-related variables, including F0, jitter, shimmer, and Cepstral Peak Prominence (CPP) values, were analyzed along with speech-fluency-related variables, such as average syllable duration (ASD), articulation rate (AR), and speech rate (SR). The results showed that in voice quality-related measurements, F0 was higher for the elderly and voice quality was diminished, as indicated by increased jitter, shimmer, and lower CPP levels. Speech fluency analysis also demonstrated that the elderly spoke more slowly, as indicated by all ASD, AR, and SR measurements. Correlation analysis between voice quality and speech fluency showed a significant relationship between shimmer and CPP values and between ASD and SR values. This suggests that changes in spoken fluency can be identified early by measuring the variations in voice quality. This study further highlights the reciprocal relationship between voice quality and spoken fluency, emphasizing that deterioration in one can affect the other.

Spectral moment analysis of distortion errors in alveolar fricatives in Korean children (치조 마찰음 왜곡 오류 유무에 따른 아동 발화 적률분석 비교)

  • Yunju Han;Do Hyung Kim;Ja Eun Hwang;Dae-Hyun Jang;Jae Won Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.33-40
    • /
    • 2024
  • This study investigated acoustic features in spectral moment analysis, comparing accurate articulations with distortions of alveolar fricatives such as dentalization, palatalization, and lateralization. A retrospective analysis was conducted on speech samples from 61 children (mean age: 5.6±1.5 years, 19 females, 42 males) using the Assessment of Phonology & Articulation for Children (APAC) and Urimal-test of Articulation and Phonology I (U-TAP I). Spectral moment analysis was applied to 169 speech samples. The results revealed that the center of gravity of accurate articulations was higher than that of palatalization, while palatalization was lower than dentalization. The variance of dentalization was higher than that of both accurate articulations and palatalization. The skewness of dentalization was higher than that of accurate articulations, and the skewness of palatalization was higher than that of accurate articulations. The kurtosis of palatalization was higher than that of both accurate articulations and dentalization. No significant differences were observed for the position of fricatives (initial, medial) and tense type (plain, tense) across all variables of spectral moment analysis for each distortion type. This study confirmed distinct patterns in center of gravity, variance, skewness, and kurtosis depending on the type of alveolar fricative distortion. The objective values provided in this study will serve as foundational data for diagnosing alveolar fricative distortions in children with speech sound disorders.

Evaluation of the readability of self-reported voice disorder questionnaires (자기보고식 음성장애 설문지 문항의 가독성 평가)

  • HyeRim Kwak;Seok-Chae Rhee;Seung Jin Lee;HyangHee Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.41-48
    • /
    • 2024
  • The significance of self-reported voice assessments concerning patients' chief complaints and quality of life has increased. Therefore, readability assessments of questionnaire items are essential. In this study, readability analyses were performed based on text grade and complexity, vocabulary frequency and grade, and lexical diversity of the 11 Korean versions of self-reported voice disorder questionnaires (KVHI, KAVI, KVQOL, K-SVHI, K-VAPP, K-VPPC, TVSQ, K-VDCQ, K-VFI, K-VTDS, and K-VoiSS). Additionally, a comparative readability assessment was conducted on the original versions of these questionnaires to discern the differences between their Korean counterparts and the questionnaires for children. Consequently, it was determined that voice disorder questionnaires could be used without difficulty for populations with lower literacy levels. Evaluators should consider subjects' reading levels when conducting assessments, and future developments and revisions should consider their reading difficulties.

Characteristics of accurate token and all token diadochokinesis in patients with normal pressure hydrocephalus (정상압 수두증 환자와 정상 노인의 조음교대운동 수행력 비교)

  • Seong Hee Yoon;Ki-Su Park;Kyunghun Kang;Janghyeok Yoon;Ji-Wan Ha
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.57-65
    • /
    • 2024
  • Normal pressure hydrocephalus (NPH) is a condition wherein the cerebrospinal pressure in the brain is within the normal range, but the cerebrospinal fluid increases above the normal level, causing ventriculomegaly. In patients with NPH, the articulatory system exhibits reduced mobility and range, which may affect diadochokinesis (DDK) and speech intelligibility. In this study, we investigated the characteristics of DDK, including accurate-token DDK and all-token DDK including inaccurate tokens, in patients with NPH and healthy elderly adults (HE). We also examined the classification accuracy of DDK between the two groups. Finally, we investigated whether there was a correlation between speech intelligibility and DDKs in the NPH group. The results showed that NPH and HE groups differed significantly in both accurate-token DDK and all-token DDK, and their classification accuracy was relatively high. However, there was no correlation between speech intelligibility and DDK. The findings suggest that the DDK is a useful method for sensitively assessing speech motor performance in patients with NPH.

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

  • Sohee Han;Jisub Um;Hoirin Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.

Voice range differences in vowels by voice classification among male students of popular music vocals (대중가요 보컬 전공 남학생의 성종에 따른 모음 간 음역 차이)

  • Il-Song Ji;Jaeock Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.2
    • /
    • pp.37-47
    • /
    • 2024
  • This study was conducted on 27 male students majoring in or preparing for popular music vocals to determine whether they were aware of their voice classification and vocal range. Additionally, differences in the fundamental frequency and average speaking fundamental frequency were compared among the voice classifications. Moreover, considering that they may differ in their ability to produce high frequencies depending on the vowel, differences in voice ranges among the cardinal vowels, /a/, /i/, and /u/, were examined, and differences in voice ranges between vowels were compared by voice classification. The results showed that more than half of the male students majoring in or preparing for popular music vocals were not accurately aware of their voice types. In addition, statistically significant differences were found in the maximum fundamental frequency and frequency range among vowels, indicating differences in the voice range that can be produced depending on the vowel type. In particular, the voice range decreased in the following order: /a/>/u/>/i/. This suggests that while the vowel /a/ is easier to articulate in the high register compared to other vowels, vowels /u/ and /i/ as high vowels involve narrowing of the oral cavity due to the raised position of the tongue, accompanied by raising of the larynx, resulting in a decrease in voice range and difficulty in vocalizing in the high register.

Phonological retrieval and phonological memory skills in children with dyslexia and poor comprehension (난독증 아동과 읽기이해부진 아동의 음운인출과 음운기억 능력)

  • Hyojin Yoon
    • Phonetics and Speech Sciences
    • /
    • v.16 no.2
    • /
    • pp.83-90
    • /
    • 2024
  • This study aimed to explore phonological retrieval and phonological memory skills in second to third graders with dyslexia, poor comprehension, and typical development. The participants included 17 children with dyslexia, 17 children with poor comprehension, and 24 typically developing children. Children with dyslexia scored below 85 on the word decoding test, poor comprehender scored above 90 on the word decoding, and below 85 on the reading comprehension test and typical children scored above 90 on both reading tests. All participants were assessed on rapid automatized naming (RAN) and nonword repetition (NWR). The result indicated that children with dyslexia performed significantly worse on RAN and NWR tasks than other groups. However, there was significant differences between poor comprehender and typically developing children. Furthermore, only RAN were significantly correlated with word decoding and reading comprehension in children with dyslexia. For typically developing children, RAN was correlated with word decoding and reading comprehension, while NWR had a significant correlation with reading comprehension. No correlations were found between these variables for poor comprehender. The finding suggests that children with dyslexia showed difficulties on phonological retrieval and phonological memory, which are essential for reading development while poor comprehender do not have difficulties with phonological processing skills. Phonological processing deficits may underlie word decoding difficulties in dyslexia.

Comparison on knowledge and practice of vocal hygiene among students majoring in classical and popular music vocals (성악전공 대학생과 실용음악전공 대학생의 음성위생 지식과 수행 비교)

  • Choung Seo Park;Jaeock Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.2
    • /
    • pp.59-69
    • /
    • 2024
  • Due to differences in singing styles and voice production between classical and popular music singers, their knowledge and practice regarding vocal hygiene may differ. This study compared the knowledge and practice of vocal hygiene among 121 university undergraduate students (58 classical and 63 popular music vocal majors). Additionally, the correlation between the level of knowledge and practice of vocal hygiene and the subjective voice evaluation was examined. The results revealed that both knowledge and practice of vocal hygiene were significantly higher in classical than popular music vocal majors, and that vocal hygiene practice was significantly higher than knowledge in the entire group. In addition, there was a weak positive correlation between knowledge and practice of vocal hygiene; and a weak negative correlation between vocal hygiene practice and subjective voice evaluation. This study suggests that popular music vocal majors have relatively lower levels of knowledge and practice in vocal hygiene than classical music vocal majors. It also highlights the need to provide tailored vocal hygiene education programs for both classical and popular music vocal majors, as they show low levels of knowledge and practice in certain aspects of vocal hygiene.