• Title/Summary/Keyword: 말소리

Search Result 1,337, Processing Time 0.024 seconds

Voice range profile in premutation, mutation, and postmutation of men (변성이전, 변성 및 변성이후 남성의 발성범위 프로파일)

  • Kim, Jaeock;Lee, Seung Jin
    • Phonetics and Speech Sciences
    • /
    • v.13 no.4
    • /
    • pp.89-100
    • /
    • 2021
  • This study compared the voice range profiles (VRPs) with glissando and simplified VRP methods with 57 men who were in premutation (8-13 years), mutation (11-16 years), and postmutation (10-24 years) stages. The difference between modal and falsetto areas measured in two VRP methods was also compared. As the results, the average fundamental frequency (F0) was in the order of premuaton>mutation>postmutation. The maximum F0 (F0max), the range of F0 (F0range), the maximum intensity (Imax), and the range of intensity (Irange) were the lowest in the mutation stage, and these variables were higher in falsetto area than in modal area in both methods. In addition, most variables of VRP in glissando were higher than in simplified VRP, but the differences were not significant. This study showed that, in men in mutation stage, due to the temporary anatomical and physiological changes of the larynx, the mechanism of the vocal folds vibration changes and VRP shows a different pattern from that of other age groups. Both the VRPs of glissando and simplifed VRP are suitable for clinical practice by experienced examiners. And it is necessary to measure not only the falsetto area but also the modal area when measuring VRP.

Temperament characteristics of children with persistent and recovered stuttering: A longitudinal study (말더듬이 지속된 아동과 회복된 아동의 기질 특성 비교: 종단연구)

  • Chon, HeeCheong
    • Phonetics and Speech Sciences
    • /
    • v.13 no.4
    • /
    • pp.101-114
    • /
    • 2021
  • The purpose of this study was to investigate the temperament characteristics associated with stuttering subtypes (persistent and recovered) over time and the relationship between those characteristics and stuttering severity. This four-year longitudinal study covered 41 preschool children who stutter (CWS) and 30 preschool children who do not stutter (the CWNS group). At the final visit, 27 CWS were classified as the Recovered group and 14 CWS were classified as the Persistent group. Using the Children's Behavior Questionnaire-Short Form, each participant's temperament characteristics were measured twice: at one year and two years after the initial visit. The three subscale scores (Extraversion, Negative Affectivity, and Effortful Control) and the 15 component scores were analyzed, and they were used for between-group and between-visit comparisons. The Persistent group showed a significantly higher Negative Affectivity subscale score at every visit than the Recovered and CWNS groups. Within this subscale, significant group differences were found in the 'Fear' and 'Anger/Frustration' components, demonstrating that the Persistent group scored higher than the Recovered and CWNS groups. There was no significant correlation between the subscale and component scores and the stuttering severity scores within the Persistent group at any visit. These results support the proposition that these two stuttering subtypes have different temperament characteristics; they also imply that temperament might be influenced by stuttering experience over time.

Feasibility of hearing aid gain self-adjustment using speech recognition (말소리 인지를 이용한 보청기 이득 자가 조절의 실현)

  • Yun, Donghyeon;Shen, Yi;Zhang, Zhuohuang
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.1
    • /
    • pp.76-86
    • /
    • 2022
  • Personal hearing devices, such as hearing aids, may be fine-tuned by allowing the users to conduct self-adjustment. Two self-adjustment procedures were developed to collect the listener preferred gains in six octave-frequency bands from 0.25 kHz to 8 kHz. These procedures were designed to allow rapid exploration of a multi-dimensional parameter space using a simple, one-dimensional user control interface (i.e., a programmable knob). The two procedures differ in whether the user interface controls the gains in all frequency bands simultaneously (Procedure A) or only the gain in one frequency band (Procedure B) on a given trial. Monte-Carlo simulations suggested that for both procedures the gain preference identified by simulated listeners rapidly converged to the ground-truth preferred gain profile over the first 20 trials. Initial behavioral evaluations of the self-adjustment procedures, in terms of test-retest reliability, were conducted using 20 young, normal-hearing listeners. Each estimate of the preferred gain profile took less than 20 minutes. The deviation between two separate estimates of the preferred gain profile, conducted at least a week apart, was about 10 dB ~ 15 dB.

Cyber Threats Analysis of AI Voice Recognition-based Services with Automatic Speaker Verification (화자식별 기반의 AI 음성인식 서비스에 대한 사이버 위협 분석)

  • Hong, Chunho;Cho, Youngho
    • Journal of Internet Computing and Services
    • /
    • v.22 no.6
    • /
    • pp.33-40
    • /
    • 2021
  • Automatic Speech Recognition(ASR) is a technology that analyzes human speech sound into speech signals and then automatically converts them into character strings that can be understandable by human. Speech recognition technology has evolved from the basic level of recognizing a single word to the advanced level of recognizing sentences consisting of multiple words. In real-time voice conversation, the high recognition rate improves the convenience of natural information delivery and expands the scope of voice-based applications. On the other hand, with the active application of speech recognition technology, concerns about related cyber attacks and threats are also increasing. According to the existing studies, researches on the technology development itself, such as the design of the Automatic Speaker Verification(ASV) technique and improvement of accuracy, are being actively conducted. However, there are not many analysis studies of attacks and threats in depth and variety. In this study, we propose a cyber attack model that bypasses voice authentication by simply manipulating voice frequency and voice speed for AI voice recognition service equipped with automated identification technology and analyze cyber threats by conducting extensive experiments on the automated identification system of commercial smartphones. Through this, we intend to inform the seriousness of the related cyber threats and raise interests in research on effective countermeasures.

A study on the predictability of acoustic power distribution of English speech for English academic achievement in a Science Academy (과학영재학교 재학생 영어발화 주파수 대역별 음향 에너지 분포의 영어 성취도 예측성 연구)

  • Park, Soon;Ahn, Hyunkee
    • Phonetics and Speech Sciences
    • /
    • v.14 no.3
    • /
    • pp.41-49
    • /
    • 2022
  • The average acoustic distribution of American English speakers was statistically compared with the English-speaking patterns of gifted students in a Science Academy in Korea. By analyzing speech recordings, the duration time of which is much longer than in previous studies, this research identified the degree of acoustic proximity between the two parties and the predictability of English academic achievement of gifted high school students. Long-term spectral acoustic power distribution vectors were obtained for 2,048 center frequencies in the range of 20 Hz to 20,000 Hz by applying an long-term average speech spectrum (LTASS) MATLAB code. Three more variables were statistically compared to discover additional indices that can predict future English academic achievement: the receptive vocabulary size test, the cumulative vocabulary scores of English formative assessment, and the English Speaking Proficiency Test scores. Linear regression and correlational analyses between the four variables showed that the receptive vocabulary size test and the low-frequency vocabulary formative assessments which require both lexical and domain-specific science background knowledge are relatively more significant variables than a basic suprasegmental level English fluency in the predictability of gifted students' academic achievement.

The relationship between fluency levels and suprasegmentals according to the sentence types in the English read speech by Korean middle school English learners (한국 중학생의 영어 읽기 발화에서 문장유형에 따른 유창성 등급과 초분절 요소의 관계)

  • Kim, Hwa-Young
    • Phonetics and Speech Sciences
    • /
    • v.14 no.3
    • /
    • pp.51-66
    • /
    • 2022
  • This study aims to help Korean English learners to learn English pronunciation by revealing which suprasegmentals affect the implementation of English sentences closer to native English speakers when they read English sentences. To this end, Korean middle school English learners were selected as subjects and research data were gathered through sentence types (declarative, interrogative, imperative, and exclamative), as well as syllables. Speech rate, pause frequency, pause duration, F0 range, and rhythm among suprasegmentals were used for analysis of these English sentence utterances. Mean analysis, correlation analysis, and regression analysis were performed. The results showed that speech rate, pause frequency, pause duration, and F0 range affected the evaluation of fluency levels. In the regression analysis between all suprasegmentals and fluency levels, the suprasegmentals that most affected fluency levels were speech rate and F0 range. Rhythm had no meaningful relation with fluency levels. Therefore, when teaching English pronunciation, it is necessary to teach students to increase their speech rate and F0 range. In addition, students should be trained to reduce both the number and the duration of pauses during utterance to improve their fluency. It is noteworthy that of the four sentence types, exclamative sentences were produced with faster speech rate, fewer pauses, shorter pause duration, and higher rhythm values.

Development and validation of a Korean Affective Voice Database (한국형 감정 음성 데이터베이스 구축을 위한 타당도 연구)

  • Kim, Yeji;Song, Hyesun;Jeon, Yesol;Oh, Yoorim;Lee, Youngmee
    • Phonetics and Speech Sciences
    • /
    • v.14 no.3
    • /
    • pp.77-86
    • /
    • 2022
  • In this study, we reported the validation results of the Korean Affective Voice Database (KAV DB), an affective voice database available for scientific and clinical use, comprising a total of 113 validated affective voice stimuli. The KAV DB includes audio-recordings of two actors (one male and one female), each uttering 10 semantically neutral sentences with the intention to convey six different affective states (happiness, anger, fear, sadness, surprise, and neutral). The database was organized into three separate voice stimulus sets in order to validate the KAV DB. Participants rated the stimuli on six rating scales corresponding to the six targeted affective states by using a 100 horizontal visual analog scale. The KAV DB showed high internal consistency for voice stimuli (Cronbach's α=.847). The database had high sensitivity (mean=82.8%) and specificity (mean=83.8%). The KAV DB is expected to be useful for both academic research and clinical purposes in the field of communication disorders. The KAV DB is available for download at https://kav-db.notion.site/KAV-DB-75 39a36abe2e414ebf4a50d80436b41a.

A comparative analysis of metadata structures and attributes of Samsung smartphone voice recording files for forensic use (법과학적 활용을 위한 삼성 스마트폰 음성 녹음 파일의 메타데이터 구조 및 속성 비교 분석 연구)

  • Ahn, Seo-Yeong;Ryu, Se-Hui;Kim, Kyung-Wha;Hong, Ki-Hyung
    • Phonetics and Speech Sciences
    • /
    • v.14 no.3
    • /
    • pp.103-112
    • /
    • 2022
  • Due to the popularization of smartphones, most of the recorded speech files submitted as evidence of recent crimes are produced by smartphones, and the integrity (forgery) of the submitted speech files based on smartphones is emerging as a major issue in the investigation and trial process. Samsung smartphones with the highest domestic market share are distributed with built-in speech recording applications that can record calls and voice, and can edit recorded speech. Unlike editing through third-party speech (audio) applications, editing by their own builtin speech applications has a high similarity to the original file in metadata structures and attributes, so more precise analysis techniques need to prove integrity. In this study, we constructed a speech file metadata database for speech files (original files) recorded by 34 Samsung smartphones and edited speech files edited by their built-in speech recording applications. We analyzed by comparing the metadata structures and attributes of the original files to their edited ones. As a result, we found significant metadata differences between the original speech files and the edited ones.

A preliminary study of acoustic measures in male musical theater students by laryngeal height (뮤지컬 전공 남학생에서 후두 높이에 따른 음향학적 측정치에 대한 예비 연구)

  • Lee, Kwang Yong;Lee, Seung Jin
    • Phonetics and Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.55-65
    • /
    • 2022
  • This study aimed to compare acoustic measurements by the high, middle, and low laryngeal heights of male musical theater students. Furthermore, the correlation between the relative height of the larynx and the acoustic measurements was examined, along with the predictability of the relative height (vertical position) of the larynx from acoustic measurements. The participants included five male students majoring in musical theater singing, and acoustic analysis was performed by having them produce the /a/ vowel 10 times each at the laryngeal positions of high, middle, and low. The relative vertical positions of the laryngeal prominence in each position were measured based on the resting position. Results indicated that the relative position of the larynx varied significantly according to laryngeal height, such that as the larynx descended, the first three formant frequencies decreased while the spectral energy at the same frequencies increased. Formant frequencies showed a weak to moderate positive correlation with the relative height of the larynx, while the spectral energy showed a moderate negative correlation. The relative height of the larynx was predicted by eight acoustic measures (adjusted R2 = .829). In conclusion, the predictability of the relative height of the larynx was partially confirmed in a non-invasive manner.

Voice onset time in children with bilateral cochlear implants (양측 인공와우이식 아동의 성대진동시작시간 특성)

  • Jeon, Yesol;Lee, Youngmee
    • Phonetics and Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.77-86
    • /
    • 2022
  • This study aimed to investigate the voice onset time (VOT) of plosives in the VCV syllables by the place of articulation and phonation type spoken by children with bilateral cochlear implants (CIs) in comparison with children with typical hearing (TH). In all, 15 children with bilateral CIs and 15 children with TH participated in this study, aged between 5 to 10 years. All children produced 9 VCV syllables and their VOT were analyzed by the Praat software. There was no significant difference in mean VOT between children with bilateral CIs and children with TH. However, there was a significant difference in mean VOT by the place of articulation, such that the VOT for velars were longer than those for bilabials and alveolars. Additionally, there was a significant difference in mean VOT by the phonation type, such that the VOT of aspirated consonants were longer than those of lenis and fortis consonants. The results of this study suggest that children with bilateral CIs can distinguish the acoustic properties of plosive consonants and control the speech timing between the structures of the larynx and the oral cavity at a similar level as children with TH.