• Title/Summary/Keyword: 음성공학

Search Result 1,130, Processing Time 0.026 seconds

Hi, KIA! Classifying Emotional States from Wake-up Words Using Machine Learning (Hi, KIA! 기계 학습을 이용한 기동어 기반 감성 분류)

  • Kim, Taesu;Kim, Yeongwoo;Kim, Keunhyeong;Kim, Chul Min;Jun, Hyung Seok;Suk, Hyeon-Jeong
    • Science of Emotion and Sensibility
    • /
    • v.24 no.1
    • /
    • pp.91-104
    • /
    • 2021
  • This study explored users' emotional states identified from the wake-up words -"Hi, KIA!"- using a machine learning algorithm considering the user interface of passenger cars' voice. We targeted four emotional states, namely, excited, angry, desperate, and neutral, and created a total of 12 emotional scenarios in the context of car driving. Nine college students participated and recorded sentences as guided in the visualized scenario. The wake-up words were extracted from whole sentences, resulting in two data sets. We used the soundgen package and svmRadial method of caret package in open source-based R code to collect acoustic features of the recorded voices and performed machine learning-based analysis to determine the predictability of the modeled algorithm. We compared the accuracy of wake-up words (60.19%: 22%~81%) with that of whole sentences (41.51%) for all nine participants in relation to the four emotional categories. Accuracy and sensitivity performance of individual differences were noticeable, while the selected features were relatively constant. This study provides empirical evidence regarding the potential application of the wake-up words in the practice of emotion-driven user experience in communication between users and the artificial intelligence system.

Change of Extracellular Matrix of Human Vocal Fold Fibroblasts by Vibratory Stimulation (진동이 성대세포주의 세포외기질 변화에 대한 연구)

  • Kim, Ji Min;Shin, Sung-Chan;Kwon, Hyun-Keun;Cheon, Yong-Il;Ro, Jung Hoon;Lee, Byung-Joo
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.32 no.1
    • /
    • pp.15-23
    • /
    • 2021
  • Background and Objectives During speech, the vocal folds oscillate at frequencies ranging from 100-200 Hz with amplitudes of a few millimeters. Mechanical stimulation is an essential factor which affects metabolism of human vocal folds. The effect of mechanical vibration on the cellular response in the human vocal fold fibroblasts cells (hVFFs) was evaluated. Materials and Method We created a culture systemic device capable of generating vibratory stimulations at human phonation frequencies. To establish optimal cell culture condition, cellular proliferation and viability assay was examined. Quantitative real time polymerase chain reaction was used to assess extracellular matrix (ECM) related and growth factors expression on response to changes in vibratory frequency and amplitude. Western blot was used to investigate ECM and inflammation-related transcription factor activation and its related cellular signaling transduction pathway. Results The cell viability was stable with vibratory stimulation within 24 h. A statistically significant increase of ECM genes (collagen type I alpha 1 and collagen type I alpha 2) and growth factor [transforming growth factor β1 (TGF-β1) and fibroblast growth factor 1 (FGF-1)] observe under the experimental conditions. Vibratory stimulation induced transcriptional activation of NF-κB by phosphorylation of p65 subunit through cellular Mitogen-activated protein kinases activation by extracellular signal regulated kinase and p38 mitogen-activated protein kinases (MAPKs) phosphorylation on hVFFs. Conclusion This study confirmed enhancing synthesis of collagen, TGF-β1 and FGF was testified by vibratory stimulation on hVFFs. This mechanism is thought to be due to the activation of NF-κB and MAPKs. Taken together, these results demonstrate that vibratory bioreactor may be a suitable alternative to hVFFs for studying vocal folds cellular response to vibratory vocalization.

Cross-sectional perception studies of children's monosyllabic word by naive listeners (일반 청자의 아동 발화 단음절에 대한 교차 지각 분석)

  • Ha, Seunghee;So, Jungmin;Yoon, Tae-Jin
    • Phonetics and Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.21-28
    • /
    • 2022
  • Previous studies have provided important findings on children's speech production development. They have revealed that essentially all aspects of children's speech shift toward adult-like characteristics over time. Nevertheless, few studies have examined the perceptual aspects of children's speech tokens, as perceived by naive adult listeners. To fill the gap between children's production and adults' perception, we conducted cross-sectional perceptual studies of monosyllabic words produced by children aged two to six years. Monosyllabic words in the consonant-vowel-consonant form were extracted from children's speech samples and presented aurally to five listener groups (20 listeners in total). Generally, the agreement rate between children's production of target words and adult listeners' responses increases with age. The perceptual responses to tokens produced by two-year old children induced the largest discrepancies and the responses to words produced by six years olds agreed the most. Further analyses were conducted to identify the sources of disagreement, including the types of segments and syllable structure. This study makes an important contribution to our understanding of the development and perception of children's speech across age groups.

An Analysis Study on the Current Status and Integration Methods of the Domestic Early Warning System (국내 재난 예경보 시스템 현황 및 통합 방안에 대한 분석 연구)

  • Hwang, Woosuk;Pyo, Kyungsoo
    • Journal of Broadcast Engineering
    • /
    • v.27 no.1
    • /
    • pp.80-90
    • /
    • 2022
  • Currently, the domestic early warning system is issued differently for each disaster, and is operated independently by relevant organizations from central government to local governments. Representative domestic disaster warning systems include disaster broadcasting using CBS(Cell Broadcasting Service) and DMB(Digital Multimedia Broadcasting) Automatic Emergency Alert Service, DITS(Disaster Information Transform System) transmitted and displayed on TV screens, automatic response system, automated rainfall warning system, and disaster message board. However, due to the difference in the method of issuing each emergency alert at the site of an emergency disaster, the alerts are issued at different times for each media, and the delivered content is also not integrated. If these systems are integrated, it is expected that damage to people's property and lives will be minimized by sharing and integrated management of disaster information such as voice, video, and data to comprehensively judge and make decisions about disaster situations. Therefore, in this study, we present a plan for the integration of the disaster warning system along with the analysis of the operation status of the domestic early warning system.

Investigation of acoustic performances of the creative convergence classrooms in elementary schools (초등학교 창의융합교실의 음향성능 조사)

  • A-Hyeon Jo;Chan-Hoon Haan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.4
    • /
    • pp.285-297
    • /
    • 2023
  • The present study aims to investigate the acoustic performance of the creative convergence classrooms in Korea used by elementary school students under the age of 9 introduced through the school space innovation project. In order to do this, acoustic performances of three creative convergence classrooms were measured. The measured acoustic parameters were background noise levels, Reverberation Time (RT), D50, Speech Transmission Index (STI), and Inter-Aural Cross Correlation (IACC). Also, acoustic parameters including Transmission Loss (TL) and standardized level difference (DnT) have been measured for the analysis of sound insulation performance of walls. In addition, the noise level was measured according to the opening conditions of doors and windows in the classroom. As a result, background noise level was measured at an average of 28.0 dB(A) to 32.8 dB(A) when the air conditioner was not operated, and the RT did not exceed 0.6 s. There were differences in IACC according to various desk layouts, and IACC values were high in the center line and the seats near the sound source. In particular, higher IACC was measured at the seats on the center line facing the source squarely. Regarding noise level in the classroom according to the opening conditions of doors and windows, the standards were exceeded when all windows, or windows and doors front onto the corridor were opened.

A Performance Improvement Method using Variable Break in Corpus Based Japanese Text-to-Speech System (가변 Break를 이용한 코퍼스 기반 일본어 음성 합성기의 성능 향상 방법)

  • Na, Deok-Su;Min, So-Yeon;Lee, Jong-Seok;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.155-163
    • /
    • 2009
  • In text-to-speech systems, the conversion of text into prosodic parameters is necessarily composed of three steps. These are the placement of prosodic boundaries. the determination of segmental durations, and the specification of fundamental frequency contours. Prosodic boundaries. as the most important and basic parameter. affect the estimation of durations and fundamental frequency. Break prediction is an important step in text-to-speech systems as break indices (BIs) have a great influence on how to correctly represent prosodic phrase boundaries, However. an accurate prediction is difficult since BIs are often chosen according to the meaning of a sentence or the reading style of the speaker. In Japanese, the prediction of an accentual phrase boundary (APB) and major phrase boundary (MPB) is particularly difficult. Thus, this paper presents a method to complement the prediction errors of an APB and MPB. First, we define a subtle BI in which it is difficult to decide between an APB and MPB clearly as a variable break (VB), and an explicit BI as a fixed break (FB). The VB is chosen using the classification and regression tree, and multiple prosodic targets in relation to the pith and duration are then generated. Finally. unit-selection is conducted using multiple prosodic targets. In the MOS test result. the original speech scored a 4,99. while proposed method scored a 4.25 and conventional method scored a 4.01. The experimental results show that the proposed method improves the naturalness of synthesized speech.

Diagnosis of Scoliosis Using Chest Radiographs with a Semi-Supervised Generative Adversarial Network (준지도학습 방법을 이용한 흉부 X선 사진에서 척추측만증의 진단)

  • Woojin Lee;Keewon Shin;Junsoo Lee;Seung-Jin Yoo;Min A Yoon;Yo Won Choi;Gil-Sun Hong;Namkug Kim;Sanghyun Paik
    • Journal of the Korean Society of Radiology
    • /
    • v.83 no.6
    • /
    • pp.1298-1311
    • /
    • 2022
  • Purpose To develop and validate a deep learning-based screening tool for the early diagnosis of scoliosis using chest radiographs with a semi-supervised generative adversarial network (GAN). Materials and Methods Using a semi-supervised learning framework with a GAN, a screening tool for diagnosing scoliosis was developed and validated through the chest PA radiographs of patients at two different tertiary hospitals. Our proposed method used training GAN with mild to severe scoliosis only in a semi-supervised manner, as an upstream task to learn scoliosis representations and a downstream task to perform simple classification for differentiating between normal and scoliosis states sensitively. Results The area under the receiver operating characteristic curve, negative predictive value (NPV), positive predictive value, sensitivity, and specificity were 0.856, 0.950, 0.579, 0.985, and 0.285, respectively. Conclusion Our deep learning-based artificial intelligence software in a semi-supervised manner achieved excellent performance in diagnosing scoliosis using the chest PA radiographs of young individuals; thus, it could be used as a screening tool with high NPV and sensitivity and reduce the burden on radiologists for diagnosing scoliosis through health screening chest radiographs.

A Survey on Awareness and Availability on Items of 2018 Assistive Devices Distribution Program for the Disabled in the Occupational Therapists (2018년도 장애인 보조기기 교부사업 품목에 대한 작업치료사의 인식도와 활용도 조사)

  • Kim, Jeong-Eun;Park, Je-Min;Bae, Su-Yeong;Jung, Nam-hae
    • Korean Journal of Occupational Therapy
    • /
    • v.26 no.4
    • /
    • pp.85-95
    • /
    • 2018
  • Objective : The purpose of this study was to investigate the awareness and availability on items of 2018 assistive devices distribution program for the disabled in the occupational therapists. Methods : A total of 132 occupational therapists participated in the survey from May 1 to May 31. Results : 96.2% of the occupational therapists responded that assistive device is helpful in lives of the disabled people. Especially, they responded that assistive device is the most helpful in 'movement and mobility'. Awareness on an angle spoon/fork with built-up handle and universal cuff was the highest, while a visual signaling indicator was the lowest. Availability on an air cushion was the highest, while a visual signaling indicator and a voice guidance system were the lowest. 67.4% responded that 'sometimes' they use the assistive device and 77.3% responded they will utilize the assistive device. To improve awareness and availability, 43.2% needed financial support, 32.6% needed to add insurance bill and 22.7% needed related education. Conclusion : In the future, this result will be available as a basic data for the education about assistive device for the occupational therapists.

Antibacterial and Proteolytic Activities of Bacterial Isolates from Ethnic Fermented Seafoods in the East Coast of Korea (동해안 특산 수산발효식품에서 분리된 균주의 항균 및 단백질 가수분해 활성)

  • Park, Woo Jung;Lee, Seung Hwan;Lee, Hyungjae
    • Food Engineering Progress
    • /
    • v.21 no.1
    • /
    • pp.88-92
    • /
    • 2017
  • We attempted to investigate antibacterial and proteolytic activities of bacteria isolated from three ethnic fermented seafoods in the east coast of South Korea, gajami sikhae, squid jeotgal, and fermented jinuari (Grateloupia filicina). Bacillus cereus ATCC 14579, Listeria monocytogenes ATCC 15313, Staphylococcus aureus KCTC 1916, Escherichia coli O157:H7 ATCC 43895, and Salmonella enterica serovar Typhimurium ATCC 4931 were selected to determine the antibacterial activity of the bacterial isolates. Among 233 isolates from the three foods, 36 isolates (15.5%) showed antibacterial activity against B. cereus ATCC 14579, the highest incidence of inhibition, followed by S. aureus KCTC 1916 (7.7%) and L. monocytogenes ATCC 15313 (6.0%). However, only five and three strains among the isolates exhibited inhibitory activity against Gram-negative indicators, E. coli ATCC 43895 and Sal. enterica ATCC 4931, respectively. The proteolytic activity of the isolates was determined via hydrolysis of skim milk after 24, 48, and 72 h incubation. After 72 h incubation, 72 out of 233 isolates (30.9%) showed proteolytic activity, and the isolates of fermented jinuari exhibited the highest incidence of proteolytic activity (60%, 36 isolates). These results suggest that ethnic fermented seafoods in the east coast of South Korea might be a promising source of bacterial strains producing antibacterial and proteolytic compounds.

A Semi-Automatic Semantic Mark Tagging System for Building Dialogue Corpus (대화 말뭉치 구축을 위한 반자동 의미표지 태깅 시스템)

  • Park, Junhyeok;Lee, Songwook;Lim, Yoonseob;Choi, Jongsuk
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.5
    • /
    • pp.213-222
    • /
    • 2019
  • Determining the meaning of a keyword in a speech dialogue system is an important technology for the future implementation of an intelligent speech dialogue interface. After extracting keywords to grasp intention from user's utterance, the intention of utterance is determined by using the semantic mark of keyword. One keyword can have several semantic marks, and we regard the task of attaching the correct semantic mark to the user's intentions on these keyword as a problem of word sense disambiguation. In this study, about 23% of all keywords in the corpus is manually tagged to build a semantic mark dictionary, a synonym dictionary, and a context vector dictionary, and then the remaining 77% of all keywords is automatically tagged. The semantic mark of a keyword is determined by calculating the context vector similarity from the context vector dictionary. For an unregistered keyword, the semantic mark of the most similar keyword is attached using a synonym dictionary. We compare the performance of the system with manually constructed training set and semi-automatically expanded training set by selecting 3 high-frequency keywords and 3 low-frequency keywords in the corpus. In experiments, we obtained accuracy of 54.4% with manually constructed training set and 50.0% with semi-automatically expanded training set.