• Title/Summary/Keyword: Speech sound

Search Result 625, Processing Time 0.023 seconds

Decision Tree for Likely phoneme model schema support (유사 음소 모델 스키마 지원을 위한 결정 트리)

  • Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.11 no.10
    • /
    • pp.367-372
    • /
    • 2013
  • In Speech recognition system, there is a problem with phoneme in the model training and it cause a stored mode regeneration process which come into being appear time and more costs. In this paper, we propose the methode of likely phoneme model schema using decision tree clustering. Proposed system has a robust and correct sound model which system apply the decision tree clustering methode form generate model, therefore this system reduce the regeneration process and provide a retrieve the phoneme unit in probability model. Also, this proposed system provide a additional likely phoneme model and configured robust correct sound model. System performance as a result of represent vocabulary dependence recognition rate of 98.3%, vocabulary independence recognition rate of 98.4%.

A Study on the Gender and Age Classification of Speech Data Using CNN (CNN을 이용한 음성 데이터 성별 및 연령 분류 기술 연구)

  • Park, Dae-Seo;Bang, Joon-Il;Kim, Hwa-Jong;Ko, Young-Jun
    • The Journal of Korean Institute of Information Technology
    • /
    • v.16 no.11
    • /
    • pp.11-21
    • /
    • 2018
  • Research is carried out to categorize voices using Deep Learning technology. The study examines neural network-based sound classification studies and suggests improved neural networks for voice classification. Related studies studied urban data classification. However, related studies showed poor performance in shallow neural network. Therefore, in this paper the first preprocess voice data and extract feature value. Next, Categorize the voice by entering the feature value into previous sound classification network and proposed neural network. Finally, compare and evaluate classification performance of the two neural networks. The neural network of this paper is organized deeper and wider so that learning is better done. Performance results showed that 84.8 percent of related studies neural networks and 91.4 percent of the proposed neural networks. The proposed neural network was about 6 percent high.

Acoustic design principles and the acoustical performance analysis of Incheon International Airport (인천국제공항의 음향설계원리 및 성능분석)

  • Haan, Chan-Hoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.3
    • /
    • pp.275-282
    • /
    • 2019
  • In airport terminal, aural information is transmitted during 24 hours a day including announcement, background music and emergency control. So, clear sound is mostly necessary to transmitted to the passengers in airports. IIA (Incheon International Airport) is one of the largest airports accommodating 45 million people a year which have been built since 2001. There are currently three passenger terminals including Terminal 1 & 2, and boarding concourse. The $2^{nd}$ passenger terminal is under construction to expand the spaces which will be finished in 2020. The present work aims to explain the design principles of both architectural acoustics and electo-acoustics which have been applied to all the terminal buildings in IIA including ticketing counter, great hall, departure concourse and transportation center. Also, the acoustical performances of those spaces were examined. As a result, acoustic standards for airport were suggested. Architectural concepts for designing ceiling spaces and sound absorption treatments were suggested. Also, electro-acoustic design principles were commented.

CNN based dual-channel sound enhancement in the MAV environment (MAV 환경에서의 CNN 기반 듀얼 채널 음향 향상 기법)

  • Kim, Young-Jin;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.12
    • /
    • pp.1506-1513
    • /
    • 2019
  • Recently, as the industrial scope of multi-rotor unmanned aerial vehicles(UAV) is greatly expanded, the demands for data collection, processing, and analysis using UAV are also increasing. However, the acoustic data collected by using the UAV is greatly corrupted by the UAV's motor noise and wind noise, which makes it difficult to process and analyze the acoustic data. Therefore, we have studied a method to enhance the target sound from the acoustic signal received through microphones connected to UAV. In this paper, we have extended the densely connected dilated convolutional network, one of the existing single channel acoustic enhancement technique, to consider the inter-channel characteristics of the acoustic signal. As a result, the extended model performed better than the existed model in all evaluation measures such as SDR, PESQ, and STOI.

A Mobile Newspaper Application Interface to Enhance Information Accessibility of the Visually Impaired (시각장애인의 정보 접근성 향상을 위한 모바일 신문 어플리케이션 인터페이스)

  • Lee, Seung Hwan;Hong, Seong Ho;Ko, Seung Hee;Choi, Hee Yeon;Hwang, Sung Soo
    • Journal of the HCI Society of Korea
    • /
    • v.11 no.3
    • /
    • pp.5-12
    • /
    • 2016
  • The number of visually-impaired people using a smartphone is currently increasing with the help Text-to-Speech(TTS). TTS converts text data in a mobile application into sound data, and it only allows sequential search. For this reason, the location of buttons and contents inside an application should be determined carefully. However, little attention has been made on TTS service environment during the development of mobile newspaper application. This makes visually-impaired people difficult to use these applications. Furthermore, a mobile application interface which also reflects the desire of the low vision is necessary. Therefore, this paper presents a mobile newspaper interface which considers the accessibility and the desire of various visually impaired people. To this end, the proposed interface locates buttons with the consideration of TTS service environment and provides search functionality. The proposed interface also enables visually impaired people to use the application smoothly by filtering out the words that are pronounced improperly and providing the proper explanation for every button. Finally, several functionalities such as increasing font size and color reversal are implemented for the low vision. Simulation results show that the proposed interface achieves better performance than other applications in terms of search speed and usability.

Comparisons of voice quality parameter values measured with MDVP, Praat, and TF32 (MDVP, Praat, TF32에 따른 음향학적 측정치에 대한 비교)

  • Ko, Hye-Ju;Woo, Mee-Ryung;Choi, Yaelin
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.73-83
    • /
    • 2020
  • Measured values may differ between Multi-Dimensional Voice Program (MDVP), Praat, and Time-Frequency Analysis software (TF32), all of which are widely used in voice quality analysis, due to differences in the algorithms used in each analyzer. Therefore, this study aimed to compare the values of parameters of normal voice measured with each analyzer. After tokens of the vowel sound /a/ were collected from 35 normal adult subjects (19 male and 16 female), they were analyzed with MDVP, Praat, and TF32. The mean values obtained from Praat for jitter variables (J local, J abs, J rap, and J ppq), shimmer variables (S local, S dB, and S apq), and noise-to-harmonics ratio (NHR) were significantly lower than those from MDVP in both males and females (p<.01). The mean values of J local, J abs, and S local were significantly lower in the order MDVP, Praat, and TF32 in both genders. In conclusion, the measured values differed across voice analyzers due to the differences in the algorithms each analyzer uses. Therefore, it is important for clinicians to analyze pathologic voice after understanding the normal criteria used by each analyzer when they use a voice analyzer in clinical practice.

Machine-learning-based out-of-hospital cardiac arrest (OHCA) detection in emergency calls using speech recognition (119 응급신고에서 수보요원과 신고자의 통화분석을 활용한 머신 러닝 기반의 심정지 탐지 모델)

  • Jong In Kim;Joo Young Lee;Jio Chung;Dae Jin Shin;Dong Hyun Choi;Ki Hong Kim;Ki Jeong Hong;Sunhee Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.109-118
    • /
    • 2023
  • Cardiac arrest is a critical medical emergency where immediate response is essential for patient survival. This is especially true for Out-of-Hospital Cardiac Arrest (OHCA), for which the actions of emergency medical services in the early stages significantly impact outcomes. However, in Korea, a challenge arises due to a shortage of dispatcher who handle a large volume of emergency calls. In such situations, the implementation of a machine learning-based OHCA detection program can assist responders and improve patient survival rates. In this study, we address this challenge by developing a machine learning-based OHCA detection program. This program analyzes transcripts of conversations between responders and callers to identify instances of cardiac arrest. The proposed model includes an automatic transcription module for these conversations, a text-based cardiac arrest detection model, and the necessary server and client components for program deployment. Importantly, The experimental results demonstrate the model's effectiveness, achieving a performance score of 79.49% based on the F1 metric and reducing the time needed for cardiac arrest detection by 15 seconds compared to dispatcher. Despite working with a limited dataset, this research highlights the potential of a cardiac arrest detection program as a valuable tool for responders, ultimately enhancing cardiac arrest survival rates.

A Study on the Characteristics of the Korean Adult Male Sound According to Sasang Constitution Using PSCC with a Sentence (사상체질음성분석기(四象體質音聲分析機)(PSSC)를 통한 한국인 성인남성(成人男性)의 체질별(體質別) 음향특성연구(音響特性硏究) - 단문(短文)을 중심으로 -)

  • Choi, Jae-Wan;Song, Hak-Soo;Han, Dong-Youn;Cho, Sung-Eon;Wang, Hyang-Lan;Jeon, Jong-Weon;Kim, Dal-Rae;Yoo, Jun-Sang
    • Journal of Sasang Constitutional Medicine
    • /
    • v.18 no.3
    • /
    • pp.64-74
    • /
    • 2006
  • 1. Objectives and Methods A Study on the Characteristics of the Korean Adult Male Sound according to Sasang Constitution using PSSC with a Sentence. Sasang Constitutional Medicine(SCM) is the one of the traditional Korean Medicine. It classifies people into four categories like Taeyangin, Soyangin, Taeumin and Soeumin. The rule to classify is Appearance and Body Shape, Facial Appearance and Speech, Character and Talents and Diseases and Medications. This study was done to investigate the relationships between Voice and Sound parameters using PSSC(Phonetic System of Sasang Constitution) in a sentence. Experimental Participants were 195 Korean adult males including 1 Taeyangin, 37 Soyangin, 105 Taeumin and 52 Soeumin. Sasang Constitutional specialist used PSSC and Korean Medical Diagnosis to classify participants into four constitution. 2. Results In Pitch segment, Soyangin's Center freq.(4) was significantly high compared with Taeyangin and Taeumin groups. Soyangin's and Soeumin's Center freq.(6) was significantly high compared with Taeyangin and Taeumin groups. In APQ segment and Octave segment, there were no significant differences among four groups. In Shimmer segment, Taeumin's F Shimmer(1) and F Shimmer(2) were significantly high compared with Taeyangin and Soyangin groups. In Energy segment, Taeyangin's 2k-4k total sum, 2k-4k dev., C dev., C# dev. and D S.D. were significantly high compared with other groups. In Recording time segment, there was no significant difference among four groups. More Taeyangin cases and the other parameters are needed to determine constitution using PSSC and to make PSSC effective. 3. Conclusions From above result, there is the possibility of efficiency standard guide for constitution diagnosis by analyzation og voice.

  • PDF

An Adaptive Background Sound Mixing Algorithm Based on Energy and LP Analysis of Speech Signal (음성신호 에너지 및 LP 분석 기반 적응적 배경음혼합 알고리즘)

  • Kang, Jin Ah;Chun, Chan Jun;Kim, Hong Kook;Kim, Myeong Bo;Kim, Ji Woon
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2010.11a
    • /
    • pp.260-261
    • /
    • 2010
  • 본 논문에서는 제작된 콘텐츠에 배경음을 간편하고 효과적으로 혼합하기 위해서 녹음된 신호(전경음)를 분석하여 배경음 에너지를 적응적으로 조절하는 배경음혼합 알고리즘을 제안한다. 이를 위해, 제안된 알고리즘은 등청감 곡선 (equal-loudness curve) 및 linear prediction (LP) 분석에 기반하여 전경음신호의 청감 에너지 및 음성신호 존재여부를 결정한다. 이에 따라 전경음에 음성신호가 존재하는 경우에는 음성이 명확하게 들릴 수 있도록 혼합된 배경음의 에너지를 하향 조절하고, 반대로 전경음에 음성신호가 존재하지 않는 경우에는 배경음이 명확하게 들릴 수 있도록 혼합된 배경음의 에너지를 상향 조절한다. 제안된 알고리즘의 효율성을 검증하기 위해, 고정 가중치를 이용하여 배경음을 혼합하는 경우와의 음질 선호도 조사를 실시한 결과, 제안된 알고리즘에 대한 높은 선호도를 보였다.

  • PDF

A Novel Covariance Matrix Estimation Method for MVDR Beamforming In Audio-Visual Communication Systems (오디오-비디오 통신 시스템에서 MVDR 빔 형성 기법을 위한 새로운 공분산 행렬 예측 방법)

  • You, Gyeong-Kuk;Yang, Jae-Mo;Lee, Jinkyu;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.5
    • /
    • pp.326-334
    • /
    • 2014
  • This paper proposes a novel covariance matrix estimation scheme for minimum variance distortionless response (MVDR) beamforming. By accurately tracking direction-of-sound source arrival (DoA) information using audio-visual sensors, the covariance matrix is efficiently estimated by adopting a variable forgetting factor. The variable forgetting factor is determined by considering signal-to-interference ratio (SIR). Experimental results verify that the performance of the proposed method is superior to that of the conventional one in terms of interference/noise reduction and speech distortion.