• Title/Summary/Keyword: 음성 분석

Search Result 3,084, Processing Time 0.03 seconds

Voice Activity Detection Using Ellipse Fitting of the Oral Cavity Region (구강 영역에 대한 타원 근사법을 이용한 음성 구간 검출법)

  • Ryu, Jewoong;Choo, Sung Kwon;Kim, Gibak;Cho, Namik
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.07a
    • /
    • pp.271-274
    • /
    • 2012
  • 음성 신호처리에서 많이 쓰이는 음성구간 검출은 주로 음향신호의 분석을 통하여 음향 신호에 음성이 존재하는지 여부를 판단한다. 그러나 음향신호를 이용한 방법은 음성 또는 비음성 잡음이나 주위 음향 환경에 의하여 성능이 결정된다는 단점이 있다. 음향 환경 변화에 강인한 음성구간 검출을 수행하기 위하여, 영상정보를 이용한 음성구간 검출 방법들이 최근에 연구되어 왔는데 기존 방법들은 입술 모양의 변화를 추정하기 위하여 입술 모델 등을 이용하거나 구강(oral cavity) 영역에 해당하는 픽셀 수의 변화를 이용하여 음성 구간을 검출하였다. 위 방법들은 입술의 모양을 추정하는 데 복잡한 계산이 필요하거나, 입술 모양 추정 없이 구강 영역픽셀 수만 이용하기 때문에 다소 정확도가 떨어진다는 단점이 있다. 본 논문에서는, 입술 모양의 변화를 추정하기 위해 밖으로 드러나는 구강 영역의 모양을 타원 근사법으로 추정하고, 타원의 넓이와 높이의 변화를 이용하여 음성 구간을 검출하는 방법을 제안하였다. 비교 실험 결과, 제안하는 방법은 구강영역 픽셀 수의 변화만 이용하는 방법에 비해 우수한 성능을 보임을 확인할 수 있었다.

  • PDF

Statistical Model-Based Voice Activity Detection Using the Second-Order Conditional Maximum a Posteriori Criterion with Adapted Threshold (적응형 문턱값을 가지는 2차 조건 사후 최대 확률을 이용한 통계적 모델 기반의 음성 검출기)

  • Kim, Sang-Kyun;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.1
    • /
    • pp.76-81
    • /
    • 2010
  • In this paper, we propose a novel approach to improve the performance of a statistical model-based voice activity detection (VAD) which is based on the second-order conditional maximum a posteriori (CMAP). In our approach, the VAD decision rule is expressed as the geometric mean of likelihood ratios (LRs) based on adapted threshold according to the speech presence probability conditioned on both the current observation and the speech activity decisions in the pervious two frames. Experimental results show that the proposed approach yields better results compared to the statistical model-based and the CMAP-based VAD using the LR test.

Speech Synthesis Based on CVC Speech Segments Extracted from Continuous Speech (연속 음성으로부터 추출한 CVC 음성세그먼트 기반의 음성합성)

  • 김재홍;조관선;이철희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.7
    • /
    • pp.10-16
    • /
    • 1999
  • In this paper, we propose a concatenation-based speech synthesizer using CVC(consonant-vowel-consonant) speech segments extracted from an undesigned continuous speech corpus. Natural synthetic speech can be generated by a proper modelling of coarticulation effects between phonemes and the use of natural prosodic variations. In general, CVC synthesis unit shows smaller acoustic degradation of speech quality since concatenation points are located in the consonant region and it can properly model the coarticulation of vowels that are effected by surrounding consonants. In this paper, we analyze the characteristics and the number of required synthesis units of 4 types of speech synthesis methods that use CVC synthesis units. Furthermore, we compare the speech quality of the 4 types and propose a new synthesis method based on the most promising type in terms of speech quality and implementability. Then we implement the method using the speech corpus and synthesize various examples. The CVC speech segments that are not in the speech corpus are substituted by demonstrate speech segments. Experiments demonstrate that CVC speech segments extracted from about 100 Mbytes continuous speech corpus can produce high quality synthetic speech.

  • PDF

Pitch Detection by the Analysis of Speech and EGG Signals (2-채널 (음성 및 EGG) 신호 분석에 의한 피치검출)

  • Shin, Mu-Yong;Kim, Jeong-Cheol;Bae, Keun-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.5
    • /
    • pp.5-12
    • /
    • 1996
  • We propose a two-channel(Speech & EGG) pitch detection algorithm. The EGG signal monitors the vibratory motion of vocal folds very well. Therefore, using the EGG signal as well as speech signal, we obtain a reliable and robust pitch detection algorithm that minimizers problems occuring in the pitch detection with speech only. The proposed algorithm gives precise pitch markers that are synchronized to the speech in the time domain. Experimental results demonstrate the superiority of the two-channel pitch detection algorithm over the conventional method, and it can be used in obtaining reference pitch for evaluation of other pitch detection algorithms.

  • PDF

Development of energy expenditure measurement device based on voice and body activity (음성과 활동량을 이용한 에너지 소모량 측정기기 개발)

  • Im, Jae Joong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.6
    • /
    • pp.303-309
    • /
    • 2012
  • Energy expenditure values were estimated based on the voice signals and body activities. Voice signals and body activities were obtained using PVDF contact vibration sensor and 3-axis accelerometer, respectively. Vibration caused by voices, activity signals, and actual energy consumption were acquired using data acquisition system and gas analyzer. With the use of power values from the voice signals and weight as independent variables, R-square of 0.918 appeared to show the highest value. For activity outputs, use of signal vector magnitude, body mass index, height, and age as independent variables revealed to provide the highest correlation with actual energy expenditure. Estimation of energy expenditure based on voice and activity provides more accurate results than based on activity only.

Analysis of Eigenvalues of Covariance Matrices of Speech Signals in Frequency Domain for Various Bands (음성 신호의 주파수 영역에서의 주파수 대역별 공분산 행렬의 고유값 분석)

  • Kim, Seonil
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.05a
    • /
    • pp.293-296
    • /
    • 2016
  • Speech Signals consist of signals of consonants and vowels, but the lasting time of vowels is much longer than that of consonants. It can be assumed that the correlations between signal blocks in speech signal is very high. But the correlations between signal blocks in various frequency bands can be quite different. Each speech signal is divided into blocks which have 128 speech data. FFT is applied to each block. Various frequency areas of the results of FFT are taken and Covariance matrix between blocks in a speech signal is extracted and finally eigenvalues of those matrix are obtained. It is studied that in the eigenvalues of various frequency bands which band can be used to get more reliable result.

  • PDF

Noise Reduction Algorithm in Speech by Wiener Filter (위너필터에 의한 음성 중의 잡음제거 알고리즘)

  • Choi, Jae-Seung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.9
    • /
    • pp.1293-1298
    • /
    • 2013
  • This paper proposes a noise reduction algorithm using Wiener filter to remove the noise components from the noisy speech in order to improve the speech signal. The proposed algorithm first removes the noise spectrums of white noise from the noisy signal based on the noise reshaping and reduction method at each frame. And this algorithm enhances the speech signal using Wiener filter based on linear predictive coding analysis. In this experiment, experimental results of the proposed algorithm demonstrate using the speech and noise data by Japanese male speaker. Based on measuring the spectral distortion (SD) measure, experiments confirm that the proposed algorithm is effective for the speech by contaminated white noise. From the experiments, the maximum improvement in the output SD values was 4.94 dB better for white noise compared with former Wiener filter.

Performance Analysis of Speech Recognition Model based on Neuromorphic Architecture of Speech Data Preprocessing Technique (음성 데이터 전처리 기법에 따른 뉴로모픽 아키텍처 기반 음성 인식 모델의 성능 분석)

  • Cho, Jinsung;Kim, Bongjae
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.3
    • /
    • pp.69-74
    • /
    • 2022
  • SNN (Spiking Neural Network) operating in neuromorphic architecture was created by mimicking human neural networks. Neuromorphic computing based on neuromorphic architecture requires relatively lower power than typical deep learning techniques based on GPUs. For this reason, research to support various artificial intelligence models using neuromorphic architecture is actively taking place. This paper conducted a performance analysis of the speech recognition model based on neuromorphic architecture according to the speech data preprocessing technique. As a result of the experiment, it showed up to 84% of speech recognition accuracy performance when preprocessing speech data using the Fourier transform. Therefore, it was confirmed that the speech recognition service based on the neuromorphic architecture can be effectively utilized.

고령친화 AI음성 O2O 서비스의 서비스가치가 태도와 이용의도에 미치는 영향에 관한 연구

  • Lee, Myeong-Suk;Go, In-Gon
    • 한국벤처창업학회:학술대회논문집
    • /
    • 2021.11a
    • /
    • pp.125-128
    • /
    • 2021
  • 한국은 2025년 전체 인구 중 65세 이상 인구가 20%을 상회하는 초고령 사회 진입이 전망되면서 노화수준에 맞는 고령친화적인 제품서비스 공급이 요구된다. 특히 시니어 소비자가 사용하기 편리한 인터페이스를 갖춘 서비스가 필요하다. 이에 시니어들은 노화(aging)에 대한 문제의식에 비용을 지불하며 젊은 소비자들과 유사한 소비행태를 보이고, 노화 수준별 건강 유지 및 건강 불안, 돌봄 공백, 사회적 고립 증가 등 사회문제가 복합적으로 심화되면서 고령친화적인 스마트한 Aging Service 공급이 요구된다. 이러한 시기와 맞물려 with코로나시대 시니어 소비자가 사용하기 편리한 인터페이스를 갖는 제품·서비스로 4차 산업혁명의 중심인 AI(Artificial Intelligence)와 정보통신 기술의 노력이 가시화되고 있다. 따라서 IT 기술에 덧입혀 시니어들의 욕구에 부합하는 AI 음성인식 기능을 탑재한 제품 및 서비스가 향후 고령친화산업 성장을 주도할 것으로 전망된다. 이에 본 연구는 '고령친화 AI 음성 O2O 서비스'의 서비스 가치가 태도와 이용의도에 영향을 미치는가를 분석하기 위해 선행이론을 토대로 전문가 델파이 방법을 통해 고령친화 AI 음성 O2O 서비스의 정의를 도출하고 실증분석으로 '고령친화 AI 음성 O2O 서비스'의 서비스가치(상황기반 제공성, 즉시연결성, 위치정확성)와 태도 및 이용의도간의 인과관계를 조사하기 위해 본 연구를 진행하였다.

  • PDF

A MAC Protocol for the Integrated Voice/Data Services in Packet CDMA Network (패킷 CDMA 망에서 음성/데이타 통합 서비스를 위한 MAC 프로토콜)

  • Lim, In-Taek
    • Journal of KIISE:Information Networking
    • /
    • v.27 no.1
    • /
    • pp.68-75
    • /
    • 2000
  • In this paper, a media access control protocol is proposed for voice/data integrated services in the packet CDMA network, and the performance of the proposed protocol is analyzed. The proposed protocol uses the spreading code sensing and the reservation schemes. This protocol gives higher priority to the delay-sensitive voice traffic than to the data traffic. A voice terminal can reserve an available spreading code during a talkspurt to transmit multiple voice packets. On the other hand, whenever a data packet is generated, the data terminal transmits the packet through one of the available spreading codes that are not used by the voice terminals. In this protocol, the voice packets do not come into collision with the data packets. The numerical results show that this protocol can increase the maximum number of voice terminals. The performance for the data traffic degrades by increasing the voice traffic load because of the low priority. But it shows that the data traffic performance can be increased in proportion to the number of spreading codes.

  • PDF