• Title/Summary/Keyword: speech source

Search Result 281, Processing Time 0.022 seconds

멀티펄스의 진폭보정에 관한 연구 (A Study on Compensation of Amplitude in Multi Pulse)

  • 이시우
    • 한국산학기술학회논문지
    • /
    • 제12권9호
    • /
    • pp.4119-4124
    • /
    • 2011
  • 유성음원과 무성음원을 사용하는 멀티펄스 음성부호화 방식에 있어서, 음성신호의 진폭이 증가하거나 감소하는 경우에 음성 파형에 일그러짐이 나타난다. 이것은 대표구간의 멀티펄스를 피치구간마다 복원하는 과정에서 재생 음성신호가 정규화되는 것이 원인으로 작용한다. 이것을 해결하기위하여 본 논문에서는 피치구간마다 멀티펄스의 진폭을 보정하는 방법(AC-MPC)을 제시하였으며, 기존의 MPC와 멀티펄스 진폭을 보정한 AC-MPC의 SNRseg를 평가한 결과, AC-MPC의 남자음성에서 0.7dB, 여자음성에서 0.7dB 개선된 것을 확인할 수 있었다. 결국, MPC에 비해 AC-MPC의 SNRseg가 개선되어 음성파형의 일그러짐을 제어할 수 있었으며, 본 방법은 셀룰러폰이나 스마트폰과 같이 Low Bit Rate의 음원을 사용하여 음성신호를 부호화하는 방식에 활용할 수 있을 것으로 기대된다.

목소리 특성의 청취 평가에 기초한 사상체질과 음성 특징의 상관관계 분석 (Analysis of the Relationship Between Sasang Constitutional Groups and Speech Features Based on a Listening Evaluation of Voice Characteristics)

  • 권철홍;김종열;김근호;장준수
    • 말소리와 음성과학
    • /
    • 제4권4호
    • /
    • pp.71-77
    • /
    • 2012
  • Sasang constitution experts utilize voice characteristics as an auxiliary measure for deciding a person's constitutional group. This study aims at establishing a relationship between speech features and the constitutional groups by subjective listening evaluation of voice characteristics. A speech database of 841 speakers whose constitutional groups have been already diagnosed by Sasang constitution experts was constructed. Speech features related to speech source and vocal tract filter were extracted from five vowels and one sentence. Statistically significant speech features for classifying the groups were analyzed using SPSS. The features contributed to constitution classification were speaking rate, Energy, A1, A2, A3, H1, H2, H4, CPP for males in their 20s, F0_mean, CPP, SPI, HNR, Shimmer, Energy, A1, A2, A3, H1, H2, H4 for females in their 20s, Energy, A1, A2, A3, H1, H2, H4, CPP for male in the 60s, and Jitter, HNR, CPP, SPI for females in their 60s. Experimental results show that speech technology is useful in classifying constitutional groups.

English-Korean speech translation corpus (EnKoST-C): Construction procedure and evaluation results

  • Jeong-Uk Bang;Joon-Gyu Maeng;Jun Park;Seung Yun;Sang-Hun Kim
    • ETRI Journal
    • /
    • 제45권1호
    • /
    • pp.18-27
    • /
    • 2023
  • We present an English-Korean speech translation corpus, named EnKoST-C. End-to-end model training for speech translation tasks often suffers from a lack of parallel data, such as speech data in the source language and equivalent text data in the target language. Most available public speech translation corpora were developed for European languages, and there is currently no public corpus for English-Korean end-to-end speech translation. Thus, we created an EnKoST-C centered on TED Talks. In this process, we enhance the sentence alignment approach using the subtitle time information and bilingual sentence embedding information. As a result, we built a 559-h English-Korean speech translation corpus. The proposed sentence alignment approach showed excellent performance of 0.96 f-measure score. We also show the baseline performance of an English-Korean speech translation model trained with EnKoST-C. The EnKoST-C is freely available on a Korean government open data hub site.

음성 신호의 다구간 에너지 차를 이용한 새로운 프리엠퍼시스 방법에 관한 연구 (A Study on a New Pre-emphasis Method Using the Short-Term Energy Difference of Speech Signal)

  • 김동준;김주리
    • 대한전기학회논문지:시스템및제어부문D
    • /
    • 제50권12호
    • /
    • pp.590-596
    • /
    • 2001
  • The pre-emphasis is an essential process for speech signal processing. Widely used two methods are the typical method using a fixed value near unity and te optimal method using the autocorrelation ratio of the signal. This study proposes a new pre-emphasis method using the short-term energy difference of speech signal, which can effectively compensate the glottal source characteristics and lip radiation characteristics. Using the proposed pre-emphasis, speech analysis, such as spectrum estimation, formant detection, is performed and the results are compared with those of the conventional two pre-emphasis methods. The speech analysis with 5 single vowels showed that the proposed method enhanced the spectral shapes and gave nearly constant formant frequencies and could escape the overlapping of adjacent two formants. comparison with FFT spectra had verified the above results and showed the accuracy of the proposed method. The computational complexity of the proposed method reduced to about 50% of the optimal method.

  • PDF

KMSAV: Korean multi-speaker spontaneous audiovisual dataset

  • Kiyoung Park;Changhan Oh;Sunghee Dong
    • ETRI Journal
    • /
    • 제46권1호
    • /
    • pp.71-81
    • /
    • 2024
  • Recent advances in deep learning for speech and visual recognition have accelerated the development of multimodal speech recognition, yielding many innovative results. We introduce a Korean audiovisual speech recognition corpus. This dataset comprises approximately 150 h of manually transcribed and annotated audiovisual data supplemented with additional 2000 h of untranscribed videos collected from YouTube under the Creative Commons License. The dataset is intended to be freely accessible for unrestricted research purposes. Along with the corpus, we propose an open-source framework for automatic speech recognition (ASR) and audiovisual speech recognition (AVSR). We validate the effectiveness of the corpus with evaluations using state-of-the-art ASR and AVSR techniques, capitalizing on both pretrained models and fine-tuning processes. After fine-tuning, ASR and AVSR achieve character error rates of 11.1% and 18.9%, respectively. This error difference highlights the need for improvement in AVSR techniques. We expect that our corpus will be an instrumental resource to support improvements in AVSR.

휴머노이드 로봇을 위한 원거리 음성 인터페이스 기술 연구 (Distant-talking of Speech Interface for Humanoid Robots)

  • 이협우;육동석
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.39-40
    • /
    • 2007
  • For efficient interaction between human and robots, speech interface is a core problem especially in noisy and reverberant conditions. This paper analyzes main issues of spoken language interface for humanoid robots, such as sound source localization, voice activity detection, and speaker recognition.

  • PDF

A Frequency-Domain Normalized MBD Algorithm with Unidirectional Filters for Blind Speech Separation

  • Kim Hye-Jin;Nam Seung-Hyon
    • The Journal of the Acoustical Society of Korea
    • /
    • 제24권2E호
    • /
    • pp.54-60
    • /
    • 2005
  • A new multichannel blind deconvolution algorithm is proposed for speech mixtures. It employs unidirectional filters and normalization of gradient terms in the frequency domain. The proposed algorithm is shown to be approximately nonholonomic. Thus it provides improved convergence and separation performances without whitening effect for nonstationary sources such as speech and audio signals. Simulations using real world recordings confirm superior performances over existing algorithms and its usefulness for real applications.

장애인의 상용치료원 보유가 환자 중심 의사소통에 미치는 영향 (The Effect of Having a Usual Source of Care on Patient-Centered Communication among Persons with Disabilities)

  • 전보영;이민영;안은미
    • 보건행정학회지
    • /
    • 제31권4호
    • /
    • pp.518-530
    • /
    • 2021
  • Background: This study examined the effect of having a usual source of care on the degree of patient-centered communication among persons with disability. The role of the usual source of care has been emphasized to improve patient experience, especially for patients with complex health conditions. Methods: This study used the 2017-2018 Korean Health Panel data, and the final study observations were 22,475 (20,806 people without disability and 1,669 people with disability). We applied generalized estimating equation model to show the effect of having a usual source of care on patient-centered communication, and subgroup analysis considering the types and severity of disabilities. Results: Persons who have disabilities, compared with ones without it, significantly had more usual sources of care (32.4% vs. 24.6%). By type of disability, persons with mental (51.4%), internal organ (43.8%), visual (37%), and physical disabilities (31.6%) had more usual sources of care than hearing/speech (26.6%), and developmental disabilities (18.6%). The average score of patient-centered communication was higher among who had a usual sources of care (3.2 vs. 2.7), and the regression analysis showed that having a usual sources of care was positively associated with higher patient-centered communication score (𝛽=0.476, p<0.05). However, the positive effects of usual sources of care was not observed among persons with severe hearing/speech, developmental, and mental disabilities. Conclusion: This study showed that role of patient-centered communication was limited in persons with severe hearing/speech disabilities, developmental, and mental disabilities. The education programs and supports are needed to improve communication skills between medical staff and persons with specific types of disabilities.

특징 맵 중요도 기반 어텐션을 적용한 복소 스펙트럼 기반 음성 향상에 관한 연구 (A study on speech enhancement using complex-valued spectrum employing Feature map Dependent attention gate)

  • 정재희;김우일
    • 한국음향학회지
    • /
    • 제42권6호
    • /
    • pp.544-551
    • /
    • 2023
  • 잡음 음성의 지각적 품질과 명료도 향상을 위해 활용되는 음성 향상은 크기 스펙트럼을 이용한 방법에서 크기와 위상을 같이 향상시킬 수 있는 복소 스펙트럼을 이용한 방법으로 연구되어왔다. 본 논문에서는 잡음 음성의 명료도와 품질을 더욱 향상시키기 위해 복소 스펙트럼 기반 음성 향상 시스템에 어텐션 기법을 적용하는 방안에 관해 연구를 수행하였다. 어텐션 기법은 additive attention을 기반으로 수행하며 복소 스펙트럼의 특성을 고려하여 어텐션 가중치를 계산할 수 있도록 하였다. 또한 특징 맵의 중요도를 고려하기 위해 전역 평균 풀링 연산을 같이 사용하였다. 복소 스펙트럼 기반 음성 향상은 Deep Complex U-Net(DCUNET) 모델을 기반으로 수행하였으며, additive attention은 Attention U-Net 모델에서 제안된 방법을 기반으로 연구를 수행하였다. 거실 환경의 잡음 데이터에 대해 음성 향상을 수행한 결과, 제안한 방법이 Source to Distortion Ratio(SDR), Perceptual Evaluation of Speech Quality(PESQ), Short Time Objective Intelligibility(STOI) 평가 지표에서 기준 모델보다 개선된 성능을 보였으며, 낮은 Signal-to-Noise Ratio(SNR) 조건의 다양한 배경 잡음 환경에 대해서도 일관된 성능 향상을 보였다. 이를 통해 제안한 음성 향상 시스템이 효과적으로 잡음 음성의 명료도와 품질을 향상시킬 수 있음을 보여주었다.

멀티펄스의 위치보정 방법을 이용한 8kbps PC-MPC에 관한 연구 (A Study on 8kbps PC-MPC by Using Position Compensation Method of Multi-Pulse)

  • 이시우
    • 디지털융복합연구
    • /
    • 제11권5호
    • /
    • pp.285-290
    • /
    • 2013
  • 유성음원과 무성음원을 사용하는 멀티펄스 음성부호화 방식에 있어서, 대표구간의 멀티펄스를 사용하는 경우에 유성음의 합성음성파형에서 일그러짐이 나타난다. 이것은 대표구간의 멀티펄스를 피치구간마다 복원하는 과정에서 재생 음성신호가 정규화되는 것이 원인으로 작용한다. 이것을 해결하기위하여 본 논문에서는 피치구간마다 멀티펄스의 위치를 보정하는 방법(PC-MPC)을 제시하였으며, 기존의 MPC와 멀티펄스 위치를 보정한 PC-MPC의 $SNR_{seg}$를 평가한 결과, PC-MPC의 남자음성에서 0.4dB, 여자음성에서 0.5dB 개선된 것을 확인할 수 있었다. 결국, MPC에 비해 PC-MPC의 $SNR_{seg}$가 개선되어 음성파형의 일그러짐을 제어할 수 있었으며, 본 방법은 셀룰러폰이나 스마트폰과 같이 Low Bit Rate의 음원을 사용하여 음성신호를 부호화하는 방식에 활용할 수 있을 것으로 기대된다.