Search | Korea Science

A Probabilistic Combination Method of Minimum Statistics and Soft Decision for Robust Noise Power Estimation in Speech Enhancement (강인한 음성향상을 위한 Minimum Statistics와 Soft Decision의 확률적 결합의 새로운 잡음전력 추정기법)

Park, Yun-Sik;Chang, Joon-Hyuk
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.4
- /
- pp.153-158
- /
- 2007
This paper presents a new approach to noise estimation to improve speech enhancement in non-stationary noisy environments. The proposed method combines the two separate noise power estimates provided by the minimum statistics (MS) for speech presence and soft decision (SD) for speech absence in accordance with SAP (Speech Absence Probability) on a separate frequency bin. The performance of the proposed algorithm is evaluated by the subjective test under various noise environments and yields better results compared with the conventional MS or SD-based schemes.
https://doi.org/10.7776/ASK.2007.26.4.153 인용 PDF KSCI

A Robust Speech/Non-Speech Decision Using Voiced Characteristics of Speech (음성의 유성음 특성을 이용한 음성/비음성 판별 방법)

Lee, Sung-Joo;Jung, Ho-Young;Lee, Yun-Keun;Kim, Hyung-Soon
- Proceedings of the Korea Information Processing Society Conference
- /
- 2007.05a
- /
- pp.411-412
- /
- 2007
자동음성인식 시스템을 이용하는 사용자 입장에서 보면 음성인식시스템을 사용하기 위하여 음성을 입력할 때마다 버튼을 눌러야 하는 Push-To-Talk (PTT) 방식은 여간 번거로운 일이 아닐 수 없다. 그리고 사용자가 원거리에서 음성을 입력하는 경우처럼 PTT 방식 자체가 용이하지 못 한 음성인식 응용분야에서는 Non-Push-To-Talk (NON-PTT) 방식의 필요성이 대두되게 된다. NON-PTT 방식의 음성 전처리를 위해서는 입력신호로부터 음성신호만을 구분해내는 음성판별기술이 필수적이다. 하지만 일상적인 잡음환경에서 음성신호만을 구분해내는 일은 매우 어려운 일이 아닐 수 없다. 본 논문에서는 일상적인 가정잡음환경에 강인한 음성판별방식을 제안한다. 여기서는 음성판별을 위해서 음성의 유성음 특성을 이용하였다. 즉, 일정구간 이상의 음성신호에는 일정구간이상의 유성음 구간이 존재하며 만약 잡음환경에서도 유성음 구간을 잘 검출할 수 있다면 이러한 음성의 특성을 이용하여 검출된 신호가 음성인지 아닌지를 판별할 수 있다. 이를 위하여 여기서는 가정잡음환경에서도 유성음을 잘 검출할 수 있도록 11 가지 유성음 특징들과 이를 이용한 음성판별방법을 제안하였다. 제안된 방법의 성능 평가를 위하여 음성의 끝점검출방법과 통합하여 음성/비음성 판별 테스트를 수행하였으며 테스트 수행결과 열악한 잡음환경에서 80%이상의 비음성을 거절하는 성능을 보였다.
PDF

Speech Recognition Accuracy Measure using Deep Neural Network for Effective Evaluation of Speech Recognition Performance (효과적인 음성 인식 평가를 위한 심층 신경망 기반의 음성 인식 성능 지표)

Ji, Seung-eun;Kim, Wooil
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.21 no.12
- /
- pp.2291-2297
- /
- 2017
This paper describe to extract speech measure algorithm for evaluating a speech database, and presents generating method of a speech quality measure using DNN(Deep Neural Network). In our previous study, to produce an effective speech quality measure, we propose a combination of various speech measures which are highly correlated with WER(Word Error Rate). The new combination of various types of speech quality measures in this study is more effective to predict the speech recognition performance compared to each speech measure alone. In this paper, we describe the method of extracting measure using DNN, and we change one of the combined measure from GMM(Gaussican Mixture Model) score used in the previous study to DNN score. The combination with DNN score shows a higher correlation with WER compared to the combination with GMM score.
https://doi.org/10.6109/jkiice.2017.21.12.2291 인용 PDF KSCI

Collection of Korean Emotional Speech Database from Actors (배우에 의한 한국어 정서음성 데이터베이스 수집)

Jo Cheolwoo;Bak Il-suh;Lee Yongju;Kim Bongwan
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.45-48
- /
- 2004
본 논문에서는 한국어 정서음성 데이터베이스를 수집하는 과정을 기술하고 및 데이터베이스의 특성에 관해서 논의한다. 데이터베이스는 배우로부터 수집되었으며 주관적 평가에 의해 평가되었다. 배우는 남녀 각 3인씩 총 6인이며, 6가지 정서상태에 의해 10개의 문장을 발성하였고 20명의 평가자가 음성에 포함된 정서상태를 독립적으로 평가하였다. 작성된 데이터베이스는 임의제시 방법에 의한 주관적 평가결과 $80\%$이상의 일치도를 얻었다.
PDF

Objective parameter extraction in perceptual dysphonia assessment (청지각적 음성장애평가에서의 객관적인 파라미터 추출)

Jang, Seung-Jin;Choe, Ye-Rin;Kim, Eun-Yeon;Kim, Won-Sik
- Proceedings of the Korean Society for Emotion and Sensibility Conference
- /
- 2009.05a
- /
- pp.181-182
- /
- 2009
GRBAS(G : grade, R : rough, B : breathy, S : strained, A : asthenic) 음성장애평가는 성대의 이상 또는 말마비장애 등의 환자들을 평가하는 척도로 널리 사용된다. 하지만 사람에 의해 주관적인 평가로 이루어지는 방식의 문제점이 많이 제기되어, 자동화 알고리즘에 의한 객관적인 청지각적 음성장애 평가도구를 개발하려는 시도가 많이 연구되어왔다. 이러한 개발에 있어 보편적으로 선행되어야 하는 음소 분류 및 일치성 판단을 위한 객관적인 파라미터를 구하고자 함이 본 연구의 목적이다.
PDF

A Study on the Generation of Multi-syllable Nonsense Wordset for the Assessment of Synthetic Speech (합성음성평가를 위한 다음절 무의미단어 생성과 이용에 관한 연구)

Jo, Cheol-Woo;Kim, Kyung-Tae;Lee, Yong-Ju
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.5
- /
- pp.51-58
- /
- 1994
These times many kinds of man-machine Interfaces using speech signal, speech recognizers or speech synthesizers, are proposed and utilized in practice. Especially speech synthesis system is widely used in our life. But its assessment method is still in its first stage. In this paper we propose a method to generate multi-syllable nonsense wordset for the purpose of synthetic speech assessment and applies the wordset to one commercial text-to-speech system. Some results about the experiment is suggested and it is verified that the method to generate a nonsense wordset can be used to assess the intelligibility of the synthesizer in phoneme level or in phonemic environmental level.
PDF

Quality Assessment of Telephone Speech with ATM Circuit Emulation Services (ATM 망을 통한 Circuit Emulation 서비스에서 전화음성의 품질평가)

Cho, Young-Soon;Seo, Jeong-Wook;Bae, Keun-Sung
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.35S no.6
- /
- pp.156-163
- /
- 1998
The ATM network provides ATM CES(Circuit Emulation Services) with AAL1 for CBR(constant bit rate) services such as telephone speech. In this study, quality assessment of telephone speech with CES over ATM was performed and discussed. For this, interoperability between ATM network and structured/unstructured DS1 link was modeled for simulation. And for qualiy assessment of telephone speech, SNR and MOS were used as an objective and a subjective measure, respectively. Experimental results have shown that MOS score 4 as well as SNR 30dB could be obtained at CLR of $10^{-3}$ or below for speech signal.
PDF

Subtitle generation using Speech recognition (음성인식기술을 이용한 자막생성 연구)

AHN, Chung Hyun;Jang, In Sun
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2016.06a
- /
- pp.48-49
- /
- 2016
본 논문에서는 동영상, 팟캐스트 오로부터 자막을 생성하여 청각장애인의 미디어 접근권을 향상시키는 음성인식기술을 적용한 자막생성에 대하여 제안한다. 또한 레퍼런스 음성 DB 와 드라마, 팟캐스트 오디오로부터 생성된 자막의 정확도에 대해 평가하였다. 오디오를 이용하여 생성된 자막은 사극의 경우에는 다소 정확도가 낮게 평가되었으나, 전체적으로는 약 80%이상의 정확도를 갖는 것으로 파악되었다.
PDF

From Clarity To Human Voice (명료도에서 사람 목소리로 - TTS에 관하여)

권철홍
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06c
- /
- pp.139-142
- /
- 1998
그 동안 TTS 음성합성의 평가 척도로 명료도(Clarity)와 자연성(Naturalness)을 기준으로 삼았다. 이제는 합성음의 평가 기준이 사람 목소리와 이해도가 되는 것이 좋겠다고 생각한다. 본 논문은 사람 목소리와 이해도라는 척도 중에서 사람 목소리에 관한 주제를 다루고자 한다. 이를 위하여 음성 DB의 합성 단위로 CVC type을 기본으로 하고, CV, VC type으로 보강한 단위를 선정하여 음성 DB를 구축하였다. 그리고 합성 알고리즘은 음색을 살리며 피치 변경이 용이한 PS-RELP 알고리즘을 제안하였다.
PDF

최적 통화품질에 관한 오피니언 평가

Gang, Seong-Hun;Gang, Gyeong-Ok;Jang, Dae-Yeong;Gwon, Yun-Ju
- Electronics and Telecommunications Trends
- /
- v.6 no.3
- /
- pp.92-100
- /
- 1991
본 고에서는 통신망의 통화품질의 기준을 설정하기 위하여, 음성품질 열화요인 중 음량정격 및 측음 마스킹 정격에 대한 일련의 주관평가를 실시하여, 음량정격과 평균 오피니언 점수 및 측음 마스킹 정격과 평균 오피니언 점수와의 상관을 구하였고, 또한 음성품질에 대한 사용자 백분율을 도출하여 사용자의 오피니언을 기본으로 하는 음성품질 기준에 대하여 기술하였다.
https://doi.org/10.22648/ETRI.1991.J.060308 인용 PDF

Search Result 1,635, Processing Time 0.033 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)