Search | Korea Science

A Modelling of segmental Duration based on Regression Tree of the Normalized Duration (정규화 지속시간 회귀트리를 기반으로 한 음운지속시가 모델화)

정지혜
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06e
- /
- pp.278-281
- /
- 1998
본 논문에서는 자연음성으로부터 통계적인 방법으로 일반적인 음성합성 규칙을 생성하기 위해, 남녀 각각 1명이 200문장에 대해 발성한 문음성 데이터를 음운 세그먼트, 음운 라벨링, 음운별 품사 태깅, 문법 정보 태깅하여 음성 데이터베이스를 구축하였다. 이 음성 데이터베이스로부터 휴지지속시간을 분석하여 긴 휴지와 짧은 휴지로 분류하였고, 이러한 휴지가 어느 경우에 나타나는가를 조사하였다. 음운지속시간을 보다 정교하게 예측하기 위하여, 각 음운의 고유 지속시간의 영향을 배제시킨 정규화 지속시간에 대해 2가지 class(장, 단)의 휴지시간을 고려한 회귀트리로 음운지속시간을 모델화하였다. 제안된 모델의 평가 결과 예측치와 관측치 간의 다중 상관 계수는 남성은 0.82, 여성은 0.84 정도로 평가되었다.
PDF

Assessment of Telephone Speech Transmission Quality by Opinion Test (오피니언 테스트에 의한 전화 음성품질 평가)

Kwon, Yoon-Ju;Jang, Dae-Young;Kang, Kyeong-Ok;Kang, Seong-Hoon
- The Journal of the Acoustical Society of Korea
- /
- v.11 no.1
- /
- pp.14-21
- /
- 1992
In order to establish the speech transmission quality of networks, a series of subjective tests for loudness rating(LR) and sidetone masking rating(STMR) among transmission impairments were carried out. As a result of subjective tests, relationships of mean opinion score(MOS) with LR and STMR, respectively, were obtained. Also, we obtained the cumulative MOS characteristics which represent the percentage of scores that subjects voted. Thus it is easy to achieve a strategic objective of customer satisfaction for present networks and new services.
PDF

Noise Processing for Speech Recognition in the Telephone Line (음성 인식을 위한 전화망에서의 잡음처리)

전원석;신원호;양태영;김원구;윤대희
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.1
- /
- pp.4-8
- /
- 1998
본 논문에서는 다양한 전화선 채널을 통하여 수집된 음성 데이터에 포함된 잡음 및 채널 왜곡을 제거하여 음성인식 시스템의 성능을 향상시키는 방법에 관하여 연구하였다. 전 화선을 통과한 음성에 포함된 채널 잡음 및 왜곡을 제거하는 방법으로는 음성신호를 보상하 는 방법으로 CMS(Cepstral Mean Subtraction), SBR(Signal Bias Removal)과 SM(Stochastic Matching)의 성능을 비교 평가하였다. 잡음제거 방식의 성능을 평가를 위하 여 음소 단위의 반연속 HMM을 이용한 화자독립 단독음 인식을 수행하였다. 인식 실험 결 과, 멜 켑스트럼을 사용한 경우에 CMS가 가장 우수한 성능을 내었고 다음으로 SM과 SBR 순으로 나타났다. 또한 특징벡터를 주변 잡음에 강인하게 하는 가중함수(RPS, BPL)를 사용 한 켑스트럼 계수와 잡음제거 방식을 함께 사용한 경우에 인식 성능이 더욱 향상되었다.
PDF

Complex nested U-Net-based speech enhancement model using a dual-branch decoder (이중 분기 디코더를 사용하는 복소 중첩 U-Net 기반 음성 향상 모델)

Seorim Hwang;Sung Wook Park;Youngcheol Park
- The Journal of the Acoustical Society of Korea
- /
- v.43 no.2
- /
- pp.253-259
- /
- 2024
This paper proposes a new speech enhancement model based on a complex nested U-Net with a dual-branch decoder. The proposed model consists of a complex nested U-Net to simultaneously estimate the magnitude and phase components of the speech signal, and the decoder has a dual-branch decoder structure that performs spectral mapping and time-frequency masking in each branch. At this time, compared to the single-branch decoder structure, the dual-branch decoder structure allows noise to be effectively removed while minimizing the loss of speech information. The experiment was conducted on the VoiceBank + DEMAND database, commonly used for speech enhancement model training, and was evaluated through various objective evaluation metrics. As a result of the experiment, the complex nested U-Net-based speech enhancement model using a dual-branch decoder increased the Perceptual Evaluation of Speech Quality (PESQ) score by about 0.13 compared to the baseline, and showed a higher objective evaluation score than recently proposed speech enhancement models.
https://doi.org/10.7776/ASK.2024.43.2.253 인용 PDF

A Automated Method for Training Keyword Spotter based on Speech Synthesis (키워드 음성인식을 위한 음성합성 기반 자동 학습 기법)

Lim, Jaebong;Lee, Jongsoo;Cho, Yonghun;Baek, Yunju
- Proceedings of the Korea Information Processing Society Conference
- /
- 2021.05a
- /
- pp.494-496
- /
- 2021
최근 경량 딥러닝 기반 키워드 음성인식은 가전, 완구, 키오스크 등 다양한 응용에 음성 인터페이스를 쉽게 적용할 수 있는 기술로서 주목받고 있다. 키워드 음성인식은 일부 키워드만 인식 가능한 음성인식 기술로서 저성능 디바이스에서 활용 가능한 장점이 있다. 그러나 응용에 따라 필요한 키워드에 대하여 다시 음성데이터를 수집해야하고 이를 학습하여 모델을 새로 준비해야하는 단점이 있다. 따라서 본 연구에서는 음성데이터 수집 없이 음성합성을 통해 생성한 음성으로만 키워드 음성인식 모델을 학습하는 음성합성 기반 자동 학습 기법을 제안하였다. 생성한 음성데이터를 활용하고자하는 시도가 활발히 이루어지고 있으나, 기존 연구에서는 정확도를 유지하기 위하여 수집한 실제 음성데이터가 필요한 한계가 있다. 제안한 자동 학습 기법은 생성한 음성데이터에 대해 복합 데이터 증대 기법을 적용하여 실제 음성데이터 없이 키워드 음성인식의 정확도를 높였다. 제안한 기법에 대하여 상용 음성합성 서비스를 기반으로 수집한 한국어 키워드 데이터세트를 활용하여 성능평가를 진행하였다. 20개 한국어 키워드에 대해 실험한 결과, 제안한 기법을 적용하여 학습시킨 키워드 음성인식 모델의 정확도는 86.44%임을 확인하였다.
https://doi.org/10.3745/PKIPS.y2021m05a.494 인용 PDF

An automatic pronunciation evaluation system using non-native teacher's speech model (비원어민 교수자 음성모델을 이용한 자동발음평가 시스템)

Park, Hye-bin;Kim, Dong Heon;Joung, Jinoo
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.16 no.2
- /
- pp.131-136
- /
- 2016
An appropriate evaluation on learner's pronunciation has been an important part of foreign language education. The learners should be evaluated and receive proper feedback for pronunciation improvement. Due to the cost and consistency problem of human evaluation, automatic pronunciation evaluation system has been studied. The most of the current automatic evaluation systems utilizes underlying Automatic Speech Recognition (ASR) technology. We suggest in this work to evaluate learner's pronunciation accuracy and fluency in word-level using the ASR and non-native teacher's speech model. Through the performance evaluation on our system, we confirm the overall evaluation result of pronunciation accuracy and fluency actually represents the learner's English skill level quite accurately.
https://doi.org/10.7236/JIIBC.2016.16.2.131 인용 PDF KSCI

Auditory-Perceptual and Acoustic Assessment in Measuring Dysphonia Severity of Vocal Fold Nodules (성대결절 환자의 음성장애에 대한 청지각적 및 음향학적 평가)

Kim, Geun-Hyo;Kwon, Soon-Bok
- The Journal of the Korea Contents Association
- /
- v.18 no.1
- /
- pp.108-116
- /
- 2018
The purpose of this study was to investigate the relationship between the differences in the acoustic measurements (AVQI) and the auditory-perceptual assessments (GRBAS, CAPE-V) of the normal and vocal fold nodules. For this purpose, Total 335 voice samples were analyzed acoustically and three raters performed auditory-perceptual assessments. in the results, AVQI, G, and OS scores of the normal group were lower than those of the vocal fold nodules group. The correlations between the G scale and the OS scale were highly correlated, and the correlation between the AVQI, and auditory-perceptual results (G and OS) was also high value. The threshold values for discriminating AVQI, G, and OS between the two groups were ${\leq}4.06$, ${\leq}1$, and ${\leq}26$, respectively, and the predictive diagnostic power was 0.840, 0.860, and 0.848. In conclusion, AVQI and auditory-perceptual evaluation can improve potentiality the screening of vocal fold nodules and help to determine the diagnosis and treatment plan of voice disorders.
https://doi.org/10.5392/JKCA.2018.18.01.108 인용 PDF KSCI

An Implementation of VoiceXML Test Environment Using IIS (IIS를 이용한 VoiceXML 실험 환경 구현)

Kwon, Hyung-Joon;Kim, Jung-Hyun;Hong, Kwang-Seok
- Proceedings of the Korea Institute of Convergence Signal Processing
- /
- 2006.06a
- /
- pp.73-76
- /
- 2006
유비쿼터스 컴퓨팅에서 중요한 기술 중 하나로 평가되는 음성인식 및 합성기술은 인간과 컴퓨터의 상호 작용에 있어 가장 편리하고 보편적인 방법이다. 음성인식 및 합성기술을 이용한 인간과 컴퓨터 상호작용 기반의 애플리케이션의 개발을 위해 음성 확장성 생성 언어(VoiceXML)을 이용하면 음성 인식 및 합성에 관한 전문 지식이 없어도 애플리케이션 제작을 쉽게 할 수 있다는 장점이 있어서 음성인식 및 합성기술의 인프라 구축과 저변 확대를 목적으로 일부 국내 업체들은 VoiceXML을 이용한 음성 애플리케이션을 제작하고 실험할 수 있도록 VoiceXML 실험 환경을 제공한다. 본 논문에서는 기존에 공개된 실험 환경을 소개하고, 다양한 실험 환경 제공을 위해 기존에 있던 Linux기반의 실험 환경과는 다른 Windows NT기반의 IIS(Internet Information Service)를 이용한 VoiceXML실험 환경을 제안하고 구현하였다. 그 결과 ASP(Active Server Page)와 ADO(ActiveX Data Object)를 이용한 VoiceXML음성 애플리케이션 실험이 가능한 환경을 구축하였고, 사용자 평가 결과 제안한 방법이 유효하다는 것을 확인하였다.
PDF

Analysis of the Time Delayed Effect for Speech Feature (음성 특징에 대한 시간 지연 효과 분석)

Ahn, Young-Mok
- The Journal of the Acoustical Society of Korea
- /
- v.16 no.1
- /
- pp.100-103
- /
- 1997
In this paper, we analyze the time delayed effect of speech feature. Here, the time delayed effect means that the current feature vector of speech is under the influence of the previous feature vectors. In this paper, we use a set of LPC driven cepstal coefficients and evaluate the time delayed effect of cepstrum with the performance of the speech recognition system. For the experiments, we used the speech database consisting of 22 words which uttered by 50 male speakers. The speech database uttered by 25 male speakers was used for training, and the other set was used for testing. The experimental results show that the time delayed effect is large in the lower orders of feature vector but small in the higher orders.
PDF

A Performance of a Remote Speech Input Unit in Speech Recognition System (음성인식 시스템에서의 원격 음성입력기의 성능평가)

Lee, Gwang-seok
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2009.10a
- /
- pp.723-726
- /
- 2009
In this research, We simulated performances of error reduction algorithm for the speech signal based on the microphone array-based beamforming method in speech recognition system and analyzed its performance. Also, we processed speech signal adopted from microphone array and maximum signal to noise ratio for each channel, and then compared them with signal to noise ratio of speech signal. Speech recognition rate is improved from 54.2% to 61.4% in case 1 and is improved from 41.2% to 50.5% in case 2 of the lower signal to noise ratio. Therefore the average reduction rates are showed 15.7% in case 1.
PDF

Search Result 1,645, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)