Search | Korea Science

Voice Recognition using a Phoneme based Similarity Algorithm in Home Networks (음소 기반의 유사율 알고리즘을 이용한 Home Network 환경에서의 음성 인식)

Lee, Chang-Sub;Yu, Jae-Bong;Park, Joon-Seok;Yang, Soo-Ho;Kim, Yu-Seop;Park, Chan-Young
- Proceedings of the Korea Information Processing Society Conference
- /
- 2005.05a
- /
- pp.767-770
- /
- 2005
네트워크상에서 전달되는 음성데이터는 전달되는 과정에서 잡음 등의 외부 요인으로 인하여 데이터에 손실이 생기는 문제가 발생한다. 이렇게 전달된 음성데이터가 음성 인식기를 통과하면 바로 음성 인식기를 통과했을 때 보다 인식률이 낮아진다. 본 연구에서는 홈 네트워크를 제어하는데 있어서 음성 인식률을 향상시키기 위해서 음성 데이터를 입력받아, 이를 음소단위 기반의 유사율 알고리즘을 적용시켜 이미 구축된 홈 네트워크 용어 관련 사전에 등록된 단어와의 유사성을 검토하여 추출된 결과로 홈 네트워크를 제어하는 방안을 제안한다. 음소단위 기반의 유사율 알고리즘과 다중발화를 이용했을 때 Threshold 값이 85% 일 경우 사전에 구축된 단어와 매칭된 인식률은 100%였으며, 사전에 없는 단어의 오인식률은 2%로 감소되었다.
PDF

An Implementation of VoiceXML Test Environment Using IIS (IIS를 이용한 VoiceXML 실험 환경 구현)

Kwon, Hyung-Joon;Kim, Jung-Hyun;Hong, Kwang-Seok
- Proceedings of the Korea Institute of Convergence Signal Processing
- /
- 2006.06a
- /
- pp.73-76
- /
- 2006
유비쿼터스 컴퓨팅에서 중요한 기술 중 하나로 평가되는 음성인식 및 합성기술은 인간과 컴퓨터의 상호 작용에 있어 가장 편리하고 보편적인 방법이다. 음성인식 및 합성기술을 이용한 인간과 컴퓨터 상호작용 기반의 애플리케이션의 개발을 위해 음성 확장성 생성 언어(VoiceXML)을 이용하면 음성 인식 및 합성에 관한 전문 지식이 없어도 애플리케이션 제작을 쉽게 할 수 있다는 장점이 있어서 음성인식 및 합성기술의 인프라 구축과 저변 확대를 목적으로 일부 국내 업체들은 VoiceXML을 이용한 음성 애플리케이션을 제작하고 실험할 수 있도록 VoiceXML 실험 환경을 제공한다. 본 논문에서는 기존에 공개된 실험 환경을 소개하고, 다양한 실험 환경 제공을 위해 기존에 있던 Linux기반의 실험 환경과는 다른 Windows NT기반의 IIS(Internet Information Service)를 이용한 VoiceXML실험 환경을 제안하고 구현하였다. 그 결과 ASP(Active Server Page)와 ADO(ActiveX Data Object)를 이용한 VoiceXML음성 애플리케이션 실험이 가능한 환경을 구축하였고, 사용자 평가 결과 제안한 방법이 유효하다는 것을 확인하였다.
PDF

Voice Activity Detection Using Global Speech Absence Probability Based on Teager Energy in Noisy Environments (잡음환경에서 Teager Energy 기반의 전역 음성부재확률을 이용하는 음성검출)

Park, Yun-Sik;Lee, Sang-Min
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.49 no.1
- /
- pp.97-103
- /
- 2012
In this paper, we propose a novel voice activity detection (VAD) algorithm to effectively distinguish speech from nonspeech in various noisy environments. Global speech absence probability (GSAP) derived from likelihood ratio (LR) based on the statistical model is widely used as the feature parameter for VAD. However, the feature parameter based on conventional GSAP is not sufficient to distinguish speech from noise at low SNRs (signal-to-noise ratios). The presented VAD algorithm utilizes GSAP based on Teager energy (TE) as the feature parameter to provide the improved performance of decision for speech segments in noisy environment. Performances of the proposed VAD algorithm are evaluated by objective test under various environments and better results compared with the conventional methods are obtained.
PDF KSCI

Robust speech quality enhancement method against background noise and packet loss at voice-over-IP receiver (배경잡음 및 패킷손실에 강인한 voice-over-IP 수신단 기반 음질향상 기법)

Kim, Gee Yeun;Kim, Hyoung-Gook
- The Journal of the Acoustical Society of Korea
- /
- v.37 no.6
- /
- pp.512-517
- /
- 2018
Improving voice quality is a major concern in telecommunications. In this paper, we propose a robust speech quality enhancement against background noise and packet loss at VoIP (Voice-over-IP) receiver. The proposed method combines network jitter estimation based on hybrid Markov chain, adaptive playout scheduling using the estimated jitter, and speech enhancement based on restoration of amplitude and phase to enhance the quality of the speech signal arriving at the VoIP receiver over IP network. The experimental results show that the proposed method removes the background noise added to the speech signal before encoding at the sender side and provides the enhanced speech quality in an unstable network environment.
https://doi.org/10.7776/ASK.2018.37.6.512 인용 PDF KSCI HTML

Attention based multimodal model for Korean speech recognition post-editing (한국어 음성인식 후처리를 위한 주의집중 기반의 멀티모달 모델)

Jeong, Yeong-Seok;Oh, Byoung-Doo;Heo, Tak-Sung;Choi, Jeong-Myeong;Kim, Yu-Seop
- Annual Conference on Human and Language Technology
- /
- 2020.10a
- /
- pp.145-150
- /
- 2020
최근 음성인식 분야에서 신경망 기반의 종단간 모델이 제안되고 있다. 해당 모델들은 음성을 직접 입력받아 전사된 문장을 생성한다. 음성을 직접 입력받는 모델의 특성상 데이터의 품질이 모델의 성능에 많은 영향을 준다. 본 논문에서는 이러한 종단간 모델의 문제점을 해결하고자 음성인식 결과를 후처리하기 위한 멀티모달 기반 모델을 제안한다. 제안 모델은 음성과 전사된 문장을 입력 받는다. 입력된 각각의 데이터는 Encoder를 통해 자질을 추출하고 주의집중 메커니즘을 통해 Decoder로 추출된 정보를 전달한다. Decoder에서는 전달받은 주의집중 메커니즘의 결과를 바탕으로 후처리된 토큰을 생성한다. 본 논문에서는 후처리 모델의 성능을 평가하기 위해 word error rate를 사용했으며, 실험결과 Google cloud speech to text모델에 비해 word error rate가 8% 감소한 것을 확인했다.
PDF

A Parallel Speech Recognition System based on Hidden Markov Model (은닉 마코프 모델 기반 병렬음성인식 시스템)

Jeong, Sang-Hwa;Park, Min-Uk
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.12
- /
- pp.951-959
- /
- 2000
본 논문의 병렬음성인식 모델은 연속 은닉 마코프 모델(HMM; hidden Markov model)에 기반한 병렬 음소인식모듈과 계층구조의 지식베이스에 기반한 병렬 문장인식모듈로 구성된다. 병렬 음소인식 모듈은 수천개의 HMM을 병렬 프로세서에 분산시킨 수, 할당된 HMM에 대한 출력확률 계산과 Viterbi 알고리즘을 담당한다. 지식베이스 기반 병렬 문장인식모듈은 음소모듈에서 공급되는 음소열과 지안하는 병렬 음성인식 알고리즘은 분산메모리 MIMD 구조의 다중 트랜스퓨터와 Parsytec CC 상에 구현되었다. 실험결과, 병렬 음소인식모듈을 통한 실행시간 향상과 병렬 문장인식모듈을 통한 인식률 향상을 얻을 수 있었으며 병렬 음성인식 시스템의 실시간 구현 가능성을 확인하였다.
PDF

Design and Implementation of UEEIS(University Entrance Examination Information System) Based on Voice Application of VoiceXML (VoiceXML 음성 애플리케이션에 기반한 입시정보시스템 설계 및 구현)

Ha, Man-Seok;Yoon, Young-Keun;Park, Soo-Hyun
- 한국IT서비스학회:학술대회논문집
- /
- 2002.06a
- /
- pp.268-274
- /
- 2002
현재 대부분의 대학 입시정보시스템은 ARS 및 웹기반의 서비스를 병행하여 제공하고 있다. 기존 ARS 기반 시스템의 단점은 전화버튼만으로 입력이 제한된다는 점과 시스템의 구축 및 유지보수가 용이하지 않다는 점이다. 이러한 문제점을 해결하기 위하여 전화버튼뿐만 아니라 음성인식에 의한 입력이 가능한 VoiceXML 음성 애플리케이션을 도입하였다. VoiceXML 및 음성 애플리케이션을 활용하여 입시정보시스템을 설계 및 구현해 본 결과 이러한 문제점들을 상당부분 해결할 수 있었다. 그리고 미리 연관된 키워드를 등록하여 다양한 입력옵션을 제공함으로써 자연어 처리가 좀더 용이해졌다. 이는 XML의 최대장점인 다양한 확장성과 응용성이 향상되는 것이며 사용자에게 기존 시스템보다 훨씬 개선된 사용자 인터페이스를 제공할 수 있게 된 것이다. 또한 기존 웹기반의 서비스에 쉽게 연동이 가능하고 유지보수 또한 기존 시스템보다 쉽게 할 수 있다.
PDF

A Technique to Improve the Practicality of SVM-based Speech/Music Classifiers Through Hierarchical Classification (계층구조의 분류를 통한 서포트벡터머신 기반의 음성/음악 분류기의 실용도 향상기법)

Choi, Seokhwan;Cho, Youngok;Cho, Jiu;Lim, Chungsoo;Lee, Yeonwoo;Lee, Seong Ro
- Proceedings of the Korea Information Processing Society Conference
- /
- 2012.04a
- /
- pp.1033-1034
- /
- 2012
본 논문은 제한된 대역폭의 효율적인 활용을 위한 가변 전송률 코덱을 목표로 제안된 서포트벡터머신 기반의 음성/음악 분류기의 실용도를 높이기 위한 기법을 제안한다. 서포트벡터머신 기반의 음성/음악 분류기는 높은 분류능력을 가지고 있지만 많은 계산량을 요구하기 때문에 실시간으로 사용하기에는 부적합한 면이 있다. 따라서 계층적 분류를 통해 서포트벡터머신 기반의 음성/음악 분류기의 실용성을 향상시키는 기법을 제안한다.
https://doi.org/10.3745/PKIPS.y2012m04a.1033 인용 PDF

Error Correction for Korean Speech Recognition using a LSTM-based Sequence-to-Sequence Model

Jin, Hye-won;Lee, A-Hyeon;Chae, Ye-Jin;Park, Su-Hyun;Kang, Yu-Jin;Lee, Soowon
- Journal of the Korea Society of Computer and Information
- /
- v.26 no.10
- /
- pp.1-7
- /
- 2021
Recently, since most of the research on correcting speech recognition errors is based on English, there is not enough research on Korean speech recognition. Compared to English speech recognition, however, Korean speech recognition has many errors due to the linguistic characteristics of Korean language, such as Korean Fortis and Korean Liaison, thus research on Korean speech recognition is needed. Furthermore, earlier works primarily focused on editorial distance algorithms and syllable restoration rules, making it difficult to correct the error types of Korean Fortis and Korean Liaison. In this paper, we propose a context-sensitive post-processing model of speech recognition using a LSTM-based sequence-to-sequence model and Bahdanau attention mechanism to correct Korean speech recognition errors caused by the pronunciation. Experiments showed that by using the model, the speech recognition performance was improved from 64% to 77% for Fortis, 74% to 90% for Liaison, and from 69% to 84% for average recognition than before. Based on the results, it seems possible to apply the proposed model to real-world applications based on speech recognition.
https://doi.org/10.9708/jksci.2021.26.10.001 인용 PDF KSCI HTML

Visual Voice Activity Detection and Adaptive Threshold Estimation for Speech Recognition (음성인식기 성능 향상을 위한 영상기반 음성구간 검출 및 적응적 문턱값 추정)

Song, Taeyup;Lee, Kyungsun;Kim, Sung Soo;Lee, Jae-Won;Ko, Hanseok
- The Journal of the Acoustical Society of Korea
- /
- v.34 no.4
- /
- pp.321-327
- /
- 2015
In this paper, we propose an algorithm for achieving robust Visual Voice Activity Detection (VVAD) for enhanced speech recognition. In conventional VVAD algorithms, the motion of lip region is found by applying an optical flow or Chaos inspired measures for detecting visual speech frames. The optical flow-based VVAD is difficult to be adopted to driving scenarios due to its computational complexity. While invariant to illumination changes, Chaos theory based VVAD method is sensitive to motion translations caused by driver's head movements. The proposed Local Variance Histogram (LVH) is robust to the pixel intensity changes from both illumination change and translation change. Hence, for improved performance in environmental changes, we adopt the novel threshold estimation using total variance change. In the experimental results, the proposed VVAD algorithm achieves robustness in various driving situations.
https://doi.org/10.7776/ASK.2015.34.4.321 인용 PDF KSCI

Search Result 2,233, Processing Time 0.033 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)