DOI QR코드

DOI QR Code

Performance Improvement Methods of a Spoken Chatting System Using SVM

SVM을 이용한 음성채팅시스템의 성능 향상 방법

  • 안혁주 (강원대학교 컴퓨터정보통신공학 전공) ;
  • 이성희 (강원대학교 컴퓨터정보통신공학 전공) ;
  • 송영길 (강원대학교 컴퓨터정보통신공학 전공) ;
  • 김학수 (강원대학교 컴퓨터정보통신공학 전공)
  • Received : 2015.01.07
  • Accepted : 2015.04.28
  • Published : 2015.06.30

Abstract

In spoken chatting systems, users'spoken queries are converted to text queries using automatic speech recognition (ASR) engines. If the top-1 results of the ASR engines are incorrect, these errors are propagated to the spoken chatting systems. To improve the top-1 accuracies of ASR engines, we propose a post-processing model to rearrange the top-n outputs of ASR engines using a ranking support vector machine (RankSVM). On the other hand, a number of chatting sentences are needed to train chatting systems. If new chatting sentences are not frequently added to training data, responses of the chatting systems will be old-fashioned soon. To resolve this problem, we propose a data collection model to automatically select chatting sentences from TV and movie scenarios using a support vector machine (SVM). In the experiments, the post-processing model showed a higher precision of 4.4% and a higher recall rate of 6.4% compared to the baseline model (without post-processing). Then, the data collection model showed the high precision of 98.95% and the recall rate of 57.14%.

음성채팅시스템에서 사용자의 음성 질의는 자동음성인식기를 통하여 텍스트 질의로 변환된다. 만약 자동음성인식기의 1순위 결과가 틀린다면 이 오류는 그대로 음성채팅시스템에 전파된다. 자동음성인식기의 1순위 정밀도를 향상시키기 위하여 본 논문에서는 RankSVM을 이용하여 자동음성인식기의 n개 결과를 재순위화하는 후처리 모델을 제안한다. 채팅시스템을 학습하기 위해서는 대용량의 채팅 문장들이 필요하다. 만약 새로운 채팅 문장들이 학습데이터에 자주 추가되지 않는다면 채팅시스템의 응답은 금방 진부해질 것이다. 이러한 문제를 해결하기 위하여 본 논문에서는 SVM을 이용하여 TV와 영화 시나리오로부터 채팅 문장들을 자동으로 선택하는 데이터 수집 모델을 제안한다. 실험에서 제안된 후처리 모델은 후처리를 하지 않은 모델보다 정확률에서 4.4%, 재현율에서 6.4% 더 좋은 결과를 보였다. 그리고 제안된 데이터 수집 모델은 98.95%의 높은 정확률과 57.14%의 재현율을 보였다.

Keywords

References

  1. iOS 8-Siri [Internet], http://www.apple.com/kr/ios/siri/ (2014. 11. 26).
  2. S-Voice [Internet], http://ko.wikipedia.org/wiki/S_보이스 (2014. 11. 26).
  3. LG OptimusUI [Internet], http://ko.wikipedia.org/wiki/LG_옵티머스_UI (2014. 11. 26).
  4. Ki-Seung Lee, "Study on the Improvement of Speech Recognizer by Using Time Scale Modification," The Journal of the Acoustical Society of Korea, Vol.23 No.6, pp.462-472, 2004.
  5. Chang-young Lee, "Comparison of Male/Female Speech Features and Improvement of Recognition Performance by Gender-Specific Speech Recognition," Journal of The Korea Institute of Information and Communication Engineering, Vol.5, No.6, pp.568-574, 2010.
  6. Jungho Cho, "A Spectral Compensation Method for Noise Robust Speech Recognition," Journal of the Institute of Electronics Engineers of Korea, Vol.49-IE, No.2, pp.9-17, 2012.
  7. Sook-Nam Choi, Hyun-Yeol Chung, "Noise Robust Speech Recognition Based on Parallel Model Combination Adaptation Using Frequency-Variant," The Journal of the Acoustical Society of Korea, Vol.32, No.3, pp.252-261, 2013. https://doi.org/10.7776/ASK.2013.32.3.252
  8. Tae-woong Choi, Soon-hyob Kim, "Gamma-tone Feature Extraction Acoustic Modeling for Improving Speech Recotnition Performance," The Korean Institute of Information Technology, Vol.10, No.11, pp.155-160, 2012.
  9. Md. Afzal Hossan, Sheeraz Memon, and Mark A Gragory, "A Novel Approch for MFCC Feature Extraction," ICSPCS, pp.1-5, 2010.
  10. DongHee Lim, SeungShik Kang, and DuSeong Chang, "Word Spacing Error Correction for the Postprocessing of Speech Recognition," Korea Computer Congress, Vol.33, No.1, pp.25-27, 2006.
  11. WonMoon Song, EunJu Kim, and MyungWon Kim, "Post-Processing of Speech Recognition Using User Utterance Sequential Pattern," Korea Computer Congress, pp.709-711, 2005.
  12. Thorsten Joachims, Support Vector Machine for Ranking, Cornell University, 2009, [Internet] http://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html (2014.11.26).
  13. Thorsten Joachims, Support Vector Machine(light), Cornell University, 2008, [Internet] http://svmlight.joachims.org/(2014. 11. 26).
  14. Simsimi [Internet], http://developer.simsimi.com/2002 (2014. 11. 26).
  15. Jonghwan Kim, Duseong Chang, and Harksoo Kim, "Statistical Generation of Korean Chatting Sentences Using Multiple Feature Information," Korean Journal of Cognitive Science, Vol.20, No.4, pp.421-437, 2009. https://doi.org/10.19066/cogsci.2009.20.4.002
  16. Sejong Corpus [Internet], http://www.sejong.or.kr/ (2014. 11. 26).