Decision of the Korean Speech Act using Feature Selection Method

자질 선택 기법을 이용한 한국어 화행 결정

  • Published : 2003.04.01

Abstract

Speech act is the speaker's intentions indicated through utterances. It is important for understanding natural language dialogues and generating responses. This paper proposes the method of two stage that increases the performance of the korean speech act decision. The first stage is to select features from the part of speech results in sentence and from the context that uses previous speech acts. We use x$^2$ statistics(CHI) for selecting features that have showed high performance in text categorization. The second stage is to determine speech act with selected features and Neural Network. The proposed method shows the possibility of automatic speech act decision using only POS results, makes good performance by using the higher informative features and speed up by decreasing the number of features. We tested the system using our proposed method in Korean dialogue corpus transcribed from recording in real fields, and this corpus consists of 10,285 utterances and 17 speech acts. We trained it with 8,349 utterances and have test it with 1,936 utterances, obtained the correct speech act for 1,709 utterances(88.3%). This result is about 8% higher accuracy than without selecting features.

화행(speech act)이란 화자의 발화를 통해 나타나는 화자의 의도를 가르키며 자연어로 된 발화를 이해하고 이에 대한 응답을 생성하기 위해 중요한 요소이다. 본 논문에서는 한국어 화행 결정의 성능을 높이기 위해 두 단계 방법을 제안한다. 첫 번째 단계는 형태소 분석결과만을 이용하여 추출된 문장자질과 이전 화행을 이용하여 추출된 문맥자질 중 정보량이 높은 자질을 선택하는 단계이다. 이 단계에서는 형태소 분석 시스템을 사용하여 전체 자질을 구성하고 문서분류 분야의 자질 선택에서 높은 성능을 보인 카이제곱 통계량을 이용하여 효과적인 자질 선택한다. 두 번째 단계는 선택된 자질과 신경망을 이용하여 화행을 분석하는 단계이다. 본 논문에서 제시한 방법은 형태소 분석 결과만을 이용하여 자동적으로 화행을 결정할 수 있는 가능성을 제시하였으며 효과적인 자질 선택을 통해 자질의 수를 감소시키고 정보량이 높은 자질을 사용하여 속도와 성능을 향상 시켰다 본 논문은 제안된 시스템을 실제 영역에서 수집되어 전사된 10,285개의 발화와 17개의 화행으로 이루어진 대화 코퍼스에 대해 실험하였다. 본 논문은 이 코퍼스에서 8,349개 발화를 학습 코퍼스로 사용하여, 실험 코퍼스의 1,936개 발화에 대해 1,709개에 대해 정확한 화행을 제시하여, 88.3%의 정확도를 보였다. 이는 자질 선택을 하지 않았을 때 보다 약 8%가 증가된 결과이다.

Keywords

References

  1. Lambert, L. and S. Caberry. A Tripatite Plan-Based Model of Dialogue. In Proceedings of ACL , 1991. pp. 47-54
  2. Chu-Carroll, J. and S. Carberry. Response Generation in Collaborative Negotiation. ACL-95, 1995 https://doi.org/10.3115/981658.981677
  3. Kim, jin Ah, et al. A response generation in dialogue system based on dialogue flow diagrams. In Proceedings of NLPHS, 1995
  4. Choi, Won Seug, Jeong-Mi Cho, and Jungyun Seo. Analysis System of Speech Acts and Discourse Structures Using Maximum Entropy Model. In Proceedings of the 37th Annual Meeting of the Association for computational Lin guistics, 1999, pp. 230-237 https://doi.org/10.3115/1034678.1034719
  5. Lee, Songwook, and Jungyun Seo. Korean Speech Act Analysis Using Decision Tree. In Proceedings of the Conference on Hangul and Korean Language Information Processing, 1999. pp. 377-381
  6. Lweis, D.D. and M. Ringuette, 1994. Comparison of two learning algorims for text categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), 1994
  7. Schutze, H., D.A. Hull, and J.O. Pedersen. A comparison of classifiers and document representations for the routing problem. In 18th Annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 95), 1995
  8. Wiener, E., J.O. Pedersen and A.S. Weigend. A neural network approach to topic spotting. In Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95), 1995
  9. Yang, Yiming and Jan O. Pedersen. A comparative study on Feature selection in text categorization. In proceedings of the 14th Inter national conference on Machine Learning, 1997
  10. Samuel, K., S. Carberry, and K. Vijay-Shanker. Dialogue Act Tagging with Transformation-Based Learning. In Proceddings of the $17^{th}$ International Conference on computational Linguistics and the 36th Annual Meeting of the Association for computational Linguistics, 1998. pp 1150-1156 https://doi.org/10.3115/980432.980757
  11. Samuel, K., S. Carberry, and K. Vijay-Shanker. An Investigation of Transformation-Based Learning in Discourse. Machine Learning: Proceedings of the $15_{th}$ International Conference. 1998
  12. Samuel, K. and S. Carberry and K. Vijay-Shanker. Automatically Selecting Useful Phrases for Dialogue Act Tagging. In Proceedings of the Fourth Conference of the Pacific Association for Computational Linguistics, 1999
  13. Jae-hoon Kim, Jungyun Seo, Gilchang Kim, Estimating Membership Functions in a Fuzzy Network Model for Part-Of-Speech Tagging, Journal of Intelligent and Fuzzy Systems, Vol. 4, pp.309-320, 1996
  14. Lee, Jae-won, Jungyun Seo, Gilchang Kim. A dialogue analysis Model with statistical speech act processing for Dialogue Machine Translation, Proceedings of Spoken Language Translation (Workshop in conjunction with (E)ACL 97, page 10-15, 1997
  15. Lee, Hyunjung, Jae-Won Lee, Jungyun Seo. Speech Act Analysis Model of Korean Utterances for Automatic Dialog Translation, Journal of Korea Information Science Society (B): Software and Applications, 25(10) : 1433-1552, 1998
  16. Rumelhart, D.E. and J.L. McClelland. Parallel Distributed Processing, volume 1. MIT Press. 1986