DOI QR코드

DOI QR Code

합성곱 신경망을 이용한 On-Line 주제 분리

On-Line Topic Segmentation Using Convolutional Neural Networks

  • 이경호 (충남대학교 전자전파정보통신공학과) ;
  • 이공주 (충남대학교 전파정보통신공학과)
  • 투고 : 2016.10.05
  • 심사 : 2016.10.13
  • 발행 : 2016.11.30

초록

글이나 대화를 일정한 주제의 단위로 나누는 것을 주제 분리라고 한다. 지금까지 주제 분리는 주로 완결된 하나의 문서에서 최적화된 분리를 찾는 방향으로 진행되어 왔다. 하지만 몇몇 응용은 글이나 대화가 진행 중에 주제 분리를 할 필요가 있다. 본 논문에서는 합성곱 신경망을 이용한 교사 학습 모델을 통해 문장의 진행 중에 주제 분리를 수행하는 모델에 대해 제안한다. 그리고 제안한 모델의 성능 검증을 위해 On-line 상황을 가정한 실험과 기존의 C99모델을 결합한 실험을 수행하였다. 실험결과 각각 17.8과 11.95의 Pk 점수를 얻었고, 이를 통해 본 논문의 모델을 통한 On-line 상황에서의 주제 분리 활용의 가능성을 확인하였다.

A topic segmentation module is to divide statements or conversations into certain topic units. Until now, topic segmentation has progressed in the direction of finding an optimized set of segments for a whole document, considering it all together. However, some applications need topic segmentation for a part of document which is not finished yet. In this paper, we propose a model to perform topic segmentation during the progress of the statement with a supervised learning model that uses a convolution neural network. In order to show the effectiveness of our model, we perform experiments of topic segmentation both on-line status and off-line status using C99 algorithm. We can see that our model achieves 17.8 and 11.95 of Pk score, respectively.

키워드

참고문헌

  1. Jeffrey C. Reynar, "Topic segmentation: Algorithms and applications," IRCS Technical Reports Series, p.66, 1998.
  2. Marti A. Hearst, "TextTiling: Segmenting text into multi-paragraph subtopic passages," Computational Linguistics, Vol.23, No.1, pp.33-64, 1997.
  3. Freddy Y. Y. Choi, "Advances in domain independent linear text segmentation," Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, Association for Computational Linguistics, 2000.
  4. Stanley F. Chen, Kristie Seymore, and Ronald Rosenfeld, "Topic adaptation for language modeling using unnormalized exponential models," Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, IEEE, Vol.2. 1998.
  5. M. Galley, K. McKeown, E. Fosler-Lussier, and H. Jing, "Discourse segmentation of multi-party conversation," in Proc 41st ACL '03, Vol.1, pp.562-569, 2003.
  6. Martin Riedl, and Chris Biemann, "TopicTiling: a text segmentation algorithm based on LDA," Proceedings of ACL 2012 Student Research Workshop, Association for Computational Linguistics, 2012.
  7. Martin Riedl, and Chris Biemann, "Text segmentation with topic models," Journal for Language Technology and Computational Linguistics, Vol.27, No.1, pp.47-69, 2012.
  8. Alexander A. Alemi and Paul Ginsparg, "Text Segmentation based on Semantic Word Embeddings," arXiv preprint arXiv:1503.05543, 2015.
  9. Jeffrey Pennington, Richard Socher, and Christopher D. Manning, "Glove: Global Vectors for Word Representation," EMNLP, Vol.14. 2014.
  10. Masao Utiyama and Hitoshi Isahara, "A statistical model for domain-independent text segmentation," Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, 2001.
  11. Doug Beeferman, Adam Berger, and John Lafferty, "Statistical models for text segmentation," Machine Learning, Vol.34, No.1-3, pp.177-210, 1999. https://doi.org/10.1023/A:1007506220214
  12. Tomas Mikolov, et al., "Efficient estimation of word representations in vector space," arXiv Preprint arXiv: 1301.3781, 2013.