DOI QR코드

DOI QR Code

향상된 TextRank 알고리즘을 이용한 자동 회의록 생성 시스템

Automatic Meeting Summary System using Enhanced TextRank Algorithm

  • Bae, Young-Jun (Department of Computer Software Engineering, Kumoh National Institute of Technology) ;
  • Jang, Ho-Taek (Department of Computer Software Engineering, Kumoh National Institute of Technology) ;
  • Hong, Tae-Won (Department of Computer Software Engineering, Kumoh National Institute of Technology) ;
  • Lee, Hae-Yeoun (Department of Computer Software Engineering, Kumoh National Institute of Technology)
  • 투고 : 2018.05.31
  • 심사 : 2018.06.25
  • 발행 : 2018.10.30

초록

다양한 업무 수행에 있어서 회의나 토론 등의 내용을 정리하여 문서화하는 것의 중요성은 매우 높다. 그러나 기존에는 사람이 직접 내용에 대한 정리를 수작업으로 수행하였다. 본 논문에서는 TextRank 알고리즘을 이용하여 자동으로 회의록을 생성하는 시스템의 개발에 대하여 설명한다. 제안한 시스템은 발언자의 모든 발언 내용을 실시간으로 기록하고, 문장들을 출현 빈도수에 기초하여 유사도를 계산한 후, 문서 데이터 안에서 문장들 간의 관계를 찾아내는 비지도 학습 알고리즘을 통해 중요 단어 혹은 문장을 추출함으로서 자동으로 회의록을 생성하도록 하였다. 특히, PageRank 알고리즘을 단어와 문장에 적합하도록 재구성한 TextRank 알고리즘에 대하여 핵심어의 가중치 조정 기법을 도입함으로서 성능 향상을 모색하였다.

To organize and document the contents of meetings and discussions is very important in various tasks. However, in the past, people had to manually organize the contents themselves. In this paper, we describe the development of a system that generates the meeting minutes automatically using the TextRank algorithm. The proposed system records all the utterances of the speaker in real time and calculates the similarity based on the appearance frequency of the sentences. Then, to create the meeting minutes, it extracts important words or phrases through a non-supervised learning algorithm for finding the relation between the sentences in the document data. Especially, we improved the performance by introducing the keyword weighting technique for the TextRank algorithm which reconfigured the PageRank algorithm to fit words and sentences.

키워드

참고문헌

  1. F. Cruz, J. A. Troyano, F. Enriquez, "Supervised TextRank," Lecture Notes in Computer Science, vol. 4139, pp. 632-639, 2006.
  2. J.-H. Kim, J.-H. Kim, "Korean Indicative Summarization Using Aggregate Similarity," Proceedings of the Annual Conference on Human and Cognitive Language Technology, pp. 238-244, 2000.
  3. J.-P. Hong, J.-W. Cha, "Korean Important Sentence Extraction using TextRank Algorithms" Proceedings of the Korea Computer Congress, vol. 36(1C), pp. 311-314, 2009.
  4. S.-J. Moon, S. Lee, "Automatic Document Summary Technique Using Fuzzy Theory," KIPS Transactions on Software and Data Engineering, vol. 3(12), pp. 531-536, 2014. https://doi.org/10.3745/KTSDE.2014.3.12.531
  5. D. Hiemstra, "A probabilistic justification for using $tf{\times}idf$ term weighting in information retrieval," International Journal on Digital Libraries, vol. 3(2), pp.131-139, 2000. https://doi.org/10.1007/s007999900025
  6. I. Mani, Automatic Summarization, John Benjamins Publishing Company, Vol. 3. 2001 (ISBN 9789027299109).
  7. F. Barrios, F. Lopez, L. Argerich, R. Wachenchauzer, "Variations of the Similarity Function of TextRank for Automated Summarization," Proceedings of the Argentine Symposium on Artificial Intelligence, pp. 65-72, 2016.
  8. The National Assembly Information System, http://likms.assembly.go.kr/record/index.jsp, 2018