Browse > Article
http://dx.doi.org/10.30693/SMJ.2022.11.3.31

Improving the effectiveness of document extraction summary based on the amount of sentence information  

Kim, Eun Hee (조선대학교 컴퓨터공학과)
Lim, Myung Jin (조선대학교 컴퓨터공학과)
Shin, Ju Hyun (조선대학교 신산업융합학부)
Publication Information
Smart Media Journal / v.11, no.3, 2022 , pp. 31-38 More about this Journal
Abstract
In the document extraction summary study, various methods for selecting important sentences based on the relationship between sentences were proposed. In the Korean document summary using the summation similarity of sentences, the summation similarity of the sentences was regarded as the amount of sentence information, and the summary sentences were extracted by selecting important sentences based on this. However, the problem is that it does not take into account the various importance that each sentence contributes to the entire document. Therefore, in this study, we propose a document extraction summary method that provides a summary by selecting important sentences based on the amount of quantitative and semantic information in the sentence. As a result, the extracted sentence agreement was 58.56% and the ROUGE-L score was 34, which was superior to the method using only the combined similarity. Compared to the deep learning-based method, the extraction method is lighter, but the performance is similar. Through this, it was confirmed that the method of compressing information based on semantic similarity between sentences is an important approach in document extraction summary. In addition, based on the quickly extracted summary, the document generation summary step can be effectively performed.
Keywords
extractive summary; TF-IDF; sentence embedding; sentence similarity;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Qaiser, Shahzad, and Ramsha Ali. "Text mining: use of TF-IDF to examine the relevance of words to documents." International Journal of Computer Applications, vol. 181, no. 1, pp. 25-29, Jul. 2018.   DOI
2 Khandelwal, Urvashi, et al. "Sample efficient text summarization using a single pre-trained transformer," arXiv preprint arXiv:1905.08836 2019.
3 Liu, Yang. "Fine-tune BERT for extractive summarization." arXiv preprint arXiv:1903.10318 2019.
4 BHARGAVA. Rupal, SHARMA. Yashvardhan, "Deep extractive text summarization," Procedia Computer Science, vol. 167, pp. 138-146, 2020.   DOI
5 HUANG, Anna, et al. "Similarity measures for text document clustering," Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008),.pp. 9-56, Christchurch, New Zealand 2008.
6 M. Allahyari, S. Pouriyeh, M. Assefi et al., "Text summarization techniques: A brief survey," International Journal of Advanced Computer Science and Applications, vol. 8, no. 10, 2017.
7 R. Elbarougy, G. Behery, and A. E. Khatib, "Extractive Arabic text summarization using modified PageRank algorithm," Egyptian Informatics Journal, vol. 21, no. 2, pp. 73-81, Jul. 2019.   DOI
8 김은희, 임명진, 신주현, "ELMo 임베딩 기반 문장 중요도를 고려한 중심 문장 추출방법," 스마트미디어저널, 제10권 제1호, 39-46쪽, 2021년 03월
9 Li, Y., McLean, D., Bandar, Z.A., O'Shea, J.D., Crockett, K. "Sentence Similarity Based on Semantic Nets and Corpus Statistics," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 8, pp. 1138-1150, Aug. 2006.   DOI
10 김재훈, 김준홍, "도합유사도를 이용한 한국어 추출문서 요약," 한국정보과학회 언어공학연구회 학술발표 논문집, 238-244쪽, 2000년. 10월
11 CER, Daniel, et al. "Universal sentence encoder," arXiv preprint arXiv:1803.11175, Apr. 2018.
12 SALTON, Gerard, et al. "Automatic text structuring and summarization," Information processing & management, vol. 33, no. 2, pp. 193-207, Mar. 1997.   DOI
13 정운철, 고영중, 서정연, "2단계 문장 추출방법을 이용한 자동 문서 요약." 한국정보과학회 학술발표논문집, 제31권, 제1호(B), 910-912쪽, 2004년 4월
14 송현우, 오승환, 이승엽, "파이프라인을 사용한 한국어 문서 요약의 효과성 제고," 한국정보과학회 학술발표논문집, 1285-1287쪽, 2021년 12월
15 N. Chatterjee, A. Mittal and S. Goyal, "Single document extractive text summarization using Genetic Algorithms," 2012 Third International Conference on Emerging Applications of Information Technology, pp. 19-23, Kolkata, IndiaDec. 2012.
16 N. S. Shirwandkar and S. Kulkarni, "Extractive Text Summarization Using Deep Learning," 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 1-5, Pune, India, Aug. 2018.
17 차준석, 김정인, 김판구, "단어 간 의미적 연관성을 고려한 어휘 체인 기반의 개선된 자동 문서요약 방법," 스마트미디어저널, 제6권, 제1호, 22-29쪽, 2017년 3월
18 이태석, 선충녕, 정영임, 강승식, "미등록 어휘에 대한 선택적 복사를 적용한 문서 자동요약," 스마트 미디어저널, 제8권, 제2호, 58-65쪽, 2019년 06월
19 RADEV. Dragomir. R, HOVY. Eduard, MCKEOWN. Kathleen, "Introduction to the special issue on summarization," Computational linguistics, vol. 28, no. 4, pp. 399-408, Dec. 2002.   DOI
20 NENKOVA, Ani; MCKEOWN, Kathleen. "A survey of text summarization techniques," Mining text data, Springer, Boston, MA, pp. 43-76, Jan. 2012.
21 JINGLING. Zhao, HUIYUN. Zhang, BAOJIANG. Cui, "Sentence similarity based on semantic vector model," 2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, IEEE, pp. 499-503, Guangdong, China, Nov. 2014.
22 Mishra, Mridul K., and Jaydeep Viradiya. "Survey of Sentence Embedding Methods." International Journal of Applied Science and Computations, vol. 6, pp. 590-592, 2019.
23 REIMERS, Nils; GUREVYCH, Iryna. Sentence-bert: "Sentence embeddings using siamese bert-networks," arXiv preprint arXiv:1908.10084, 2019.
24 Aizawa, Akiko. "An information-theoretic perspective of tf-idf measures." Information Processing & Management, vol. 39, no. 1, pp. 45-65, Jan. 2003.   DOI