• Title/Summary/Keyword: Document Summary

Search Result 85, Processing Time 0.019 seconds

Analysis and Comparison of Query focused Korean Document Summarization using Word Embedding (워드 임베딩을 이용한 질의 기반 한국어 문서 요약 분석 및 비교)

  • Heu, Jee-Uk
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.6
    • /
    • pp.161-167
    • /
    • 2019
  • Recently, the amount of created information has been rising rapidly by dissemination of state of the art and developing of the various web service based on ICT. In additionally, the user has to need a lot of times and effort to find the necessary information which is the user want to know it in the mount of information. Document summarization is the technique that making and providing the summary of given document efficiently by analyzing and extracting the key sentences and words. However, it is hard to apply the previous of word embedding technique to the document which is composed by korean language for analyzing contents in the document due to the character of language. In this paper, we propose the new query-focused korean document summarization by exploiting word embedding technique such as Word2Vec and FastText, and then compare the both result of performance.

A Document Summarization System Using Dynamic Connection Graph (동적 연결 그래프를 이용한 자동 문서 요약 시스템)

  • Song, Won-Moon;Kim, Young-Jin;Kim, Eun-Ju;Kim, Myung-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.1
    • /
    • pp.62-69
    • /
    • 2009
  • The purpose of document summarization is to provide easy and quick understanding of documents by extracting summarized information from the documents produced by various application programs. In this paper, we propose a document summarization method that creates and analyzes a connection graph representing the similarity of keyword lists of sentences in a document taking into account the mean length(the number of keywords) of sentences of the document. We implemented a system that automatically generate a summary from a document using the proposed method. To evaluate the performance of the method, we used a set of 20 documents associated with their correct summaries and measured the precision, the recall and the F-measure. The experiment results show that the proposed method is more efficient compared with the existing methods.

User-based Document Summarization using Non-negative Matrix Factorization and Wikipedia (비음수행렬분해와 위키피디아를 이용한 사용자기반의 문서요약)

  • Park, Sun;Jeong, Min-A;Lee, Seong-Ro
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.2
    • /
    • pp.53-60
    • /
    • 2012
  • In this paper, we proposes a new document summarization method using the expanded query by wikipedia and the semantic feature representing inherent structure of document set. The proposed method can expand the query from user's initial query using the relevance feedback based on wikipedia in order to reflect the user require. It can well represent the inherent structure of documents using the semantic feature by the non-negative matrix factorization (NMF). In addition, it can reduce the semantic gap between the user require and the result of document summarization to extract the meaningful sentences using the expanded query and semantic features. The experimental results demonstrate that the proposed method achieves better performance than the other methods to summary document.

A Proposal on Analyzing Operational Mission Summary/Mission Profile and RAM Goal Setting from Operational Concepts on the Next-MILSATCOM (차기 군 위성통신체계 OMS/MP 분석 및 운용개념으로부터의 RAM 목표값 산출 제안)

  • Park, Heung-Soon;Kwon, Tae-Wook;Lee, Chul-Hwa;Park, Dae-Hyun
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.16 no.3
    • /
    • pp.295-303
    • /
    • 2013
  • The Operational Mode Summary/Mission Profile(OMS/MP) is a document which describes how a system or training device will be used in wartime and/or peacetime at the time it is field with focus on the future. OMS/MP is also typically used for the RAM goal setting in an early phase of weapon system development. This paper provides OMS/MP and RAM goal of the Next-MILSATCOM which is following military satellite system after ANASIS. We propose operational concepts, user-side OMS/MP model and RAM goal.

An Efficient Machine Learning-based Text Summarization in the Malayalam Language

  • P Haroon, Rosna;Gafur M, Abdul;Nisha U, Barakkath
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1778-1799
    • /
    • 2022
  • Automatic text summarization is a procedure that packs enormous content into a more limited book that incorporates significant data. Malayalam is one of the toughest languages utilized in certain areas of India, most normally in Kerala and in Lakshadweep. Natural language processing in the Malayalam language is relatively low due to the complexity of the language as well as the scarcity of available resources. In this paper, a way is proposed to deal with the text summarization process in Malayalam documents by training a model based on the Support Vector Machine classification algorithm. Different features of the text are taken into account for training the machine so that the system can output the most important data from the input text. The classifier can classify the most important, important, average, and least significant sentences into separate classes and based on this, the machine will be able to create a summary of the input document. The user can select a compression ratio so that the system will output that much fraction of the summary. The model performance is measured by using different genres of Malayalam documents as well as documents from the same domain. The model is evaluated by considering content evaluation measures precision, recall, F score, and relative utility. Obtained precision and recall value shows that the model is trustable and found to be more relevant compared to the other summarizers.

The Information Filtering Agent System with a Customized Document Summary (사용자 맞춤의 문서 요약을 제공하는 정보 여과 에이전트 시스템)

  • 조영희;김교정
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.04a
    • /
    • pp.377-386
    • /
    • 2000
  • 현재의 정보 과적재(information overload) 상황은 대량의 정보 가운데서 사용자의 관련 정보에 대한 요청을 도와 불필요한 정보로부터 막기 위한 도구가 매우 필요한 실정이다. 이러한 도구중 대표적으로 사용되는 웹 검색 엔진과 같은 정보 검색 시스템의 단점은 적합한 검색용어를 선택해야만 하는 점과, 결과에 대한 효율적인 요약이 제공되지 않는다는 점이다.따라서 본 논문에서는 이러한 검색 엔진에서의 단점을 보완하여 사용자를 정보 과잉 상황에서의 불필요한 정보로부터 보호하기 위해, 사용자의 프로파일을 기반으로 하여 정보를 개인화된 요약과 함께 제공하는 정보 여과 에이전트(information filtering agent)인 '사용자 맞춤의 문서 요약을 제공하는 정보 여과 에이전트 시스템'을 제안한다.

  • PDF

Automatic Video Management System Using Face Recognition and MPEG-7 Visual Descriptors

  • Lee, Jae-Ho
    • ETRI Journal
    • /
    • v.27 no.6
    • /
    • pp.806-809
    • /
    • 2005
  • The main goal of this research is automatic video analysis using a face recognition technique. In this paper, an automatic video management system is introduced with a variety of functions enabled, such as index, edit, summarize, and retrieve multimedia data. The automatic management tool utilizes MPEG-7 visual descriptors to generate a video index for creating a summary. The resulting index generates a preview of a movie, and allows non-linear access with thumbnails. In addition, the index supports the searching of shots similar to a desired one within saved video sequences. Moreover, a face recognition technique is utilized to personalbased video summarization and indexing in stored video data.

  • PDF

(Context-based Annotation for Pen Input Device Environment) (펜 입력 장치 환경을 고려한 컨텍스트 기반 Annotation)

  • 김재경;손원성;임순범;최윤철
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.5_6
    • /
    • pp.559-569
    • /
    • 2003
  • Annotation is used for inscribing personal opinion, explanation, and summary. Various methods for processing annotation efficiently in digital document environments are being studied. However, previous studies placed much emphasis on function of annotation, so either they did not support Intuitive paper-based input interface or the systems that support it still have low reusability problems, because relation between annotation and original document are not explicit. Thus, in our study, we define context-based annotation modeling for digital document environments, and suggest annotation interface based on the modeling. To design annotation model, we define annotation types, context information of document, and relationship between annotation and original document. Also, a system based on the modeling is implemented to support pen-based annotation and annotation DTD. As a result, unlike previous studies, it is possible to explicitly define context-based annotation in pen-based input environments. We present various functions using the modeling and various possibilities of application.

Performance Improvement of Topic Modeling using BART based Document Summarization (BART 기반 문서 요약을 통한 토픽 모델링 성능 향상)

  • Eun Su Kim;Hyun Yoo;Kyungyong Chung
    • Journal of Internet Computing and Services
    • /
    • v.25 no.3
    • /
    • pp.27-33
    • /
    • 2024
  • The environment of academic research is continuously changing due to the increase of information, which raises the need for an effective way to analyze and organize large amounts of documents. In this paper, we propose Performance Improvement of Topic Modeling using BART(Bidirectional and Auto-Regressive Transformers) based Document Summarization. The proposed method uses BART-based document summary model to extract the core content and improve topic modeling performance using LDA(Latent Dirichlet Allocation) algorithm. We suggest an approach to improve the performance and efficiency of LDA topic modeling through document summarization and validate it through experiments. The experimental results show that the BART-based model for summarizing article data captures the important information of the original articles with F1-Scores of 0.5819, 0.4384, and 0.5038 in Rouge-1, Rouge-2, and Rouge-L performance evaluations, respectively. In addition, topic modeling using summarized documents performs about 8.08% better than topic modeling using full text in the performance comparison using the Perplexity metric. This contributes to the reduction of data throughput and improvement of efficiency in the topic modeling process.

Automatic Meeting Summary System using Enhanced TextRank Algorithm (향상된 TextRank 알고리즘을 이용한 자동 회의록 생성 시스템)

  • Bae, Young-Jun;Jang, Ho-Taek;Hong, Tae-Won;Lee, Hae-Yeoun
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.5
    • /
    • pp.467-474
    • /
    • 2018
  • To organize and document the contents of meetings and discussions is very important in various tasks. However, in the past, people had to manually organize the contents themselves. In this paper, we describe the development of a system that generates the meeting minutes automatically using the TextRank algorithm. The proposed system records all the utterances of the speaker in real time and calculates the similarity based on the appearance frequency of the sentences. Then, to create the meeting minutes, it extracts important words or phrases through a non-supervised learning algorithm for finding the relation between the sentences in the document data. Especially, we improved the performance by introducing the keyword weighting technique for the TextRank algorithm which reconfigured the PageRank algorithm to fit words and sentences.