• Title/Summary/Keyword: summarization

Search Result 375, Processing Time 0.029 seconds

Automatic Extraction Techniques of Topic-relevant Visual Shots Using Realtime Brainwave Responses (실시간 뇌파반응을 이용한 주제관련 영상물 쇼트 자동추출기법 개발연구)

  • Kim, Yong Ho;Kim, Hyun Hee
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.8
    • /
    • pp.1260-1274
    • /
    • 2016
  • To obtain good summarization algorithms, we need first understand how people summarize videos. 'Semantic gap' refers to the gap between semantics implied in video summarization algorithms and what people actually infer from watching videos. We hypothesized that ERP responses to real time videos will show either N400 effects to topic-irrelevant shots in the 300∼500ms time-range after stimulus on-set or P600 effects to topic-relevant shots in the 500∼700ms time range. We recruited 32 participants in the EEG experiment, asking them to focus on the topic of short videos and to memorize relevant shots to the topic of the video. After analysing real time videos based on the participants' rating information, we obtained the following t-test result, showing N400 effects on PF1, F7, F3, C3, Cz, T7, and FT7 positions on the left and central hemisphere, and P600 effects on PF1, C3, Cz, and FCz on the left and central hemisphere and C4, FC4, P8, and TP8 on the right. A further 3-way MANOVA test with repeated measures of topic-relevance, hemisphere, and electrode positions showed significant interaction effects, implying that the left hemisphere at central, frontal, and pre-frontal positions were sensitive in detecting topic-relevant shots while watching real time videos.

Multi-Document Summarization Method of Reviews Using Word Embedding Clustering (워드 임베딩 클러스터링을 활용한 리뷰 다중문서 요약기법)

  • Lee, Pil Won;Hwang, Yun Young;Choi, Jong Seok;Shin, Young Tae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.11
    • /
    • pp.535-540
    • /
    • 2021
  • Multi-document refers to a document consisting of various topics, not a single topic, and a typical example is online reviews. There have been several attempts to summarize online reviews because of their vast amounts of information. However, collective summarization of reviews through existing summary models creates a problem of losing the various topics that make up the reviews. Therefore, in this paper, we present method to summarize the review with minimal loss of the topic. The proposed method classify reviews through processes such as preprocessing, importance evaluation, embedding substitution using BERT, and embedding clustering. Furthermore, the classified sentences generate the final summary using the trained Transformer summary model. The performance evaluation of the proposed model was compared by evaluating the existing summary model, seq2seq model, and the cosine similarity with the ROUGE score, and performed a high performance summary compared to the existing summary model.

Construction of Text Summarization Corpus in Economics Domain and Baseline Models

  • Sawittree Jumpathong;Akkharawoot Takhom;Prachya Boonkwan;Vipas Sutantayawalee;Peerachet Porkaew;Sitthaa Phaholphinyo;Charun Phrombut;Khemarath Choke-mangmi;Saran Yamasathien;Nattachai Tretasayuth;Kasidis Kanwatchara;Atiwat Aiemleuk;Thepchai Supnithi
    • Journal of information and communication convergence engineering
    • /
    • v.22 no.1
    • /
    • pp.33-43
    • /
    • 2024
  • Automated text summarization (ATS) systems rely on language resources as datasets. However, creating these datasets is a complex and labor-intensive task requiring linguists to extensively annotate the data. Consequently, certain public datasets for ATS, particularly in languages such as Thai, are not as readily available as those for the more popular languages. The primary objective of the ATS approach is to condense large volumes of text into shorter summaries, thereby reducing the time required to extract information from extensive textual data. Owing to the challenges involved in preparing language resources, publicly accessible datasets for Thai ATS are relatively scarce compared to those for widely used languages. The goal is to produce concise summaries and accelerate the information extraction process using vast amounts of textual input. This study introduced ThEconSum, an ATS architecture specifically designed for Thai language, using economy-related data. An evaluation of this research revealed the significant remaining tasks and limitations of the Thai language.

Automatic Text Summarization based on Selective Copy mechanism against for Addressing OOV (미등록 어휘에 대한 선택적 복사를 적용한 문서 자동요약)

  • Lee, Tae-Seok;Seon, Choong-Nyoung;Jung, Youngim;Kang, Seung-Shik
    • Smart Media Journal
    • /
    • v.8 no.2
    • /
    • pp.58-65
    • /
    • 2019
  • Automatic text summarization is a process of shortening a text document by either extraction or abstraction. The abstraction approach inspired by deep learning methods scaling to a large amount of document is applied in recent work. Abstractive text summarization involves utilizing pre-generated word embedding information. Low-frequent but salient words such as terminologies are seldom included to dictionaries, that are so called, out-of-vocabulary(OOV) problems. OOV deteriorates the performance of Encoder-Decoder model in neural network. In order to address OOV words in abstractive text summarization, we propose a copy mechanism to facilitate copying new words in the target document and generating summary sentences. Different from the previous studies, the proposed approach combines accurate pointing information and selective copy mechanism based on bidirectional RNN and bidirectional LSTM. In addition, neural network gate model to estimate the generation probability and the loss function to optimize the entire abstraction model has been applied. The dataset has been constructed from the collection of abstractions and titles of journal articles. Experimental results demonstrate that both ROUGE-1 (based on word recall) and ROUGE-L (employed longest common subsequence) of the proposed Encoding-Decoding model have been improved to 47.01 and 29.55, respectively.

Question and Answering System through Search Result Summarization of Q&A Documents (Q&A 문서의 검색 결과 요약을 활용한 질의응답 시스템)

  • Yoo, Dong Hyun;Lee, Hyun Ah
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.4
    • /
    • pp.149-154
    • /
    • 2014
  • A user should pick up relevant answers by himself from various search results when using user participation question answering community like Knowledge-iN. If refined answers are automatically provided, usability of question answering community must be improved. This paper divides questions in Q&A documents into 4 types(word, list, graph and text), then proposes summarizing methods for each question type using document statistics. Summarized answers for word, list and text type are obtained by question clustering and calculating scores for words using frequency, proximity and confidence of answers. Answers for graph type is shown by extracting user opinion from answers.

Korean Summarization System using Automatic Paragraphing (단락 자동 구분을 이용한 문서 요약 시스템)

  • 김계성;이현주;이상조
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.681-686
    • /
    • 2003
  • In this paper, we describes a system that extracts important sentences from Korean newspaper articles using automatic paragraphing. First, we detect repeated words between sentences. Through observation of the repeated words, this system compute Closeness Degree between Sentences(CDS ) from the degree of morphological agreement and the change of grammatical role. And then, it automatically divides a document into meaningful paragraphs using the number of paragraph defined by the user´s need. Finally. it selects one representative sentence from each paragraph and it generates summary using representative sentences. Though our system doesn´t utilize some features such as title, sentence position, rhetorical structure, etc., it is able to extract meaningful sentences to be included in the summary.

Analysis of Research status based on Citation Context

  • Kim, Byungkyu;Choi, Seon-heui;Kang, Muyeong;Kang, Ji-Hoon
    • International Journal of Contents
    • /
    • v.11 no.2
    • /
    • pp.63-68
    • /
    • 2015
  • A citation analysis utilizes the relations among citations and is the most popular bibliometric method. This analysis is based on 1) the evaluation by paper, journal and researcher of the research output, 2) the identification of emerging research topics, 3) the production of a map of the intellectual structure of the research domain and 4) various services for academic information. However, this approach has a limitation in that a citation is treated in a very simple manner, even though the purpose of citation can vary greatly. To address this problem, new approaches have been studied that take into account the citation context. This research separates the citations according to the citation functions and tries to conduct an analysis according to the newly classified citations. Furthermore, research on the citation summarization and visualization based on both the citation context and the citation function of the citations was also attempted. However, since there are very few studies related to citation context in South Korea, more research and development is needed in this area. This study analyzes the status of the research in terms of the citation context. For this, we utilized social network analysis methods.

Design of Metadata Retrieval Structure for Efficient Browsing of Personalized Broadcasting Contents (개인화된 방송 컨텐츠의 효율적 검색을 위한 메타데이터 검색 구조 설계)

  • Lee, Hye-Gyu;Park, Sung-Han
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.2
    • /
    • pp.100-105
    • /
    • 2009
  • In this paper, we Propose a browsing system to reduce the contents retrieval time in the personalized broadcasting system. For this purpose, we add a subgenre table between Classification DS and Summarization DS of MPEG-7 MDS. The subgenre includes a subgenre list to shorten the searching time of contents which we want to find. In addition we divide metadata into event and object using Summarization DS of the MPEG-7 MDS. In this way, the hierarchical browsing of broadcasting contents is made possible. This structure may reduce the complexity of search by storing event and object separately. Our simulation results show that the search time of the proposed system is shorter than that of the previous works.

Multi-Topic Meeting Summarization using Lexical Co-occurrence Frequency and Distribution (어휘의 동시 발생 빈도와 분포를 이용한 다중 주제 회의록 요약)

  • Lee, Byung-Soo;Lee, Jee-Hyong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2015.07a
    • /
    • pp.13-16
    • /
    • 2015
  • 본 논문에서는 어휘의 동시 발생 (co-occurrence) 빈도와 분포를 이용한 회의록 요약방법을 제안한다. 회의록은 일반 문서와 달리 문서에 여러 세부적인 주제들이 나타나며, 잘못된 형식의 문장, 불필요한 잡담들을 포함하고 있기 때문에 이러한 특징들이 문서요약 과정에서 고려되어야 한다. 기존의 일반적인 문서요약 방법은 하나의 주제를 기반으로 문서 전체에서 가장 중요한 문장으로 요약하기 때문에 다중 주제 회의록 요약에는 적합하지 않다. 제안한 방법은 먼저 어휘의 동시 발생 (co-occurrence) 빈도를 이용하여 회의록 분할 (segmentation) 과정을 수행한다. 다음으로 주제의 구분에 따라 분할된 각 영역 (block)의 중요 단어 집합 생성, 중요 문장 추출 과정을 통해 회의록의 중요 문장들을 선별한다. 마지막으로 추출된 중요 문장들의 위치, 종속 관계를 고려하여 최종적으로 회의록을 요약한다. AMI meeting corpus를 대상으로 실험한 결과, 제안한 방법이 baseline 요약 방법들보다 요약 비율에 따른 평가 및 요약문의 세부 주제별 평가에서 우수한 요약 성능을 보임을 확인하였다.

  • PDF