• Title/Summary/Keyword: content-based summary

Search Result 75, Processing Time 0.023 seconds

Automatic Music Summarization Using Similarity Measure Based on Multi-Level Vector Quantization (다중레벨 벡터양자화 기반의 유사도를 이용한 자동 음악요약)

  • Kim, Sung-Tak;Kim, Sang-Ho;Kim, Hoi-Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.2E
    • /
    • pp.39-43
    • /
    • 2007
  • Music summarization refers to a technique which automatically extracts the most important and representative segments in music content. In this paper, we propose and evaluate a technique which provides the repeated part in music content as music summary. For extracting a repeated segment in music content, the proposed algorithm uses the weighted sum of similarity measures based on multi-level vector quantization for fixed-length summary or optimal-length summary. For similarity measures, count-based similarity measure and distance-based similarity measure are proposed. The number of the same codeword and the Mahalanobis distance of features which have same codeword at the same position in segments are used for count-based and distance-based similarity measure, respectively. Fixed-length music summary is evaluated by measuring the overlapping ratio between hand-made repeated parts and automatically generated ones. Optimal-length music summary is evaluated by calculating how much automatically generated music summary includes repeated parts of the music content. From experiments we observed that optimal-length summary could capture the repeated parts in music content more effectively in terms of summary length than fixed-length summary.

Automatic Summary Method of Linguistic Educational Video Using Multiple Visual Features (다중 비주얼 특징을 이용한 어학 교육 비디오의 자동 요약 방법)

  • Han Hee-Jun;Kim Cheon-Seog;Choo Jin-Ho;Ro Yong-Man
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.10
    • /
    • pp.1452-1463
    • /
    • 2004
  • The requirement of automatic video summary is increasing as bi-directional broadcasting contents and various user requests and preferences for the bi -directional broadcast environment are increasing. Automatic video summary is needed for an efficient management and usage of many contents in service provider as well. In this paper, we propose a method to generate a content-based summary of linguistic educational videos automatically. First, shot-boundaries and keyframes are generated from linguistic educational video and then multiple(low-level) visual features are extracted. Next, the semantic parts (Explanation part, Dialog part, Text-based part) of the linguistic educational video are generated using extracted visual features. Lastly the XMI- document describing summary information is made based on HieraTchical Summary architecture oi MPEG-7 MDS (Multimedia I)escription Scheme). Experimental results show that our proposed algorithm provides reasonable performance for automatic summary of linguistic educational videos. We verified that the proposed method is useful ior video summary system to provide various services as well as management of educational contents.

  • PDF

An Efficient Machine Learning-based Text Summarization in the Malayalam Language

  • P Haroon, Rosna;Gafur M, Abdul;Nisha U, Barakkath
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1778-1799
    • /
    • 2022
  • Automatic text summarization is a procedure that packs enormous content into a more limited book that incorporates significant data. Malayalam is one of the toughest languages utilized in certain areas of India, most normally in Kerala and in Lakshadweep. Natural language processing in the Malayalam language is relatively low due to the complexity of the language as well as the scarcity of available resources. In this paper, a way is proposed to deal with the text summarization process in Malayalam documents by training a model based on the Support Vector Machine classification algorithm. Different features of the text are taken into account for training the machine so that the system can output the most important data from the input text. The classifier can classify the most important, important, average, and least significant sentences into separate classes and based on this, the machine will be able to create a summary of the input document. The user can select a compression ratio so that the system will output that much fraction of the summary. The model performance is measured by using different genres of Malayalam documents as well as documents from the same domain. The model is evaluated by considering content evaluation measures precision, recall, F score, and relative utility. Obtained precision and recall value shows that the model is trustable and found to be more relevant compared to the other summarizers.

Korean Information Summary System for National R&D Projcet Information Summary (국가R&D과제정보 요약을 위한 한국어 정보요약 시스템)

  • Lee, Jong-Won;Kim, Tae-Hyun;Shin, Dong-Gu;Jo, Woo-Seung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.72-74
    • /
    • 2022
  • The National Science and Technology Knowledge Information Service (NTIS) provides information on national R&D projects. Project information consists of meta-information such as 'project name', 'project performance institution', 'research manager name', and text explaining projects such as 'research goal', 'research content', and 'expected effect'. There is a problem that it takes a lot of time to find the desired project information by checking all of the "research goals" or "research contents" in the list of results of searching for 1 million project information. To solve this problem, this paper proposes a project information summary system that summarizes the parts consisting of long texts within the national R&D project information. By analyzing the linguistic characteristics of the Korean language, a preprocessor was built and a project information summary model based on natural language processing technology was developed to process preprocessed text information. Through this, project information composed of long sentences is provided in a compressed and summarized form, which will help users to easily and quickly infer the overall content with the summary information alone.

  • PDF

Dynamic Summarization and Summary Description Scheme for Efficient Video Browsing (효율적인 비디오 브라우징을 위한 동적 요약 및 요약 기술구조)

  • 김재곤;장현성;김문철;김진웅;김형명
    • Journal of Broadcast Engineering
    • /
    • v.5 no.1
    • /
    • pp.82-93
    • /
    • 2000
  • Recently, the capability of efficient access to the desired video content is of growing importance because more digital video data are available at an increasing rate. A video summary abstracting the gist from the entirety enables the efficient browsing as well as the fast skimming of the video contents. In this paper, we discuss a novel dynamic summarization method based on the detection of highlights which represent semantically significant content and the description scheme (DS) proposed to MPEG-7 aiming to provide summary description. The summary DS proposed to MPEG-7 allows for efficient navigation and browsing to the contents of interest through the functionalities of multi-level highlights, hierarchical browsing and user-customized summarization. In this paper, we also show the validation and the usefulness of the methodology for dynamic summarization and the summary DS in real applications with soccer video sequences.

  • PDF

Implementation of TV-Anytime Compliant STB for Personalized TV Services

  • Lee Hee Kyung;Yang Seung Jun;Kim Jae Gon;Hong Jin Woo
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.576-580
    • /
    • 2004
  • In this paper, we present a design and implementation of a TV-Anytime compliant STB to provide personalized content consumption according to user preferences and various terminal/network conditions. This paper mainly details with a metadata engine which consists of meta data de-multiplexing, metadata decoding, and metadata-based content browsing. For personalized content consumption, the proposed metadata engine provides the following key functionalities: advanced EPG, non-linear segment navigation wirh Tables-of-Content and/or event-based summary, automatic recommendation of user-preferred programs, and etc. The implemented STB employing the metadata engine is successfully tested with a set of service scenarios in an end-to-end broadcasting test-bed.

  • PDF

A Study on Music Summarization (음악요약 생성에 관한 연구)

  • Kim Sung-Tak;Kim Sang-Ho;Kim Hoi-Rin;Choi Ji-Hoon;Lee Han-Kyu;Hong Jin-Woo
    • Journal of Broadcast Engineering
    • /
    • v.11 no.1 s.30
    • /
    • pp.3-14
    • /
    • 2006
  • Music summarization means a technique which automatically generates the most importantand representative a part or parts ill music content. The techniques of music summarization have been studied with two categories according to summary characteristics. The first one is that the repeated part is provided as music summary and the second provides the combined segments which consist of segments with different characteristics as music summary in music content In this paper, we propose and evaluate two kinds of music summarization techniques. The algorithm using multi-level vector quantization which provides a repeated part as music summary gives fixed-length music summary is evaluated by overlapping ration between hand-made repeated parts and automatically generated summary. As results, the overlapping ratios of conventional methods are 42.2% and 47.4%, but that of proposed method with fixed-length summary is 67.1%. Optimal length music summary is evaluated by the portion of overlapping between summary and repeated part which is different length according to music content and the result shows that automatically-generated summary expresses more effective part than fixed-length summary with optimal length. The cluster-based algorithm using 2-D similarity matrix and k-means algorithm provides the combined segments as music summary. In order to evaluate this algorithm, we use MOS test consisting of two questions(How many similar segments are in summarized music? How many segments are included in same structure?) and the results show good performance.

Empirical Study for Automatic Evaluation of Abstractive Summarization by Error-Types (오류 유형에 따른 생성요약 모델의 본문-요약문 간 요약 성능평가 비교)

  • Seungsoo Lee;Sangwoo Kang
    • Korean Journal of Cognitive Science
    • /
    • v.34 no.3
    • /
    • pp.197-226
    • /
    • 2023
  • Generative Text Summarization is one of the Natural Language Processing tasks. It generates a short abbreviated summary while preserving the content of the long text. ROUGE is a widely used lexical-overlap based metric for text summarization models in generative summarization benchmarks. Although it shows very high performance, the studies report that 30% of the generated summary and the text are still inconsistent. This paper proposes a methodology for evaluating the performance of the summary model without using the correct summary. AggreFACT is a human-annotated dataset that classifies the types of errors in neural text summarization models. Among all the test candidates, the two cases, generation summary, and when errors occurred throughout the summary showed the highest correlation results. We observed that the proposed evaluation score showed a high correlation with models finetuned with BART and PEGASUS, which is pretrained with a large-scale Transformer structure.

Application of Topic Modeling Techniques in Arabic Content: A Systematic Review

  • Maram Alhmiyani;Huda Alhazmi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.6
    • /
    • pp.1-12
    • /
    • 2023
  • With the rapid increase of user generated data on digital platforms, the task of categorizing and classifying theses huge data has become difficult. Topic modeling is an unsupervised machine learning technique that can be used to get a summary from a large collection of documents. Topic modeling has been widely used in English content, yet the application of topic modeling in Arabic language is limited. Therefore, the aim of this paper is to provide a systematic review of the application of topic modeling algorithms in Arabic content. Using a well-known and trusted databases including ScienceDirect, IEEE Xplore, Springer Link, and Google Scholar. Considering the publication date from 2012 to 2022, we got 60 papers. After refining the papers based on predefined criteria, we resulted in 32 papers. Our result show that unfortunately the application of topic modeling techniques in Arabic content is limited.

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.