• Title/Summary/Keyword: Summarization Model

Search Result 88, Processing Time 0.024 seconds

Semantic Event Detection in Golf Video Using Hidden Markov Model (은닉 마코프 모델을 이용한 골프 비디오의 시멘틱 이벤트 검출)

  • Kim Cheon Seog;Choo Jin Ho;Bae Tae Meon;Jin Sung Ho;Ro Yong Man
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.11
    • /
    • pp.1540-1549
    • /
    • 2004
  • In this paper, we propose an algorithm to detect semantic events in golf video using Hidden Markov Model. The purpose of this paper is to identify and classify the golf events to facilitate highlight-based video indexing and summarization. In this paper we first define 4 semantic events, and then design HMM model with states made up of each event. We also use 10 multiple visual features based on MPEG-7 visual descriptors to acquire parameters of HMM for each event. Experimental results showed that the proposed algorithm provided reasonable detection performance for identifying a variety of golf events.

  • PDF

Keyword Network Visualization for Text Summarization and Comparative Analysis (문서 요약 및 비교분석을 위한 주제어 네트워크 가시화)

  • Kim, Kyeong-rim;Lee, Da-yeong;Cho, Hwan-Gue
    • Journal of KIISE
    • /
    • v.44 no.2
    • /
    • pp.139-147
    • /
    • 2017
  • Most of the information prevailing in the Internet space consists of textual information. So one of the main topics regarding the huge document analyses that are required in the "big data" era is the development of an automated understanding system for textual data; accordingly, the automation of the keyword extraction for text summarization and abstraction is a typical research problem. But the simple listing of a few keywords is insufficient to reveal the complex semantic structures of the general texts. In this paper, a text-visualization method that constructs a graph by computing the related degrees from the selected keywords of the target text is developed; therefore, two construction models that provide the edge relation are proposed for the computing of the relation degree among keywords, as follows: influence-interval model and word- distance model. The finally visualized graph from the keyword-derived edge relation is more flexible and useful for the display of the meaning structure of the target text; furthermore, this abstract graph enables a fast and easy understanding of the target text. The authors' experiment showed that the proposed abstract-graph model is superior to the keyword list for the attainment of a semantic and comparitive understanding of text.

Online-Based Local Government Image Typology: A Case Study on Jakarta Provincial Government Official YouTube Videos

  • Pratama, Arif Budy
    • Journal of Contemporary Eastern Asia
    • /
    • v.16 no.1
    • /
    • pp.1-21
    • /
    • 2017
  • The Jakarta Provincial Government utilizes the YouTube channel to interact with citizens and enhance transparency. The purpose of this study is to explore online perceptions of local government image perceived by online audiences through the YouTube platform. The concepts of organizational image and credibility in the political image are adapted to analyze online public perceptions on the Jakarta Provincial Government image. Using the video summarization approach on Three hundred and forty-six official YouTube videos, which were uploaded from 1 March 2016 to 31 May 2016, and content analysis of Eight thousand two hundred and thirty-seven comments, this study shows both political and bureaucratic image emerge concurrently in the Jakarta Provincial Government case. The typology model is proposed to describe and explain the four image variations that occurred in the case study. Practical recommendations are suggested to manage YouTube channel as one of the social media used in the local government context.

An Automatic Summarization System Based On a Probabilistic Model Using Document Structure Information (문서 구조 정보를 이용한 확률 모델 기반 자동요약 시스템)

  • Jang, Dong-Hyun;Myaeng, Sung-Hyon
    • Annual Conference on Human and Language Technology
    • /
    • 1997.10a
    • /
    • pp.15-22
    • /
    • 1997
  • 인터넷과 정보 서비스 기술의 발달로 일반 대중에게 제공되는 정보의 양은 기하급수적으로 증가하고 있는 추세지만 사용자가 원하는 정보를 얻기는 더욱 어려워지고 있으며, 필요한 정보를 찾은 경우에도 그 양이 많기 때문에 전체적인 내용을 파악하는 데 많은 시간을 소비하게 된다. 이러한 문제를 해결하고자 본 연구에서는 통계적 모델을 사용하여 문서로부터 문장을 추출한 후 요약문을 작성하여 사용자에게 제시하는 시스템을 개발하였다. 문서 요약 시스템의 구축을 위하여 사용된 방법은 문서 집합으로부터 중요 문장을 추출한 후 이로부터 요약문에 나타날 수 있는 특성(feature)과 중요 단어를 학습하여 학습된 내용을 이용하여 요약문을 하는 방법이다. 시스템 개발 및 평가를 위해 사용된 문서는 정보 과학 분야의 논문 모음이며 이를 학습 데이터와 실험 데이터로 구분한 후 학습 데이터로부터 필요한 정보를 얻고 실험 데이터로 평가하였다.

  • PDF

Korean Pre-trained Model KE-T5-based Automatic Paper Summarization (한국어 사전학습 모델 KE-T5 기반 자동 논문 요약)

  • Seo, Hyeon-Tae;Shin, Saim;Kim, San
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.505-506
    • /
    • 2021
  • 최근 인터넷에서 기하급수적으로 증가하는 방대한 양의 텍스트를 자동으로 요약하려는 연구가 활발하게 이루어지고 있다. 자동 텍스트 요약 작업은 다양한 사전학습 모델의 등장으로 인해 많은 발전을 이루었다. 특히 T5(Text-to-Text Transfer Transformer) 기반의 모델은 자동 텍스트 요약 작업에서 매우 우수한 성능을 보이며, 해당 분야의 SOTA(State of the Art)를 달성하고 있다. 본 논문에서는 방대한 양의 한국어를 학습시킨 사전학습 모델 KE-T5를 활용하여 자동 논문 요약을 수행하고 평가한다.

  • PDF

Comparative Analysis of Language Model Performance in News Domain Summarization (언어 모델의 뉴스 도메인 요약 성능 비교 분석)

  • Sangwon Ryu;Yunsu Kim;Gary Geunbae Lee
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.131-136
    • /
    • 2023
  • 본 논문에서는 기존의 요약 태스크에서 주로 사용하는 인코더-디코더 모델과 디코더 기반의 언어 모델의 성능을 비교한다. 요약 태스크를 평가하는 주요한 평가 지표인 ROUGE 점수의 경우, 정답 요약문과 모델이 생성한 요약문 간의 겹치는 단어를 기준으로 평가한다. 따라서, 추상적인 요약문을 생성하는 언어 모델의 경우 인코더-디코더 모델에 비해 낮은 ROUGE 점수가 측정되는 경향이 있다. 또한, 최근 연구에서 정답 요약문 자체의 낮은 품질에 대한 문제가 되었고, 이는 곧 ROUGE 점수로 모델이 생성하는 요약문을 평가하는 것에 대한 신뢰도 저하로 이어진다. 따라서, 본 논문에서는 언어 모델의 요약 성능을 보다 다양한 관점에서 평가하여 언어 모델이 기존의 인코더-디코더 모델보다 좋은 요약문을 생성한다는 것을 보인다.

  • PDF

A Comparative Study on the Korean Text Extractive Summarization using Pre-trained Language Model (사전 학습 언어 모델을 이용한 한국어 문서 추출 요약 비교 분석)

  • Young-Rae Cho;Kwang-Hyun Baek;Min-Ji Park;Byung Hoon Park;Sooyeon Shin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.518-521
    • /
    • 2023
  • 오늘날 과도한 정보의 양 속에서 디지털 문서 내 중요한 정보를 효율적으로 획득하는 것은 비용 효율의 측면에서 중요한 요구사항이 되었다. 문서 요약은 자연어 처리의 한 분야로서 원본 문서의 핵심적인 정보를 유지하는 동시에 중요 문장을 추출 또는 생성하는 작업이다. 이 중 추출요약은 정보의 손실 및 잘못된 정보 생성의 가능성을 줄이고 요약 가능하다. 그러나 여러 토크나이저와 임베딩 모델 중 적절한 활용을 위한 비교가 미진한 상황이다. 본 논문에서는 한국어 사전학습된 추출 요약 언어 모델들을 선정하고 추가 데이터셋으로 학습하고 성능 평가를 실시하여 그 결과를 비교 분석하였다.

An Automatic Summarization of Call-For-Paper Documents Using a 2-Phase hidden Markov Model (2단계 은닉 마코프 모델을 이용한 논문 모집 공고의 자동 요약)

  • Kim, Jeong-Hyun;Park, Seong-Bae;Lee, Sang-Jo;Park, Se-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.2
    • /
    • pp.243-250
    • /
    • 2008
  • This paper proposes a system which extracts necessary information from call-for-paper (CFP) documents using a hidden Markov model (HMM). Even though a CFP does not follow a strict form, there is, in general, a relatively-fixed sequence of information within most CFPs. Therefore, a hiden Markov model is adopted to analyze CFPs which has an advantage of processing consecutive data. However, when CFPs are intuitively modeled with a hidden Markov model, a problem arises that the boundaries of the information are not recognized accurately. In order to solve this problem, this paper proposes a two-phrase hidden Markov model. In the first step, the P-HMM (Phrase hidden Markov model) which models a document with phrases recognizes CFP documents locally. Then, the D-HMM (Document hidden Markov model) grasps the overall structure and information flow of the document. The experiments over 400 CFP documents grathered on Web result in 0.49 of F-score. This performance implies 0.15 of F-measure improvement over the HMM which is intuitively modeled.

Summarizing the Differences in Chinese-Vietnamese Bilingual News

  • Wu, Jinjuan;Yu, Zhengtao;Liu, Shulong;Zhang, Yafei;Gao, Shengxiang
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1365-1377
    • /
    • 2019
  • Summarizing the differences in Chinese-Vietnamese bilingual news plays an important supporting role in the comparative analysis of news views between China and Vietnam. Aiming at cross-language problems in the analysis of the differences between Chinese and Vietnamese bilingual news, we propose a new method of summarizing the differences based on an undirected graph model. The method extracts elements to represent the sentences, and builds a bridge between different languages based on Wikipedia's multilingual concept description page. Firstly, we calculate the similarity between Chinese and Vietnamese news sentences, and filter the bilingual sentences accordingly. Then we use the filtered sentences as nodes and the similarity grade as the weight of the edge to construct an undirected graph model. Finally, combining the random walk algorithm, the weight of the node is calculated according to the weight of the edge, and sentences with highest weight can be extracted as the difference summary. The experiment results show that our proposed approach achieved the highest score of 0.1837 on the annotated test set, which outperforms the state-of-the-art summarization models.

An Innovative Approach of Bangla Text Summarization by Introducing Pronoun Replacement and Improved Sentence Ranking

  • Haque, Md. Majharul;Pervin, Suraiya;Begum, Zerina
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.752-777
    • /
    • 2017
  • This paper proposes an automatic method to summarize Bangla news document. In the proposed approach, pronoun replacement is accomplished for the first time to minimize the dangling pronoun from summary. After replacing pronoun, sentences are ranked using term frequency, sentence frequency, numerical figures and title words. If two sentences have at least 60% cosine similarity, the frequency of the larger sentence is increased, and the smaller sentence is removed to eliminate redundancy. Moreover, the first sentence is included in summary always if it contains any title word. In Bangla text, numerical figures can be presented both in words and digits with a variety of forms. All these forms are identified to assess the importance of sentences. We have used the rule-based system in this approach with hidden Markov model and Markov chain model. To explore the rules, we have analyzed 3,000 Bangla news documents and studied some Bangla grammar books. A series of experiments are performed on 200 Bangla news documents and 600 summaries (3 summaries are for each document). The evaluation results demonstrate the effectiveness of the proposed technique over the four latest methods.