• Title/Summary/Keyword: Document Summary

Search Result 85, Processing Time 0.026 seconds

Summary Generation of a Document with Out-of-vocabulary Words (어휘 사전에 없는 단어를 포함한 문서의 요약문 생성 방법)

  • Lee, Tae-seok;Kang, Seung-Shik
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.530-531
    • /
    • 2018
  • 문서 자동 요약은 주요 단어 또는 문장을 추출하거나 문장을 생성하는 방식으로 요약한다. 최근 연구에서는 대량의 문서를 딥러닝하여 요약문 자체를 생성하는 방식으로 발전하고 있다. 추출 요약이나 생성 요약 모두 핵심 단어를 인식하는 것이 매우 중요하다. 학습할 때 각 단어가 문장에서 출현한 패턴으로부터 의미를 인식하고 단어를 선별하여 요약한다. 결국 기계학습에서는 학습 문서에 출현한 어휘만으로 요약을 한다. 따라서 학습 문서에 출현하지 않았던 어휘가 포함된 새로운 문서의 요약에서 기존 모델이 잘 작동하기 어려운 문제가 있다. 본 논문에서는 학습단계에서 출현하지 않은 단어까지도 중요성을 인식하고 요약문을 생성할 수 있는 신경망 모델을 제안하였다.

  • PDF

A Wartime·Peacetime OMS/MP Analysis Model for a Naval Ship and Case Study (함정 전·평시 OMS/MP 설정 방법론 연구 및 사례)

  • Ha, Sungchul;Kook, Jungho
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.15 no.5
    • /
    • pp.660-666
    • /
    • 2012
  • The weapon system is getting more and more expensive, complex and smarter. Therefore, efficiently and effectively, it is important to operate the weapon system. OMS/MP is a document to quantify operational factors like as environment, mission, mode etc. It is important data to perform RAM analysis in early weapon development phase and operate better a weapon system. This paper present a process and framework of OMS/MP for a naval ship with a deep analysis of relevant domestic and abroad case studies. It propose OMS/MP analysis framework based on wartime scenario and mission area analysis. This result will contribute not only improvement for the availability of a naval ship but also enhancement of RAM analysis process.

Comparative study of legal document summary method based on pre-trained model (사전학습 기반의 법률문서 요약 방법 비교연구)

  • Kim, EuiSoon;Lim, HeuiSeok
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.614-617
    • /
    • 2021
  • 법률 문서는 일반 사용자가 이해하기 어려운 용어로 이루어져 있고 특히 장문의 문서가 많아 법률시스템에 종사하는 종사자들 또한 많은 양의 문서를 읽기가 어려운 현실이다. 이에 문서 요약 방법중 딥러닝 기반의 사전학습 모델을 적용한 추출요약기반, 생성요약 방법론과 딥러닝 이전의 핵심문장 추출 방법론을 비교하여 법률용어의 요약성능에 대한 비교 평가를 수행하고자 하며 추후 연구과제로 법률문서에 특화된 요약 모델을 만들어보고자 한다.

An Experimental Study on Automatic Summarization of Multiple News Articles (복수의 신문기사 자동요약에 관한 실험적 연구)

  • Kim, Yong-Kwang;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.1 s.59
    • /
    • pp.83-98
    • /
    • 2006
  • This study proposes a template-based method of automatic summarization of multiple news articles using the semantic categories of sentences. First, the semantic categories for core information to be included in a summary are identified from training set of documents and their summaries. Then, cue words for each slot of the template are selected for later classification of news sentences into relevant slots. When a news article is input, its event/accident category is identified, and key sentences are extracted from the news article and filled in the relevant slots. The template filled with simple sentences rather than original long sentences is used to generate a summary for an event/accident. In the user evaluation of the generated summaries, the results showed the 54.l% recall ratio and the 58.l% precision ratio in essential information extraction and 11.6% redundancy ratio.

A Study on the Online Arbitration Rules in China (중국 온라인중재규칙에 관한 연구)

  • Choi, Seok-Beom
    • Journal of Arbitration Studies
    • /
    • v.21 no.2
    • /
    • pp.47-64
    • /
    • 2011
  • The China International Economic and Trade Arbitration Commission(CIETAC) released online arbitration rules which apply the resolution of disputes over electronic commerce transactions, as well as other economic and trade disputes in which the parties agree to do. The evidence submitted by the parties may be electronic evidence created, sent, received or stored by electronic, optical or magnetic means. Electronic evidence with a reliable electronic signature shall carry the same effect and probative force as a document with a hand-written signature. Where a case is tried in a tribunal, the arbitration tribunal shall conduct an online trial hearing using internet video conference or other electronic or computer communication means. Unless the parties have another agreement, summary procedure shall apply to cases where the amount in dispute exceeds RMB 100,000 but no more than RMB 1 million, or where the amount in dispute exceeds RMB 1 million and a party submits a written application for summary procedure after obtaining the written consent of the other party. Unless the parties have agreed otherwise, fast-track procedure shall apply to cases where the amount in dispute does not exceed RMB 100,000 or where the amount in dispute exceeds RMB 100,000 and a party submits a written application for fast-track procedure after obtaining the written consent of the other party. Notable features of the Online Rules are as follows; first, there is not detailed consideration for online arbitration. Second, communications between the parties and the tribunal are allowed only through the Secretariat. Third, elaborate provisions regarding the electronic submission and transmission of documents is provided for. Forth, various factors must be considered by the tribunal in deciding the evidence's reliability. Fifth, reasonable endeavours is levied on CIETAC to keep data communications secure and encrypted. Sixth, the tribunal has the right to investigate and collect relevant evidence. And finally different procedures are provided for in consideration of the various types of E-commerce.

  • PDF

Building a Korean Text Summarization Dataset Using News Articles of Social Media (신문기사와 소셜 미디어를 활용한 한국어 문서요약 데이터 구축)

  • Lee, Gyoung Ho;Park, Yo-Han;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.8
    • /
    • pp.251-258
    • /
    • 2020
  • A training dataset for text summarization consists of pairs of a document and its summary. As conventional approaches to building text summarization dataset are human labor intensive, it is not easy to construct large datasets for text summarization. A collection of news articles is one of the most popular resources for text summarization because it is easily accessible, large-scale and high-quality text. From social media news services, we can collect not only headlines and subheads of news articles but also summary descriptions that human editors write about the news articles. Approximately 425,000 pairs of news articles and their summaries are collected from social media. We implemented an automatic extractive summarizer and trained it on the dataset. The performance of the summarizer is compared with unsupervised models. The summarizer achieved better results than unsupervised models in terms of ROUGE score.

A Document Collection Method for More Accurate Search Engine (정확도 높은 검색 엔진을 위한 문서 수집 방법)

  • Ha, Eun-Yong;Gwon, Hui-Yong;Hwang, Ho-Yeong
    • The KIPS Transactions:PartA
    • /
    • v.10A no.5
    • /
    • pp.469-478
    • /
    • 2003
  • Internet information search engines using web robots visit servers conneted to the Internet periodically or non-periodically. They extract and classify data collected according to their own method and construct their database, which are the basis of web information search engines. There procedure are repeated very frequently on the Web. Many search engine sites operate this processing strategically to become popular interneet portal sites which provede users ways how to information on the web. Web search engine contacts to thousands of thousands web servers and maintains its existed databases and navigates to get data about newly connected web servers. But these jobs are decided and conducted by search engines. They run web robots to collect data from web servers without knowledge on the states of web servers. Each search engine issues lots of requests and receives responses from web servers. This is one cause to increase internet traffic on the web. If each web server notify web robots about summary on its public documents and then each web robot runs collecting operations using this summary to the corresponding documents on the web servers, the unnecessary internet traffic is eliminated and also the accuracy of data on search engines will become higher. And the processing overhead concerned with web related jobs on web servers and search engines will become lower. In this paper, a monitoring system on the web server is designed and implemented, which monitors states of documents on the web server and summarizes changes of modified documents and sends the summary information to web robots which want to get documents from the web server. And an efficient web robot on the web search engine is also designed and implemented, which uses the notified summary and gets corresponding documents from the web servers and extracts index and updates its databases.

The Classification System of the Official Documents in the Colonial Period (일제하 조선총독부의 공문서 분류방식)

  • Park, Sung-jin
    • The Korean Journal of Archival Studies
    • /
    • no.5
    • /
    • pp.179-208
    • /
    • 2002
  • In this paper, I explained the dominating/dominated relationship of Japan and Colonized Korea by analysing the management system of official documents. I examined the theory and practices of the classification used by the office of the Governor-General for preserving official documents whose production and circulation ended. In summary, first, the office of the Governor-General and its municipal authorities classified and filed documents according to the nature and regulations on apportionment for the organizations. The apportionment of the central and local organs was not fixed through the colonial period and changed chronologically. The organization and apportionment of the central and local organs reflected the changes in the colonial policies. As a result, even in the same organs, the composition of documents had differences at different times. The essential way of classifying documents in the colonial period was to sort out official documents which should be preserved serially and successively according to each function of the colonial authorities. The filing of documents was taken place in the form of the direct reflection of organizing and apportioning of the function among several branches of the office of the Governor-General and other governmental organs. However, for the reason that filing documents was guided at the level of the organs, each organ's members responsible for documents hardly composed the filing unit as a sub-category of the organ itself. Second, Japan constructed the infrastructure of colonial rule through the management system of official documents. After Kabo Reform, the management system of official documents had the same principles as those of the Japan proper. The office of the Governor-General not only adopted several regulations on the management of official documents, but also controlled the arrangement and the situation of document managing in the local governmental organizations with the constant censorship. The management system of documents was fundamentally based on the reality of colonial rule and neglected many principles of archival science. For example, the office of Governor-General labelled many policy documents as classified and burnt them only because of the administrative and managerial purposes. Those practices were inherited in the document management system of post-colonial Korea and resulted in scrapping of official documents in large quantities because the system produced too many "classified documents".

Web Document Transcoding Technique for Small Display Devices (소형 화면 단말기를 위한 웹 문서 변환 기법)

  • Shin, Hee-Sook;Mah, Pyeong-Soo;Cho, Soo-Sun;Lee, Dong-Woo
    • The KIPS Transactions:PartD
    • /
    • v.9D no.6
    • /
    • pp.1145-1156
    • /
    • 2002
  • We propose a web document transcoding technique that translates existing web pages designed for desktop computers into an appropriate form for hand-held devices connected to the wireless internet. By defining a content block based on a visual separation and using it as a minimum unit for analyzing and converting processes, we can get web pages converted more exactly. We also apply the reallocation of the content block and the generation of new index in order to provide convenient interface without left-right scrolling in small screen devices. These methods, compared with existing ways such as text level summary or partial extraction method, can provide efficient navigation and a full recognition of web documents. To gain those transcoding benefits, we propose the Layout-Forming Tag Analysis Algorithm that analyzes structural tags, which motivate visual separation and the Component Grouping Algorithm that extracts the content block. We also classify and rearrange the content block and generate the new index to produce an appropriate form of web pages for small display devices. We have designed and implemented our transcoding system in a proxy server and evaluated the methods and the algorithms through an analysis of transcoded results. Our transcoding system showed a good result on most of popular web pages that have complicated structures.

Automatic Text Summarization based on Selective Copy mechanism against for Addressing OOV (미등록 어휘에 대한 선택적 복사를 적용한 문서 자동요약)

  • Lee, Tae-Seok;Seon, Choong-Nyoung;Jung, Youngim;Kang, Seung-Shik
    • Smart Media Journal
    • /
    • v.8 no.2
    • /
    • pp.58-65
    • /
    • 2019
  • Automatic text summarization is a process of shortening a text document by either extraction or abstraction. The abstraction approach inspired by deep learning methods scaling to a large amount of document is applied in recent work. Abstractive text summarization involves utilizing pre-generated word embedding information. Low-frequent but salient words such as terminologies are seldom included to dictionaries, that are so called, out-of-vocabulary(OOV) problems. OOV deteriorates the performance of Encoder-Decoder model in neural network. In order to address OOV words in abstractive text summarization, we propose a copy mechanism to facilitate copying new words in the target document and generating summary sentences. Different from the previous studies, the proposed approach combines accurate pointing information and selective copy mechanism based on bidirectional RNN and bidirectional LSTM. In addition, neural network gate model to estimate the generation probability and the loss function to optimize the entire abstraction model has been applied. The dataset has been constructed from the collection of abstractions and titles of journal articles. Experimental results demonstrate that both ROUGE-1 (based on word recall) and ROUGE-L (employed longest common subsequence) of the proposed Encoding-Decoding model have been improved to 47.01 and 29.55, respectively.