• Title/Summary/Keyword: Similar Document

Search Result 244, Processing Time 0.02 seconds

News Recommendation Exploiting Document Summarization based on Deep Learning (딥러닝 기반의 문서요약기법을 활용한 뉴스 추천)

  • Heu, Jee-Uk
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.4
    • /
    • pp.23-28
    • /
    • 2022
  • Recently smart device(such as smart phone and tablet PC) become a role as an information gateway, using of the web news by multiple users from the web portal has been more important things. However, the quantity of creating web news on the web makes hard to catch the information which the user wants and confuse the users cause of the similar and repeated contents. In this paper, we propose the news recommend system using the document summarization based on KoBART which gives the selected news to users from the candidate news on the news portal. As a result, our proposed system shows higher performance and recommending the news efficiently by pre-training and fine-tuning the KoBART using collected news data.

Firm Classification based on MBTI Organizational Character Type: Using Firm Review Big Data (MBTI 조직성격유형화에 따른 기업분류: 기업리뷰 빅데이터를 활용하여)

  • Lee, Hanjun;Shin, Dongwon;An, Byungdae
    • Asia-Pacific Journal of Business
    • /
    • v.12 no.3
    • /
    • pp.361-378
    • /
    • 2021
  • Purpose - The purpose of this study is to classify KOSPI listed companies according to their organizational character type based on MBTI. Design/methodology/approach - This study collected 109,989 reviews from an online firm review website, Jobplanet. Using these reviews and the descriptions about organizational character, we conducted document similarity analysis. Doc2Vec technique was hired for the analysis. Findings - First, there are more companies belonging to Extraversion(E), Intuition(N), Feeling(F), and Judging(J) than Introversion(I), Sensing(S), Thinking(T), and Perceiving(P) as organizational character types of MBTI. Second, more companies have EJ and EP as the behavior type and NT and NF as the decision-making type. Third, the top-3 organizational character type of which firms have among 16 types are ENTJ, ENFP, and ENFJ. Finally, companies belonging to the same industry group were found to have similar organizational character. Research implications or Originality - This study provides a noble way to measure organizational character type using firm review big data and document similarity analysis technique. The research results can be practically used for firms in their organizational diagnosis and organizational management, and are meaningful as a basic study for various future studies to empirically analyze the impact of organizational character.

A study on the current archival system of government document in korea (한국의 현행 정부기록보존제도에 대한 고찰)

  • 김상호
    • Journal of Korean Library and Information Science Society
    • /
    • v.25
    • /
    • pp.225-256
    • /
    • 1996
  • The purpose of this study is to find the ways that will be helpful to improve archival system and to establish National Archives. The contents of this study were focused on comparing the characteristics of archival system in governmental administration,: Executive, Legislature, and Judiciary. They were also focused on analyzing the problems of those current system. This research was basically conducted depending on the detailed articles of the legislation and regulation pertinent to the archives. Two major results of the research are 1) There are much differences among the governmental administrations in structuring and organizing for archival administrations. Archival works of government document are divided primarily according to the period of conservation and it is necessary to establish the regional archives and central management system, and to employ archivists as an expert staff. 2) The principles and methods of archival process, such as transferences, classifications, preservations, access, and destructions are similar to each other. In order to improve and co-ordinate current systems, it is necessary to constitute several councils endowed with consultative and decision-making power.

  • PDF

A Study on the application of International Transport Law to electronic bill of lading (전자식(電子式) 선하증권(船荷證券)과 국제운송규칙(國際運送規則))

  • Yang, Jung-Ho
    • THE INTERNATIONAL COMMERCE & LAW REVIEW
    • /
    • v.20
    • /
    • pp.369-385
    • /
    • 2003
  • Contracts of carriage evidenced by bill of lading which are made between carrier and unidentified number of the shipper are to a large extent regulated by statute law such as Hague-Visby Rules and Hamburg Rules. These rules qualifies the contractual liberty of parties and especially restrains the carrier from introducing exemption from his liability beyond those admitted by the Rules. However, these Rules are applied only to goods in respect of which a bill of lading or similar document of title has been issued. In this reason, it is possible that liability of carrier in respect of goods shipped could become an issue where electronic bill of lading is used instead of paper bill of lading because electronic bill of lading is not generally recognised document of title in existing rule. Thus, this article discuss the relation between the carrier who create electronic bill of lading and the Rules regulating liability of carrier. Also, new Rules which has been examining in UNCITRAL will be introduced.

  • PDF

A Feature -Based Word Spotting for Content-Based Retrieval of Machine-Printed English Document Images (내용기반의 인쇄체 영문 문서 영상 검색을 위한 특징 기반 단어 검색)

  • Jeong, Gyu-Sik;Gwon, Hui-Ung
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.10
    • /
    • pp.1204-1218
    • /
    • 1999
  • 문서영상 검색을 위한 디지털도서관의 대부분은 논문제목과/또는 논문요약으로부터 만들어진 색인에 근거한 제한적인 검색기능을 제공하고 있다. 본 논문에서는 영문 문서영상전체에 대한 검색을 위한 단어 영상 형태 특징기반의 단어검색시스템을 제안한다. 본 논문에서는 검색의 효율성과 정확도를 높이기 위해 1) 기존의 단어검색시스템에서 사용된 특징들을 조합하여 사용하며, 2) 특징의 개수 및 위치뿐만 아니라 특징들의 순서를 포함하여 매칭하는 방법을 사용하며, 3) 특징비교에 의해 검색결과를 얻은 후에 여과목적으로 문자인식을 부분적으로 적용하는 2단계의 검색방법을 사용한다. 제안된 시스템의 동작은 다음과 같다. 문서 영상이 주어지면, 문서 영상 구조가 분석되고 단어 영역들의 조합으로 분할된다. 단어 영상의 특징들이 추출되어 저장된다. 사용자의 텍스트 질의가 주어지면 이에 대응되는 단어 영상이 만들어지며 이로부터 영상특징이 추출된다. 이 참조 특징과 저장된 특징들과 비교하여 유사한 단어를 검색하게 된다. 제안된 시스템은 IBM-PC를 이용한 웹 환경에서 구축되었으며, 영문 문서영상을 이용하여 실험이 수행되었다. 실험결과는 본 논문에서 제안하는 방법들의 유효성을 보여주고 있다. Abstract Most existing digital libraries for document image retrieval provide a limited retrieval service due to their indexing from document titles and/or the content of document abstracts. This paper proposes a word spotting system for full English document image retrieval based on word image shape features. In order to improve not only the efficiency but also the precision of a retrieval system, we develop the system by 1) using a combination of the holistic features which have been used in the existing word spotting systems, 2) performing image matching by comparing the order of features in a word in addition to the number of features and their positions, and 3) adopting 2 stage retrieval strategies by obtaining retrieval results by image feature matching and applying OCR(Optical Charater Recognition) partly to the results for filtering purpose. The proposed system operates as follows: given a document image, its structure is analyzed and is segmented into a set of word regions. Then, word shape features are extracted and stored. Given a user's query with text, features are extracted after its corresponding word image is generated. This reference model is compared with the stored features to find out similar words. The proposed system is implemented with IBM-PC in a web environment and its experiments are performed with English document images. Experimental results show the effectiveness of the proposed methods.

An effective detection method for hiding data in compound-document files (복합문서 파일에 은닉된 데이터 탐지 기법에 대한 연구)

  • Kim, EunKwang;Jeon, SangJun;Han, JaeHyeok;Lee, MinWook;Lee, Sangjin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.6
    • /
    • pp.1485-1494
    • /
    • 2015
  • Traditionally, data hiding has been done mainly in such a way that insert the data into the large-capacity multimedia files. However, the document files of the previous versions of Microsoft Office 2003 have been used as cover files as their structure are so similar to a File System that it is easy to hide data in them. If you open a compound-document file which has a secret message hidden in it with MS Office application, it is hard for users who don't know whether a secret message is hidden in the compound-document file to detect the secret message. This paper presents an analysis of Compound-File Binary Format features exploited in order to hide data and algorithms to detect the data hidden with these exploits. Studying methods used to hide data in unused area, unallocated area, reserved area and inserted streams led us to develop an algorithm to aid in the detection and examination of hidden data.

Semantic Clustering Model for Analytical Classification of Documents in Cloud Environment (클라우드 환경에서 문서의 유형 분류를 위한 시맨틱 클러스터링 모델)

  • Kim, Young Soo;Lee, Byoung Yup
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.389-397
    • /
    • 2017
  • Recently semantic web document is produced and added in repository in a cloud computing environment and requires an intelligent semantic agent for analytical classification of documents and information retrieval. The traditional methods of information retrieval uses keyword for query and delivers a document list returned by the search. Users carry a heavy workload for examination of contents because a former method of the information retrieval don't provide a lot of semantic similarity information. To solve these problems, we suggest a key word frequency and concept matching based semantic clustering model using hadoop and NoSQL to improve classification accuracy of the similarity. Implementation of our suggested technique in a cloud computing environment offers the ability to classify and discover similar document with improved accuracy of the classification. This suggested model is expected to be use in the semantic web retrieval system construction that can make it more flexible in retrieving proper document.

An Experimental Study on Feature Selection Using Wikipedia for Text Categorization (위키피디아를 이용한 분류자질 선정에 관한 연구)

  • Kim, Yong-Hwan;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.155-171
    • /
    • 2012
  • In text categorization, core terms of an input document are hardly selected as classification features if they do not occur in a training document set. Besides, synonymous terms with the same concept are usually treated as different features. This study aims to improve text categorization performance by integrating synonyms into a single feature and by replacing input terms not in the training document set with the most similar term occurring in training documents using Wikipedia. For the selection of classification features, experiments were performed in various settings composed of three different conditions: the use of category information of non-training terms, the part of Wikipedia used for measuring term-term similarity, and the type of similarity measures. The categorization performance of a kNN classifier was improved by 0.35~1.85% in $F_1$ value in all the experimental settings when non-learning terms were replaced by the learning term with the highest similarity above the threshold value. Although the improvement ratio is not as high as expected, several semantic as well as structural devices of Wikipedia could be used for selecting more effective classification features.

Gathering Common-word and Document Reclassification to improve Accuracy of Document Clustering (문서 군집화의 정확률 향상을 위한 범용어 수집과 문서 재분류 알고리즘)

  • Shin, Joon-Choul;Ock, Cheol-Young;Lee, Eung-Bong
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.53-62
    • /
    • 2012
  • Clustering technology is used to deal efficiently with many searched documents in information retrieval system. But the accuracy of the clustering is satisfied to the requirement of only some domains. This paper proposes two methods to increase accuracy of the clustering. We define a common-word, that is frequently used but has low weight during clustering. We propose the method that automatically gathers the common-word and calculates its weight from the searched documents. From the experiments, the clustering error rates using the common-word is reduced to 34% compared with clustering using a stop-word. After generating first clusters using average link clustering from the searched documents, we propose the algorithm that reevaluates the similarity between document and clusters and reclassifies the document into more similar clusters. From the experiments using Naver JiSikIn category, the accuracy of reclassified clusters is increased to 1.81% compared with first clusters without reclassification.

Design and Implementation of XML Document Transformation System based on Structured Differences Analysis (구조적 상이성 분석에 기반한 XML 문서 변환 시스템의 설계 및 구현)

  • Jo, Jeong-Gil;Jo, Yun-Gi;Gu, Yeon-Seol
    • The KIPS Transactions:PartD
    • /
    • v.9D no.2
    • /
    • pp.297-306
    • /
    • 2002
  • This paper handles the design and implementation of the system for transforming the XML document bated on XML Schema being different in syntax but similar in logic, with using structured differences analysis. In the system, the merge data is generated from the source and destination documents by utilizing data registry and structured differences analysis, and then XML document is generated from the generated merge data. The XML document transformation system is designed that transformation process to the present application system from the different application system gains advantage in the aspect of time, cost, and reliability. The implementation environment of the system is that it is run on IBM compatible PC and it is developed using the software of visual basic 6.0 with the Platform of Windows 2000.