• Title/Summary/Keyword: document analysis

Search Result 1,192, Processing Time 0.031 seconds

Design of E-Document Management System Using Dynamic Group Key based on OOXML (OOXML기반의 동적 그룹키를 이용한 전자문서 관리 시스템의 설계)

  • Lee, Young-Gu;Kim, Hyun-Chul;Jung, Taik-Yeong;Jun, Moon-Seog
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.12B
    • /
    • pp.1407-1417
    • /
    • 2009
  • We propose a e-document management system that can provide segmented page information on a document according to different levels of authority from access control environment. The proposed system creates hierarchy identifier using a one-way hash chain and therefore does not need to own key information for all users as in existing system. Also by creating group keys by compounding hash chain hierarchy identifier with randomly formed group identifier, the system can flexibly respond to dynamic changes from group member movements while at the same time resolving the problems of key formation and management in document encoding technique using symmetric key for each page. Lastly as a result of comparative analysis through an experiment with existing e-document management systems, the proposed system showed superiority in the efficiency of encoding and decoding document and the speed of encoding and decoding by the pages.

Study on History Tracking Technique of the Document File through RSID Analysis in MS Word (MS 워드의 RSID 분석을 통한 문서파일 이력 추적 기법 연구)

  • Joun, Jihun;Han, Jaehyeok;Jung, Doowon;Lee, Sangjin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.6
    • /
    • pp.1439-1448
    • /
    • 2018
  • Many electronic document files, including Microsoft Office Word (MS Word), have become a major issue in various legal disputes such as privacy, contract forgery, and trade secret leakage. The internal metadata of OOXML (Office Open XML) format, which is used since MS Word 2007, stores the unique Revision Identifier (RSID). The RSID is a distinct value assigned to a corresponding word, sentence, or paragraph that has been created/modified/deleted after a document is saved. Also, document history, such as addition/correction/deletion of contents or the order of creation, can be tracked using the RSID. In this paper, we propose a methodology to investigate discrimination between the original document and copy as well as possible document file leakage by utilizing the changes of the RSID according to the user's behavior.

Records Management Business Analysis of Certified Electronic Document Center as the 'External Electronic Records Storage Facilities' ('외부전자기록물저장시설'로서 공인전자문서센터의 업무 분석)

  • Lee, Kyungnam
    • The Korean Journal of Archival Studies
    • /
    • no.47
    • /
    • pp.227-254
    • /
    • 2016
  • The purpose of this study is to review whether Certified Electronic Document Center can perform electronic records preservation work as 'external electronic records storage facilities'. It began with addressing the current state of Certified Electronic Document Center. And then reviewed the key concepts of archival science in archival science literature and the meaning of electronic records preservation. The results of the review pointed out that these concepts are confused depending on each designated community's interests while the revision of the Archives Law. And finally we inspected whether Certified Electronic Document Center has an ability to perform the electronic records preservation work. For this, we compared and analyzed Archives Law, national archives' standards and regulations related to Certified Electronic Document Center. As a result, we may confirm that the current Certified Electronic Document Center did not have the capability for records management.

A Study on the Influencing Factors of Continuous Usage Intention of Electronic Official Document 24 System (문서24시스템의 지속사용의도에 영향을 미치는 영향에 관한 실증적 연구)

  • Lee, Hong-Jae;Kim, San-Hae;Han, Kyeong-Seok;Han, Sang-Ung
    • Journal of Digital Contents Society
    • /
    • v.19 no.6
    • /
    • pp.1081-1090
    • /
    • 2018
  • The study has been derived through the empirical analysis so as to find the Continuous Usage of Electronic Official Document 24 System. Total of 236 questionnaires were analysis to users who used or experienced the Electronic Official Document 24 System. As a result of the analysis First Accuracy, Convenience, and Security have a positive(+) effect on Ease of Use. However, Compatibility and Innovation were not effect on Ease of Use Second, Accuracy, Convenience, Security, and Innovation have a positive(+) effect on Perceived Usefulness. However, Compatibility was not effect on Perceived Usefulness. Third, Behavioral Costs have a positive(+) effect Continuous Intention and perceived ease of use positively affects perceived usefulness and Continuous Intention. Finally, perceived usefulness also has a positive effect on the Continuous Intention.

Design and Implementation of XML Document Transformation System based on Structured Differences Analysis (구조적 상이성 분석에 기반한 XML 문서 변환 시스템의 설계 및 구현)

  • Jo, Jeong-Gil;Jo, Yun-Gi;Gu, Yeon-Seol
    • The KIPS Transactions:PartD
    • /
    • v.9D no.2
    • /
    • pp.297-306
    • /
    • 2002
  • This paper handles the design and implementation of the system for transforming the XML document bated on XML Schema being different in syntax but similar in logic, with using structured differences analysis. In the system, the merge data is generated from the source and destination documents by utilizing data registry and structured differences analysis, and then XML document is generated from the generated merge data. The XML document transformation system is designed that transformation process to the present application system from the different application system gains advantage in the aspect of time, cost, and reliability. The implementation environment of the system is that it is run on IBM compatible PC and it is developed using the software of visual basic 6.0 with the Platform of Windows 2000.

Analysis on Sequence of Ball-pen and Pencil by using Digital Infrared Photography -with Emphasis on the Documents Authentication- (적외선 사진술을 이용한 볼펜과 연필의 선후 관계 분석 -문서감정을 중심으로-)

  • Kim, Yoo-Jin;Youn, Sung-Bin;Har, Dong-Hwan
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.5
    • /
    • pp.481-488
    • /
    • 2011
  • Generally speaking, a document is a mutual promise between two parties and functions as a legally-binding trust for a transaction. A document should be produced on a mutual agreement basis, and its credibility shall be attained if the transparency of a document production is ensured. Therefore, sequence analysis of the procedures in a document production is very important for appraisal of a document. The purpose of this research is to distinguish sequence association between the erased carbon ingredients of a pencil and the ingredients left in a ball-point pen and thus suggest a method that determines whether mutual agreement was applied or not in signing an insurance policy. This method analyzes if the carbon ingredients of a pencil are left in the bottom section of a ball-point pen through infrared photography. If the carbon ingredients of a pencil are left in the bottom section of a pen, the pen shall absorb infrared rays and mark a dense concentration. This method applies a relatively simple infrared photography system and therefore shall be beneficial to a personal appraisal store.

Web Document Classification Based on Hangeul Morpheme and Keyword Analyses (한글 형태소 및 키워드 분석에 기반한 웹 문서 분류)

  • Park, Dan-Ho;Choi, Won-Sik;Kim, Hong-Jo;Lee, Seok-Lyong
    • The KIPS Transactions:PartD
    • /
    • v.19D no.4
    • /
    • pp.263-270
    • /
    • 2012
  • With the current development of high speed Internet and massive database technology, the amount of web documents increases rapidly, and thus, classifying those documents automatically is getting important. In this study, we propose an effective method to extract document features based on Hangeul morpheme and keyword analyses, and to classify non-structured documents automatically by predicting subjects of those documents. To extract document features, first, we select terms using a morpheme analyzer, form the keyword set based on term frequency and subject-discriminating power, and perform the scoring for each keyword using the discriminating power. Then, we generate the classification model by utilizing the commercial software that implements the decision tree, neural network, and SVM(support vector machine). Experimental results show that the proposed feature extraction method has achieved considerable performance, i.e., average precision 0.90 and recall 0.84 in case of the decision tree, in classifying the web documents by subjects.

The Geometric Layout Analysis of the Document Image Using Connected Components Method and Median Filter (연결요소 방법과 메디안 필터를 이용한 문서영상 기하학적 구조분석)

  • Jang, Dae-Geun;Hwang, Chan-Sik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.8A
    • /
    • pp.805-813
    • /
    • 2002
  • Document image should be classified into detailed regions as text, picture, table and etc through the geometric layout analysis if paper documents can be converted automatically into electronic documents. However, complexity of the document layout and variety of the size and density of a picture are the reason to make it difficult to analyze the geometric layout of the document images. In this paper, we propose the method which have a better performance of the region segmentation and classifications, and the line extraction in the table region than the commercial softwares and previous methods. The proposed method can segment the document into detailed regions by using connected components method even if its layout is complex. This method also classifies texts and pictures by using separable median filter even. Though their size and density are diverse, In addition, this method extracts the lines from the table adapting one dimensional median filter to the each horizontal and vertical direction, even though lines are deformed or texts attached to them.

Step-by-step Approach for Effective Korean Unknown Word Recognition (한국어 미등록어 인식을 위한 단계별 접근방법)

  • Park, So-Young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.369-372
    • /
    • 2009
  • Recently, newspapers as well as web documents include many newly coined words such as "mid"(meaning "American drama" since "mi" means "America" in Korean and "d" refers to the "d" of drama) and "anseup"(meaning "pathetic" since "an" and "seup" literally mean eyeballs and moist respectively). However, these words cause a Korean analyzing system's performance to decrease. In order to recognize these unknown word automatically, this paper propose a step-by-step approach consisting of an unknown noun recognition phase based on full text analysis, an unknown verb recognition phase based on web document frequency, and an unknown noun recognition phase based on web document frequency. The proposed approach includes the phase based on full text analysis to recognize accurately the unknown words occurred once and again in a document. Also, the proposed approach includes two phases based on web document frequency to recognize broadly the unknown words occurred once in the document. Besides, the proposed model divides between an unknown noun recognition phase and an unknown verb recognition phase to recognize various unknown words. Experimental results shows that the proposed approach improves precision 1.01% and recall 8.50% as compared with a previous approach.

  • PDF

Feature-selection algorithm based on genetic algorithms using unstructured data for attack mail identification (공격 메일 식별을 위한 비정형 데이터를 사용한 유전자 알고리즘 기반의 특징선택 알고리즘)

  • Hong, Sung-Sam;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.1-10
    • /
    • 2019
  • Since big-data text mining extracts many features and data, clustering and classification can result in high computational complexity and low reliability of the analysis results. In particular, a term document matrix obtained through text mining represents term-document features, but produces a sparse matrix. We designed an advanced genetic algorithm (GA) to extract features in text mining for detection model. Term frequency inverse document frequency (TF-IDF) is used to reflect the document-term relationships in feature extraction. Through a repetitive process, a predetermined number of features are selected. And, we used the sparsity score to improve the performance of detection model. If a spam mail data set has the high sparsity, detection model have low performance and is difficult to search the optimization detection model. In addition, we find a low sparsity model that have also high TF-IDF score by using s(F) where the numerator in fitness function. We also verified its performance by applying the proposed algorithm to text classification. As a result, we have found that our algorithm shows higher performance (speed and accuracy) in attack mail classification.