• Title/Summary/Keyword: document analysis

Search Result 1,196, Processing Time 0.025 seconds

Use Analysis and Evaluation of MEDLIS(MEDical Library Information System) Document Delivery Service (의학학술지종합정보시스템(MEDLIS)의 원문제공서비스 이용 분석과 평가)

  • Chang, Hye-Rhan;Kim, Jeong-A
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.46 no.3
    • /
    • pp.233-250
    • /
    • 2012
  • The purpose of this study is to assess the development, current states, and problems of MEDLIS document delivery service. With the analysis of MEDLIS transaction data from 2001 to 2011, we identified continuous usage decrease, unbalanced contribution by type of institution, high dependence on back issues, use differences among subfields of medicine, relatively low success rate, and various reasons for failure. Based on the results, recommendations for the maintenance of union catalog database, technical support for search capability enhancements, establishment of back issue archiving policy, user training and publicity, and membership expansion are suggested to promote the service.

Fax Sender Verification Technique Based on Pattern Analysis for Preventing Falsification of FAX Documents (팩스 문서 위·변조 방지를 위한 패턴 분석 기반의 팩스 송신처 검증 기법)

  • Kim, Youngho;Choi, Hwangkyu
    • Journal of Digital Contents Society
    • /
    • v.15 no.4
    • /
    • pp.547-558
    • /
    • 2014
  • Recently, in the course of business processes a variety of abuse cases of fax documents is common in general corporate, government, and financial institutions. To solve this problem, it is necessary for a technique to prevent falsification of fax documents. In this paper, we propose a new fax sender verification technique based on pattern analysis to prevent falsification of fax documents only using the received fax document. In the proposed technique, the fax sender is verified by analyzing the communication signal patterns between the fax sender and receiver and image pattern in the received fax document. In this paper, we conduct the experiments that apply our technique to real-world fax systems, and then tamper-proof effects were confirmed from the experimental results.

Active Rule Language for XML Document Management (XML 문서 관리를 위한 능동 규칙 언어)

  • Hwang, Jeong-Hee;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.10D no.1
    • /
    • pp.33-44
    • /
    • 2003
  • XML is the standard for storing and exchanging information in the Web. As the applications of XML become more widespread, the worts on rule-based technology are rapidly going on to support reactive functionality on the XML documents and the XML repositories. Active rules consist of event-condition-action, which automatically perform actions in response to status change of database. Therefore the feature of active rule satisfies the new needs in XML setting. In this paper, we propose not only a XML based active rule language to manage XML document automatically, but also an active rule analysis method to guarantee rule termination. Finally, we demonstrate some examples of active rule defined by the proposed rule language, and also verify the efficiency of our analysis method by comparing with another method.

Groupware Current Status Analysis Ⅰ (그룹웨어의 현황 분석 Ⅰ)

  • Kim, Sun-Uk;Gim, Bong-Jin
    • IE interfaces
    • /
    • v.10 no.3
    • /
    • pp.75-93
    • /
    • 1997
  • Unlike individual applications, it is extremely hard to obtain user requirements for group systems, since there exists very complicated dynamics in group. This may result in spreading a great amount of products with a broad range of contents. Thus, this study presents a comparative analysis of groupware products. As a result, these products have been categorized into three areas which include cooperation/document management systems, collaborative writing systems, and decision-making/meeting systems. While the systems reviewed here focus on the cooperation/document management systems, the other two areas will be dealt in details in part Ⅱ. The first area ends up with two large categories such as proprietary groupware products and intranet groupware products. However, it has been observed that there is a natural convergence between these two categories. Consequently, the comparative analysis has been performed in terms of functions provided on the two categories and a combined category. Each group of the functions has been divided into three parts which consist of basic functions, quasi-basic functions, and others. Such a decision has been made based on the frequency rate of the functions provided in the products. With a more strict rule, the basic functions comprise electronic mail, sanction, bulletin board, document management, scheduling, security, Web browser, and Internet connectivity. This study also provides a framework for integrated functional model of groupware systems. The basic functions are merged into the model. However, the model is so flexible that it can partially include the quasi-functions in addition to the basic functions. In the future, it is expected that a large number of products will stem from the modification of the functional model.

  • PDF

Firm Classification based on MBTI Organizational Character Type: Using Firm Review Big Data (MBTI 조직성격유형화에 따른 기업분류: 기업리뷰 빅데이터를 활용하여)

  • Lee, Hanjun;Shin, Dongwon;An, Byungdae
    • Asia-Pacific Journal of Business
    • /
    • v.12 no.3
    • /
    • pp.361-378
    • /
    • 2021
  • Purpose - The purpose of this study is to classify KOSPI listed companies according to their organizational character type based on MBTI. Design/methodology/approach - This study collected 109,989 reviews from an online firm review website, Jobplanet. Using these reviews and the descriptions about organizational character, we conducted document similarity analysis. Doc2Vec technique was hired for the analysis. Findings - First, there are more companies belonging to Extraversion(E), Intuition(N), Feeling(F), and Judging(J) than Introversion(I), Sensing(S), Thinking(T), and Perceiving(P) as organizational character types of MBTI. Second, more companies have EJ and EP as the behavior type and NT and NF as the decision-making type. Third, the top-3 organizational character type of which firms have among 16 types are ENTJ, ENFP, and ENFJ. Finally, companies belonging to the same industry group were found to have similar organizational character. Research implications or Originality - This study provides a noble way to measure organizational character type using firm review big data and document similarity analysis technique. The research results can be practically used for firms in their organizational diagnosis and organizational management, and are meaningful as a basic study for various future studies to empirically analyze the impact of organizational character.

Analyzing the Effect of Characteristics of Dictionary on the Accuracy of Document Classifiers (용어 사전의 특성이 문서 분류 정확도에 미치는 영향 연구)

  • Jung, Haegang;Kim, Namgyu
    • Management & Information Systems Review
    • /
    • v.37 no.4
    • /
    • pp.41-62
    • /
    • 2018
  • As the volume of unstructured data increases through various social media, Internet news articles, and blogs, the importance of text analysis and the studies are increasing. Since text analysis is mostly performed on a specific domain or topic, the importance of constructing and applying a domain-specific dictionary has been increased. The quality of dictionary has a direct impact on the results of the unstructured data analysis and it is much more important since it present a perspective of analysis. In the literature, most studies on text analysis has emphasized the importance of dictionaries to acquire clean and high quality results. However, unfortunately, a rigorous verification of the effects of dictionaries has not been studied, even if it is already known as the most essential factor of text analysis. In this paper, we generate three dictionaries in various ways from 39,800 news articles and analyze and verify the effect each dictionary on the accuracy of document classification by defining the concept of Intrinsic Rate. 1) A batch construction method which is building a dictionary based on the frequency of terms in the entire documents 2) A method of extracting the terms by category and integrating the terms 3) A method of extracting the features according to each category and integrating them. We compared accuracy of three artificial neural network-based document classifiers to evaluate the quality of dictionaries. As a result of the experiment, the accuracy tend to increase when the "Intrinsic Rate" is high and we found the possibility to improve accuracy of document classification by increasing the intrinsic rate of the dictionary.

Word Extraction from Table Regions in Document Images (문서 영상 내 테이블 영역에서의 단어 추출)

  • Jeong, Chang-Bu;Kim, Soo-Hyung
    • The KIPS Transactions:PartB
    • /
    • v.12B no.4 s.100
    • /
    • pp.369-378
    • /
    • 2005
  • Document image is segmented and classified into text, picture, or table by a document layout analysis, and the words in table regions are significant for keyword spotting because they are more meaningful than the words in other regions. This paper proposes a method to extract words from table regions in document images. As word extraction from table regions is practically regarded extracting words from cell regions composing the table, it is necessary to extract the cell correctly. In the cell extraction module, table frame is extracted first by analyzing connected components, and then the intersection points are extracted from the table frame. We modify the false intersections using the correlation between the neighboring intersections, and extract the cells using the information of intersections. Text regions in the individual cells are located by using the connected components information that was obtained during the cell extraction module, and they are segmented into text lines by using projection profiles. Finally we divide the segmented lines into words using gap clustering and special symbol detection. The experiment performed on In table images that are extracted from Korean documents, and shows $99.16\%$ accuracy of word extraction.

A Study on Extracting the Document Text for Unallocated Areas of Data Fragments (비할당 영역 데이터 파편의 문서 텍스트 추출 방안에 관한 연구)

  • Yoo, Byeong-Yeong;Park, Jung-Heum;Bang, Je-Wan;Lee, Sang-Jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.20 no.6
    • /
    • pp.43-51
    • /
    • 2010
  • It is meaningful to investigate data in unallocated space because we can investigate the deleted data. Consecutively complete file recovery using the File Carving is possible in unallocated area, but noncontiguous or incomplete data recovery is impossible. Typically, the analysis of the data fragments are needed because they should contain large amounts of information. Microsoft Word, Excel, PowerPoint and PDF document file's text are stored using compression or specific document format. If the part of aforementioned document file was stored in unallocated data fragment, text extraction is possible using specific document format. In this paper, we suggest the method of extracting a particular document file text in unallocated data fragment.

Usability Analysis of Structured Abstracts in Journal Articles for Document Clustering (문서 클러스터링을 위한 학술지 논문의 구조적 초록 활용성 연구)

  • Choi, Sang-Hee;Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.1
    • /
    • pp.331-349
    • /
    • 2012
  • Structured abstracts have been regarded as an essential information factor to represent topics of journal articles. This study aims to provide an unconventional view to utilize structured abstracts with the analysis on sub fields of a structured abstract in depth. In this study, a structured abstract was segmented into four fields, namely, purpose, design, findings, and values/implications. Each field was compared in the performance analysis of document clustering. In result, the purpose statement of an abstract affected on the performance of journal article clustering more than any other fields. Furthermore, certain types of keywords were identified to be excluded in the document clustering to improve clustering performance, especially by Within group average clustering method. These keywords had stronger relationship to a specific abstract field such as research design than the topic of an article.

Analysis of Massive Scholarly Keywords using Inverted-Index based Bottom-up Clustering (역인덱스 기반 상향식 군집화 기법을 이용한 대규모 학술 핵심어 분석)

  • Oh, Heung-Seon;Jung, Yuchul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.11
    • /
    • pp.758-764
    • /
    • 2018
  • Digital documents such as patents, scholarly papers and research reports have author keywords which summarize the topics of documents. Different documents are likely to describe the same topic if they share the same keywords. Document clustering aims at clustering documents to similar topics with an unsupervised learning method. However, it is difficult to apply to a large amount of documents event though the document clustering is utilized to in various data analysis due to computational complexity. In this case, we can cluster and connect massive documents using keywords efficiently. Existing bottom-up hierarchical clustering requires huge computation and time complexity for clustering a large number of keywords. This paper proposes an inverted index based bottom-up clustering for keywords and analyzes the results of clustering with massive keywords extracted from scholarly papers and research reports.