Search | Korea Science

Document Classification Model Using Web Documents for Balancing Training Corpus Size per Category

Park, So-Young;Chang, Juno;Kihl, Taesuk
- Journal of information and communication convergence engineering
- /
- v.11 no.4
- /
- pp.268-273
- /
- 2013
In this paper, we propose a document classification model using Web documents as a part of the training corpus in order to resolve the imbalance of the training corpus size per category. For the purpose of retrieving the Web documents closely related to each category, the proposed document classification model calculates the matching score between word features and each category, and generates a Web search query by combining the higher-ranked word features and the category title. Then, the proposed document classification model sends each combined query to the open application programming interface of the Web search engine, and receives the snippet results retrieved from the Web search engine. Finally, the proposed document classification model adds these snippet results as Web documents to the training corpus. Experimental results show that the method that considers the balance of the training corpus size per category exhibits better performance in some categories with small training sets.
https://doi.org/10.6109/jicce.2013.11.4.268 인용 PDF KSCI

INTEGRATION OF SSM AND IDEF TECHNIQUES FOR ANALYZING DOCUMENT MANAGEMENT PROCESSES

Vachara Peansupap;Udtaporn Theingkuen
- International conference on construction engineering and project management
- /
- 2009.05a
- /
- pp.725-731
- /
- 2009
Construction documents are recognized as an essential component for making a decision and supporting on construction processes. In construction, the management of project document is a complex process due to different factors such as document types, stakeholder involvement, document flow, and document flow processes. Therefore, inappropriate management of project documents can cause several impacts on construction work processes such as delay or poor quality of work. Several information and communication technologies (ICT) were proposed to overcome problems concerning document management practice in construction projects. However, the adoption of ICT may have some limitation on the compatibility of specific document workflow. Lack of understanding on designing document system may cause many problems during the use and implementation phase. Thus, this paper proposes the framework that integrates Soft System Methodology (SSM) concept and Integrated Definition Modeling Technique (IDEF) for analyzing document management system in construction project. Research methodology is classified as the case study. Five main construction building projects are selected as case studies. The qualitative data related to problems and processes are collected by interviewing construction project participants such as main contractors, owners, consultants, and designers. The findings from case study show the benefits of using SSM and IDEF. The use of SSM can help identify the problems in managing construction document in rich picture view whereas IDEF can illustrate the document flow in construction project in details. In addition, the idea of integrating these two concepts can be used to identify the root causes of process problems at the information level. As the results, this idea can be applied to analyze and design web-based document management system in the future.
PDF

A Study on Building Structures and Processes for Intelligent Web Document Classification (지능적인 웹문서 분류를 위한 구조 및 프로세스 설계 연구)

Jang, Young-Cheol
- Journal of Digital Convergence
- /
- v.6 no.4
- /
- pp.177-183
- /
- 2008
This paper aims to offer a solution based on intelligent document classification to create a user-centric information retrieval system allowing user-centric linguistic expression. So, structures expressing user intention and fine document classifying process using EBL, similarity, knowledge base, user intention, are proposed. To overcome the problem requiring huge and exact semantic information, a hybrid process is designed integrating keyword, thesaurus, probability and user intention information. User intention tree hierarchy is build and a method of extracting group intention between key words and user intentions is proposed. These structures and processes are implemented in HDCI(Hybrid Document Classification with Intention) system. HDCI consists of analyzing user intention and classifying web documents stages. Classifying stage is composed of knowledge base process, similarity process and hybrid coordinating process. With the help of user intention related structures and hybrid coordinating process, HDCI can efficiently categorize web documents in according to user's complex linguistic expression with small priori information.
PDF

Document Retrieval using Concept Network (개념 네트워크를 이용한 정보 검색 방법)

Hur, Won-Chang;Lee, Sang-Jin
- Asia pacific journal of information systems
- /
- v.16 no.4
- /
- pp.203-215
- /
- 2006
The advent of KM(knowledge management) concept have led many organizations to seek an effective way to make use of their knowledge. But the absence of right tools for systematic handling of unstructured information makes it difficult to automatically retrieve and share relevant information that exactly meet user's needs. we propose a systematic method to enable content-based information retrieval from corpus of unstructured documents. In our method, a document is represented by using several key terms which are automatically selected based on their quantitative relevancy to the document. Basically, the relevancy is calculated by using a traditional TFIDF measure that are widely accepted in the related research, but to improve effectiveness of the measure, we exploited 'concept network' that represents term-term relationships. In particular, in constructing the concept network, we have also considered relative position of terms occurring in a document. A prototype system for experiment has been implemented. The experiment result shows that our approach can have higher performance over the conventional TFIDF method.
PDF KSCI

Machine Learning Based Automatic Categorization Model for Text Lines in Invoice Documents

Shin, Hyun-Kyung
- Journal of Korea Multimedia Society
- /
- v.13 no.12
- /
- pp.1786-1797
- /
- 2010
Automatic understanding of contents in document image is a very hard problem due to involvement with mathematically challenging problems originated mainly from the over-determined system induced by document segmentation process. In both academic and industrial areas, there have been incessant and various efforts to improve core parts of content retrieval technologies by the means of separating out segmentation related issues using semi-structured document, e.g., invoice,. In this paper we proposed classification models for text lines on invoice document in which text lines were clustered into the five categories in accordance with their contents: purchase order header, invoice header, summary header, surcharge header, purchase items. Our investigation was concentrated on the performance of machine learning based models in aspect of linear-discriminant-analysis (LDA) and non-LDA (logic based). In the group of LDA, na$\"{\i}$ve baysian, k-nearest neighbor, and SVM were used, in the group of non LDA, decision tree, random forest, and boost were used. We described the details of feature vector construction and the selection processes of the model and the parameter including training and validation. We also presented the experimental results of comparison on training/classification error levels for the models employed.
PDF KSCI

Rectification of Document Image on Smartphone Using MSER-b Binarization (MSER-b 이진화 기법을 이용한 스마트폰 문서 이미지 보정 기법)

Yu, Young-Jung;Moon, Sang-Ho;Park, Seong-Ho
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.19 no.1
- /
- pp.201-207
- /
- 2015
The smartphone with camera can easily generate an image instead of a scanner. However the document image through a smartphone can have distortions related rotation or perspective. In this paper, we proposed a method to generate the document image in that distortions are reduced from the captured document image through a smartphone. For this, the original document image through a smartphone is preprocessed using the MSER-b technique to reduce the light effect. Then, the text area contour is extracted using the characteristics of the document image. Lastly, rotation or perspective distortions are reduced using the extracted text area contour. For experiments, the proposed method is compared two other products. Through experiments, we show that the distortions within the captured document image through smartphone can be effectively reduced.
https://doi.org/10.6109/jkiice.2015.19.1.201 인용 PDF KSCI KPUBS HTML

전자원문제공서비스의 현황과 과제

이경호
- Journal of Korean Library and Information Science Society
- /
- v.29
- /
- pp.171-212
- /
- 1998
In this study, the concept, developments and the present situations of an electronic document delivery services, projects and systems are examined. Also the implications of an electronic document delivery services in the library and the future of the services are studied. Some conclusions and a few suggestions derived from the study are as follows : (1) An electronic document delivery services, one of the most innovative methods for delivering the needed materials to a researcher is now being incorporated into an important part of today's information industries. (2) The technological developments have made it possible to deliver nearly all the document formats electronically, and can make the shortest turnaround time to be 30minutes. The technology has also made it possible to develop user-friendly document delivery services by providing the various methods of requesting of, delivering of and charging for the materials. (3) Different types of institutions have made researches, tests, developments and implementation of an electronic document delivery techniques with different features. (4) The issues of copyrights and standards involved in an electronic document delivery still remain as the problems to be solved. (5) The increase and development of patron-initiated document delivery services have and will have some impacts on the library services with the possibility to pass by the librarians intermediation, but to deliver the materials directly to the end-users. (6) The library could take the outside electronic document delivery services as an opportunity. Accordingly, in order to incorporate this services in the interlibrary loan, collection development and other library services, the library should establish appropriate policies, guidelines and management strategies related to the operations. (7) In order to maximize the use of the electronic document delivery services, the library should provide an appropriate education for the librarian and users to have knowledge and skills on the changing techniques of the electronic document delivery and on the various features as well as changing mechanisms by each system and service.
PDF

Design of the Access Control System for MS-WORD Document System (MS-Word 문서 접근 제어시스템 설계)

Jang, Seung-Ju
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.22 no.10
- /
- pp.1405-1411
- /
- 2018
This paper designs access control system for MS-word(Microsoft-word) document system. The system designed in this paper uses the document-related information by analyzing the MS-word document structure. It is designed to block access to users who can not access the modified information by partially modifying MS-word document information. This makes it impossible to read documents other than those who have access to the MS-word document. This allows you to control access to the MS-word document. A user with access to the MS-word document will be able to retrieve the modified information back to the original information so that the document can be read normally. In this paper, we design and implement experiments. In the experiment, we performed document access if MS-word document information was modified. Experimental results show that the MS-word access control system operates normally.
https://doi.org/10.6109/jkiice.2018.22.10.1405 인용 PDF KSCI

Clustering of Web Document Exploiting with the Co-link in Hypertext (동시링크를 이용한 웹 문서 클러스터링 실험)

김영기;이원희;권혁철
- Journal of Korean Library and Information Science Society
- /
- v.34 no.2
- /
- pp.233-253
- /
- 2003
Knowledge organization is the way we humans understand the world. There are two types of information organization mechanisms studied in information retrieval: namely classification md clustering. Classification organizes entities by pigeonholing them into predefined categories, whereas clustering organizes information by grouping similar or related entities together. The system of the Internet information resources extracts a keyword from the words which appear in the web document and draws up a reverse file. Term clustering based on grouping related terms, however, did not prove overly successful and was mostly abandoned in cases of documents used different languages each other or door-way-pages composed of only an anchor text. This study examines infometric analysis and clustering possibility of web documents based on co-link topology of web pages.
PDF

Hierarchical Automatic Classification of News Articles based on Association Rules (연관규칙을 이용한 뉴스기사의 계층적 자동분류기법)

Joo, Kil-Hong;Shin, Eun-Young;Lee, Joo-Il;Lee, Won-Suk
- Journal of Korea Multimedia Society
- /
- v.14 no.6
- /
- pp.730-741
- /
- 2011
With the development of the internet and computer technology, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The conventional document categorization method used only the keywords of related documents for document classification. However, this paper proposed keyword extraction method of based on association rule. This method extracts a set of related keywords which are involved in document's category and classifies representative keyword by using the classification rule proposed in this paper. In addition, this paper proposed the preprocessing method for efficient keywords creation and predicted the new document's category. We can design the classifier and measure the performance throughout the experiment to increase the profile's classification performance. When predicting the category, substituting all the classification rules one by one is the major reason to decrease the process performance in a profile. Finally, this paper suggested automatically categorizing plan which can be applied to hierarchical category architecture, extended from simple category architecture.
https://doi.org/10.9717/kmms.2011.14.6.730 인용 PDF KSCI

Search Result 613, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)