• Title/Summary/Keyword: engineering document

Search Result 1,248, Processing Time 0.038 seconds

A Machine-Learning Based Approach for Extracting Logical Structure of a Styled Document

  • Kim, Tae-young;Kim, Suntae;Choi, Sangchul;Kim, Jeong-Ah;Choi, Jae-Young;Ko, Jong-Won;Lee, Jee-Huong;Cho, Youngwha
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.2
    • /
    • pp.1043-1056
    • /
    • 2017
  • A styled document is a document that contains diverse decorating functions such as different font, colors, tables and images generally authored in a word processor (e.g., MS-WORD, Open Office). Compared to a plain-text document, a styled document enables a human to easily recognize a logical structure such as section, subsection and contents of a document. However, it is difficult for a computer to recognize the structure if a writer does not explicitly specify a type of an element by using the styling functions of a word processor. It is one of the obstacles to enhance document version management systems because they currently manage the document with a file as a unit, not the document elements as a management unit. This paper proposes a machine learning based approach to analyzing the logical structure of a styled document composing of sections, subsections and contents. We first suggest a feature vector for characterizing document elements from a styled document, composing of eight features such as font size, indentation and period, each of which is a frequently discovered item in a styled document. Then, we trained machine learning classifiers such as Random Forest and Support Vector Machine using the suggested feature vector. The trained classifiers are used to automatically identify logical structure of a styled document. Our experiment obtained 92.78% of precision and 94.02% of recall for analyzing the logical structure of 50 styled documents.

Keyword Analysis Based Document Compression System

  • Cao, Kerang;Lee, Jongwon;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.1
    • /
    • pp.48-51
    • /
    • 2018
  • The traditional documents analysis was centered on words based system was implemented using a morpheme analyzer. These traditional systems can classify used words in the document but, cannot help to user's document understanding or analysis. In this problem solved, System needs extract for most valuable paragraphs what can help to user understanding documents. In this paper, we propose system extracts paragraphs of normalized XML document. User insert to system what filename when wants for analyze XML document. Then, system is search for keyword of the document. And system shows results searched keyword. When user choice and inserts keyword for user wants then, extracting for paragraph including keyword. After extracting paragraph, system operating maintenance paragraph sequence and check duplication. If exist duplication then, system deletes paragraph of duplication. And system informs result to user what counting each keyword frequency and weight to user, sorted paragraphs.

Combining Distributed Word Representation and Document Distance for Short Text Document Clustering

  • Kongwudhikunakorn, Supavit;Waiyamai, Kitsana
    • Journal of Information Processing Systems
    • /
    • v.16 no.2
    • /
    • pp.277-300
    • /
    • 2020
  • This paper presents a method for clustering short text documents, such as news headlines, social media statuses, or instant messages. Due to the characteristics of these documents, which are usually short and sparse, an appropriate technique is required to discover hidden knowledge. The objective of this paper is to identify the combination of document representation, document distance, and document clustering that yields the best clustering quality. Document representations are expanded by external knowledge sources represented by a Distributed Representation. To cluster documents, a K-means partitioning-based clustering technique is applied, where the similarities of documents are measured by word mover's distance. To validate the effectiveness of the proposed method, experiments were conducted to compare the clustering quality against several leading methods. The proposed method produced clusters of documents that resulted in higher precision, recall, F1-score, and adjusted Rand index for both real-world and standard data sets. Furthermore, manual inspection of the clustering results was conducted to observe the efficacy of the proposed method. The topics of each document cluster are undoubtedly reflected by members in the cluster.

The Design and Implementation of SGML Document Editing System Using Document Structure Information (문서 구조정보를 이용한 SGML 문서 편집 시스템의 설계 및 구현)

  • Kim, Chang-Su;Jo, In-June;Jung, Hoe-Kyung
    • The Journal of Engineering Research
    • /
    • v.3 no.1
    • /
    • pp.21-27
    • /
    • 1998
  • This paper describes the design and implementation of system for editing SGML document instance using document structure information of SGML DTD. For make use of structure window for logical structure expression of document to SGML document editing without editing mistake of user and easy update the using support to editing process of elements, attributes, entities tools and product document, and valid using SGML parser. Also, in order to support Korean and English text using KS 5601. In this paper, the proposed SGML document editing system is used common controls support of window 95 for window user interface

  • PDF

Design of Document-HTML Generation Technique for Authorized Electronic Document Communication (공인전자문서 소통을 위한 Document-HTML 문서 생성 기법의 설계)

  • Hwang, Hyun-Cheon;Kim, Woo-Je
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.1
    • /
    • pp.51-59
    • /
    • 2021
  • Electronic document communication based on a digital channel is becoming increasingly important with the advent of the paperless age. The electronic document based on PDF format does not provide a powerful customer experience for a mobile device user despite replacing a paper document by providing the content integrity and the independence of various devices and software. On the other hand, the electronic document based on HTML5 format has weakness in the content integrity as there is no HTML5 specification for the content integrity despite its enhanced customer experience such as a responsive web technology for a mobile device user. In this paper, we design the Document-HTML, which provides the content integrity and the powerful customer experience by declaring the HTML5 constraint rules and the extended tags to contain the digital signature based on PKI. We analyze the existing electronic document that has been used in the major financial enterprise to develop a sample. We also verify the Document-HTML by experimenting with the sample of HTML electronic communication documents and analyze the PKI equation. The Document-HTML document can be used as an authorized electronic document communication and provide a powerful customer experience in the mobile environment between an enterprise and a user in the future.

Exploratory Study on BIM-based Information Breakdown Structure for Construction Document Management

  • Lee, Dong Gun;Cha, Hee Sung
    • Journal of Construction Engineering and Project Management
    • /
    • v.5 no.1
    • /
    • pp.32-39
    • /
    • 2015
  • Construction industry is an aggregate of information that diverse information is integrated and controlled. To implement successful construction projects, it can be said that the information management is very important. In particular, because information of construction sites is controlled in a form of documents, importance of the document management in construction has been increased. But, by controlling information through documents, there are difficult problems in writing and classification of the documents and preservation and utilization of the information. Also, due to incompletion of the information management system, difficulty in systematic info management arises. For this reason, this study intends to suggest the document information breakdown structure for controlling document info efficiently which is generated at construction sites. For this, through the examination of preceding studies, establishment of the concept of the document info breakdown structure, the space breakdown structure, and the info breakdown structure, availability of document information is intended to heighten.

Document Image Binarization Technique using MSER (MSER을 이용한 문서 이미지 이진화 기법)

  • Yu, Young-Jung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.8
    • /
    • pp.1941-1947
    • /
    • 2014
  • Document image binarization is largely used as previous stage of document recognition. And the result of document recognition is much affected from the result of document image binarization. There were many studies to binarize document images. The results of previous studies for document image binarization is varied according to the state of document images. In this paper, we propose a technique for document image binarization using MSER that is applied to extract objects from an image. At first, raw MSER objects are extracted from a document image. Because the raw MSER objects cannot be used for document image binarization, the extracted raw MSER objects are modified. Then the final MSER objects are used for document image binarization with the contrast image that is extracted from the document image. Experimental results show that the proposed technique is useful for document image binarization.

INTEGRATION OF SSM AND IDEF TECHNIQUES FOR ANALYZING DOCUMENT MANAGEMENT PROCESSES

  • Vachara Peansupap;Udtaporn Theingkuen
    • International conference on construction engineering and project management
    • /
    • 2009.05a
    • /
    • pp.725-731
    • /
    • 2009
  • Construction documents are recognized as an essential component for making a decision and supporting on construction processes. In construction, the management of project document is a complex process due to different factors such as document types, stakeholder involvement, document flow, and document flow processes. Therefore, inappropriate management of project documents can cause several impacts on construction work processes such as delay or poor quality of work. Several information and communication technologies (ICT) were proposed to overcome problems concerning document management practice in construction projects. However, the adoption of ICT may have some limitation on the compatibility of specific document workflow. Lack of understanding on designing document system may cause many problems during the use and implementation phase. Thus, this paper proposes the framework that integrates Soft System Methodology (SSM) concept and Integrated Definition Modeling Technique (IDEF) for analyzing document management system in construction project. Research methodology is classified as the case study. Five main construction building projects are selected as case studies. The qualitative data related to problems and processes are collected by interviewing construction project participants such as main contractors, owners, consultants, and designers. The findings from case study show the benefits of using SSM and IDEF. The use of SSM can help identify the problems in managing construction document in rich picture view whereas IDEF can illustrate the document flow in construction project in details. In addition, the idea of integrating these two concepts can be used to identify the root causes of process problems at the information level. As the results, this idea can be applied to analyze and design web-based document management system in the future.

  • PDF

Electronic Document Management System based on JAVA,CORBA (분산환경에서의 JAVA,CORBA를 이용한 전자문서관리시스템 구현)

  • 김형선;한성배
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.21 no.48
    • /
    • pp.193-199
    • /
    • 1998
  • Electronic document management system is tool, based on the document life cycle concept, for structured management of various documents within an organization. In this paper, we address a development process of electronic document management system based on pure JAVA and CORBA. We have developed a electronic document management system which can support a variety of platform in heterogeneous distributed environment. EDMS can serve as an integration platform for industries that require handling of massive document and data such as construction and engineering, automobile, shipbuilding industries. Using the developed system, users can access documents in the system through an internet brower, and also add or modify existing document.

  • PDF

Development of Common Document Structure based on XML for Representing Mechanical Part and Assembly Information (기계 조립품 정보의 표현을 위한 XML기반 공용문서 구조)

  • 정태형;박승현;윤성원
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.20 no.9
    • /
    • pp.180-187
    • /
    • 2003
  • In engineering design environment it is hard to link design data and systems because the types of them are disparate. Therefore, the importance of metadata has increased. Some researches have been executed to develop metadata. But they cannot interact with other metadata and are difficult to extend. The purpose of this paper is to develop a common document structure which represents the general information of mechanical part assembly using XML, and to use it as base documents in order to integrate design data and systems. It is composed of part, assembly and user documents. Part document represents the information of a part independently to part type. Assembly document represents the location of constituent part documents. User document represents user's information. Common documents can be used as a broker between design data and systems, and it can improve the interpretability and reusability of document. We applied the developed common document structure to 2-stage spur gear drive.