• 제목/요약/키워드: engineering document

검색결과 1,248건 처리시간 0.033초

A Machine-Learning Based Approach for Extracting Logical Structure of a Styled Document

  • Kim, Tae-young;Kim, Suntae;Choi, Sangchul;Kim, Jeong-Ah;Choi, Jae-Young;Ko, Jong-Won;Lee, Jee-Huong;Cho, Youngwha
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권2호
    • /
    • pp.1043-1056
    • /
    • 2017
  • A styled document is a document that contains diverse decorating functions such as different font, colors, tables and images generally authored in a word processor (e.g., MS-WORD, Open Office). Compared to a plain-text document, a styled document enables a human to easily recognize a logical structure such as section, subsection and contents of a document. However, it is difficult for a computer to recognize the structure if a writer does not explicitly specify a type of an element by using the styling functions of a word processor. It is one of the obstacles to enhance document version management systems because they currently manage the document with a file as a unit, not the document elements as a management unit. This paper proposes a machine learning based approach to analyzing the logical structure of a styled document composing of sections, subsections and contents. We first suggest a feature vector for characterizing document elements from a styled document, composing of eight features such as font size, indentation and period, each of which is a frequently discovered item in a styled document. Then, we trained machine learning classifiers such as Random Forest and Support Vector Machine using the suggested feature vector. The trained classifiers are used to automatically identify logical structure of a styled document. Our experiment obtained 92.78% of precision and 94.02% of recall for analyzing the logical structure of 50 styled documents.

Keyword Analysis Based Document Compression System

  • Cao, Kerang;Lee, Jongwon;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • 제16권1호
    • /
    • pp.48-51
    • /
    • 2018
  • The traditional documents analysis was centered on words based system was implemented using a morpheme analyzer. These traditional systems can classify used words in the document but, cannot help to user's document understanding or analysis. In this problem solved, System needs extract for most valuable paragraphs what can help to user understanding documents. In this paper, we propose system extracts paragraphs of normalized XML document. User insert to system what filename when wants for analyze XML document. Then, system is search for keyword of the document. And system shows results searched keyword. When user choice and inserts keyword for user wants then, extracting for paragraph including keyword. After extracting paragraph, system operating maintenance paragraph sequence and check duplication. If exist duplication then, system deletes paragraph of duplication. And system informs result to user what counting each keyword frequency and weight to user, sorted paragraphs.

Combining Distributed Word Representation and Document Distance for Short Text Document Clustering

  • Kongwudhikunakorn, Supavit;Waiyamai, Kitsana
    • Journal of Information Processing Systems
    • /
    • 제16권2호
    • /
    • pp.277-300
    • /
    • 2020
  • This paper presents a method for clustering short text documents, such as news headlines, social media statuses, or instant messages. Due to the characteristics of these documents, which are usually short and sparse, an appropriate technique is required to discover hidden knowledge. The objective of this paper is to identify the combination of document representation, document distance, and document clustering that yields the best clustering quality. Document representations are expanded by external knowledge sources represented by a Distributed Representation. To cluster documents, a K-means partitioning-based clustering technique is applied, where the similarities of documents are measured by word mover's distance. To validate the effectiveness of the proposed method, experiments were conducted to compare the clustering quality against several leading methods. The proposed method produced clusters of documents that resulted in higher precision, recall, F1-score, and adjusted Rand index for both real-world and standard data sets. Furthermore, manual inspection of the clustering results was conducted to observe the efficacy of the proposed method. The topics of each document cluster are undoubtedly reflected by members in the cluster.

문서 구조정보를 이용한 SGML 문서 편집 시스템의 설계 및 구현 (The Design and Implementation of SGML Document Editing System Using Document Structure Information)

  • 김창수;조인준;정회경
    • 공학논문집
    • /
    • 제3권1호
    • /
    • pp.21-27
    • /
    • 1998
  • 본 논문에서는 SGML DTD(Document Type Definition)의 문서 구조정보를 이용하여 SGML 실례문서를 편집하기 위한 시스템을 설계 및 구현하였다. 이를 위해 문서의 논리구조 표현을 위한 구조 창을 이용하여 SGML 문서를 편집할 수 있어 SGML에 대해 모르는 사용자도 편집오류 없이 문서를 생성할 수 있고 엘리먼트(element)와 속성(attribute), 엔티티(entity)를 지원하는 도구를 이용하여 엘리먼트 등을 손쉽게 수정 가능하고, 생성된 문서를 SGML 파서(parser)를 이용하여 검증할 수 있도록 시스템을 설계하였다. 또한 본 시스템은KS 5601코드를 사용하여 한글과 영문 텍스트를 모두 지원한다. 본 논문에서 설계한 SGML 문서 편집 시스템은 윈도우 사용자 인터페이스를 위해 윈도우95 시스템 환경 하에서 구현하였다.

  • PDF

공인전자문서 소통을 위한 Document-HTML 문서 생성 기법의 설계 (Design of Document-HTML Generation Technique for Authorized Electronic Document Communication)

  • 황현천;김우제
    • 산업경영시스템학회지
    • /
    • 제44권1호
    • /
    • pp.51-59
    • /
    • 2021
  • Electronic document communication based on a digital channel is becoming increasingly important with the advent of the paperless age. The electronic document based on PDF format does not provide a powerful customer experience for a mobile device user despite replacing a paper document by providing the content integrity and the independence of various devices and software. On the other hand, the electronic document based on HTML5 format has weakness in the content integrity as there is no HTML5 specification for the content integrity despite its enhanced customer experience such as a responsive web technology for a mobile device user. In this paper, we design the Document-HTML, which provides the content integrity and the powerful customer experience by declaring the HTML5 constraint rules and the extended tags to contain the digital signature based on PKI. We analyze the existing electronic document that has been used in the major financial enterprise to develop a sample. We also verify the Document-HTML by experimenting with the sample of HTML electronic communication documents and analyze the PKI equation. The Document-HTML document can be used as an authorized electronic document communication and provide a powerful customer experience in the mobile environment between an enterprise and a user in the future.

Exploratory Study on BIM-based Information Breakdown Structure for Construction Document Management

  • Lee, Dong Gun;Cha, Hee Sung
    • Journal of Construction Engineering and Project Management
    • /
    • 제5권1호
    • /
    • pp.32-39
    • /
    • 2015
  • Construction industry is an aggregate of information that diverse information is integrated and controlled. To implement successful construction projects, it can be said that the information management is very important. In particular, because information of construction sites is controlled in a form of documents, importance of the document management in construction has been increased. But, by controlling information through documents, there are difficult problems in writing and classification of the documents and preservation and utilization of the information. Also, due to incompletion of the information management system, difficulty in systematic info management arises. For this reason, this study intends to suggest the document information breakdown structure for controlling document info efficiently which is generated at construction sites. For this, through the examination of preceding studies, establishment of the concept of the document info breakdown structure, the space breakdown structure, and the info breakdown structure, availability of document information is intended to heighten.

MSER을 이용한 문서 이미지 이진화 기법 (Document Image Binarization Technique using MSER)

  • 유영중
    • 한국정보통신학회논문지
    • /
    • 제18권8호
    • /
    • pp.1941-1947
    • /
    • 2014
  • 문서 이미지의 이진화는 문서 인식의 이전 단계에서 주로 사용되며, 이진화의 성공 여부에 따라 문서 인식의 결과에 영향을 미치는 중요한 단계로 볼 수 있다. 지금까지 문서 이미지를 이진화 하기 위한 다양한 기법들이 연구되었지만, 문서 이미지의 상태에 따라 그 결과는 다양하다. 본 논문에서는 객체 추출에 많이 이용되는 MSER(Maximally Stable Extremal Region)을 이용하여 문서 이미지를 이진화하는 기법을 제안한다. 먼저 문서 이미지에서 MSER 객체를 추출한다. 추출된 MSER 객체는 그 자체로 문서 이미지 이진화에 사용되기는 어렵기 때문에 사용하기 적합한 형태로 변경되는 과정을 거친다. 그리고 최종 MSER 객체와 문서 이미지로부터 추출한 대비 이진 이미지를 이용하여 최종 이진 이미지를 계산한다. 실험결과는 본 논문에서 제안한 방법이 문서 이미지의 이진화에 유용함을 보여준다.

INTEGRATION OF SSM AND IDEF TECHNIQUES FOR ANALYZING DOCUMENT MANAGEMENT PROCESSES

  • Vachara Peansupap;Udtaporn Theingkuen
    • 국제학술발표논문집
    • /
    • The 3th International Conference on Construction Engineering and Project Management
    • /
    • pp.725-731
    • /
    • 2009
  • Construction documents are recognized as an essential component for making a decision and supporting on construction processes. In construction, the management of project document is a complex process due to different factors such as document types, stakeholder involvement, document flow, and document flow processes. Therefore, inappropriate management of project documents can cause several impacts on construction work processes such as delay or poor quality of work. Several information and communication technologies (ICT) were proposed to overcome problems concerning document management practice in construction projects. However, the adoption of ICT may have some limitation on the compatibility of specific document workflow. Lack of understanding on designing document system may cause many problems during the use and implementation phase. Thus, this paper proposes the framework that integrates Soft System Methodology (SSM) concept and Integrated Definition Modeling Technique (IDEF) for analyzing document management system in construction project. Research methodology is classified as the case study. Five main construction building projects are selected as case studies. The qualitative data related to problems and processes are collected by interviewing construction project participants such as main contractors, owners, consultants, and designers. The findings from case study show the benefits of using SSM and IDEF. The use of SSM can help identify the problems in managing construction document in rich picture view whereas IDEF can illustrate the document flow in construction project in details. In addition, the idea of integrating these two concepts can be used to identify the root causes of process problems at the information level. As the results, this idea can be applied to analyze and design web-based document management system in the future.

  • PDF

분산환경에서의 JAVA,CORBA를 이용한 전자문서관리시스템 구현 (Electronic Document Management System based on JAVA,CORBA)

  • 김형선;한성배
    • 산업경영시스템학회지
    • /
    • 제21권48호
    • /
    • pp.193-199
    • /
    • 1998
  • Electronic document management system is tool, based on the document life cycle concept, for structured management of various documents within an organization. In this paper, we address a development process of electronic document management system based on pure JAVA and CORBA. We have developed a electronic document management system which can support a variety of platform in heterogeneous distributed environment. EDMS can serve as an integration platform for industries that require handling of massive document and data such as construction and engineering, automobile, shipbuilding industries. Using the developed system, users can access documents in the system through an internet brower, and also add or modify existing document.

  • PDF

기계 조립품 정보의 표현을 위한 XML기반 공용문서 구조 (Development of Common Document Structure based on XML for Representing Mechanical Part and Assembly Information)

  • 정태형;박승현;윤성원
    • 한국정밀공학회지
    • /
    • 제20권9호
    • /
    • pp.180-187
    • /
    • 2003
  • In engineering design environment it is hard to link design data and systems because the types of them are disparate. Therefore, the importance of metadata has increased. Some researches have been executed to develop metadata. But they cannot interact with other metadata and are difficult to extend. The purpose of this paper is to develop a common document structure which represents the general information of mechanical part assembly using XML, and to use it as base documents in order to integrate design data and systems. It is composed of part, assembly and user documents. Part document represents the information of a part independently to part type. Assembly document represents the location of constituent part documents. User document represents user's information. Common documents can be used as a broker between design data and systems, and it can improve the interpretability and reusability of document. We applied the developed common document structure to 2-stage spur gear drive.