• 제목/요약/키워드: Business Document

Search Result 489, Processing Time 0.035 seconds

EDMS의 Windows 탐색기 상에서의 구현방안 연구

  • Jang, Man-Cheol;Bang, Gyeong-Sik;Kim, Jong-Bae
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.05a
    • /
    • pp.744-746
    • /
    • 2015
  • EDMS(Enterprise Document Management System) is generally as an office business management solutions in the form of an application running in a Web browser environment, has been utilized as an Electronic Document Management System. This system, users access the browser has led to inconvenience not easy. For for ease of use this, how to implement a system that can be used to operate the EDMS on the screen of the Windows Explorer, in this study, are presented.

  • PDF

A Methodology for Automatic Multi-Categorization of Single-Categorized Documents (단일 카테고리 문서의 다중 카테고리 자동확장 방법론)

  • Hong, Jin-Sung;Kim, Namgyu;Lee, Sangwon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.77-92
    • /
    • 2014
  • Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we propose a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. First, we attempt to find the relationship between documents and topics by using the result of topic analysis for single-categorized documents. Second, we construct a correspondence table between topics and categories by investigating the relationship between them. Finally, we calculate the matching scores for each document to multiple categories. The results imply that a document can be classified into a certain category if and only if the matching score is higher than the predefined threshold. For example, we can classify a certain document into three categories that have larger matching scores than the predefined threshold. The main contribution of our study is that our methodology can improve the applicability of traditional multi-category classifiers by generating multi-categorized documents from single-categorized documents. Additionally, we propose a module for verifying the accuracy of the proposed methodology. For performance evaluation, we performed intensive experiments with news articles. News articles are clearly categorized based on the theme, whereas the use of vulgar language and slang is smaller than other usual text document. We collected news articles from July 2012 to June 2013. The articles exhibit large variations in terms of the number of types of categories. This is because readers have different levels of interest in each category. Additionally, the result is also attributed to the differences in the frequency of the events in each category. In order to minimize the distortion of the result from the number of articles in different categories, we extracted 3,000 articles equally from each of the eight categories. Therefore, the total number of articles used in our experiments was 24,000. The eight categories were "IT Science," "Economy," "Society," "Life and Culture," "World," "Sports," "Entertainment," and "Politics." By using the news articles that we collected, we calculated the document/category correspondence scores by utilizing topic/category and document/topics correspondence scores. The document/category correspondence score can be said to indicate the degree of correspondence of each document to a certain category. As a result, we could present two additional categories for each of the 23,089 documents. Precision, recall, and F-score were revealed to be 0.605, 0.629, and 0.617 respectively when only the top 1 predicted category was evaluated, whereas they were revealed to be 0.838, 0.290, and 0.431 when the top 1 - 3 predicted categories were considered. It was very interesting to find a large variation between the scores of the eight categories on precision, recall, and F-score.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Message Interoperability in e-Logistics System (e-Logistics시스템의 메시지 상호운용성)

  • Seo Sungbo;Lee Young Joon;Hwang Jaegak;Ryu Keun Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.5
    • /
    • pp.436-450
    • /
    • 2005
  • Existing B2B, B2C computer systems and applications that executed business trans-actions were the client- server based architecture which consists of heterogeneous hardware and software including personal computers and mainframes. Due to the active boom of electronic business, integration and compatibility of exchanged data, applications and hardwares have emerged as hot issue. This paper designs and implements a message transport system and a document transformation system in order to solve the interoperability problem of integrated logistics system in e-Business when doing electronic business. Message transport system integrated ebMS 2.0 which is standard business message exchange format of ebXML, the international standard electronic commerce framework, and JMS of J2EE enable to ensure reliable messaging. The document transformation system could convert non-standard XML documents into standard XML documents and provide the web services after integrating message system. Using suggested business scenario and various test data, our message oriented system preyed to be interoperable and stable. We participated ebXML messaging interoperability test organized by ebXML Asia Committee ITG in oder to evaluate and certify the suitability for message system.

Construction Business Automation System (건설사업 자동화 시스템)

  • Lee, Dong-Eun
    • Proceedings of the Korean Institute Of Construction Engineering and Management
    • /
    • 2007.11a
    • /
    • pp.95-102
    • /
    • 2007
  • This paper presents the core technology of Construction Business Process Automation to model and automate construction business processes. Business Process Reengineering (BPR) and Automation (BPA) have been recognized as one of the important aspects in construction business management. However, BPR requires a lot of efforts to identify, document, implement, execute, maintain, and keep track thousands of business processes to deliver a project. Moreover, existing BPA technologies used in existing Enterprise Resource Planning (ERP) systems do not lend themselves to effective scalability for construction business process management. Application of Workflow and Object Technologies would be quite effective in implementing a scalable enterprise application for construction business processes by addressing how: 1) Automated construction management tasks are developed as software components, 2) The process modeling is facilitated by dragging-and dropping task components in a network, 3) Raising business requests and instantiating corresponding process instances are delivered, and 4) Business process instances are executed by using workflow technology based on real-time simulation engine. This paper presents how the construction business process automation is achieved by using equipment reservation and cancellation processes simplified intentionally.

  • PDF

A study on business process modeling using MRF/RFID Technology (MRF/RFID 기술을 활용한 Business Process Modeling에 관한 연구)

  • Jung, H.C.;Jang, D.H.;Jung, C.D.
    • Proceedings of the Korea Society of Information Technology Applications Conference
    • /
    • 2006.04a
    • /
    • pp.155-161
    • /
    • 2006
  • These days, ubiquitous and RFID are often mentioned by mass media. This piper will introduce RFID-aided Baggage Tracking System that has been implemented under a government project, identify areas for improvement based on the up-to-date operation, and finally a research and development approach to address the issues. The document will also discuss mobile RFID R&D business that our research center is carrying out, and suggest technical requirements for successful RFID business from perspective of corporate research center. Last but not least it will talk about some opportunities that require cooperation among the govement, business and academy to ensure the relevant industry advances RFID business in Korea.

  • PDF

Design and Implementation of ebXML BD Authoring Tool(XDocBuilder) (ebXML 비즈니스 문서 저작도구(XDocBuilder) 설계 및 구현)

  • Park, Cheon-Shu;Kang, Sang-Seung;Han, Woo-Yong;Sohn, Joo-Chan
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.05b
    • /
    • pp.1293-1296
    • /
    • 2003
  • ebXML의 핵심 컴포넌트(Core Component)는 범 산업 분야 및 다양한 환경에서 재사용 가능하고 컨텍스트(context)에 영향을 받지 않는 일반적인 빌딩블록(building block)으로, 비즈니스 문서를 구성하기 위한 가장 기본 요소이다. 이러한 핵심 컴포넌트는 비즈니스 컨텍스트에 의해 BIE(Business Information Entity)를 이루게 되며 Syntax binding을 통해 XML Schema, DTD 등의 형태로 표현된다. 따라서, ebXML환경에서 사용되는 비즈니스 문서를 포함하여 다양한 종류의 XML Schema, DTD, XML 관련 문서를 쉽게 저작(생성, 검증, 편집)할 수 있는 도구가 필요하다. 본 논문에서는 이러한 요구사항에 적합한 ebXML CC 기반 BD(Business Document) 저작도구로 범용적으로 사용할 수 있는 Schema.DTD 기반 XML 문서 편집기, Schema 문서 편집기, DTD 문서 편집기로 구성된 XDocBuilder를 설계 및 구현하였다.

  • PDF

Design and Implementation of Web-based Electronic Bidding System using XML (웹 기반의 XML을 활용한 전자 입찰 시스템의 설계 및 구현)

  • 윤선희
    • The Journal of Information Systems
    • /
    • v.10 no.1
    • /
    • pp.127-146
    • /
    • 2001
  • The area of business applications in the internet are extended enormously in result of fast development of computing and communication technologies, increase of internet use, and use of intranet/extranet in enterprise information system. Widely spread the use of the internet, there are various applications for Business to Business (B to B) or Business to Customer(B to C) model that are based on the intranet or extranet. This paper designed and implemented the Web-based Electronic Bidding System for Business to Business (B to B) model. The technical issues of electronic bidding system in the internet are involved in the connection between web client and server, electronic data interchange for the contract document, and security solution during the bidding and contracting processes. The web-based electronic bidding system in this paper is implemented using Java applet and servlet as a connection interface for web client and server, XML/EDI-based documents for a bid and a contract, and bidding server and notary server for enhancing the security using PKI(Public Key Infrastructure)-based public key cryptography, digital signature and Certification Authority(CA).

  • PDF

Increasing Business Service Interoperation through the WSDL Extension

  • Lee, Jong-Ok;Jung, Min-Ho
    • Journal of Information Technology Applications and Management
    • /
    • v.15 no.3
    • /
    • pp.243-259
    • /
    • 2008
  • To support business services interoperation, the BSD (Business Service Document), which is an extension of the Web Service Description Language (WSDL) the web service specification of the World Wide Web Consortium (W3C) was designed. The WSDL presents comprehensive standards for the interoperability of software components and W3C delegates extensions of WSDL to the users for their own purposes and objectives. In this article, BSD Creator, which can generate well-formed and valid BSDs, was designed and implemented. Also, the BSD Operation Pilot System where service providers can publish BSD specification documents and service users can search for services, was implemented and presented. BSD Creator and the BSD Operation Pilot System, which are the outcomes of this study, were assessed for their quality and usefulness using ISO/IEC 9126. The outcomes of this paper will be the basis on which industry groups can construct a Business Services Interoperation System, and are expected to contribute to the revitalization of business service interoperation.

  • PDF