• Title/Summary/Keyword: Document Model

Search Result 842, Processing Time 0.023 seconds

A Study on eDocument Management Using Professional Terminologies (전문용어기반 eDocument 관리 방안에 관한 연구)

  • 김명옥
    • The Journal of Society for e-Business Studies
    • /
    • v.7 no.2
    • /
    • pp.21-38
    • /
    • 2002
  • Document retrieval (DR) has been a serious issue for long in the field of Office Information Management. Nowadays, our daily work is becoming heavily dependent on the usage of information collected from the internet, and the DR methods on the Web has become an important issue which is studied more than any other topic by many researchers. The main purpose of this study is to develop a model to manage business documents by integrating three major methodologies used in the field of electronic library and information retrieval: Metadata, Thesaurus, and Index/Reversed Index. In addition, we have added a new concept of eDocument, which consists of metadata about unit documents and/or unit document themselves. eDocument is introduced as a way to utilize existing document sources. The core concepts and structures of the model were introduced, and the architecture of the eDocument management system has been proposed. Test (simulation) result of the model and the direction for the future studies were also mentioned.

  • PDF

Document Summarization Model Based on General Context in RNN

  • Kim, Heechan;Lee, Soowon
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1378-1391
    • /
    • 2019
  • In recent years, automatic document summarization has been widely studied in the field of natural language processing thanks to the remarkable developments made using deep learning models. To decode a word, existing models for abstractive summarization usually represent the context of a document using the weighted hidden states of each input word when they decode it. Because the weights change at each decoding step, these weights reflect only the local context of a document. Therefore, it is difficult to generate a summary that reflects the overall context of a document. To solve this problem, we introduce the notion of a general context and propose a model for summarization based on it. The general context reflects overall context of the document that is independent of each decoding step. Experimental results using the CNN/Daily Mail dataset show that the proposed model outperforms existing models.

Document Classification Model Using Web Documents for Balancing Training Corpus Size per Category

  • Park, So-Young;Chang, Juno;Kihl, Taesuk
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.4
    • /
    • pp.268-273
    • /
    • 2013
  • In this paper, we propose a document classification model using Web documents as a part of the training corpus in order to resolve the imbalance of the training corpus size per category. For the purpose of retrieving the Web documents closely related to each category, the proposed document classification model calculates the matching score between word features and each category, and generates a Web search query by combining the higher-ranked word features and the category title. Then, the proposed document classification model sends each combined query to the open application programming interface of the Web search engine, and receives the snippet results retrieved from the Web search engine. Finally, the proposed document classification model adds these snippet results as Web documents to the training corpus. Experimental results show that the method that considers the balance of the training corpus size per category exhibits better performance in some categories with small training sets.

The study on a plan for applying UNeDocs to Maritime Logistics to achieve its paperless logistics (Paperless 해운 물류를 위한 UNeDocs 적용 방안 연구)

  • Ahn, Kyeong Rim
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.5 no.2
    • /
    • pp.199-208
    • /
    • 2009
  • Mosts of export/import cargo has been moving using maritime transport means. Korea had been driven the system automation project using EDI document since the mid-1990s. However, this automation system comes upon about 40-50% against overall maritime business process, manual or paper document processing work is existing as ever. International e-business environment also has changing into electronic form document transaction from paper document-based transaction. International standardization organization, UN/CEFACT proposed UNeDocs for paperless jtransaction. UNeDocs is a specification to define XML data model as well as electronic form. With UNeDocs, it is not necessary to generate the duplexed data, and it can support user convenient and guarantee the flexibility. This paper defines the UNeDocs data model for EDI and Off-Line processing at the current maritime business. Then, it have to check XML syntax and structure for the defined data model through quality of document check system. Also, it explains the applying plan about the defined UNeDocs data model. It is possible to support paperless transaction as defining UNeDocs-based standard data model and converting into paper document, XML and EDI document using UNeDocs data model.

Query Space Exploration Model Using Genetic Algorithm

  • Lee, Jae-Hoon;Lee, Sung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.3 no.2
    • /
    • pp.222-226
    • /
    • 2003
  • Information retrieval must be able to search the most suitable document that user need from document set. If foretell document adaptedness by similarity degree about QL(Query Language) of document, documents that search person does not require are searched. In this paper, showed that can search the most suitable document on user's request searching document of the whole space using genetic algorithm and used knowledge-base operator to solve various model's problem.

Fine-Grained Mobile Application Clustering Model Using Retrofitted Document Embedding

  • Yoon, Yeo-Chan;Lee, Junwoo;Park, So-Young;Lee, Changki
    • ETRI Journal
    • /
    • v.39 no.4
    • /
    • pp.443-454
    • /
    • 2017
  • In this paper, we propose a fine-grained mobile application clustering model using retrofitted document embedding. To automatically determine the clusters and their numbers with no predefined categories, the proposed model initializes the clusters based on title keywords and then merges similar clusters. For improved clustering performance, the proposed model distinguishes between an accurate clustering step with titles and an expansive clustering step with descriptions. During the accurate clustering step, an automatically tagged set is constructed as a result. This set is utilized to learn a high-performance document vector. During the expansive clustering step, more applications are then classified using this document vector. Experimental results showed that the purity of the proposed model increased by 0.19, and the entropy decreased by 1.18, compared with the K-means algorithm. In addition, the mean average precision improved by more than 0.09 in a comparison with a support vector machine classifier.

A probabilistic information retrieval model by document ranking using term dependencies (용어간 종속성을 이용한 문서 순위 매기기에 의한 확률적 정보 검색)

  • You, Hyun-Jo;Lee, Jung-Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.5
    • /
    • pp.763-782
    • /
    • 2019
  • This paper proposes a probabilistic document ranking model incorporating term dependencies. Document ranking is a fundamental information retrieval task. The task is to sort documents in a collection according to the relevance to the user query (Qin et al., Information Retrieval Journal, 13, 346-374, 2010). A probabilistic model is a model for computing the conditional probability of the relevance of each document given query. Most of the widely used models assume the term independence because it is challenging to compute the joint probabilities of multiple terms. Words in natural language texts are obviously highly correlated. In this paper, we assume a multinomial distribution model to calculate the relevance probability of a document by considering the dependency structure of words, and propose an information retrieval model to rank a document by estimating the probability with the maximum entropy method. The results of the ranking simulation experiment in various multinomial situations show better retrieval results than a model that assumes the independence of words. The results of document ranking experiments using real-world datasets LETOR OHSUMED also show better retrieval results.

Document Clustering with Relational Graph Of Common Phrase and Suffix Tree Document Model (공통 Phrase의 관계 그래프와 Suffix Tree 문서 모델을 이용한 문서 군집화 기법)

  • Cho, Yoon-Ho;Lee, Sang-Keun
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.2
    • /
    • pp.142-151
    • /
    • 2009
  • Previous document clustering method, NSTC measures similarities between two document pairs using TF-IDF during web document clustering. In this paper, we propose new similarity measure using common phrase-based relational graph, not TF-IDF. This method suggests that weighting common phrases by relational graph presenting relationship among common phrases in document collection. And experimental results indicate that proposed method is more effective in clustering document collection than NSTC.

A Study on Electronic Document Delivery Service in Academic Libraries (대학도서관의 전자문헌제공 모형구축에 관한 연구)

  • Lee, Hwa-Yeon
    • Journal of Information Management
    • /
    • v.28 no.1
    • /
    • pp.34-61
    • /
    • 1997
  • This study suggests that the model of electronic document delivery service be carried out by academic libraries step by step, based on the analysis of the domestic and foreign status. The model consists of the user's request step, electronic document building step, electronic document transferring step. Academic libraries should perform the electronic document delivery service step by step that satisfies user's information need.

  • PDF

Electronic Document Automation System Model for Improving Productivity in maintenance work - in Inspection Process of Construction Equipment Maintenance - (정비작업의 생산성 향상을 위한 전자문서자동화시스템 모형 - 건설장비 정비작업을 중심으로 -)

  • Kong, Myung-Dal
    • Journal of the Korea Safety Management & Science
    • /
    • v.19 no.3
    • /
    • pp.49-58
    • /
    • 2017
  • This paper suggests a specific model that could efficiently improve the interaction and the interface between MES(Manufacturing Execution System) server and POP(Point of Production) terminal through electronic document server and electronic pen, bluetooth receiver and form paper in disassembly and process inspection works. The proposed model shows that the new method by electronic document automation system can more efficiently perform to reduce processing time for maintenance work, compared with the current approach by handwritten processing system. It is noted in case of the method by electronic document automation system that the effects of proposed model are as follows; (a) While the processing time per equipment for maintenance by the current method was 300 minutes, the processing time by the new method was 50 minutes. (b) While the processing error ratio by the current method was 20%, the error ratio by the new method was 1%.