• Title/Summary/Keyword: Documents Generation

Search Result 155, Processing Time 0.028 seconds

AutoCor: A Query Based Automatic Acquisition of Corpora of Closely-related Languages

  • Dimalen, Davis Muhajereen D.;Roxas, Rachel Edita O.
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.146-154
    • /
    • 2007
  • AutoCor is a method for the automatic acquisition and classification of corpora of documents in closely-related languages. It is an extension and enhancement of CorpusBuilder, a system that automatically builds specific minority language corpora from a closed corpus, since some Tagalog documents retrieved by CorpusBuilder are actually documents in other closely-related Philippine languages. AutoCor used the query generation method odds ratio, and introduced the concept of common word pruning to differentiate between documents of closely-related Philippine languages and Tagalog. The performance of the system using with and without pruning are compared, and common word pruning was found to improve the precision of the system.

  • PDF

Design and Implementation of BADA-IV/XML Query Processor Supporting Efficient Structure Querying (효율적 구조 질의를 지원하는 바다-IV/XML 질의처리기의 설계 및 구현)

  • 이명철;김상균;손덕주;김명준;이규철
    • The Journal of Information Technology and Database
    • /
    • v.7 no.2
    • /
    • pp.17-32
    • /
    • 2000
  • As XML emerging as the Internet electronic document language standard of the next generation, the number of XML documents which contain vast amount of Information is increasing substantially through the transformation of existing documents to XML documents or the appearance of new XML documents. Consequently, XML document retrieval system becomes extremely essential for searching through a large quantity of XML documents that are storied in and managed by DBMS. In this paper we describe the design and implementation of BADA-IV/XML query processor that supports content-based, structure-based and attribute-based retrieval. We design XML query language based upon XQL (XML Query Language) of W3C and tightly-coupled with OQL (a query language for object-oriented database). XML document is stored and maintained in BADA-IV, which is an object-oriented database management system developed by ETRI (Electronics and Telecommunications Research Institute) The storage data model is based on DOM (Document Object Model), therefore the retrieval of XML documents is executed basically using DOM tree traversal. We improve the search performance using Node ID which represents node's hierarchy information in an XML document. Assuming that DOW tree is a complete k-ary tree, we show that Node ID technique is superior to DOM tree traversal from the viewpoint of node fetch counts.

  • PDF

RISK FRAMEWORK FOR NEXT GENERATION NUCLEAR POWER PLANT CONSTRUCTION

  • John Walewski ;Stuart Anderson;Jaeheum Yeon;Amy Kim
    • International conference on construction engineering and project management
    • /
    • 2013.01a
    • /
    • pp.451-458
    • /
    • 2013
  • This research documents the initial findings and recommendations for developing a risk management tool to assess and quantify the risks associated with the construction of the next generation of nuclear power plants. The proposed tool builds upon the Construction Industry Institute's International Project Risk Assessment (IPRA) Best Practice. This paper provides an overview of the investigation to assess the unique risk elements pertaining to nuclear power plant construction and documents the preliminary findings from historical project performance data to better understand the function and use of the IPRA's Relative Impact value.

  • PDF

An RDBMS-based Inverted Index Technique for Path Queries Processing on XML Documents with Different Structures (상이한 구조의 XML문서들에서 경로 질의 처리를 위한 RDBMS기반 역 인덱스 기법)

  • 민경섭;김형주
    • Journal of KIISE:Databases
    • /
    • v.30 no.4
    • /
    • pp.420-428
    • /
    • 2003
  • XML is a data-oriented language to represent all types of documents including web documents. By means of the advent of XML-based document generation tools and grow of proprietary XML documents using those tools and translation from legacy data to XML documents at an accelerating pace, we have been gotten a large amount of differently-structured XML documents. Therefore, it is more and more important to retrieve the right documents from the document set. But, previous works on XML have mainly focused on the storage and retrieval methods for a large XML document or XML documents had a same DTD. And, researches that supported the structural difference did not efficiently process path queries on the document set. To resolve the problem, we suggested a new inverted index mechanism using RDBMS and proved it outperformed the previous works. And especially, as it showed the higher efficiency in indirect containment relationship, we argues that the index structure is fit for the differently-structured XML document set.

Study on the Generation Methods of Composition Noun for Efficient Index Term Extraction (효율적인 색인어 추출을 위한 합성명사 생성 방안에 대한 연구)

  • Kim, Mi-Jin;Park, Mi-Seong;Choe, Jae-Hyeok;Lee, Sang-Jo
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1122-1131
    • /
    • 2000
  • The efficiency of thesytem depends upon an accurate extraction capability of index terms in the system of information search or in that of automatic index. Therefore, extraction of accurate index terms is of utmost importance. This report presents the generation methods of composition noun for efficient index term extraction by using words of high frequency appearance, so that the right documents can be found during information search. For the sake of presentation of this method, index terms of composition noun shall be extracted by applying the rule of composition and disintegration to the nouns with high frequency of appearance in the documents, such as those with upper 30%∼40% of frequency ratio. In addition, for he purpose of effecting an inspection of validity in relation to a composition of high frequency nouns such as those with upper 30∼40% of frequency ratio as presented in this report, it proposes an adequate frquency ratio during noun composition. Based upon the proposed application, in this short documents with less than 300 syllables, low frequency omissions were noticed, when composed with nouns in the upper 30% of frequency ratio; whereas the documents with more than 30 syllables, when composed with nouns in he upper 40% of frequency ration, had a considerable reduction of low frequency omissions. Thus, total number of index terms has decreased to 57.7% of these existing and an accurate extraction of index terms with an 85.6% adequacy ratio became possible.

  • PDF

Design and Implementation of an Automated Privacy Protection System over TPM and File Virtualization (TPS: TPM 및 파일 가상화를 통한 개인정보보호 자동화 시스템 디자인 및 구현)

  • Jeong, Hye-Lim;Ahn, Sung-Kyu;Kim, Mun Sung;Park, Ki-Woong
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.2
    • /
    • pp.7-17
    • /
    • 2017
  • In this paper, we propose the TPS (TPM-enhanced Privacy Protection System) which is an automated privacy protection system enhanced with a TPM (Trusted Platform Module). The TPS detects documents including personal information by periodic scanning the disk of clients at regular intervals and encrypts them. Hence, system manages the encrypted documents in the server. In particular, the security of TPS was greatly enhanced by limiting the access of documents including the personal information with regard to the client in an abnormal state through the TPM-based platform verification mechanism of the client system. In addition, we proposed and implemented a VTF (Virtual Trusted File) interface to provide users with the almost identical user interface as general document access even though documents containing personal information are encrypted and stored on the remote server. Consequently, the TPS automates the compliance of the personal information protection acts without additional users' interventions.

Development of A Web Mining System Based On Document Similarity (문서 유사도 기반의 웹 마이닝 시스템 개발)

  • 이강찬;민재홍;박기식;임동순;우훈식
    • The Journal of Society for e-Business Studies
    • /
    • v.7 no.1
    • /
    • pp.75-86
    • /
    • 2002
  • In this study, we proposed design issues and structure of a web mining system and develop a system for the purpose of knowledge integration under world wide web environments resulted from our developing experiences. The developed system consists of three main functions: 1) gathering documents utilizing a search agent; 2) determining similarity coefficients between any two documents from term frequencies; 3) clustering documents based on similarity coefficients. It is believed that the developed system can be utilized for discovery of knowledge in relatively narrow domains such as news classification, index term generation in knowledge management.

  • PDF

Outcomes and Impacts of Smart City Policies in Japan

  • Yamashita, Jun
    • World Technopolis Review
    • /
    • v.8 no.2
    • /
    • pp.92-103
    • /
    • 2019
  • The first generation of Japan's smart city policies began around 2010. However, the latest trends in smart city policies and the impacts of the first generation on the latter one were not fully covered in either official documents or academic literature. In such circumstances, the purposes of this study were firstly to identify outcomes derived from the smart city projects in the first generation, and then, to reveal the present situation of the latest smart city policies, including the influence of the first generation on these state of the art policies. The present study was also intended to evaluate the validity of a conceptual framework presented by Fernandez-Anez et al. (2018) for smart city policies. As a result, it was revealed that (1) policy outputs and outcomes derived from the smart city policies in the first generation were highly regarded, (2) the conceptual framework of smart city policies was evaluated as valid, and (3) the second generation of smart city policies after Society 5.0 was characterized by the establishment of smart city platforms.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Development of an Integrated EDI Document Generation System (통합적 EDI 문서생성 시스템의 개발)

  • Lee, Seung-Ik;Cho, Sung-Bae
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.6 no.3
    • /
    • pp.339-347
    • /
    • 2000
  • There are increasing needs for an individual or enterprise to interchange documents electronically through communication network to enhance the efficiency of business, according to the rapid progress in the construction of communication network. UN has established and distributed the standards for electronic documents to facilitate rapid and accurate processing of business. In this paper, we present the design and implementation of an integrated system for processing documents that conform to the electronic document standards. This system consists of three subsystems, which are an EDI parser for checking syntax, an EDI document editor, and an EDI directory viewer for referencing syntax of an EDI document. Integrated usage of these tools can afford the composition of correct EDI documents more rapidly and conveniently.

  • PDF