• Title/Summary/Keyword: Document Databases

Search Result 130, Processing Time 0.025 seconds

Integrated Information Retrieval with Metadata Interface for Heterogeneous Distributed XML Documents (메타정보 인터페이스를 이용한 이질 구조 분석 XML문서 통합 검색)

  • 류성준;황재문;김태훈;남영광
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.11
    • /
    • pp.1505-1518
    • /
    • 2004
  • We propose an extremely light DDXMI approach for semi-automated integration of both structurally and semantically heterogeneous distributed XML documents. In the proposed prototype, a DDXMI(Distributed Documents XML Metadata Interface) is defined and a user interface generator is developed. The prototype takes sources' DTDs as inputs and generates a friendly graphical user interface for the application users. The user can easily describe the semantic mapping between the integrated virtual database DTD and sources' DTDs through assigning index numbers and specifying associated function names so that the DDXMI based on the mappings is automatically generated. Quilt is selected as the XML query language which processes user queries according to the DDXMI. It is assumed that the application users know what they want from the different sources, that is, they have their own integrated database schema in their mind, and know the semantics of the involved XML databases. A small-size global DTD and a mid-size global DTB are generated to verify the rluery generation and retrieval results with 3 XML document databases, that is, Master/ph.D thesis, research reports, and journal databases. The system has been developed with JavaCC and Java Servelet.

An XML Data Management System and Its Application to Genome Databases (XML 데이타 관리시스템과 유전체 데이타베이스에의 응용)

  • 이경희;김태경;김선신;이충세;조완섭
    • Journal of KIISE:Databases
    • /
    • v.31 no.4
    • /
    • pp.432-443
    • /
    • 2004
  • As the XML data has been widely used in the Internet, it is necessary to store and retrieve the XML data by using DBMSs. However, relational DBMSs suffer from the model difference between graph structure of the XML data and table forms in relational databases. We propose an ORDBMS-based DTD-dependent XML data management system Xing. Xing stores XML data in a DTD-dependent form in an object database. Since the object database schema has a graph structure and supports multi-valued attributes, mapping from an XML data model and queries into an object data model and OQLs is a simple problem. For rapid storing of large quantities of the XML data, we use SAX parser with customized Xing-tree which requires a small memory space compared with the DOM-tree. Xing also returns the query result in an XML document form. We have implemented the Xing system on top of UniSQL object-relational DBMS for the validity checking and performance comparison. For XML genome data from GenBank, and experimental evaluation shows that Xing can provide significant performance improvement (maximum 10 times) compared with the relational approach.

A Study on System Requirements for Integrated Electronic Document Management System (IEDMS) (통합전자문서체계구현을 위한 요구기능 분석 연구 -A사의 전자문서관리 사례를 중심으로-)

  • 권택문
    • Journal of Information Technology Application
    • /
    • v.2 no.1
    • /
    • pp.55-81
    • /
    • 2000
  • An Electronic Document Management System(EDMS) is an electronic system solution that is used to create, capture, distribute, edit, store and manage documents and related structured data repositories throughout an organization. Recently, documents of any type, such as text, images, and video files, and structured databases can be controlled and managed by an office automation system and an EDMS. Thus, many organizations are already using these information technologies to reduce process cycle-times. But what the organizations are missing is a integrated system the current workflow or office automation system and provides immediate access to and automatic routing of the organization's mission-critical information. This study tried to find out the user's requirements for integrating current information system and relatively new technology, electronic document management system in order to improve business operations, productivity and quality, and reduces waste. integration of electronic document management system(EDMS) and office automation system and proper use of these technological will improve organization's processes, and compress the process cycle-times. For this study a case study was done by a project team in cooperation with a government organization(say A company). Through this case study valuable electronic document management and office automation system requirement have been identified and reported for providing a system model in order to design an Integrated EDMS(IMDMS).

  • PDF

Integration of the PubAnnotation ecosystem in the development of a web-based search tool for alternative methods

  • Neves, Mariana
    • Genomics & Informatics
    • /
    • v.18 no.2
    • /
    • pp.18.1-18.5
    • /
    • 2020
  • Finding publications that propose alternative methods to animal experiments is an important but time-consuming task since researchers need to perform various queries to literature databases and screen many articles to assess two important aspects: the relevance of the article to the research question, and whether the article's proposed approach qualifies to being an alternative method. We are currently developing a Web application to support finding alternative methods to animal experiments. The current (under development) version of the application utilizes external tools and resources for document processing, and relies on the PubAnnotation ecosystem for annotation querying, annotation storage, dictionary-based tagging of cell lines, and annotation visualization. Currently, our two PubAnnotation repositories for discourse elements contain annotations for more than 110k PubMed documents. Further, we created an annotator for cell lines that contain more than 196k terms from Cellosaurus. Finally, we are experimenting with TextAE for annotation visualization and for user feedback.

Implementation and Evaluation of Integrated Viewier for Displanning Text and TIFF Image Materials on the Internet Environments (인터넷상에서 텍스트와 TIFF 이미지 자료 디스플레이를 위한 뷰어 구현 및 평가)

  • 최흥식
    • Journal of the Korean Society for information Management
    • /
    • v.17 no.1
    • /
    • pp.67-87
    • /
    • 2000
  • The purpose of the study is to develop an integrated viewer which can display both text and image files on the Internet environment. Up to now, most viewers for full-text databases can be displayed documents only by image or graphic viewers. The newly developed system can compress document files in commercial word processors (e.g, 한글TM, WordTM, ExceITM, PowerpointTM, HunminJungumTM, ArirangTM, CADTM), as well as conventional TIFF image file in smaller size, which were converted into DVI(DeVice Independent) file format, and display them on computer screen. IDoc Viewer was evaluated to test its performance by user group, consisting of 5 system developers, 5 librarians, and 10 end-users. IDoc Viewer has been proved to be good or excellent at 20 out of 26 check lists.

  • PDF

An Efficient Index Structure Supporting Structure Queries for Video Documents (비디오 문서의 구조 질의를 위한 효율적 인덱스 구조)

  • Lee, Yong-Kyu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.5
    • /
    • pp.1109-1118
    • /
    • 1998
  • Recently, much attention has been focused on video databases. Video documents also have a hierarchical logical structure like text documents. By exploiting this structure using structure queries, users can obtain greater benefits than by using only content queries. In order to process structure queries efficiently, an index structure supporting fast video element access must be provided. However, there has been little attention to the index structure for video documents. In this paper, we present a tree-structured video document model and a new inverted index structure for video documents. We evaluate the storage requirement and the disk access time of the scheme and present the analytical results.

  • PDF

Comparison of graph clustering methods for analyzing the mathematical subject classification codes

  • Choi, Kwangju;Lee, June-Yub;Kim, Younjin;Lee, Donghwan
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.5
    • /
    • pp.569-578
    • /
    • 2020
  • Various graph clustering methods have been introduced to identify communities in social or biological networks. This paper studies the entropy-based and the Markov chain-based methods in clustering the undirected graph. We examine the performance of two clustering methods with conventional methods based on quality measures of clustering. For the real applications, we collect the mathematical subject classification (MSC) codes of research papers from published mathematical databases and construct the weighted code-to-document matrix for applying graph clustering methods. We pursue to group MSC codes into the same cluster if the corresponding MSC codes appear in many papers simultaneously. We compare the MSC clustering results based on the several assessment measures and conclude that the Markov chain-based method is suitable for clustering the MSC codes.

IDREF-ID Attribute Reference Modeling of DTD for Legacy Database (Legacy 데이터베이스를 위한 DTD의 IDREF-ID 속성 관계 모델링)

  • 김정희;곽호영
    • Journal of Internet Computing and Services
    • /
    • v.3 no.3
    • /
    • pp.31-38
    • /
    • 2002
  • A method of DID generating step is suggested for applying the XML technology to the information data extracted from the Legacy databases. The IDREF-ID attribute reference modeling is used for representing the complex relationship between tables and excluding the prearranged step of ID insertion. ID Insertion procedure is performed in parallel with investigating the relationship between the tables and the frequent search direction between the table data. As a result, ID insertion procedure can be performed simultaneously with understanding of the IDREF-ID relationship between tables, and DID are also generated.

  • PDF

A Study on Document Retrieval Using Bibliographic Citations (인용문헌을 이용한 검색에 관한 연구)

  • Kim, Young-Min
    • Journal of the Korean Society for information Management
    • /
    • v.2 no.1
    • /
    • pp.136-163
    • /
    • 1985
  • A user who retrieved relevant documents from the existing commercial databases may be not always satisfied with the results of the traditional bibliographic searches using the subject index terms. On the assumption that the user wants more relevant documents in such instances, this thesis presents an expanded search strategy by carrying out an experiment using bibliographic citations as another content indicator in addition to index terms.

  • PDF

PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Genomics & Informatics
    • /
    • v.2 no.2
    • /
    • pp.99-106
    • /
    • 2004
  • In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein­protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.