• Title/Summary/Keyword: relational schema

Search Result 145, Processing Time 0.019 seconds

Efficient Linear Path Query Processing using Information Retrieval Techniques for Large-Scale Heterogeneous XML Documents (정보 검색 기술을 이용한 대규모 이질적인 XML 문서에 대한 효율적인 선형 경로 질의 처리)

  • 박영호;한욱신;황규영
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.540-552
    • /
    • 2004
  • We propose XIR-Linear, a novel method for processing partial match queries on large-scale heterogeneous XML documents using information retrieval (IR) techniques. XPath queries are written in path expressions on a tree structure representing an XML document. An XPath query in its major form is a partial match query. The objective of XIR-Linear is to efficiently support this type of queries for large-scale documents of heterogeneous schemas. XIR-Linear has its basis on the schema-level methods using relational tables and drastically improves their efficiency and scalability using an inverted index technique. The method indexes the labels in label paths as key words in texts, and allows for finding the label paths that match the queries far more efficiently than string match used in conventional methods. We demonstrate the efficiency and scalability of XIR-Linear by comparing it with XRel and XParent using XML documents crawled from the Internet. The results show that XIR-Linear is more efficient than both XRel and XParent by several orders of magnitude for linear path expressions as the number of XML documents increases.

A Nonunique Composite Foreign Key-Based Approach to Fact Table Modeling and MDX Query Composing (비유일 외래키 조합 복합키 기반의 사실테이블 모델링과 MDX 쿼리문 작성법)

  • Yu, Han-Ju;Lee, Duck-Sung;Choi, In-Soo
    • KSCI Review
    • /
    • v.14 no.2
    • /
    • pp.185-197
    • /
    • 2006
  • A star schema consists of a central fact table, which is surrounded by one or more dimension tables. Each row int the fact table contains a multi-part primary key(or a composite foreign key) along with one or more columns containing various facts about the data stored in the row Each of the composit foreign key components is related to a dimensional table. The combination of keys in the fact table creates a composite foreign key that is unique to the fact table record. The composite foreign key, however, is rarely unique to the fact table record in real-world applications, particularly in financial applications. In order to make the composite foreign key be the determinant in real-world application, some precalculation might be performed in the SQL relational database, and cached in the OLAP database. However, there are many drawbacks to this approach. In some cases, this approach might give users the wrong results. In this paper, an approach to fact table modeling and related MDX query composing, which can be used in real-world applications without performing any precalculation and gives users the correct results, is proposed.

  • PDF

A Nonunique Composite Foreign Key-Based Approach to Fact Table Modeling and MDX Query Composing (비유일 외래키 조합 복합키 기반의 사실테이블 모델링과 MDX 쿼리문 작성법)

  • Yu, Han-Ju;Lee, Duck-Sung;Choi, In-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.1 s.45
    • /
    • pp.177-188
    • /
    • 2007
  • A star schema consists of a central fact table, which is surrounded by one or more dimension tables. Each row in the fact table contains a multi-part primary key(or a composite foreign key) along with one or more columns containing various facts about the data stored in the row. Each of the composit foreign key components is related to a dimensional table. The combination of keys in the fact table creates a composite foreign key that is unique to the fact table record. The composite foreign key, however, is rarely unique to the fact table retold in real-world applications, particularly in financial applications. In order to make the composite foreign key be the determinant in real-world application, some precalculation might be performed in the SQL relational database, and cached in the OLAP database. However, there are many drawbacks to this approach. In some cases, this approach might give users the wrong results. In this paper, an approach to fact table modeling and related MDX query composing, which can be used in real-world applications without performing any precalculation and gives users the correct results, is proposed.

  • PDF

Branching Path Query Processing for XML Documents using the Prefix Match Join (프리픽스 매취 조인을 이용한 XML 문서에 대한 분기 경로 질의 처리)

  • Park Young-Ho;Han Wook-Shin;Whang Kyu-Young
    • Journal of KIISE:Databases
    • /
    • v.32 no.4
    • /
    • pp.452-472
    • /
    • 2005
  • We propose XIR-Branching, a novel method for processing partial match queries on heterogeneous XML documents using information retrieval(IR) techniques and novel instance join techniques. A partial match query is defined as the one having the descendent-or-self axis '//' in its path expression. In its general form, a partial match query has branch predicates forming branching paths. The objective of XIR-Branching is to efficiently support this type of queries for large-scale documents of heterogeneous schemas. XIR-Branching has its basis on the conventional schema-level methods using relational tables(e.g., XRel, XParent, XIR-Linear[21]) and significantly improves their efficiency and scalability using two techniques: an inverted index technique and a novel prefix match join. The former supports linear path expressions as the method used in XIR-Linear[21]. The latter supports branching path expressions, and allows for finding the result nodes more efficiently than containment joins used in the conventional methods. XIR-Linear shows the efficiency for linear path expressions, but does not handle branching path expressions. However, we have to handle branching path expressions for querying more in detail and general. The paper presents a novel method for handling branching path expressions. XIR-Branching reduces a candidate set for a query as a schema-level method and then, efficiently finds a final result set by using a novel prefix match join as an instance-level method. We compare the efficiency and scalability of XIR-Branching with those of XRel and XParent using XML documents crawled from the Internet. The results show that XIR-Branching is more efficient than both XRel and XParent by several orders of magnitude for linear path expressions, and by several factors for branching path expressions.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.