• Title/Summary/Keyword: OLAP Cubes

Search Result 21, Processing Time 0.027 seconds

Multidimensional Analysis of XML Documents using XML Cubes (XML 큐브를 이용한 다차원 XML 문서 분석)

  • Park, Byung-Kwon
    • Proceedings of the Korea Association of Information Systems Conference
    • /
    • 2005.05a
    • /
    • pp.65-78
    • /
    • 2005
  • Nowadays, large amounts of XML documents are available on the Internet. Thus, we need to analyze them multi-dimensionally in the same way as relational data. In this paper, we propose a new frame-work for multidimensional analysis of XML documents, which we call XML-OLAP. We base XML-OLAP on XML warehouses where every fact data as well as dimension data are stored as XML documents. We build XML cubes from XML warehouses. We propose a new multidimensional expression language for XML cubes, which we call XML-MDX. XML-MDX statements target XML cubes and use XQuery expressions to designate the measure data. They specify text mining operators for aggregating text constituting the measure data. We evaluate XML-OLAP by applying it to a U.S. patent XML warehouse. We use XML-MDX queries, which demonstrate that XML-OLAP is effective for multi-dimensionally analyzing the U.S. patents.

  • PDF

A Multidimensional Analysis Framework for XML Warehouses (XML 웨어하우스에 대한 다차원 분석 프레임워크)

  • Park, Byung-Kwon;Lee, Jong-Hak
    • Asia pacific journal of information systems
    • /
    • v.15 no.4
    • /
    • pp.153-164
    • /
    • 2005
  • Nowadays, large amounts of XML documents are available in the Internet. Thus, we need to analyze them multidimensionally in the same way as relational data. In this paper, we propose a new framework for multidimensional analysis of XML documents, which we call XML-OLAP. We base XML-OLAP on XML warehouses where all fact and dimension data are stored as XML documents. We build XML cubes from XML warehouses. We propose a new OLAP language for XML cubes, which we call XML-MDX. XML-MDX statements target XML cubes and use XQuery expressions to designate measure, axis and slicer. They incorporate text mining operations for aggregating text data. We apply XML-OLAP to the United States patent XML warehouse to demonstrate multidimensional analysis of XML documents.

Design of an Inference Control Process in OLAP Data Cubes (OLAP 데이터 큐브에서의 추론통제 프로세스 설계)

  • Lee, Duck-Sung;Choi, In-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.5
    • /
    • pp.183-193
    • /
    • 2009
  • Both On-Line Analytical Processing (OLAF) data cubes and Statistical Databases (SDBs) deal with multidimensional data sets. and both are concerned with statistical summarizations over the dimensions of the data sets. However, there is a distinction between the two that can be made. While SDBs are usually derived from other base data, OLAF data cubes often represent directly the base data. In other word, the base data of SDBs are the macro-data, whereas the core cubiod data in OLAF data cubes are the micro-data. The base table in OLAF is used to populate the data cube with values of the measure attribute, and each record in the base tables is used to populate a cell of the core cuboid. The fact that OLAF data cubes mostly represent the micro-data may make some records be absent in the base table. Some cells of the core cuboid remain empty, if corresponding records are absent in the base table. Wang and others proposed a method for securing OLAF data cubes against privacy breaches. They assert that the proposed method does not depend on specific types of aggregation functions. In this paper, however, it is found that their assertion on aggregate functions is wrong whenever any cell of the core cuboid remains empty. The objective of this study is to design an inference control process in OLAF data cubes which rectifying Wang's error.

A Strategy for Inference Control of Official Statistics - Centering around the Patent Application Expense Support Project - (공식통계의 추론통제 전략 - 정부의 특허경비지원사업 사례를 중심으로 -)

  • Lee, Duck-Sung;Choi, In-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.11
    • /
    • pp.199-211
    • /
    • 2009
  • Official statistics which are collected for governments and the community can be used to assess the effectiveness of governments' policies and programs. Thus, official statistics should be collected and presented based on correct findings. Erroneous official statistics will lead to lower quality results in assessing those policies and programs. Many statistical agencies, today, use on-line analytical processing (OLAP) data cubes which support OLAP tasks like aggregation and subtotals as a key part of their dissemination strategy of official statistics. Confidentiality protection in data cubes also should be made. However, sensitive parts of data cubes including micro data may be disclosed by malicious inferences. The authors have suggested an inference control process in OLAP data cubes which preventing erroneous cube creating and securing cubes against privacy breaches. The objective of this study is to establish a strategy for inference control of official statistics using the inference control process by taking the case of the Patent Application Expense Support Project.

Applying an Aggregate Function AVG to OLAP Cubes (OLAP 큐브에서의 집계함수 AVG의 적용)

  • Lee, Seung-Hyun;Lee, Duck-Sung;Choi, In-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.1
    • /
    • pp.217-228
    • /
    • 2009
  • Data analysis applications typically aggregate data across many dimensions looking for unusual patterns in data. Even though such applications are usually possible with standard structured query language (SQL) queries, the queries may become very complex. A complex query may result in many scans of the base table, leading to poor performance. Because online analytical processing (OLAP) queries are usually complex, it is desired to define a new operator for aggregation, called the data cube or simply cube. Data cube supports OLAP tasks like aggregation and sub-totals. Many aggregate functions can be used to construct a data cube. Those functions can be classified into three categories, the distributive, the algebraic, and the holistic. It has been thought that the distributive functions such as SUM, COUNT, MAX, and MIN can be used to construct a data cube, and also the algebraic function such as AVG can be used if the function is replaced to an intermediate function. It is believed that even though AVG is not distributive, but the intermediate function (SUM, COUNT) is distributive, and AVG can certainly be computed from (SUM, COUNT). In this paper, however, it is found that the intermediate function (SUM COUNT) cannot be applied to OLAP cubes, and consequently the function leads to erroneous conclusions and decisions. The objective of this study is to identify some problems in applying aggregate function AVG to OLAP cubes, and to design a process for solving these problems.

Analysis of Multiple Dimension Hierarchies of OLAP Cubes (OLAP 큐브의 다중 차원계층구조에 대한 분석)

  • 박영선;김지현;임윤선;김명
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10b
    • /
    • pp.115-117
    • /
    • 2004
  • 롤업과 드릴다운은 다차원 데이터 분석을 위한 주요 연산으로, 각 차원에 정의된 계층구조를 통해 상세 데이터로부터 점차적으로 되는 정보를 분석가에게 제공한다. 이러한 연산 속도를 고속화하기 위해 OLAP 시스템은 사전에 집계 테이블들을 생성해 놓는다. 각 차원은 다중 계층구조를 가질 수도 있으며, 이런 경우 집계 테이블들을 모두 생성하게 되면 데이터 폭발 현상이 발생하게 된다. 본 연구에서는 다중계층 구조를 분류하고, 집계 테이블과 데이터 큐브의 크기를 계산하는 모델을 정립하였다. 이를 통해 분석가는 다중 계층구조에 따른 큐브 크기를 미리 예측할 수 있으며 계층 구조의 모양과 개수를 변경하여 데이터의 양을 조절할 수 있다.

  • PDF

An Approach to Navigating Data Cubes with a Hierarchical Visualization Technique (계층적 시각화 기법을 활용한 데이터 큐브의 탐색 방안)

  • Oh, Mi-Hwa;Hwang, Man-Mo;Choi, Jung-Woo;Choi, In-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.289-305
    • /
    • 2011
  • To efficiently analyze complex and voluminous data, OLAP systems increasingly provide functionalities for visual exploration of the data allowing end-users to navigate the desired view of the data cube. This paper only deals with data cubes whose schemas represented like the exclusive symmetric hierarchy which is not addressed by current OLAP implementations. This paper presents a conceptual classification of abstraction hierarchies, and an approach to navigating data cubes with a hierarchical visualization technique. The hierarchical visualization technique is developed by using the transitive closure of a binary relation. The approach is exemplified using a real-world study from the domain of national license administration.

Design and implementation of a Web-based OLAP metadata interchange system (웹 기반의 OLAP 메타데이터 교환 시스템의 설계 및 구현)

  • Lee, In-Gi;Lee, Min-Soo;Yong, Hwan-Seung
    • The KIPS Transactions:PartD
    • /
    • v.9D no.6
    • /
    • pp.971-980
    • /
    • 2002
  • As the importance of knowledge management is being recognized, there is a significant amount of increase for interest in data warehousing. On-Line Analytical Processing (OLAP) systems can effectively make use of data warehouses. Although there are many commercial OLAP products, they have been developed without any kind of standard resulting in poor data exchange and difficulty in interfacing among the OLAP products. In this paper we propose an OLAP metadata interchange model that can be used among different OLAP products and have implemented an OLAP metadata interchange system that can interchange the cubes created from the metadata. XML is used for the OLAP metadata model and the user interface is Web-based, which makes it easier to interchange metadata among different OLAP products. Users can experience the different analysis environments of different products without the need to learn the complex cube creation process for each product. By extending this research to design a common query language that can be used among OLAP products, OLAP products should be able to more easily talk to one another.

Multi-Dimensional Keyword Search and Analysis of Hotel Review Data Using Multi-Dimensional Text Cubes (다차원 텍스트 큐브를 이용한 호텔 리뷰 데이터의 다차원 키워드 검색 및 분석)

  • Kim, Namsoo;Lee, Suan;Jo, Sunhwa;Kim, Jinho
    • Journal of Information Technology and Architecture
    • /
    • v.11 no.1
    • /
    • pp.63-73
    • /
    • 2014
  • As the advance of WWW, unstructured data including texts are taking users' interests more and more. These unstructured data created by WWW users represent users' subjective opinions thus we can get very useful information such as users' personal tastes or perspectives from them if we analyze appropriately. In this paper, we provide various analysis efficiently for unstructured text documents by taking advantage of OLAP (On-Line Analytical Processing) multidimensional cube technology. OLAP cubes have been widely used for the multidimensional analysis for structured data such as simple alphabetic and numberic data but they didn't have used for unstructured data consisting of long texts. In order to provide multidimensional analysis for unstructured text data, however, Text Cube model has been proposed precently. It incorporates term frequency and inverted index as measurements to search and analyze text databases which play key roles in information retrieval. The primary goal of this paper is to apply this text cube model to a real data set from in an Internet site sharing hotel information and to provide multidimensional analysis for users' reviews on hotels written in texts. To achieve this goal, we first build text cubes for the hotel review data. By using the text cubes, we design and implement the system which provides multidimensional keyword search features to search and to analyze review texts on various dimensions. This system will be able to help users to get valuable guest-subjective summary information easily. Furthermore, this paper evaluats the proposed systems through various experiments and it reveals the effectiveness of the system.

Overlapped-Subcube: A Lossless Compression Method for Prefix-Sun Cubes (중첩된-서브큐브: 전위-합 큐브를 위한 손실 없는 압축 방법)

  • 강흠근;민준기;전석주;정진완
    • Journal of KIISE:Databases
    • /
    • v.30 no.6
    • /
    • pp.553-560
    • /
    • 2003
  • A range-sum query is very popular and becomes important in finding trends and in discovering relationships between attributes in diverse database applications. It sums over the selected cells of an OLAP data cube where target cells are decided by specified query ranges. The direct method to access the data cube itself forces too many cells to be accessed, therefore it incurs severe overheads. The prefix-sum cube was proposed for the efficient processing of range-sum queries in OLAP environments. However, the prefix-sum cube has been criticized due to its space requirement. In this paper, we propose a lossless compression method called the overlapped-subcube that is developed for the purpose of compressing prefix-sum cubes. A distinguished feature of the overlapped-subcube is that searches can be done without decompressing. The overlapped-subcube reduces the space requirement for storing prefix-sum cubes, and improves the query performance.