• Title/Summary/Keyword: Multidimensional Data Cube

Search Result 18, Processing Time 0.025 seconds

Dense Sub-Cube Extraction Algorithm for a Multidimensional Large Sparse Data Cube (다차원 대용량 저밀도 데이타 큐브에 대한 고밀도 서브 큐브 추출 알고리즘)

  • Lee Seok-Lyong;Chun Seok-Ju;Chung Chin-Wan
    • Journal of KIISE:Databases
    • /
    • v.33 no.4
    • /
    • pp.353-362
    • /
    • 2006
  • A data warehouse is a data repository that enables users to store large volume of data and to analyze it effectively. In this research, we investigate an algorithm to establish a multidimensional data cube which is a powerful analysis tool for the contents of data warehouses and databases. There exists an inevitable retrieval overhead in a multidimensional data cube due to the sparsity of the cube. In this paper, we propose a dense sub-cube extraction algorithm that identifies dense regions from a large sparse data cube and constructs the sub-cubes based on the dense regions found. It reduces the retrieval overhead remarkably by retrieving those small dense sub-cubes instead of scanning a large sparse cube. The algorithm utilizes the bitmap and histogram based techniques to extract dense sub-cubes from the data cube, and its effectiveness is demonstrated via an experiment.

A Z-Index based MOLAP Cube Storage Scheme (Z-인덱스 기반 MOLAP 큐브 저장 구조)

  • Kim, Myung;Lim, Yoon-Sun
    • Journal of KIISE:Databases
    • /
    • v.29 no.4
    • /
    • pp.262-273
    • /
    • 2002
  • MOLAP is a technology that accelerates multidimensional data analysis by storing data in a multidimensional array and accessing them using their position information. Depending on a mapping scheme of a multidimensional array onto disk, the sliced of MOLAP operations such as slice and dice varies significantly. [1] proposed a MOLAP cube storage scheme that divides a cube into small chunks with equal side length, compresses sparse chunks, and stores the chunks in row-major order of their chunk indexes. This type of cube storage scheme gives a fair chance to all dimensions of the input data. Here, we developed a variant of their cube storage scheme by placing chunks in a different order. Our scheme accelerates slice and dice operations by aligning chunks to physical disk block boundaries and clustering neighboring chunks. Z-indexing is used for chunk clustering. The efficiency of the proposed scheme is evaluated through experiments. We showed that the proposed scheme is efficient for 3~5 dimensional cubes that are frequently used to analyze business data.

A Bitmap Index for Chunk-Based MOLAP Cubes (청크 기반 MOLAP 큐브를 위한 비트맵 인덱스)

  • Lim, Yoon-Sun;Kim, Myung
    • Journal of KIISE:Databases
    • /
    • v.30 no.3
    • /
    • pp.225-236
    • /
    • 2003
  • MOLAP systems store data in a multidimensional away called a 'cube' and access them using way indexes. When a cube is placed into disk, it can be Partitioned into a set of chunks of the same side length. Such a cube storage scheme is called the chunk-based MOLAP cube storage scheme. It gives data clustering effect so that all the dimensions are guaranteed to get a fair chance in terms of the query processing speed. In order to achieve high space utilization, sparse chunks are further compressed. Due to data compression, the relative position of chunks cannot be obtained in constant time without using indexes. In this paper, we propose a bitmap index for chunk-based MOLAP cubes. The index can be constructed along with the corresponding cube generation. The relative position of chunks is retained in the index so that chunk retrieval can be done in constant time. We placed in an index block as many chunks as possible so that the number of index searches is minimized for OLAP operations such as range queries. We showed the proposed index is efficient by comparing it with multidimensional indexes such as UB-tree and grid file in terms of time and space.

A Method for Engineering Change Analysis by Using OLAP (OLAP를 이용한 설계변경 분석 방법에 관한 연구)

  • Do, Namchul
    • Korean Journal of Computational Design and Engineering
    • /
    • v.19 no.2
    • /
    • pp.103-110
    • /
    • 2014
  • Engineering changes are indispensable engineering and management activities for manufactures to develop competitive products and to maintain consistency of its product data. Analysis of engineering changes provides a core functionality to support decision makings for engineering change management. This study aims to develop a method for analysis of engineering changes based on On-Line Analytical Processing (OLAP), a proven database analysis technology that has been applied to various business areas. This approach automates data processing for engineering change analysis from product databases that follow an international standard for product data management (PDM), and enables analysts to analyze various aspects of engineering changes with its OLAP operations. The study consists of modeling a standard PDM database and a multidimensional data model for engineering change analysis, implementing the standard and multidimensional models with PDM and data cube systems and applying the implemented data cube to core functions of engineering change management, the evaluation and propagation of engineering changes.

An Approximate Query Answering Method using a Knowledge Representation Approach (지식 표현 방식을 이용한 근사 질의응답 기법)

  • Lee, Sun-Young;Lee, Jong-Yun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.8
    • /
    • pp.3689-3696
    • /
    • 2011
  • In decision support system, knowledge workers require aggregation operations of the large data and are more interested in the trend analysis rather than in the punctual analysis. Therefore, it is necessary to provide fast approximate answers rather than exact answers, and to research approximate query answering techniques. In this paper, we propose a new approximation query answering method which is based on Fuzzy C-means clustering (FCM) method and Adaptive Neuro-Fuzzy Inference System (ANFIS). The proposed method using FCM-ANFIS can compute aggregate queries without accessing massive multidimensional data cube by producing the KR model of multidimensional data cube. In our experiments, we show that our method using the KR model outperforms the NMF method.

Sort-Based Distributed Parallel Data Cube Computation Algorithm using MapReduce (맵리듀스를 이용한 정렬 기반의 데이터 큐브 분산 병렬 계산 알고리즘)

  • Lee, Suan;Kim, Jinho
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.9
    • /
    • pp.196-204
    • /
    • 2012
  • Recently, many applications perform OLAP(On-Line Analytical Processing) over a very large volume of data. Multidimensional data cube is regarded as a core tool in OLAP analysis. This paper focuses on the method how to efficiently compute data cubes in parallel by using a popular parallel processing tool, MapReduce. We investigate efficient ways to implement PipeSort algorithm, a well-known data cube computation method, on the MapReduce framework. The PipeSort executes several (descendant) cuboids at the same time as a pipeline by scanning one (ancestor) cuboid once, which have the same sorting order. This paper proposed four ways implementing the pipeline of the PipeSort on the MapReduce framework which runs across 20 servers. Our experiments show that PipeMap-NoReduce algorithm outperforms the rest algorithms for high-dimensional data. On the contrary, Post-Pipe stands out above the others for low-dimensional data.

An Efficient ROLAP Cube Generation Scheme (효율적인 ROLAP 큐브 생성 방법)

  • Kim, Myung;Song, Ji-Sook
    • Journal of KIISE:Databases
    • /
    • v.29 no.2
    • /
    • pp.99-109
    • /
    • 2002
  • ROLAP(Relational Online Analytical Processing) is a process and methodology for a multidimensional data analysis that is essential to extract desired data and to derive value-added information from an enterprise data warehouse. In order to speed up query processing, most ROLAP systems pre-compute summary tables. This process is called 'cube generation' and it mostly involves intensive table sorting stages. (1) showed that it is much faster to generate ROLAP summary tables indirectly using a MOLAP(multidimensional OLAP) cube generation algorithm. In this paper, we present such an indirect ROLAP cube generation algorithm that is fast and scalable. High memory utilization is achieved by slicing the input fact table along one or more dimensions before generating summary tables. High speed is achieved by producing summary tables from their smallest parents. We showed the efficiency of our algorithm through experiments.

Multi-Dimensional Keyword Search and Analysis of Hotel Review Data Using Multi-Dimensional Text Cubes (다차원 텍스트 큐브를 이용한 호텔 리뷰 데이터의 다차원 키워드 검색 및 분석)

  • Kim, Namsoo;Lee, Suan;Jo, Sunhwa;Kim, Jinho
    • Journal of Information Technology and Architecture
    • /
    • v.11 no.1
    • /
    • pp.63-73
    • /
    • 2014
  • As the advance of WWW, unstructured data including texts are taking users' interests more and more. These unstructured data created by WWW users represent users' subjective opinions thus we can get very useful information such as users' personal tastes or perspectives from them if we analyze appropriately. In this paper, we provide various analysis efficiently for unstructured text documents by taking advantage of OLAP (On-Line Analytical Processing) multidimensional cube technology. OLAP cubes have been widely used for the multidimensional analysis for structured data such as simple alphabetic and numberic data but they didn't have used for unstructured data consisting of long texts. In order to provide multidimensional analysis for unstructured text data, however, Text Cube model has been proposed precently. It incorporates term frequency and inverted index as measurements to search and analyze text databases which play key roles in information retrieval. The primary goal of this paper is to apply this text cube model to a real data set from in an Internet site sharing hotel information and to provide multidimensional analysis for users' reviews on hotels written in texts. To achieve this goal, we first build text cubes for the hotel review data. By using the text cubes, we design and implement the system which provides multidimensional keyword search features to search and to analyze review texts on various dimensions. This system will be able to help users to get valuable guest-subjective summary information easily. Furthermore, this paper evaluats the proposed systems through various experiments and it reveals the effectiveness of the system.

A Multidimensional Analysis Framework for XML Warehouses (XML 웨어하우스에 대한 다차원 분석 프레임워크)

  • Park, Byung-Kwon;Lee, Jong-Hak
    • Asia pacific journal of information systems
    • /
    • v.15 no.4
    • /
    • pp.153-164
    • /
    • 2005
  • Nowadays, large amounts of XML documents are available in the Internet. Thus, we need to analyze them multidimensionally in the same way as relational data. In this paper, we propose a new framework for multidimensional analysis of XML documents, which we call XML-OLAP. We base XML-OLAP on XML warehouses where all fact and dimension data are stored as XML documents. We build XML cubes from XML warehouses. We propose a new OLAP language for XML cubes, which we call XML-MDX. XML-MDX statements target XML cubes and use XQuery expressions to designate measure, axis and slicer. They incorporate text mining operations for aggregating text data. We apply XML-OLAP to the United States patent XML warehouse to demonstrate multidimensional analysis of XML documents.

Web Information Extraction and Multidimensional Analysis Using XML (XML을 이용한 웹 정보 추출 및 다차원 분석)

  • Park, Byung-Kwon
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.5
    • /
    • pp.567-578
    • /
    • 2008
  • For analyzing a huge amount of web pages available in the Internet, we need to extract the encoded information in web pages. In this paper, we propose a method to extract and convert web information from web pages into XML documents for multidimensional analysis. For extracting information from web pages, we propose two languages: one for describing web information extraction rules based on the object-oriented model, and another for describing regular expressions of HTML tag patterns to search for target information. For multidimensional analysis on XML documents, we propose a method for constructing an XML warehouse and various XML cubes from it like the way we do for relational data. Finally, we show the validness of our method through the application to US patent web pages.

  • PDF