Search | Korea Science

Development of the Design Methodology for Large-scale Data Warehouse based on MongoDB

Lee, Junho;Joo, Kyungsoo
- Journal of the Korea Society of Computer and Information
- /
- v.23 no.3
- /
- pp.49-54
- /
- 2018
A data warehouse is a system that collectively manages and integrates data of a company. And provides the basis for decision making for management strategy. Nowadays, analysis data volumes are reaching critical size challenging traditional data ware housing approaches. Current implemented solutions are mainly based on relational database that are no longer adapted to these data volume. NoSQL solutions allow us to consider new approaches for data warehousing, especially from the multidimensional data management point of view. In this paper, we extend the data warehouse design methodology based on relational database using star schema, and have developed a consistent design methodology from information requirement analysis to data warehouse construction for large scale data warehouse construction based on MongoDB, one of NoSQL.
https://doi.org/10.9708/jksci.2018.23.03.049 인용 PDF KSCI

TATS: an Efficient Technique for Computing Temporal Aggregates for Data Warehousing

Shin, Young-Ok;Park, Sung-Kong;Baik, Doo-Kwon;Ryu, Keun-Ho
- ETRI Journal
- /
- v.22 no.3
- /
- pp.41-51
- /
- 2000
An important use of data warehousing is to provide temporal views over the history of source data. It is significant that nearly all data warehouses are dependent on relational database technology, yet relational databases provide little or no real support for temporal data. Therefore, in is difficult to obtain accurate information for time-varying data. In this paper, we are going to design a temporal data warehouse to support time-varying data efficiently. For this purpose, we present a method to support temporal query by combining a temporal query process layer with the relational database which is used as a source database in an existing data warehouse. We introduce the Temporal Aggregate Tree Strategy (TATS), and suggest its algorithm for the way to aggregate the time-varying data that is changed by the time when the temporal view is created. In addition, The TATS and the materialized view creation method of the existing data warehouse have been evaluated. As a result, the TATS reduces the size of the fact table and it shows a good performance for the comparison factor in case of processing the query for time-varying data.
PDF

Modeling and Implementation for Generic Spatio-Temporal Incorporated Information (시간 공간 통합 본원적 데이터 모델링 및 그 구현에 관한 연구)

Lee Wookey
- Journal of Information Technology Applications and Management
- /
- v.12 no.1
- /
- pp.35-48
- /
- 2005
An architectural framework is developed for integrating geospatial and temporal data with relational information from which a spatio-temporal data warehouse (STDW) system is built. In order to implement the STDW, a generic conceptual model was designed that accommodated six dimensions: spatial (map object), temporal (time), agent (contractor), management (e.g. planting) and tree species (specific species) that addressed the 'where', 'when', 'who', 'what', 'why' and 'how' (5W1H) of the STDW information, respectively. A formal algebraic notation was developed based on a triplet schema that corresponded with spatial, temporal, and relational data type objects. Spatial object structures and spatial operators (spatial selection, spatial projection, and spatial join) were defined and incorporated along with other database operators having interfaces via the generic model.
PDF

Mining Information in Automated Relational Databases for Improving Reliability in Forest Products Manufacturing

Young, Timothy M.;Guess, Frank M.
- International Journal of Reliability and Applications
- /
- v.3 no.4
- /
- pp.155-164
- /
- 2002
This paper focuses on how modem data mining can be integrated with real-time relational databases and commercial data warehouses to improve reliability in real-time. An important Issue for many manufacturers is the development of relational databases that link key product attributes with real-time process parameters. Helpful data for key product attributes in manufacturing may be derived from destructive reliability testing. Destructive samples are taken at periodic time intervals during manufacturing, which might create a long time-gap between key product attributes and real-time process data. A case study is briefly summarized for the medium density fiberboard (MDF) industry. MDF is a wood composite that is used extensively by the home building and furniture manufacturing industries around the world. The cost of unacceptable MDF was as large as 5% to 10% of total manufacturing costs. Prevention can result In millions of US dollars saved by using better Information systems.
PDF

A Study on Selecting Bitmap Join Index to Speed up Complex Queries in Relational Data Warehouses (관계형 데이터 웨어하우스의 복잡한 질의의 처리 효율 향상을 위한 비트맵 조인 인덱스 선택에 관한 연구)

An, Hyoung-Geun;Koh, Jae-Jin
- The KIPS Transactions:PartD
- /
- v.19D no.1
- /
- pp.1-14
- /
- 2012
As the size of the data warehouse is large, the selection of indices on the data warehouse affects the efficiency of the query processing of the data warehouse. Indices induce the lower query processing cost, but they occupy the large storage areas and induce the index maintenance cost which are accompanied by database updates. The bitmap join indices are well applied when we optimize the star join queries which join a fact table and many dimension tables and the selection on dimension tables in data warehouses. Though the bitmap join indices with the binary representations induce the lower storage cost, the task to select the indexing attributes among the huge candidate attributes which are generated is difficult. The processes of index selection are to reduce the number of candidate attributes to be indexed and then select the indexing attributes. In this paper on bitmap join index selection problem we reduce the number of candidate attributes by the data mining techniques. Compared to the existing techniques which reduce the number of candidate attributes by the frequencies of attributes we consider the frequencies of attributes and the size of dimension tables and the size of the tuples of the dimension tables and the page size of disk. We use the mining of the frequent itemsets as mining techniques and reduce the great number of candidate attributes. We make the bitmap join indices which have the least costs and the least storage area adapted to storage constraints by using the cost functions applied to the bitmap join indices of the candidate attributes. We compare the existing techniques and ours and analyze them in order to evaluate the efficiencies of ours.
https://doi.org/10.3745/KIPSTD.2012.19D.1.001 인용 PDF KSCI

Supporting XML Materialized Views Using Materialized Views of RDBMS (관계 DBMS의 실체뷰 기능을 이용한 XML 실체뷰 지원)

Kim, Seung-Hun;Kang, Hyun-Chul
- The Journal of Society for e-Business Studies
- /
- v.11 no.4
- /
- pp.33-48
- /
- 2006
Since the emergence of XML as the standard for data exchange on the Web, XML warehousing technology is required to efficiently support Web business applications such as e-Commerce. When the RDBMS is employed as the storage for XML warehouse, XML materialized views of the XML warehouse could be provided by leveraging the materialized views of the RDBMS Because XML documents are mapped into relational tuples, an XML query defining an XML materialized view needs to be transformed into SQL. If relational materialized views were defined with the transformed SQL statements, the XML materialized view could be obtained just by XML-tagging the tuples of the corresponding relational materialized views. The foremost advantage of such a scheme is that the RDBMS does take care of XML materialized view consistency except XML tagging whenever their source XML documents are updated. In this paper, we proposed such a scheme of providing XML materialized views, and implemented it using a commercial RDBMS equipped with materialized view facility in Java on Windows 2000 Professional environment. XML documents in TPC-W, Web e-Commerce Benchmark, were used in performance experiments. The experimental results showed that our proposed scheme for XML materialized views was very effective.
PDF

An Efficient Search Space Generation Technique for Optimal Materialized Views Selection in Data Warehouse Environment (데이타 웨어하우스 환경에서 최적 실체뷰 구성을 위한 효율적인 탐색공간 생성 기법)

Lee Tae-Hee;Chang Jae-young;Lee Sang-goo
- Journal of KIISE:Databases
- /
- v.31 no.6
- /
- pp.585-595
- /
- 2004
A query processing is a critical issue in data warehouse environment since queries on data warehouses often involve hundreds of complex operations over large volumes of data. Data warehouses therefore build a large number of materialized views to increase the system performance. Which views to materialized is an important factor on the view maintenance cost as well as the query performance. The goal of materialized view selection problem is to select an optimal set of views that minimizes total query response time in addition to the view maintenance cost. In this paper, we present an efficient solution for the materialized view selection problem. Although the optimal selection of materialized views is NP-hard problem, we developed a feasible solution by utilizing the characteristics of relational operators such as join, selection, and grouping.
PDF KSCI

Web Information Extraction and Multidimensional Analysis Using XML (XML을 이용한 웹 정보 추출 및 다차원 분석)

Park, Byung-Kwon
- Journal of Korea Multimedia Society
- /
- v.11 no.5
- /
- pp.567-578
- /
- 2008
For analyzing a huge amount of web pages available in the Internet, we need to extract the encoded information in web pages. In this paper, we propose a method to extract and convert web information from web pages into XML documents for multidimensional analysis. For extracting information from web pages, we propose two languages: one for describing web information extraction rules based on the object-oriented model, and another for describing regular expressions of HTML tag patterns to search for target information. For multidimensional analysis on XML documents, we propose a method for constructing an XML warehouse and various XML cubes from it like the way we do for relational data. Finally, we show the validness of our method through the application to US patent web pages.
PDF

Optimized Entity Attribute Value Model: A Search Efficient Re-presentation of High Dimensional and Sparse Data

Paul, Razan;Latiful Hoque, Abu Sayed Md.
- Interdisciplinary Bio Central
- /
- v.3 no.3
- /
- pp.9.1-9.5
- /
- 2011
Entity Attribute Value (EAV) is the widely used solution to represent high dimensional and sparse data, but EAV is not search efficient for knowledge extraction. In this paper, we have proposed a search efficient data model: Optimized Entity Attribute Value (OEAV) for physical representation of high dimensional and sparse data as an alternative of widely used EAV. We have implemented both EAV and OEAV models in a data warehousing en-vironment and performed different relational and warehouse queries on both the models. The experimental results show that OEAV is dramatically search efficient and occupy less storage space compared to EAV.
https://doi.org/10.4051/ibc.2011.3.3.0009 인용 PDF

Multidimensional Analysis of XML Documents using XML Cubes (XML 큐브를 이용한 다차원 XML 문서 분석)

Park, Byung-Kwon
- Proceedings of the Korea Association of Information Systems Conference
- /
- 2005.05a
- /
- pp.65-78
- /
- 2005
Nowadays, large amounts of XML documents are available on the Internet. Thus, we need to analyze them multi-dimensionally in the same way as relational data. In this paper, we propose a new frame-work for multidimensional analysis of XML documents, which we call XML-OLAP. We base XML-OLAP on XML warehouses where every fact data as well as dimension data are stored as XML documents. We build XML cubes from XML warehouses. We propose a new multidimensional expression language for XML cubes, which we call XML-MDX. XML-MDX statements target XML cubes and use XQuery expressions to designate the measure data. They specify text mining operators for aggregating text constituting the measure data. We evaluate XML-OLAP by applying it to a U.S. patent XML warehouse. We use XML-MDX queries, which demonstrate that XML-OLAP is effective for multi-dimensionally analyzing the U.S. patents.
PDF

Search Result 23, Processing Time 0.018 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)