• Title/Summary/Keyword: Column-oriented Storage

Search Result 10, Processing Time 0.04 seconds

Column-aware Polarization Scheme for High-Speed Database Systems (고속 데이터베이스 시스템을 위한 컬럼-인지 양분화 기법)

  • Byun, Si-Woo
    • Journal of Internet Computing and Services
    • /
    • v.13 no.3
    • /
    • pp.83-91
    • /
    • 2012
  • Recently, column-oriented storage has become a progressive model for high-speed database systems because of its superior I/O performance. In this paper, we analysis traditional raw-oriented storage model and then propose a new column-aware storage management model using flash memory drive and assist drive to improve the effective performance of the high-speed column-oriented database system. Our storage management scheme called column-aware polarization improves the performance of update operation by dividing and compressing table columns into active-columns or inactive-columns, and balancing congested update operations using a assist drive in high workload periods. The results obtained from experimental tests show that our scheme improves the update throughput of column-oriented storage by 19 percent, and the response time by up to 49 percent.

Cross Compressed Replication Scheme for Large-Volume Column Storages (대용량 컬럼 저장소를 위한 교차 압축 이중화 기법)

  • Byun, Siwoo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.5
    • /
    • pp.2449-2456
    • /
    • 2013
  • The column-oriented database storage is a very advanced model for large-volume data analysis systems because of its superior I/O performance. Traditional data storages exploit row-oriented storage where the attributes of a record are placed contiguously in hard disk for fast write operations. However, for search-mostly datawarehouse systems, column-oriented storage has become a more proper model because of its superior read performance. Recently, solid state drive using MLC flash memory is largely recognized as the preferred storage media for high-speed data analysis systems. In this paper, we introduce fast column-oriented data storage model and then propose a new storage management scheme using a cross compressed replication for the high-speed column-oriented datawarehouse system. Our storage management scheme which is based on two MLC SSD achieves superior performance and reliability by the cross replication of the uncompressed segment and the compressed segment under high workloads of CPU and I/O. Based on the results of the performance evaluation, we conclude that our storage management scheme outperforms the traditional scheme in the respect of update throughput and response time of the column segments.

A Column-Aware Index Management Using Flash Memory for Read-Intensive Databases

  • Byun, Si-Woo;Jang, Seok-Woo
    • Journal of Information Processing Systems
    • /
    • v.11 no.3
    • /
    • pp.389-405
    • /
    • 2015
  • Most traditional database systems exploit a record-oriented model where the attributes of a record are placed contiguously in a hard disk to achieve high performance writes. However, for read-mostly data warehouse systems, the column-oriented database has become a proper model because of its superior read performance. Today, flash memory is largely recognized as the preferred storage media for high-speed database systems. In this paper, we introduce a column-oriented database model based on flash memory and then propose a new column-aware flash indexing scheme for the high-speed column-oriented data warehouse systems. Our index management scheme, which uses an enhanced $B^+$-Tree, achieves superior search performance by indexing an embedded segment and packing an unused space in internal and leaf nodes. Based on the performance results of two test databases, we concluded that the column-aware flash index management outperforms the traditional scheme in the respect of the mixed operation throughput and its response time.

Search Performance Improvement of Column-oriented Flash Storages using Segmented Compression Index (분할된 압축 인덱스를 이용한 컬럼-지향 플래시 스토리지의 검색 성능 개선)

  • Byun, Siwoo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.1
    • /
    • pp.393-401
    • /
    • 2013
  • Most traditional databases exploit record-oriented storage model where the attributes of a record are placed contiguously in hard disk to achieve high performance writes. However, for search-mostly datawarehouse systems, column-oriented storage has become a proper model because of its superior read performance. Today, flash memory is largely recognized as the preferred storage media for high-speed database systems. In this paper, we introduce fast column-oriented database model and then propose a new column-aware index management scheme for the high-speed column-oriented datawarehouse system. Our index management scheme which is based on enhanced $B^+$-Tree achieves high search performance by embedded flash index and unused space compression in internal and leaf nodes. Based on the results of the performance evaluation, we conclude that our index management scheme outperforms the traditional scheme in the respect of the search throughput and response time.

Shadow Recovery for Column-based Databases (컬럼-기반 데이터베이스를 위한 그림자 복구)

  • Byun, Si-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.4
    • /
    • pp.2784-2790
    • /
    • 2015
  • The column-oriented database storage is a very advanced model for large-volume data transactions because of its superior I/O performance. Traditional data storages exploit row-oriented storage where the attributes of a record are placed contiguously in hard disk for fast write operations. However, for search-mostly data warehouse systems, column-oriented storage has become a more proper model because of its superior read performance. Recently, solid state drive using flash memory is largely recognized as the preferred storage media for high-speed data analysis systems. In this research, we propose a new transaction recovery scheme for a column-oriented database environment which is based on a flash media file system. We improved traditional shadow paging schemes by reusing old data pages which are supposed to be invalidated in the course of writing a new data page in the flash file system environment. In order to reuse these data pages, we exploit reused shadow list structure in our column-oriented shadow recovery(CoSR) scheme. CoSR scheme minimizes the additional storage overhead for keeping shadow pages and minimizes the I/O performance degradation caused by column data compression of traditional recovery schemes. Based on the results of the performance evaluation, we conclude that CoSR outperforms the traditional schemes by 17%.

Column-aware Transaction Management Scheme for Column-Oriented Databases (컬럼-지향 데이터베이스를 위한 컬럼-인지 트랜잭션 관리 기법)

  • Byun, Si-Woo
    • Journal of Internet Computing and Services
    • /
    • v.15 no.4
    • /
    • pp.125-133
    • /
    • 2014
  • The column-oriented database storage is a very advanced model for large-volume data analysis systems because of its superior I/O performance. Traditional data storages exploit row-oriented storage where the attributes of a record are placed contiguously in hard disk for fast write operations. However, for search-mostly datawarehouse systems, column-oriented storage has become a more proper model because of its superior read performance. Recently, solid state drive using MLC flash memory is largely recognized as the preferred storage media for high-speed data analysis systems. The features of non-volatility, low power consumption, and fast access time for read operations are sufficient grounds to support flash memory as major storage components of modern database servers. However, we need to improve traditional transaction management scheme due to the relatively slow characteristics of column compression and flash operation as compared to RAM memory. In this research, we propose a new scheme called Column-aware Multi-Version Locking (CaMVL) scheme for efficient transaction processing. CaMVL improves transaction performance by using compression lock and multi version reads for efficiently handling slow flash write/erase operation in lock management process. We also propose a simulation model to show the performance of CaMVL. Based on the results of the performance evaluation, we conclude that CaMVL scheme outperforms the traditional scheme.

A Study of Column-oriented Storage Method on Harddisks and Flash SSDs (하드디스크와 플래시SSD상에서 열-지향 저장 모델 고찰)

  • Park, Ji-Young;Kang, Woon-Hak;Lee, Sang-Won
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.1121-1124
    • /
    • 2012
  • 열-지향 데이터베이스 시스템인 C-Store는 많은 상용 데이터베이스 시스템과는 달리 데이터를 행(row) 위주가 아닌 열(column) 위주로 저장을 하여, 데이터 웨어하우스와 같이 주로 읽기 IO를 유발하는 환경에서 데이터의 전송량을 줄임으로써, 높은 성능을 보였다. 본 논문에서는 대표적인 열 지향 저장 DBMS인 C-Store와 행 위주의 저장구조를 사용하는 기존 DBMS와의 차이점을 알아보고, C-Store의 저장장치로 하드디스크와 차세대 저장장치로 주목받고 있는 플래시 SSD(Solid State Disk)를 사용하였을 때, 발생할 수 있는 장단점에 대해 분석하였다.

Comparison of Storage Structures for RDF Data in Semantic Web. (시맨틱 웹에서 RDF 데이터 저장구조들의 성능비교)

  • Kim, KyungHo;Back, WooHyoun;Son, JiEun;Kim, KyungChang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.881-884
    • /
    • 2013
  • RDF(Resource Description Framework)는 시맨틱 웹의 기초로서 웹 사용자에게 정보를 보다 정확하고 효율적으로 접근하는 표준이다. RDF 데이터를 효율적으로 저장하고 접근하는 필요성이 날로 증가하고 있다. RDF 데이터를 저장하고 검색하는 기본 저장 구조는 관계형 데이터베이스를 이용하는 것이다. 최근에는 RDF 데이터가 엄청나게 증가하고 있는 시점에 대용량 database의 질의(단순 조회)에 최적화된 칼럼-지향(column-oriented) 데이터베이스가 대안으로 제안되었다. 본 논문에서는 RDF 데이터의 저장 구조로서 관계형 데이터베이스와 칼럼-기반 데이터베이스를 비교분석 하고자 한다. Berlin SPARQL Benchmark 를 이용한 성능분석 결과 RDF data 의 저장 구조로서 칼럼-기반 데이터베이스의 효율성을 입증하였다.

A Low Cost IBM PC/AT Based Image Processing System for Satellite Image Analysis: A New Analytical Tool for the Resource Managers

  • Yang, Young-Kyu;Cho, Seong-Ik;Lee, Hyun-Woo;Miller, Lee-D.
    • Korean Journal of Remote Sensing
    • /
    • v.4 no.1
    • /
    • pp.31-40
    • /
    • 1988
  • Low-cost microcomputer systems can be assembled which possess computing power, color display, memory, and storage capacity approximately equal to graphic workstactions. A low-cost, flexible, and user-friendly IBM/PC/XT/AT based image processing system has been developed and named as KMIPS(KAIST (Korea Advanced Institute of Science & Technology) Map and Image Processing Station). It can be easily utilized by the resource managers who are not computer specialists. This system can: * directly access Landsat MSS and TM, SPOT, NOAA AVHRR, MOS-1 satellite imagery and other imagery from different sources via magnetic tape drive connected with IBM/PC; * extract image up to 1024 line by 1024 column and display it up to 480 line by 672 column with 512 colors simultaneously available; * digitize photographs using a frame grabber subsystem(512 by 512 picture elements); * perform a variety of image analyses, GIS and terrain analyses, and display functions; and * generate map and hard copies to the various scales. All raster data input to the microcomputer system is geographically referenced to the topographic map series in any rater cell size selected by the user. This map oriented, georeferenced approach of this system enables user to create a very accurately registered(.+-.1 picture element), multivariable, multitemporal data sets which can be subsequently subsequently subjected to various analyses and display functions.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.