• Title/Summary/Keyword: Big data storage

Search Result 205, Processing Time 0.023 seconds

Performance Comparison of Spatial Split Algorithms for Spatial Data Analysis on Spark (Spark 기반 공간 분석에서 공간 분할의 성능 비교)

  • Yang, Pyoung Woo;Yoo, Ki Hyun;Nam, Kwang Woo
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.1
    • /
    • pp.29-36
    • /
    • 2017
  • In this paper, we implement a spatial big data analysis prototype based on Spark which is an in-memory system and compares the performance by the spatial split algorithm on this basis. In cluster computing environments, big data is divided into blocks of a certain size order to balance the computing load of big data. Existing research showed that in the case of the Hadoop based spatial big data system, the split method by spatial is more effective than the general sequential split method. Hadoop based spatial data system stores raw data as it is in spatial-divided blocks. However, in the proposed Spark-based spatial analysis system, there is a difference that spatial data is converted into a memory data structure and stored in a spatial block for search efficiency. Therefore, in this paper, we propose an in-memory spatial big data prototype and a spatial split block storage method. Also, we compare the performance of existing spatial split algorithms in the proposed prototype. We presented an appropriate spatial split strategy with the Spark based big data system. In the experiment, we compared the query execution time of the spatial split algorithm, and confirmed that the BSP algorithm shows the best performance.

An adaptive fault tolerance strategy for cloud storage

  • Xiai, Yan;Dafang, Zhang;Jinmin, Yang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.11
    • /
    • pp.5290-5304
    • /
    • 2016
  • With the growth of the massive amount of data, the failure probability of the cloud storage node is becoming more and more big. A single fault tolerance strategy, such as replication and erasure codes, has some unavoidable disadvantages, which can not meet the needs of the today's fault tolerance. Therefore, according to the file access frequency and size, an adaptive hybrid redundant fault tolerance strategy is proposed, which can dynamically change between the replication scheme and erasure codes scheme throughout the lifecycle. The experimental results show that the proposed scheme can not only save the storage space(reduced by 32% compared with replication), but also ensure the fast recovery of the node failures(increased by 42% compared with erasure codes).

The Study on the Design and Optimization of Storage for the Recording of High Speed Astronomical Data (초고속 관측 데이터 수신 및 저장을 위한 기록 시스템 설계 및 성능 최적화 연구)

  • Song, Min-Gyu;Kang, Yong-Woo;Kim, Hyo-Ryoung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.12 no.1
    • /
    • pp.75-84
    • /
    • 2017
  • It becomes more and more more important for the storage that supports high speed recording and stable access from network environment. As one field of basic science which produces massive astronomical data, VLBI(: Very Long Baseline Interferometer) is now demanding more data writing performance and which is directly related to astronomical observation with high resolution and sensitivity. But most of existing storage are cloud model based for the high throughput of general IT, finance, and administrative service, and therefore it not the best choice for recording of big stream data. Therefore, in this study, we design storage system optimized for high performance of I/O and concurrency. To solve this problem, we implement packet read and writing module through the use of libpcap and pf_ring API on the multi core CPU environment, and build a scalable storage based on software RAID(: Redundant Array of Inexpensive Disks) for the efficient process of incoming data from external network.

Application of Iipidomics in food science (식품분야에서 Iipidomics 분석 기술의 활용)

  • Kim, Hyun-Jin;Jang, Gwang-Ju;Lee, Hyeon-Jeong;Kim, Bo-Min;Oh, Juhong
    • Food Science and Industry
    • /
    • v.50 no.1
    • /
    • pp.16-25
    • /
    • 2017
  • There is no doubt that accumulation of big data using multi-omics technologies will be useful to solve human's long-standing problems such as development of personalized diet and medicine, overcoming diseases, and longevity. However, in the food industry, big data based on omics is scarcely accumulated. In particular, comprehensive analysis of molecular lipid metabolites directly associated with food quality, such as taste, flavor, and texture has been very limited. Moreover, most of food lipidomics studies are applied to analyze lipid components and discriminate authenticity and freshness of limited foods including vegetable and fish oil. However, if lipid big data through food lipidomics research of various foods and materials can be accumulated, lipidomics can be used in the optimization of food processing, production, delivery system, food safety, and storage as well as functional food.

A Method to Manage Local Storage Capacity Using Data Locality Mechanism (데이터 지역성 메커니즘을 이용한 지역 스토리지 용량 관리 방법)

  • Kim, Baul;Ku, Mino;Min, Dugki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.10a
    • /
    • pp.324-327
    • /
    • 2013
  • Recently, due to evolving cloud computing technology, we can easily and transparently utilize both local computing resource and remote computing resource in real life. Especially, enhancing smart device technologies and network infrastructures promote an increase of needs to share files between local smart devices and cloud storages. However, since smart devices have a limited storage space, storing files on cloud storage causes a starvation problem of local storage. It means that users can face a storage-lack problem even a cloud storage service provide a huge file storing space. In this research, we propose a method to manage files between smart devices and cloud storages. Our approach calculate file usage pattern based on recently used date, and then this approach determines local files being migrated. As a result, our approach is sufficient for handling data synchronization between big data storage farm and local thin client which contains limited storage space.

  • PDF

Development Status and Prospect of New Memory Devices (신 메모리 소자의 개발 현황 및 전망)

  • Jeong, Hongsik
    • Vacuum Magazine
    • /
    • v.1 no.3
    • /
    • pp.4-8
    • /
    • 2014
  • Since the modern computer architecture was suggested by Von Neumann in 1945, computer has become inevitable for our life. This brilliant growth of computer has been led by device miniaturization trend, so called Moore's law. Especially, the explosive growth of memory devices such as DRAM and Flash have played key role in huge enlarging utilization of computer. However, abrupt increase of data used for many applications in big data era provoke the excessive energy consumption of data center which results from the inefficiency of conventional memory-storage hierarchy. As a solution, the application of new memory devices has been brought up for innovative memory-storage hierarchy. In this paper, the current development status and prospect of new memory devices will be discussed.

Space-Efficient Compressed-Column Management for IoT Collection Servers (IoT 수집 서버를 위한 공간효율적 압축-칼럼 관리)

  • Byun, Siwoo
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.9 no.1
    • /
    • pp.179-187
    • /
    • 2019
  • With the recent development of small computing devices, IoT sensor network can be widely deployed and is now readily available with sensing, calculation and communi-cation functions at low cost. Sensor data management is a major component of the Internet of Things environment. The huge volume of data produced and transmitted from sensing devices can provide a lot of useful information but is often considered the next big data for businesses. New column-wise compression technology is mounted to the large data server because of its superior space efficiency. Since sensor nodes have narrow bandwidth and fault-prone wireless channels, sensor-based storage systems are subject to incomplete data services. In this study, we will bring forth a short overview through providing an analysis on IoT sensor networks, and will propose a new storage management scheme for IoT data. Our management scheme is based on RAID storage model using column-wise segmentation and compression to improve space efficiency without sacrificing I/O performance. We conclude that proposed storage control scheme outperforms the previous RAID control by computer performance simulation.

HTSC and FH HTSC: XOR-based Codes to Reduce Access Latency in Distributed Storage Systems

  • Shuai, Qiqi;Li, Victor O.K.
    • Journal of Communications and Networks
    • /
    • v.17 no.6
    • /
    • pp.582-591
    • /
    • 2015
  • A massive distributed storage system is the foundation for big data operations. Access latency performance is a key metric in distributed storage systems since it greatly impacts user experience while existing codes mainly focus on improving performance such as storage overhead and repair cost. By generating parity nodes from parity nodes, in this paper we design new XOR-based erasure codes hierarchical tree structure code (HTSC) and high failure tolerant HTSC (FH HTSC) to reduce access latency in distributed storage systems. By comparing with other popular and representative codes, we show that, under the same repair cost, HTSC and FH HTSC codes can reduce access latency while maintaining favorable performance in other metrics. In particular, under the same repair cost, FH HTSC can achieve lower access latency, higher or equal failure tolerance and lower computation cost compared with the representative codes while enjoying similar storage overhead. Accordingly, FH HTSC is a superior choice for applications requiring low access latency and outstanding failure tolerance capability at the same time.

Development of Big-data Management Platform Considering Docker Based Real Time Data Connecting and Processing Environments (도커 기반의 실시간 데이터 연계 및 처리 환경을 고려한 빅데이터 관리 플랫폼 개발)

  • Kim, Dong Gil;Park, Yong-Soon;Chung, Tae-Yun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.16 no.4
    • /
    • pp.153-161
    • /
    • 2021
  • Real-time access is required to handle continuous and unstructured data and should be flexible in management under dynamic state. Platform can be built to allow data collection, storage, and processing from local-server or multi-server. Although the former centralize method is easy to control, it creates an overload problem because it proceeds all the processing in one unit, and the latter distributed method performs parallel processing, so it is fast to respond and can easily scale system capacity, but the design is complex. This paper provides data collection and processing on one platform to derive significant insights from various data held by an enterprise or agency in the latter manner, which is intuitively available on dashboards and utilizes Spark to improve distributed processing performance. All service utilize dockers to distribute and management. The data used in this study was 100% collected from Kafka, showing that when the file size is 4.4 gigabytes, the data processing speed in spark cluster mode is 2 minute 15 seconds, about 3 minutes 19 seconds faster than the local mode.

A method for optimizing lifetime prediction of a storage device using the frequency of occurrence of defects in NAND flash memory (낸드 플래시 메모리의 불량 발생빈도를 이용한 저장장치의 수명 예측 최적화 방법)

  • Lee, Hyun-Seob
    • Journal of Internet of Things and Convergence
    • /
    • v.7 no.4
    • /
    • pp.9-14
    • /
    • 2021
  • In computing systems that require high reliability, the method of predicting the lifetime of a storage device is one of the important factors for system management because it can maximize usability as well as data protection. The life of a solid state drive (SSD) that has recently been used as a storage device in several storage systems is linked to the life of the NAND flash memory that constitutes it. Therefore, in a storage system configured using an SSD, a method of accurately and efficiently predicting the lifespan of a NAND flash memory is required. In this paper, a method for optimizing the lifetime prediction of a flash memory-based storage device using the frequency of NAND flash memory failure is proposed. For this, we design a cost matrix to collect the frequency of defects that occur when processing data in units of Drive Writes Per Day (DWPD). In addition, a method of predicting the remaining cost to the slope where the life-long finish occurs using the Gradient Descent method is proposed. Finally, we proved the excellence of the proposed idea when any defect occurs with simulation.