• Title/Summary/Keyword: Big data storage

Search Result 205, Processing Time 0.025 seconds

A Study on the Development of Phased Big Data Distribution Model Based on Big Data Distribution Ecology (빅데이터 유통 생태계에 기반한 단계별 빅데이터 유통 모델 개발에 관한 연구)

  • Kim, Shinkon;Lee, Sukjun;Kim, Jeonggon
    • Journal of Digital Convergence
    • /
    • v.14 no.5
    • /
    • pp.95-106
    • /
    • 2016
  • The major thrust of this research focuses on the development of phased big data distribution model based on the big data ecosystem. This model consists of 3 phases. In phase 1, data intermediaries are participated in this model and transaction functions are provided. This system consists of general control systems, registrations, and transaction management systems. In phase 2, trading support systems with data storage, analysis, supply, and customer relation management functions are designed. In phase 3, transaction support systems and linked big data distribution portal systems are developed. Recently, emerging new data distribution models and systems are evolving and substituting for past data management system using new technology and the processes in data science. The proposed model may be referred as criteria for industrial standard establishment for big data distribution and transaction models in the future.

Performance analysis and prediction through various over-provision on NAND flash memory based storage (낸드 플래시 메모리기반 저장 장치에서 다양한 초과 제공을 통한 성능 분석 및 예측)

  • Lee, Hyun-Seob
    • Journal of Digital Convergence
    • /
    • v.20 no.3
    • /
    • pp.343-348
    • /
    • 2022
  • Recently, With the recent rapid development of technology, the amount of data generated by various systems is increasing, and enterprise servers and data centers that have to handle large amounts of big data need to apply high-stability and high-performance storage devices even if costs increase. In such systems, SSD(solid state disk) that provide high performance of read/write are often used as storage devices. However, due to the characteristics of reading and writing on a page-by-page basis, erasing operations on a block basis, and erassing-before-writing, there is a problem that performance is degraded when duplicate writes occur. Therefore, in order to delay this performance degradation problem, over-provision technology of SSD has been applied internally. However, since over-provided technologies have the disadvantage of consuming a lot of storage space instead of performance, the application of inefficient technologies above the right performance has a problem of over-costing. In this paper, we proposed a method of measuring the performance and cost incurred when various over-provisions are applied in an SSD and predicting the system-optimized over-provided ratio based on this. Through this research, we expect to find a trade-off with costs to meet the performance requirements in systems that process big data.

A Safety IO Throttling Method Inducting Differential End of Life to Improving the Reliability of Big Data Maintenance in the SSD based RAID (SSD기반 RAID 시스템에서 빅데이터 유지 보수의 신뢰성을 향상시키기 위한 차등 수명 마감을 유도하는 안전한 IO 조절 기법)

  • Lee, Hyun-Seob
    • Journal of Digital Convergence
    • /
    • v.20 no.5
    • /
    • pp.593-598
    • /
    • 2022
  • Recently, data production has seen explosive growth, and the storage systems to store these big data safely and quickly is evolving in various ways. A typical configuration of storage systems is the use of SSDs with fast data processing speed as a RAID group that can maintain reliable data. However, since NAND flash memory, which composes SSD, has the feature that deterioration if writes more than a certain number of times are repeated, can increase the likelihood of simultaneous failure on multiple SSDs in a RAID group. And this can result in serious reliability problems that data cannot be recovered. Thus, in order to solve this problem, we propose a method of throttling IOs so that each SSD within a RAID group leads to a different life-end. The technique proposed in this paper utilizes SMART to control the state of each SSD and the number of IOs allocated according to the data pattern used step by step. In addition, this method has the advantage of preventing large amounts of concurrency defects in RAID because it induces differential lifetime finishes of SSDs.

Performance Optimization of Big Data Center Processing System - Big Data Analysis Algorithm Based on Location Awareness

  • Zhao, Wen-Xuan;Min, Byung-Won
    • International Journal of Contents
    • /
    • v.17 no.3
    • /
    • pp.74-83
    • /
    • 2021
  • A location-aware algorithm is proposed in this study to optimize the system performance of distributed systems for processing big data with low data reliability and application performance. Compared with previous algorithms, the location-aware data block placement algorithm uses data block placement and node data recovery strategies to improve data application performance and reliability. Simulation and actual cluster tests showed that the location-aware placement algorithm proposed in this study could greatly improve data reliability and shorten the application processing time of I/O interfaces in real-time.

RDP: A storage-tier-aware Robust Data Placement strategy for Hadoop in a Cloud-based Heterogeneous Environment

  • Muhammad Faseeh Qureshi, Nawab;Shin, Dong Ryeol
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.9
    • /
    • pp.4063-4086
    • /
    • 2016
  • Cloud computing is a robust technology, which facilitate to resolve many parallel distributed computing issues in the modern Big Data environment. Hadoop is an ecosystem, which process large data-sets in distributed computing environment. The HDFS is a filesystem of Hadoop, which process data blocks to the cluster nodes. The data block placement has become a bottleneck to overall performance in a Hadoop cluster. The current placement policy assumes that, all Datanodes have equal computing capacity to process data blocks. This computing capacity includes availability of same storage media and same processing performances of a node. As a result, Hadoop cluster performance gets effected with unbalanced workloads, inefficient storage-tier, network traffic congestion and HDFS integrity issues. This paper proposes a storage-tier-aware Robust Data Placement (RDP) scheme, which systematically resolves unbalanced workloads, reduces network congestion to an optimal state, utilizes storage-tier in a useful manner and minimizes the HDFS integrity issues. The experimental results show that the proposed approach reduced unbalanced workload issue to 72%. Moreover, the presented approach resolve storage-tier compatibility problem to 81% by predicting storage for block jobs and improved overall data block placement by 78% through pre-calculated computing capacity allocations and execution of map files over respective Namenode and Datanodes.

Design of Distributed Cloud System for Managing large-scale Genomic Data

  • Seine Jang;Seok-Jae Moon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.2
    • /
    • pp.119-126
    • /
    • 2024
  • The volume of genomic data is constantly increasing in various modern industries and research fields. This growth presents new challenges and opportunities in terms of the quantity and diversity of genetic data. In this paper, we propose a distributed cloud system for integrating and managing large-scale gene databases. By introducing a distributed data storage and processing system based on the Hadoop Distributed File System (HDFS), various formats and sizes of genomic data can be efficiently integrated. Furthermore, by leveraging Spark on YARN, efficient management of distributed cloud computing tasks and optimal resource allocation are achieved. This establishes a foundation for the rapid processing and analysis of large-scale genomic data. Additionally, by utilizing BigQuery ML, machine learning models are developed to support genetic search and prediction, enabling researchers to more effectively utilize data. It is expected that this will contribute to driving innovative advancements in genetic research and applications.

Research on the Analysis System based on the Big Data for Matlab (빅데이터 기반의 생체신호 수집 및 저장소 설계)

  • Joo, Moon-il;Seo, Young-woo;Kim, Hee-cheol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.472-474
    • /
    • 2018
  • Recent rapid creation of data has resulted in the development of big data technologies. In particular, with the development of wearable devices that measure biological signals, a variety of biological signals are growing exponentially. Thus, storage technologies are required to identify and systematically store characteristics of exponential increase in biological signals. In this paper, we will study the storage design that stores the biometrics by identifying the characteristics of the biometrics and the techniques to collect the biometrics.

  • PDF

Prototype Design of Mass Distributed Storage System based on PC using Ceph for SMB

  • Cha, ByungRae;Kim, Yongil
    • Smart Media Journal
    • /
    • v.4 no.3
    • /
    • pp.62-67
    • /
    • 2015
  • The trend keywords in ICT sector will be Big Data, Internet of Things, and Cloud Computing. The rear end to support those techniques requires a large-capacity storage technology of low-cost. Therefore, we proposed the prototype of low-cost and mass distributed storage system based on PC using open-source Ceph FS for SMB.

Big IoT Healthcare Data Analytics Framework Based on Fog and Cloud Computing

  • Alshammari, Hamoud;El-Ghany, Sameh Abd;Shehab, Abdulaziz
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1238-1249
    • /
    • 2020
  • Throughout the world, aging populations and doctor shortages have helped drive the increasing demand for smart healthcare systems. Recently, these systems have benefited from the evolution of the Internet of Things (IoT), big data, and machine learning. However, these advances result in the generation of large amounts of data, making healthcare data analysis a major issue. These data have a number of complex properties such as high-dimensionality, irregularity, and sparsity, which makes efficient processing difficult to implement. These challenges are met by big data analytics. In this paper, we propose an innovative analytic framework for big healthcare data that are collected either from IoT wearable devices or from archived patient medical images. The proposed method would efficiently address the data heterogeneity problem using middleware between heterogeneous data sources and MapReduce Hadoop clusters. Furthermore, the proposed framework enables the use of both fog computing and cloud platforms to handle the problems faced through online and offline data processing, data storage, and data classification. Additionally, it guarantees robust and secure knowledge of patient medical data.

Comparing the Results of Big-Data with Questionnaire Survey (빅데이터 분석결과와 실증조사 결과의 비교)

  • Kim, Do-Goan;Shin, Seong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.11
    • /
    • pp.2027-2032
    • /
    • 2016
  • The rapid diffusion of smart phones and the development of data storage and analysis technology have made the field of big-data a promising industry in the future. In the marketing field, big-data analysis on social data can be used for understanding the needs of consumers as an effective and efficient marketing tool. Before the age of big-data, companies had relied upon the traditional methods such as questionnaire survey and marketing test in which a small number of consumers had participated. The traditional methods have still been used. Although both of big-data analysis and traditional methods are useful to understand consumers. It is need to check whether the results from both include similar implications. In this point, this study attempts to compare the results of big-data analysis with that of questionnaire survey on some cosmetics brands methods. As the results of this study, both results of big-data analysis and questionnaire survey include similar implications.