• Title/Summary/Keyword: Big data storage

Search Result 205, Processing Time 0.03 seconds

Text Mining and Visualization of Unstructured Data Using Big Data Analytical Tool R (빅데이터 분석 도구 R을 이용한 비정형 데이터 텍스트 마이닝과 시각화)

  • Nam, Soo-Tai;Shin, Seong-Yoon;Jin, Chan-Yong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.9
    • /
    • pp.1199-1205
    • /
    • 2021
  • In the era of big data, not only structured data well organized in databases, but also the Internet, social network services, it is very important to effectively analyze unstructured big data such as web documents, e-mails, and social data generated in real time in mobile environment. Big data analysis is the process of creating new value by discovering meaningful new correlations, patterns, and trends in big data stored in data storage. We intend to summarize and visualize the analysis results through frequency analysis of unstructured article data using R language, a big data analysis tool. The data used in this study was analyzed for total 104 papers in the Mon-May 2021 among the journals of the Korea Institute of Information and Communication Engineering. In the final analysis results, the most frequently mentioned keyword was "Data", which ranked first 1,538 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

Cost-Effective MapReduce Processing in the Cloud (클라우드 환경에서의 비용 효율적인 맵리듀스 처리)

  • Ryu, Wooseok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.114-115
    • /
    • 2018
  • This paper studies a mechanism for cost-effective analysis of big data in the cloud environment. Recently, as a storage of electronic medical records can be managed outside the hospital, there is a growing demand for cloud-based big data analysis in small-and-medium hospitals. This paper firstly analyze the Amazon Elastic MapReduce which is a popular cloud framework for big data analysis, and proposes a cost model for analyzing big data using Amazon EMR with less cost. Using the proposed model, the user can construct a cost-effective computing cluster, which maximize the effectiveness of the analysis per operational cost.

  • PDF

A Study on Construction of Aids to Navigation Big Data Based on S-201

  • Kim, Yunjee;Oh, Se-woong;Jeon, Minsu
    • Journal of Navigation and Port Research
    • /
    • v.46 no.5
    • /
    • pp.409-417
    • /
    • 2022
  • The International Association of Lighthouse Authorities (IALA) utilizes a questionnaire to investigate the status of Aids to Navigation (AtoN) around the world. However, results of the IALA questionnaire have limited use because respondent understanding is inconsistent. In addition, there is uncertainty regarding the appropriateness of the questionnaire content. Furthermore, the overall response rate is low. Therefore, the status of AtoN is not clearly understood. AtoN data from around the world are generated hourly. Thus, big data solutions are required to effectively exploit the information. Digitization of analog data is an important component of building big data. Hence, the IALA has developed a Maritime Resource Name (MRN) scheme and an information exchange standard. Here, we used the AtoN information exchange standard and designed an S-201-based big data construction process that could collect and manage global AtoN information. In this study, construction of an IALA AtoN portal was proposed as the core of the construction of the AtoN big data. The process was divided into three stages. IALA AtoN portal is developed by IALA with the goal to provide various meaningful statistical analysis results based on AtoN data while managing AtoN information around the world based on S-201. If an AtoN portal capable of constructing S-201-based big data is developed, then a data collection and storage system that can gather basic S-201 AtoN data from the IALA and global AtoN management agencies could be achieved. Furthermore, insightful statistical analysis of AtoN status worldwide and changes in manufacturing technology will be possible.

A Design of DBaaS-Based Collaboration System for Big Data Processing

  • Jung, Yean-Woo;Lee, Jong-Yong;Jung, Kye-Dong
    • International journal of advanced smart convergence
    • /
    • v.5 no.2
    • /
    • pp.59-65
    • /
    • 2016
  • With the recent growth in cloud computing, big data processing and collaboration between businesses are emerging as new paradigms in the IT industry. In an environment where a large amount of data is generated in real time, such as SNS, big data processing techniques are useful in extracting the valid data. MapReduce is a good example of such a programming model used in big data extraction. With the growing collaboration between companies, problems of duplication and heterogeneity among data due to the integration of old and new information storage systems have arisen. These problems arise because of the differences in existing databases across the various companies. However, these problems can be negated by implementing the MapReduce technique. This paper proposes a collaboration system based on Database as a Service, or DBaaS, to solve problems in data integration for collaboration between companies. The proposed system can reduce the overhead in data integration, while being applied to structured and unstructured data.

A Study on implementation model for security log analysis system using Big Data platform (빅데이터 플랫폼을 이용한 보안로그 분석 시스템 구현 모델 연구)

  • Han, Ki-Hyoung;Jeong, Hyung-Jong;Lee, Doog-Sik;Chae, Myung-Hui;Yoon, Cheol-Hee;Noh, Kyoo-Sung
    • Journal of Digital Convergence
    • /
    • v.12 no.8
    • /
    • pp.351-359
    • /
    • 2014
  • The log data generated by security equipment have been synthetically analyzed on the ESM(Enterprise Security Management) base so far, but due to its limitations of the capacity and processing performance, it is not suited for big data processing. Therefore the another way of technology on the big data platform is necessary. Big Data platform can achieve a large amount of data collection, storage, processing, retrieval, analysis, and visualization by using Hadoop Ecosystem. Currently ESM technology has developed in the way of SIEM (Security Information & Event Management) technology, and to implement security technology in SIEM way, Big Data platform technology is essential that can handle large log data which occurs in the current security devices. In this paper, we have a big data platform Hadoop Ecosystem technology for analyzing the security log for sure how to implement the system model is studied.

A Study on Design of Real-time Big Data Collection and Analysis System based on OPC-UA for Smart Manufacturing of Machine Working

  • Kim, Jaepyo;Kim, Youngjoo;Kim, Seungcheon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.121-128
    • /
    • 2021
  • In order to design a real time big data collection and analysis system of manufacturing data in a smart factory, it is important to establish an appropriate wired/wireless communication system and protocol. This paper introduces the latest communication protocol, OPC-UA (Open Platform Communication Unified Architecture) based client/server function, applied user interface technology to configure a network for real-time data collection through IoT Integration. Then, Database is designed in MES (Manufacturing Execution System) based on the analysis table that reflects the user's requirements among the data extracted from the new cutting process automation process, bush inner diameter indentation measurement system and tool monitoring/inspection system. In summary, big data analysis system introduced in this paper performs SPC (statistical Process Control) analysis and visualization analysis with interface of OPC-UA-based wired/wireless communication. Through AI learning modeling with XGBoost (eXtream Gradient Boosting) and LR (Linear Regression) algorithm, quality and visualization analysis is carried out the storage and connection to the cloud.

Optimization of Data Placement using Principal Component Analysis based Pareto-optimal method for Multi-Cloud Storage Environment

  • Latha, V.L. Padma;Reddy, N. Sudhakar;Babu, A. Suresh
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12
    • /
    • pp.248-256
    • /
    • 2021
  • Now that we're in the big data era, data has taken on a new significance as the storage capacity has exploded from trillion bytes to petabytes at breakneck pace. As the use of cloud computing expands and becomes more commonly accepted, several businesses and institutions are opting to store their requests and data there. Cloud storage's concept of a nearly infinite storage resource pool makes data storage and access scalable and readily available. The majority of them, on the other hand, favour a single cloud because of the simplicity and inexpensive storage costs it offers in the near run. Cloud-based data storage, on the other hand, has concerns such as vendor lock-in, privacy leakage and unavailability. With geographically dispersed cloud storage providers, multicloud storage can alleviate these dangers. One of the key challenges in this storage system is to arrange user data in a cost-effective and high-availability manner. A multicloud storage architecture is given in this study. Next, a multi-objective optimization problem is defined to minimise total costs and maximise data availability at the same time, which can be solved using a technique based on the non-dominated sorting genetic algorithm II (NSGA-II) and obtain a set of non-dominated solutions known as the Pareto-optimal set.. When consumers can't pick from the Pareto-optimal set directly, a method based on Principal Component Analysis (PCA) is presented to find the best answer. To sum it all up, thorough tests based on a variety of real-world cloud storage scenarios have proven that the proposed method performs as expected.

Advanced Resource Management with Access Control for Multitenant Hadoop

  • Won, Heesun;Nguyen, Minh Chau;Gil, Myeong-Seon;Moon, Yang-Sae
    • Journal of Communications and Networks
    • /
    • v.17 no.6
    • /
    • pp.592-601
    • /
    • 2015
  • Multitenancy has gained growing importance with the development and evolution of cloud computing technology. In a multitenant environment, multiple tenants with different demands can share a variety of computing resources (e.g., CPU, memory, storage, network, and data) within a single system, while each tenant remains logically isolated. This useful multitenancy concept offers highly efficient, and cost-effective systems without wasting computing resources to enterprises requiring similar environments for data processing and management. In this paper, we propose a novel approach supporting multitenancy features for Apache Hadoop, a large scale distributed system commonly used for processing big data. We first analyze the Hadoop framework focusing on "yet another resource negotiator (YARN)", which is responsible for managing resources, application runtime, and access control in the latest version of Hadoop. We then define the problems for supporting multitenancy and formally derive the requirements to solve these problems. Based on these requirements, we design the details of multitenant Hadoop. We also present experimental results to validate the data access control and to evaluate the performance enhancement of multitenant Hadoop.

High Efficiency Life Prediction and Exception Processing Method of NAND Flash Memory-based Storage using Gradient Descent Method (경사하강법을 이용한 낸드 플래시 메모리기반 저장 장치의 고효율 수명 예측 및 예외처리 방법)

  • Lee, Hyun-Seob
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.11
    • /
    • pp.44-50
    • /
    • 2021
  • Recently, enterprise storage systems that require large-capacity storage devices to accommodate big data have used large-capacity flash memory-based storage devices with high density compared to cost and size. This paper proposes a high-efficiency life prediction method with slope descent to maximize the life of flash memory media that directly affects the reliability and usability of large enterprise storage devices. To this end, this paper proposes the structure of a matrix for storing metadata for learning the frequency of defects and proposes a cost model using metadata. It also proposes a life expectancy prediction policy in exceptional situations when defects outside the learned range occur. Lastly, it was verified through simulation that a method proposed by this paper can maximize its life compared to a life prediction method based on the fixed number of times and the life prediction method based on the remaining ratio of spare blocks, which has been used to predict the life of flash memory.

Spatial Big Data Query Processing System Supporting SQL-based Query Language in Hadoop (Hadoop에서 SQL 기반 질의언어를 지원하는 공간 빅데이터 질의처리 시스템)

  • Joo, In-Hak
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2017
  • In this paper we present a spatial big data query processing system that can store spatial data in Hadoop and query the data with SQL-based query language. The system stores large-scale spatial data in HDFS-based storage system, and supports spatial queries expressed in SQL-based query language extended for spatial data processing. It supports standard spatial data types and functions defined in OGC simple feature model in the query language. This paper presents the development of core functions of the system including query language parsing, query validation, query planning, and connection with storage system. We compares the performance of the suggested system with an existing system, and our experiments show that the system shows about 58% performance improvement of query execution time over the existing system when executing region query for spatial data stored in Hadoop.