• Title/Summary/Keyword: 빅데이터 클러스터

Search Result 93, Processing Time 0.026 seconds

Asymmetric data storage management scheme to ensure the safety of big data in multi-cloud environments based on deep learning (딥러닝 기반의 다중 클라우드 환경에서 빅 데이터의 안전성을 보장하기 위한 비대칭 데이터 저장 관리 기법)

  • Jeong, Yoon-Su
    • Journal of Digital Convergence
    • /
    • v.19 no.3
    • /
    • pp.211-216
    • /
    • 2021
  • Information from various heterogeneous devices is steadily increasing in distributed cloud environments. This is because high-speed network speeds and high-capacity multimedia data are being used. However, research is still underway on how to minimize information errors in big data sent and received by heterogeneous devices. In this paper, we propose a deep learning-based asymmetric storage management technique for minimizing bandwidth and data errors in networks generated by information sent and received in cloud environments. The proposed technique applies deep learning techniques to optimize the load balance after asymmetric hash of the big data information generated by each device. The proposed technique is characterized by allowing errors in big data collected from each device, while also ensuring the connectivity of big data by grouping big data into groups of clusters of dogs. In particular, the proposed technique minimizes information errors when storing and managing big data asymmetrically because it used a loss function that extracted similar values between big data as seeds.

An Analysis of Causes of Marine Incidents at sea Using Big Data Technique (빅데이터 기법을 활용한 항해 중 준해양사고 발생원인 분석에 관한 연구)

  • Kang, Suk-Young;Kim, Ki-Sun;Kim, Hong-Beom;Rho, Beom-Seok
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.24 no.4
    • /
    • pp.408-414
    • /
    • 2018
  • Various studies have been conducted to reduce marine accidents. However, research on marine incidents is only marginal. There are many reports of marine incidents, but the main content of existing studies has been qualitative, which makes quantitative analysis difficult. However, quantitative analysis of marine accidents is necessary to reduce marine incidents. The purpose of this paper is to analyze marine incident data quantitatively by applying big data techniques to predict marine incident trends and reduce marine accident. To accomplish this, about 10,000 marine incident reports were prepared in a unified format through pre-processing. Using this preprocessed data, we first derived major keywords for the Marine incidents at sea using text mining techniques. Secondly, time series and cluster analysis were applied to major keywords. Trends for possible marine incidents were predicted. The results confirmed that it is possible to use quantified data and statistical analysis to address this topic. Also, we have confirmed that it is possible to provide information on preventive measures by grasping objective tendencies for marine incidents that may occur in the future through big data techniques.

Cluster Management Scheme for Safety Message Dissemination in a VANET Environment (VANET 환경에서 안전 메시지 배포를 위한 클러스터 관리 기법)

  • Pyun, Do-Woong;Lim, Jongtae;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.5
    • /
    • pp.26-36
    • /
    • 2022
  • Recently, studies have been conducted to cluster vehicles and disseminate safety messages in a VANET environment for driver safety and smoothy traffic. This paper proposes cluster management scheme for safety message dissemination through V2V communication and V2I communication in a VANET environment with high vehicle density and mobility. The proposed scheme reduces packet loss by selecting CH considering reception quality, total data owned by vehicles, moving speed, and connected vehicles, and maintaining cluster head candidates, which are the main agents of message dissemination, considering frequent cluster departures and subscriptions. In addition, the proposed scheme reduces duplicate messages by utilizing clusters by collaborating with a Road side unit(RSU). To prove the excellence of the proposed scheme, various performance evaluations are performed in terms of message packet loss and the number of RSU processing requests. As a result of performance evaluation, the cluster management scheme proposed in this paper shows better performance than the existing scheme.

Survey on Distributed Graph Processing Systems (분산 그래프 처리 시스템에 대한 연구 조사)

  • Ko, Seongyun;Seo, In;Shin, Hyungyu;Lee, Jinsoo;Han, Wook-Shin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.58-59
    • /
    • 2017
  • 그래프 데이터는 객체와 객체들 간의 관계를 모델링하여 사회 관계망 서비스, 사물 인터넷 그리고 뇌 네트워크등의 데이터를 표현하며 저장한다. 빅데이터의 시대에 빅 그래프를 처리하기 위한 수요는 가파르게 증가하고 있다. 분산 그래프 처리 시스템은 매우 큰 그래프 데이터를 클러스터 내의 여러 머신의 메모리에 나누어 저장함으로써, 빅 그래프의 처리를 가능하게 하였다. 본 논문에서는 최신 분산 그래프 처리 시스템들의 특징들을 비교 연구한다.

Management of Distributed Nodes for Big Data Analysis in Small-and-Medium Sized Hospital (중소병원에서의 빅데이터 분석을 위한 분산 노드 관리 방안)

  • Ryu, Wooseok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.05a
    • /
    • pp.376-377
    • /
    • 2016
  • Performance of Hadoop, which is a distributed data processing framework for big data analysis, is affected by several characteristics of each node in distributed cluster such as processing power and network bandwidth. This paper analyzes previous approaches for heterogeneous hadoop clusters, and presents several requirements for distributed node clustering in small-and-medium sized hospitals by considering computing environments of the hospitals.

  • PDF

Performance Comparison of Spatial Split Algorithms for Spatial Data Analysis on Spark (Spark 기반 공간 분석에서 공간 분할의 성능 비교)

  • Yang, Pyoung Woo;Yoo, Ki Hyun;Nam, Kwang Woo
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.1
    • /
    • pp.29-36
    • /
    • 2017
  • In this paper, we implement a spatial big data analysis prototype based on Spark which is an in-memory system and compares the performance by the spatial split algorithm on this basis. In cluster computing environments, big data is divided into blocks of a certain size order to balance the computing load of big data. Existing research showed that in the case of the Hadoop based spatial big data system, the split method by spatial is more effective than the general sequential split method. Hadoop based spatial data system stores raw data as it is in spatial-divided blocks. However, in the proposed Spark-based spatial analysis system, there is a difference that spatial data is converted into a memory data structure and stored in a spatial block for search efficiency. Therefore, in this paper, we propose an in-memory spatial big data prototype and a spatial split block storage method. Also, we compare the performance of existing spatial split algorithms in the proposed prototype. We presented an appropriate spatial split strategy with the Spark based big data system. In the experiment, we compared the query execution time of the spatial split algorithm, and confirmed that the BSP algorithm shows the best performance.

KI Cloud: Design and Implementation of BigData Analysis and Machine Learning Applications on Supercomputer (KI Cloud: 슈퍼컴퓨터를 통한 빅데이터 분석 및 머신 러닝 서비스 구축 방안)

  • Park, Ju-Won;Lee, Seungmin;Jeong, Kimoon;Hong, TaeYoung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.80-82
    • /
    • 2020
  • 전통적으로 기초 과학 분야의 대규모 워크로드 작업들은 슈퍼컴퓨터와 같은 대용량 클러스터 시스템을 이용하여 수행해왔다. 그러나 최근 빅데이터 및 머신 러닝과 같은 새로운 분야에서의 컴퓨팅 자원 요구가 증가하고 기존 사용자의 요구 사항도 다양해짐에 따라 기존의 클러스터 시스템 운영 환경에서는 많은 어려움이 나타나고 있다. 이러한 문제를 해결하기 위해 한국과학기술정보연구원(KISTI)에서는 지난 3 월부터 KI (KISTI Intelligent) Cloud 서비스를 개발하여 서비스를 제공하고 있다. KI Cloud 서비스는 다음과 같은 특징이 있다. 첫째, Jupyter 과 RStudio 와 같은 대화형 개발 환경을 웹을 통해 제공함으로써 사용자는 언제, 어디서나 손쉽게 서비스를 활용할 수 있다. 둘째, 컨테이너 기술을 활용하여 사용자가 요구하는 개발 및 실행 환경을 실시간으로 구성하여 제공한다. 셋째, 사용자의 서비스 환경을 동적으로 구성하여 제공함으로써 컴퓨팅 자원의 효율성을 높일 수 있다.

Comparison analysis of big data integration models (빅데이터 통합모형 비교분석)

  • Jung, Byung Ho;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.4
    • /
    • pp.755-768
    • /
    • 2017
  • As Big Data becomes the core of the fourth industrial revolution, big data-based processing and analysis capabilities are expected to influence the company's future competitiveness. Comparative studies of RHadoop and RHIPE that integrate R and Hadoop environment, have not been discussed by many researchers although RHadoop and RHIPE have been discussed separately. In this paper, we constructed big data platforms such as RHadoop and RHIPE applicable to large scale data and implemented the machine learning algorithms such as multiple regression and logistic regression based on MapReduce framework. We conducted a study on performance and scalability with those implementations for various sample sizes of actual data and simulated data. The experiments demonstrated that our RHadoop and RHIPE can scale well and efficiently process large data sets on commodity hardware. We showed RHIPE is faster than RHadoop in almost all the data generally.

Study on the method of acquiring GPU usage statistics information in cluster system (클러스터 시스템에서 GPU 사용 통계정보 획득 방안에 대한 연구)

  • Kwon, Min-Woo;Kim, Sung-Jun;Yoon, JunWeon;Hong, TaeYoung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.476-477
    • /
    • 2018
  • 한국과학기술정보연구원에서는 최근 빅데이터, 인공지능에 관한 연구 인프라 수요를 대응하기 위해 슈퍼컴퓨터 4호기 보조 가속기 시스템인 GPU 클러스터를 운영 중에 있다. GPU 클러스터 시스템은 사용자들 간에 효율적인 작업 배분을 위해 SLURM JOB 스케줄러를 이용하고 있다. 본 논문에서는 SLURM JOB 스케줄러를 통해 실행되는 사용자의 작업별 GPU 사용 통계 정보를 획득하는 방안에 대하여 소개한다.

An Empirical Evaluation Analysis of the Performance of In-memory Bigdata Processing Platform (메모리 기반 빅데이터 처리 프레임워크의 성능개선 연구)

  • Lee, Jae hwan;Choi, Jun;Koo, Dong hun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.3
    • /
    • pp.13-19
    • /
    • 2016
  • Spark, an in-memory big-data processing framework is popular to use for real-time processing workload. Spark can store all intermediate data in the cluster memory so that Spark can minimize I/O access. However, when the resident memory of workload is larger that the physical memory amount of the cluster, the total performance can drop dramatically. In this paper, we analyse the factors of bottleneck on PageRank Application that needs many memory through experiment, and cluster the Spark with Tachyon File System for using memory to solve the factor of bottleneck and then we improve the performance about 18%.