• Title/Summary/Keyword: Big Data Cluster

Search Result 209, Processing Time 0.021 seconds

An Analysis of Voters' Political Tendency Using Big Data (빅데이터를 활용한 유권자의 정치성향 분석)

  • Eum, Yeong-Cheol
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2015.01a
    • /
    • pp.319-320
    • /
    • 2015
  • 본 연구는 빅데이터를 활용해서 유권자의 정치성향을 파악할 수 있는 방안을 세 가지 관점에서 제시하였다. 첫째, 군집분석은 유권자의 기본성향을 파악할 수 있는 방법으로 각 정당은 유권자의 데이터베이스를 확보해야 한다. 둘째, 회귀분석은 독립변수가 종속변수에 어떤 영향을 끼치는가를 분석한 것으로 유권자들의 필요에 따른 정책을 세우는데 필요하다. 셋째, 연관성 분석은 특정 사물에 대한 선호도를 파악하여 유권자의 정치성향을 유추할 수 있는 방안을 말한다.

  • PDF

Performance Comparison of Spatial Split Algorithms for Spatial Data Analysis on Spark (Spark 기반 공간 분석에서 공간 분할의 성능 비교)

  • Yang, Pyoung Woo;Yoo, Ki Hyun;Nam, Kwang Woo
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.1
    • /
    • pp.29-36
    • /
    • 2017
  • In this paper, we implement a spatial big data analysis prototype based on Spark which is an in-memory system and compares the performance by the spatial split algorithm on this basis. In cluster computing environments, big data is divided into blocks of a certain size order to balance the computing load of big data. Existing research showed that in the case of the Hadoop based spatial big data system, the split method by spatial is more effective than the general sequential split method. Hadoop based spatial data system stores raw data as it is in spatial-divided blocks. However, in the proposed Spark-based spatial analysis system, there is a difference that spatial data is converted into a memory data structure and stored in a spatial block for search efficiency. Therefore, in this paper, we propose an in-memory spatial big data prototype and a spatial split block storage method. Also, we compare the performance of existing spatial split algorithms in the proposed prototype. We presented an appropriate spatial split strategy with the Spark based big data system. In the experiment, we compared the query execution time of the spatial split algorithm, and confirmed that the BSP algorithm shows the best performance.

Apriori Based Big Data Processing System for Improve Sensor Data Throughput in IoT Environments (IoT 환경에서 센서 데이터 처리율 향상을 위한 Apriori 기반 빅데이터 처리 시스템)

  • Song, Jin Su;Kim, Soo Jin;Shin, Young Tae
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.10
    • /
    • pp.277-284
    • /
    • 2021
  • Recently, the smart home environment is expected to be a platform that collects, integrates, and utilizes various data through convergence with wireless information and communication technology. In fact, the number of smart devices with various sensors is increasing inside smart homes. The amount of data that needs to be processed by the increased number of smart devices is also increasing, and big data processing systems are actively being introduced to handle it effectively. However, traditional big data processing systems have all requests directed to cluster drivers before they are allocated to distributed nodes, leading to reduced cluster-wide performance sharing as cluster drivers managing segmentation tasks become bottlenecks. In particular, there is a greater delay rate on smart home devices that constantly request small data processing. Thus, in this paper, we design a Apriori-based big data system for effective data processing in smart home environments where frequent requests occur at the same time. According to the performance evaluation results of the proposed system, the data processing time was reduced by up to 38.6% from at least 19.2% compared to the existing system. The reason for this result is related to the type of data being measured. Because the amount of data collected in a smart home environment is large, the use of cache servers plays a major role in data processing, and association analysis with Apriori algorithms stores highly relevant sensor data in the cache.

Cost-Effective MapReduce Processing in the Cloud (클라우드 환경에서의 비용 효율적인 맵리듀스 처리)

  • Ryu, Wooseok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.114-115
    • /
    • 2018
  • This paper studies a mechanism for cost-effective analysis of big data in the cloud environment. Recently, as a storage of electronic medical records can be managed outside the hospital, there is a growing demand for cloud-based big data analysis in small-and-medium hospitals. This paper firstly analyze the Amazon Elastic MapReduce which is a popular cloud framework for big data analysis, and proposes a cost model for analyzing big data using Amazon EMR with less cost. Using the proposed model, the user can construct a cost-effective computing cluster, which maximize the effectiveness of the analysis per operational cost.

  • PDF

A Study on the Upper Body Shapes of Late Elementary Schoolgirls (학령후기 여아의 상반신 체형 연구)

  • Jang, Jeong-Ah
    • Fashion & Textile Research Journal
    • /
    • v.8 no.1
    • /
    • pp.107-112
    • /
    • 2006
  • This study is done to classify the upper body shapes for late elementary schoolgirls. The sampling was done for 11~12 years-old-girls resident in Busan and Kyungnam. Based on the somatometric charateristics of them, 33 anthropometic and 7 photogrphic measurment data were acquired from every girl. These data are statistically analyzed with the following methods; Factor Analysis, Cluster Analysis, and Discriminant Analysis. Resulting from the factor analysis, it is shown that 79.95% of the whole variances can be explained with 8 factors. Through the cluster analysis, 3 types of upper body shapes can be categorized as follows: Type I has average horizontal size, big vertical size and lots of protruded chest ; Type III has big horizontal size, the mean vertical size, and big upper angle of the back ; Type II has small horizontal and vertical size and long surface length of the upper body. Through the discriminant analysis, the high discriminative items in discriminant function are follows: Upper chest circumference, arm length and waist front length of discriminant function I and waist depth, front length, back breadth, nipple to nipple breadth and upper chest circumference of discriminant function II have large coefficient values.

The Study of Head type Analysis for Milinary (모자 디자인을 위한 성인여성의 두부형태 분석)

  • 문남원
    • Journal of the Korean Society of Costume
    • /
    • v.37
    • /
    • pp.181-190
    • /
    • 1998
  • The purpose of this study was to provide basic information for women's women's head type for mil-inary. The subjects were 141 college women aged from 19∼23. Data were collected from the real anthropometric measurements and 4 index. Correlation coefficientss, factor analysis, cluster analysis and analysis of variance in SAS package. The results were as follows : 4 factors were extracted from 20 anthrometric measurements and in index data, which explain 60.0% of variance. The subjectss were classified into 4 clusters by 11 measurement and 4 index data. Each charicteristics of cluster by the measurements was flat, big, thick, small types in women's head. Each charicteristics of cluster by the index data was mostly flat in head thickness and wide, midium, narrow, very wide type in face.

  • PDF

Real-Time Indexing Performance Optimization of Search Platform Based on Big Data Cluster (빅데이터 클러스터 기반 검색 플랫폼의 실시간 인덱싱 성능 최적화)

  • Nayeon Keum;Dongchul Park
    • Journal of Platform Technology
    • /
    • v.11 no.6
    • /
    • pp.89-105
    • /
    • 2023
  • With the development of information technology, most of the information has been converted into digital information, leading to the Big Data era. The demand for search platform has increased to enhance accessibility and usability of information in the databases. Big data search software platforms consist of two main components: (1) an indexing component to generate and store data indices for a fast and efficient data search and (2) a searching component to look up the given data fast. As an amount of data has explosively increased, data indexing performance has become a key performance bottleneck of big data search platforms. Though many companies adopted big data search platforms, relatively little research has been made to improve indexing performance. This research study employs Elasticsearch platform, one of the most famous enterprise big data search platforms, and builds physical clusters of 3 nodes to investigate optimal indexing performance configurations. Our comprehensive experiments and studies demonstrate that the proposed optimal Elasticsearch configuration achieves high indexing performance by an average of 3.13 times.

  • PDF

On Implementing a Learning Environment for Big Data Processing using Raspberry Pi (라즈베리파이를 이용한 빅 데이터 처리 학습 환경 구축)

  • Hwang, Boram;Kim, Seonggyu
    • Journal of Digital Convergence
    • /
    • v.14 no.4
    • /
    • pp.251-258
    • /
    • 2016
  • Big data processing is a broad term for processing data sets so large or complex that traditional data processing applications are inadequate. Widespread use of smart devices results in a huge impact on the way we process data. Many organizations are contemplating how to incorporate or integrate those devices into their enterprise data systems. We have proposed a way to process big data by way of integrating Raspberry Pi into a Hadoop cluster as a computational grid. We have then shown the efficiency through several experiments and the ease of scaling of the proposed system.

Analysis of Development Priority Using Regional Assets (지역자산을 활용한 개발우선순위 분석)

  • Choi, Min-Ju;Lee, Sang-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.6
    • /
    • pp.359-367
    • /
    • 2019
  • As a strategy for strengthening local competitiveness, efficient use of regional assets is becoming more and more important. The key to regional identity and competitiveness is local assets. The purpose of this study is to derive the priority region for development by evaluating local assets. The analysis methods used in this study are Geographic Information System analysis, Big Data Trend analysis, and Analytic Hierarchy Process analysis. To assess the potential of local assets, the preference of assets, historical value, cluster of resources, wide-area transport accessibility, and population density were set as analysis indicators and itemized weights were applied using AHP to reflect the importance of each item. As a result of analyzing Yeongju city in Gyeongsangbuk-do, eight major points such as Buseoksa Temple, Sosu Seowon, Huibangsa Temple, Punggi Hot Spring Resort, Punggi Station, National Center for Forest Therapy, Yeongju east region and Museom Village were derived.

Effects of Hypervisor on Distributed Big Data Processing in Virtualizated Cluster Environment (가상화 클러스터 환경에서 빅 데이터 분산 처리 성능에 하이퍼바이저가 미치는 영향)

  • Chung, Haejin;Nah, Yunmook
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.2
    • /
    • pp.89-94
    • /
    • 2016
  • Recently, cluster computing environments have been in a process of change toward virtualized cluster environments. The change of the cluster environment has great impact on the performance of large volume distributed processing. Therefore, many domestic and international IT companies have invested heavily in research on cluster environments. In this paper, we show how the hypervisor affects the performance of distributed processing of a large volume of data. We present a performance comparison of MapReduce processing in two virtualized cluster environments, one built using the Xen hypervisor and the other built using the container-based Docker. Our results show that Docker is faster than Xen.