• Title/Summary/Keyword: Big Data Cluster

Search Result 209, Processing Time 0.025 seconds

Incidence of Online Public Opinion on Guangzhou Simultaneous Renting and Purchasing Policy - A data mining application

  • Wang, Yancheng;Li, Haixian
    • Asian Journal for Public Opinion Research
    • /
    • v.5 no.4
    • /
    • pp.266-284
    • /
    • 2018
  • This paper adopts the big data research method, and draws 491 data from the Tianya Forum about the Simultaneous Renting and Purchasing policy of Guangzhou. The qualitative analysis software Nvivo11 is used to cluster the main questions about the Simultaneous Renting and Purchasing policy in the forum. The 36 high-frequency word frequencies are obtained through text clustering. Through rooted theory analysis, the main driving factors for summarizing people's doubts are 9 main categories, 3 core categories, and the model of driving factors for online forums is established. The study finds that resource factors are the most key factor, economic factors are the important drivers, and policy guiding factors are sub-important drivers.

Manchester coding of compressed binary clusters for reducing IoT healthcare device's digital data transfer time (IoT기반 헬스케어 의료기기의 디지털 데이터 전송시간 감소를 위한 압축 바이너리 클러스터의 맨체스터 코딩 전송)

  • Kim, Jung-Hoon
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.8 no.6
    • /
    • pp.460-469
    • /
    • 2015
  • This study's aim is for reducing big data transfer time of IoT healthcare devices by modulating digital bits into Manchester code including zero-voltage idle as information for secondary compressed binary cluster's compartment after two step compression of compressing binary data into primary and secondary binary compressed clusters for each binary clusters having compression benefit of 1 bit or 2 bits. Also this study proposed that as department information of compressed binary clusters, inserting idle signal into Manchester code will have benefit of reducing transfer time in case of compressing binary cluster into secondary compressed binary cluster by 2 bits, because in spite of cost of 1 clock idle, another 1 bit benefit can play a role of reducing 1 clock transfer time. Idle signal is also never consecutive because the signal is for compartment information between two adjacent secondary compressed binary cluster. Voltage transition on basic rule of Manchester code is remaining while inserting idle signal, so DC balance can be guaranteed. This study's simulation result said that even compressed binary data by another compression algorithms could be transferred faster by as much as about 12.6 percents if using this method.

Cluster-head-selection-algorithm in Wireless Sensor Networks by Considering the Distance (무선 센서네트워크에서 거리를 고려한 클러스터 헤드 선택 알고리즘)

  • Kim, Byung-Joon;Yoo, Sang-Shin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.4
    • /
    • pp.127-132
    • /
    • 2008
  • Wireless sensor network technologies applicable to various industrial fields are rapidly growing. Because it is difficult to change a battery for the once distributed wireless sensor network, energy efficient design is very critical. In order to achieve this purpose in network design, a number of studies have been examining the energy efficient routing protocol. The sensor network consumes energy in proportion to the distance of data transmission and the data to send. Cluster-based routing Protocols such as LEACH-C achieve energy efficiency through minimizing the distance of data transmission. In LEACH-C, however, the total distance between the nodes consisting the clusters are considered important in constructing clustering. This paper examines the cluster-head-selection-algorithm that reflect the distance between the base station and the cluster-head having a big influence on energy consumption. The Proposed method in this paper brought the result that the performance improved average $4{\sim}7%$ when LEACH-C and the base station are located beyond a certain distance. This result showed that the distance between cluster-head and the base station had a substantial influence on lifetime performance in the cluster-based routing protocol.

  • PDF

Integrated Verification of Hadoop Cluster Prototypes and Analysis Software for SMB (중소기업을 위한 하둡 클러스터의 프로토타입과 분석 소프트웨어의 통합된 검증)

  • Cha, Byung-Rae;Kim, Nam-Ho;Lee, Seong-Ho;Ji, Yoo-Kang;Kim, Jong-Won
    • Journal of Advanced Navigation Technology
    • /
    • v.18 no.2
    • /
    • pp.191-199
    • /
    • 2014
  • Recently, researches to facilitate utilization by small and medium business (SMB) of cloud computing and big data paradigm, which is the booming adoption of IT area, has been on the increase. As one of these efforts, in this paper, we design and implement the prototype to tentatively build up Hadoop cluster under private cloud infrastructure environments. Prototype implementation are made on each hardware type such as single board, PC, and server and performance is measured. Also, we present the integrated verification results for the data analysis performance of the analysis software system running on top of realized prototypes by employing ASA (American Standard Association) Dataset. For this, we implement the analysis software system using several open sources such as R, Python, D3, and java and perform a test.

Utilization of Social Media Analysis using Big Data (빅 데이터를 이용한 소셜 미디어 분석 기법의 활용)

  • Lee, Byoung-Yup;Lim, Jong-Tae;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.2
    • /
    • pp.211-219
    • /
    • 2013
  • The analysis method using Big Data has evolved based on the Big data Management Technology. There are quite a few researching institutions anticipating new era in data analysis using Big Data and IT vendors has been sided with them launching standardized technologies for Big Data management technologies. Big Data is also affected by improvements of IT gadgets IT environment. Foreran by social media, analyzing method of unstructured data is being developed focusing on diversity of analyzing method, anticipation and optimization. In the past, data analyzing methods were confined to the optimization of structured data through data mining, OLAP, statics analysis. This data analysis was solely used for decision making for Chief Officers. In the new era of data analysis, however, are evolutions in various aspects of technologies; the diversity in analyzing method using new paradigm and the new data analysis experts and so forth. In addition, new patterns of data analysis will be found with the development of high performance computing environment and Big Data management techniques. Accordingly, this paper is dedicated to define the possible analyzing method of social media using Big Data. this paper is proposed practical use analysis for social media analysis through data mining analysis methodology.

LDBAS: Location-aware Data Block Allocation Strategy for HDFS-based Applications in the Cloud

  • Xu, Hua;Liu, Weiqing;Shu, Guansheng;Li, Jing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.1
    • /
    • pp.204-226
    • /
    • 2018
  • Big data processing applications have been migrated into cloud gradually, due to the advantages of cloud computing. Hadoop Distributed File System (HDFS) is one of the fundamental support systems for big data processing on MapReduce-like frameworks, such as Hadoop and Spark. Since HDFS is not aware of the co-location of virtual machines in the cloud, the default scheme of block allocation in HDFS does not fit well in the cloud environments behaving in two aspects: data reliability loss and performance degradation. In this paper, we present a novel location-aware data block allocation strategy (LDBAS). LDBAS jointly optimizes data reliability and performance for upper-layer applications by allocating data blocks according to the locations and different processing capacities of virtual nodes in the cloud. We apply LDBAS to two stages of data allocation of HDFS in the cloud (the initial data allocation and data recovery), and design the corresponding algorithms. Finally, we implement LDBAS into an actual Hadoop cluster and evaluate the performance with the benchmark suite BigDataBench. The experimental results show that LDBAS can guarantee the designed data reliability while reducing the job execution time of the I/O-intensive applications in Hadoop by 8.9% on average and up to 11.2% compared with the original Hadoop in the cloud.

A study on unstructured text mining algorithm through R programming based on data dictionary (Data Dictionary 기반의 R Programming을 통한 비정형 Text Mining Algorithm 연구)

  • Lee, Jong Hwa;Lee, Hyun-Kyu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.20 no.2
    • /
    • pp.113-124
    • /
    • 2015
  • Unlike structured data which are gathered and saved in a predefined structure, unstructured text data which are mostly written in natural language have larger applications recently due to the emergence of web 2.0. Text mining is one of the most important big data analysis techniques that extracts meaningful information in the text because it has not only increased in the amount of text data but also human being's emotion is expressed directly. In this study, we used R program, an open source software for statistical analysis, and studied algorithm implementation to conduct analyses (such as Frequency Analysis, Cluster Analysis, Word Cloud, Social Network Analysis). Especially, to focus on our research scope, we used keyword extract method based on a Data Dictionary. By applying in real cases, we could find that R is very useful as a statistical analysis software working on variety of OS and with other languages interface.

A Study on the Classification of Chinese Major Ports based on Competitiveness Level

  • Lee, Hong-Girl;Yeo, Ki-Tae;Ryu, Hyung-Geun
    • Journal of Navigation and Port Research
    • /
    • v.27 no.3
    • /
    • pp.315-320
    • /
    • 2003
  • Since the beginning of open-door policy, China has been making rapid annual growth with an average 10% economic development. And due to this rapid growth, cargo volumes via ports have been also rapidly increased, and accordingly, current China government has intensively invested in port development. Further, this development project is significantly big scale, compared with those project which Korea and Japan have. Thus, China is beginning to threaten Korean ports, especially Busan port which try to be a hub port in Northeast Asia. For this reason, it has been very important issue for Korea and Busan port to investigate or analyze Chinese ports based on empirical data. Especially, although various studies related to Shanghai and Hong Kong have been conducted, the competitiveness of overall Chinese major ports has been little studied. In this paper, we analyzed competitiveness level of eight Chinese ports with capabilities as container terminal, based on reliable sources. From data analysis, eight Chinese ports were classified into four groups according to competitiveness level. Rankings among four clusters based on competitiveness level are cluster(Hone Kong), cluster C(Shanghai), cluster A(Qingdao, Tianjin, and Yantian) and cluster D(Dalian, Shekou, and Xiamen).

Outlier detection of main engine data of a ship using ensemble method (앙상블 기법을 이용한 선박 메인엔진 빅데이터의 이상치 탐지)

  • KIM, Dong-Hyun;LEE, Ji-Hwan;LEE, Sang-Bong;JUNG, Bong-Kyu
    • Journal of the Korean Society of Fisheries and Ocean Technology
    • /
    • v.56 no.4
    • /
    • pp.384-394
    • /
    • 2020
  • This paper proposes an outlier detection model based on machine learning that can diagnose the presence or absence of major engine parts through unsupervised learning analysis of main engine big data of a ship. Engine big data of the ship was collected for more than seven months, and expert knowledge and correlation analysis were performed to select features that are closely related to the operation of the main engine. For unsupervised learning analysis, ensemble model wherein many predictive models are strategically combined to increase the model performance, is used for anomaly detection. As a result, the proposed model successfully detected the anomalous engine status from the normal status. To validate our approach, clustering analysis was conducted to find out the different patterns of anomalies the anomalous point. By examining distribution of each cluster, we could successfully find the patterns of anomalies.

Survival Analysis of Battalion-Level Commanders(leaders) Using Big Data as Results of Brigade-Level KCTC Training - Focused on Infantry Battalion Defensive Operations - (여단급 KCTC 훈련 결과 빅데이터를 활용한 대대급 이하 지휘관(자)의 생존분석 - 보병대대 방어작전을 중심으로 -)

  • Jinseong Yun;Hoseok Moon
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.27 no.1
    • /
    • pp.94-106
    • /
    • 2024
  • In this study, we conducted a survival analysis on battalion-level commanders(leaders), focusing on infantry battalion defensive operations using the big data of brigade-level KCTC(Korea Combat Training Center) training results. Unlike previous studies, we utilized the brigade-level KCTC training results data for the first time to conduct a survival analysis, and the research subjects were battalion-level commanders(leaders), which can affect the battle. At this time, the battle results were defined, and through cluster analysis, infantry battalions were divided into excellent, average, and insufficient units, and the difference in the survival rate of the commanders was analyzed through the Kaplan-Meier survival analysis. This provided an opportunity to objectively compare the differences between excellent and insufficient units. Subsequently, factors affecting the survival of commanders were derived using the Cox proportional hazard model, and it was possible to confirm the influencing factors from various angles by also using the survival tree model. Significance and limitations confirmed in the research process were presented as policy suggestions and future research directions.