• Title/Summary/Keyword: 하둡 환경

Search Result 95, Processing Time 0.029 seconds

An Efficient Data Replacement Algorithm for Performance Optimization of MapReduce in Non-dedicated Distributed Computing Environments (비-전용 분산 컴퓨팅 환경에서 맵-리듀스 처리 성능 최적화를 위한 효율적인 데이터 재배치 알고리즘)

  • Ryu, Eunkyung;Son, Ingook;Park, Junho;Bok, Kyoungsoo;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.9
    • /
    • pp.20-27
    • /
    • 2013
  • In recently years, with the growth of social media and the development of mobile devices, the data have been significantly increased. MapReduce is an emerging programming model that processes large amount of data. However, since MapReduce evenly places the data in the dedicated distributed computing environment, it is not suitable to the non-dedicated distributed computing environment. The data replacement algorithms were proposed for performance optimization of MapReduce in the non-dedicated distributed computing environments. However, they spend much time for date replacement and cause the network load for unnecessary data transmission. In this paper, we propose an efficient data replacement algorithm for the performance optimization of MapReduce in the non-dedicated distributed computing environments. The proposed scheme computes the ratio of data blocks in the nodes based on the node availability model and reduces the network load by transmitting the data blocks considering the data placement. Our experimental results show that the proposed scheme outperforms the existing scheme.

An Efficient Data Transmission to Cloud Storage using USB Hijacking (USB 하이재킹을 이용한 클라우드 스토리지로의 효율적인 데이터 전송 기법)

  • Eom, Hyun-Chul;No, Jae-Chun
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.6
    • /
    • pp.47-55
    • /
    • 2011
  • The performance of data transmission from mobile devices to cloud storages is limited by the amount of data being transferred, communication speed and battery consumption of mobile devices. Especially, when the large-scale data communication takes place using mobile devices, such as smart phones, the performance turbulence and power consumption become an obstacle to establish the reliable communication environment. In this paper, we present an efficient data transmission method using USB Hijacking. In our approach, the synchronization to transfer a large amount of data between mobile devices and user PC is executed by using USB Hijacking. Also, there is no need to concern about data capacity and battery consumption in the data communication. We presented several experimental results to verify the effectiveness and suitability of our approach.

Decombined Distributed Parallel VQ Codebook Generation Based on MapReduce (맵리듀스를 사용한 디컴바인드 분산 VQ 코드북 생성 방법)

  • Lee, Hyunjin
    • Journal of Digital Contents Society
    • /
    • v.15 no.3
    • /
    • pp.365-371
    • /
    • 2014
  • In the era of big data, algorithms for the existing IT environment cannot accept on a distributed architecture such as hadoop. Thus, new distributed algorithms which apply a distributed framework such as MapReduce are needed. Lloyd's algorithm commonly used for vector quantization is developed using MapReduce recently. In this paper, we proposed a decombined distributed VQ codebook generation algorithm based on a distributed VQ codebook generation algorithm using MapReduce to get a result more fast. The result of applying the proposed algorithm to big data showed higher performance than the conventional method.

The Distributed Encryption Processing System for Large Capacity Personal Information based on MapReduce (맵리듀스 기반 대용량 개인정보 분산 암호화 처리 시스템)

  • Kim, Hyun-Wook;Park, Sung-Eun;Euh, Seong-Yul
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.3
    • /
    • pp.576-585
    • /
    • 2014
  • Collecting and utilizing have a huge amount of personal data have caused severe security issues such as leakage of personal information. Several encryption algorithms for collected personal information have been widely adopted to prevent such problems. In this paper, a novel algorithm based on MapReduce is proposed for encrypting such private information. Furthermore, test environment has been built for the performance verification of the distributed encryption processing method. As the result of the test, average time efficiency has improved to 15.3% compare to encryption processing of token server and 3.13% compare to parallel processing.

Design and Implementation of Hadoop-based Big-data processing Platform for IoT Environment (사물인터넷 환경을 위한 하둡 기반 빅데이터 처리 플랫폼 설계 및 구현)

  • Heo, Seok-Yeol;Lee, Ho-Young;Lee, Wan-Jik
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.2
    • /
    • pp.194-202
    • /
    • 2019
  • In the information society represented by the Fourth Industrial Revolution, various types of data and information that are difficult to see are produced, processed, and processed and circulated to enhance the value of existing goods. The IoT(Internet of Things) paradigm will change the appearance of individual life, industry, disaster, safety and public service fields. In order to implement the IoT paradigm, several elements of technology are required. It is necessary that these various elements are efficiently connected to constitute one system as a whole. It is also necessary to collect, provide, transmit, store and analyze IoT data for implementation of IoT platform. We designed and implemented a big data processing IoT platform for IoT service implementation. Proposed platform system is consist of IoT sensing/control device, IoT message protocol, unstructured data server and big data analysis components. For platform testing, fixed IoT devices were implemented as solar power generation modules and mobile IoT devices as modules for table tennis stroke data measurement. The transmission part uses the HTTP and the CoAP, which are based on the Internet. The data server is composed of Hadoop and the big data is analyzed using R. Through the emprical test using fixed and mobile IoT devices we confirmed that proposed IoT platform system normally process and operate big data.

Anomalous Pattern Analysis of Large-Scale Logs with Spark Cluster Environment

  • Sion Min;Youyang Kim;Byungchul Tak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.3
    • /
    • pp.127-136
    • /
    • 2024
  • This study explores the correlation between system anomalies and large-scale logs within the Spark cluster environment. While research on anomaly detection using logs is growing, there remains a limitation in adequately leveraging logs from various components of the cluster and considering the relationship between anomalies and the system. Therefore, this paper analyzes the distribution of normal and abnormal logs and explores the potential for anomaly detection based on the occurrence of log templates. By employing Hadoop and Spark, normal and abnormal log data are generated, and through t-SNE and K-means clustering, templates of abnormal logs in anomalous situations are identified to comprehend anomalies. Ultimately, unique log templates occurring only during abnormal situations are identified, thereby presenting the potential for anomaly detection.

Parallel k-Modes Algorithm for Spark Framework (스파크 프레임워크를 위한 병렬적 k-Modes 알고리즘)

  • Chung, Jaehwa
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.487-492
    • /
    • 2017
  • Clustering is a technique which is used to measure similarities between data in big data analysis and data mining field. Among various clustering methods, k-Modes algorithm is representatively used for categorical data. To increase the performance of iterative-centric tasks such as k-Modes, a distributed and concurrent framework Spark has been received great attention recently because it overcomes the limitation of Hadoop. Spark provides an environment that can process large amount of data in main memory using the concept of abstract objects called RDD. Spark provides Mllib, a dedicated library for machine learning, but Mllib only includes k-means that can process only continuous data, so there is a limitation that categorical data processing is impossible. In this paper, we design RDD for k-Modes algorithm for categorical data clustering in spark environment and implement an algorithm that can operate effectively. Experiments show that the proposed algorithm increases linearly in the spark environment.

Service Delivery Time Improvement using HDFS in Desktop Virtualization (데스크탑 가상화에서 HDFS를 이용한 서비스 제공시간 개선 연구)

  • Lee, Wan-Hee;Lee, Bong-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.5
    • /
    • pp.913-921
    • /
    • 2012
  • The current PC-based desktop environment is being converted into server-based virtual desktop environment due to security, mobility, and low upgrade cost. In this paper, a desktop virtualization system is implemented using an open source-based cloud computing platform and hypervisor. The implemented system is applied to the virtualziation of computer in university. In order to reduce the image transfer time, we propose a solution using HDFS. In addition, an image management structure needed for desktop virtualization is designed and implemented, and applied to a real computer lab which accommodates 30 PCs. The performance of the proposed system is evaluated in various aspects including implementation cost, power saving rate, reduction rate of license cost, and management cost. The experimental results showed that the proposed system considerably reduced the image transfer time for desktop service.

Scalable RDFS Reasoning using Logic Programming Approach in a Single Machine (단일머신 환경에서의 논리적 프로그래밍 방식 기반 대용량 RDFS 추론 기법)

  • Jagvaral, Batselem;Kim, Jemin;Lee, Wan-Gon;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.41 no.10
    • /
    • pp.762-773
    • /
    • 2014
  • As the web of data is increasingly producing large RDFS datasets, it becomes essential in building scalable reasoning engines over large triples. There have been many researches used expensive distributed framework, such as Hadoop, to reason over large RDFS triples. However, in many cases we are required to handle millions of triples. In such cases, it is not necessary to deploy expensive distributed systems because logic program based reasoners in a single machine can produce similar reasoning performances with that of distributed reasoner using Hadoop. In this paper, we propose a scalable RDFS reasoner using logical programming methods in a single machine and compare our empirical results with that of distributed systems. We show that our logic programming based reasoner using a single machine performs as similar as expensive distributed reasoner does up to 200 million RDFS triples. In addition, we designed a meta data structure by decomposing the ontology triples into separate sectors. Instead of loading all the triples into a single model, we selected an appropriate subset of the triples for each ontology reasoning rule. Unification makes it easy to handle conjunctive queries for RDFS schema reasoning, therefore, we have designed and implemented RDFS axioms using logic programming unifications and efficient conjunctive query handling mechanisms. The throughputs of our approach reached to 166K Triples/sec over LUBM1500 with 200 million triples. It is comparable to that of WebPIE, distributed reasoner using Hadoop and Map Reduce, which performs 185K Triples/sec. We show that it is unnecessary to use the distributed system up to 200 million triples and the performance of logic programming based reasoner in a single machine becomes comparable with that of expensive distributed reasoner which employs Hadoop framework.

Development of Information Technology Infrastructures through Construction of Big Data Platform for Road Driving Environment Analysis (도로 주행환경 분석을 위한 빅데이터 플랫폼 구축 정보기술 인프라 개발)

  • Jung, In-taek;Chong, Kyu-soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.3
    • /
    • pp.669-678
    • /
    • 2018
  • This study developed information technology infrastructures for building a driving environment analysis platform using various big data, such as vehicle sensing data, public data, etc. First, a small platform server with a parallel structure for big data distribution processing was developed with H/W technology. Next, programs for big data collection/storage, processing/analysis, and information visualization were developed with S/W technology. The collection S/W was developed as a collection interface using Kafka, Flume, and Sqoop. The storage S/W was developed to be divided into a Hadoop distributed file system and Cassandra DB according to the utilization of data. Processing S/W was developed for spatial unit matching and time interval interpolation/aggregation of the collected data by applying the grid index method. An analysis S/W was developed as an analytical tool based on the Zeppelin notebook for the application and evaluation of a development algorithm. Finally, Information Visualization S/W was developed as a Web GIS engine program for providing various driving environment information and visualization. As a result of the performance evaluation, the number of executors, the optimal memory capacity, and number of cores for the development server were derived, and the computation performance was superior to that of the other cloud computing.