• Title/Summary/Keyword: MapReduce

Search Result 847, Processing Time 0.03 seconds

A Security Protection Framework for Cloud Computing

  • Zhu, Wenzheng;Lee, Changhoon
    • Journal of Information Processing Systems
    • /
    • v.12 no.3
    • /
    • pp.538-547
    • /
    • 2016
  • Cloud computing is a new style of computing in which dynamically scalable and reconfigurable resources are provided as a service over the internet. The MapReduce framework is currently the most dominant programming model in cloud computing. It is necessary to protect the integrity of MapReduce data processing services. Malicious workers, who can be divided into collusive workers and non-collusive workers, try to generate bad results in order to attack the cloud computing. So, figuring out how to efficiently detect the malicious workers has been very important, as existing solutions are not effective enough in defeating malicious behavior. In this paper, we propose a security protection framework to detect the malicious workers and ensure computation integrity in the map phase of MapReduce. Our simulation results show that our proposed security protection framework can efficiently detect both collusive and non-collusive workers and guarantee high computation accuracy.

Sort-Based Distributed Parallel Data Cube Computation Algorithm using MapReduce (맵리듀스를 이용한 정렬 기반의 데이터 큐브 분산 병렬 계산 알고리즘)

  • Lee, Suan;Kim, Jinho
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.9
    • /
    • pp.196-204
    • /
    • 2012
  • Recently, many applications perform OLAP(On-Line Analytical Processing) over a very large volume of data. Multidimensional data cube is regarded as a core tool in OLAP analysis. This paper focuses on the method how to efficiently compute data cubes in parallel by using a popular parallel processing tool, MapReduce. We investigate efficient ways to implement PipeSort algorithm, a well-known data cube computation method, on the MapReduce framework. The PipeSort executes several (descendant) cuboids at the same time as a pipeline by scanning one (ancestor) cuboid once, which have the same sorting order. This paper proposed four ways implementing the pipeline of the PipeSort on the MapReduce framework which runs across 20 servers. Our experiments show that PipeMap-NoReduce algorithm outperforms the rest algorithms for high-dimensional data. On the contrary, Post-Pipe stands out above the others for low-dimensional data.

UX Analysis for Mobile Devices Using MapReduce on Distributed Data Processing Platform (MapReduce 분산 데이터처리 플랫폼에 기반한 모바일 디바이스 UX 분석)

  • Kim, Sungsook;Kim, Seonggyu
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.9
    • /
    • pp.589-594
    • /
    • 2013
  • As the concept of web characteristics represented by openness and mind sharing grows more and more popular, device log data generated by both users and developers have become increasingly complicated. For such reasons, a log data processing mechanism that automatically produces meaningful data set from large amount of log records have become necessary for mobile device UX(User eXperience) analysis. In this paper, we define the attributes of to-be-analyzed log data that reflect the characteristics of a mobile device and collect real log data from mobile device users. Along with the MapReduce programming paradigm in Hadoop platform, we have performed a mobile device User eXperience analysis in a distributed processing environment using the collected real log data. We have then demonstrated the effectiveness of the proposed analysis mechanism by applying the various combinations of Map and Reduce steps to produce a simple data schema from the large amount of complex log records.

User-based Collaborative Filtering Recommender Technique using MapReduce (맵리듀스를 이용한 사용자 기반 협업 필터링 추천 기법)

  • Yun, So-young;Youn, Sung-dae
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.331-333
    • /
    • 2015
  • Data is increasing explosively with the spread of networks and mobile devices and there are problems in effectively processing the rapidly increasing data using existing recommendation techniques. Therefore, researches are being conducted on how to solve the scalability problem of the collaborative filtering technique. In this paper applies MapReduce, which is a distributed parallel process framework, to the collaborative filtering technique to reduce the scalability problem and heighten accuracy. The proposed technique applies MapReduce and the index technique to a user-based collaborative filtering technique and as a method which improves neighbor numbers which are used in similarity calculations and neighbor suitability, scalability and accuracy improvement effects can be expected.

  • PDF

Update Frequency Reducing Method of Spatio-Temporal Big Data based on MapReduce (MapReduce와 시공간 데이터를 이용한 빅 데이터 크기의 이동객체 갱신 횟수 감소 기법)

  • Choi, Youn-Gwon;Baek, Sung-Ha;Kim, Gyung-Bae;Bae, Hae-Young
    • Spatial Information Research
    • /
    • v.20 no.2
    • /
    • pp.137-153
    • /
    • 2012
  • Until now, many indexing methods that can reduce update cost have been proposed for managing massive moving objects. Because indexing methods for moving objects have to be updated periodically for managing moving objects that change their location data frequently. However these kinds indexing methods occur big load that exceed system capacity when the number of moving objects increase dramatically. In this paper, we propose the update frequency reducing method to combine MapReduce and existing indices. We use the update request grouping method for each moving object by using MapReduce. We decide to update by comparing the latest data and the oldest data in grouping data. We reduce update frequency by updating the latest data only. When update is delayed, for the data should not be lost and updated periodically, we store the data in a certain period of time in the hash table that keep previous update data. By the performance evaluation, we can prove that the proposed method reduces the update frequency by comparison with methods that are not applied the proposed method.

Sequential Pattern Mining with Optimization Calling MapReduce Function on MapReduce Framework (맵리듀스 프레임웍 상에서 맵리듀스 함수 호출을 최적화하는 순차 패턴 마이닝 기법)

  • Kim, Jin-Hyun;Shim, Kyu-Seok
    • The KIPS Transactions:PartD
    • /
    • v.18D no.2
    • /
    • pp.81-88
    • /
    • 2011
  • Sequential pattern mining that determines frequent patterns appearing in a given set of sequences is an important data mining problem with broad applications. For example, sequential pattern mining can find the web access patterns, customer's purchase patterns and DNA sequences related with specific disease. In this paper, we develop the sequential pattern mining algorithms using MapReduce framework. Our algorithms distribute input data to several machines and find frequent sequential patterns in parallel. With synthetic data sets, we did a comprehensive performance study with varying various parameters. Our experimental results show that linear speed up can be achieved through our algorithms with increasing the number of used machines.

Decombined Distributed Parallel VQ Codebook Generation Based on MapReduce (맵리듀스를 사용한 디컴바인드 분산 VQ 코드북 생성 방법)

  • Lee, Hyunjin
    • Journal of Digital Contents Society
    • /
    • v.15 no.3
    • /
    • pp.365-371
    • /
    • 2014
  • In the era of big data, algorithms for the existing IT environment cannot accept on a distributed architecture such as hadoop. Thus, new distributed algorithms which apply a distributed framework such as MapReduce are needed. Lloyd's algorithm commonly used for vector quantization is developed using MapReduce recently. In this paper, we proposed a decombined distributed VQ codebook generation algorithm based on a distributed VQ codebook generation algorithm using MapReduce to get a result more fast. The result of applying the proposed algorithm to big data showed higher performance than the conventional method.

Task failure resilience technique for improving the performance of MapReduce in Hadoop

  • Kavitha, C;Anita, X
    • ETRI Journal
    • /
    • v.42 no.5
    • /
    • pp.748-760
    • /
    • 2020
  • MapReduce is a framework that can process huge datasets in parallel and distributed computing environments. However, a single machine failure during the runtime of MapReduce tasks can increase completion time by 50%. MapReduce handles task failures by restarting the failed task and re-computing all input data from scratch, regardless of how much data had already been processed. To solve this issue, we need the computed key-value pairs to persist in a storage system to avoid re-computing them during the restarting process. In this paper, the task failure resilience (TFR) technique is proposed, which allows the execution of a failed task to continue from the point it was interrupted without having to redo all the work. Amazon ElastiCache for Redis is used as a non-volatile cache for the key-value pairs. We measured the performance of TFR by running different Hadoop benchmarking suites. TFR was implemented using the Hadoop software framework, and the experimental results showed significant performance improvements when compared with the performance of the default Hadoop implementation.

Naive Bayes Learning Algorithm based on Map-Reduce Programming Model (Map-Reduce 프로그래밍 모델 기반의 나이브 베이스 학습 알고리즘)

  • Kang, Dae-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.10a
    • /
    • pp.208-209
    • /
    • 2011
  • In this paper, we introduce a Naive Bayes learning algorithm for learning and reasoning in Map-Reduce model based environment. For this purpose, we use Apache Mahout to execute Distributed Naive Bayes on University of California, Irvine (UCI) benchmark data sets. From the experimental results, we see that Apache Mahout' s Distributed Naive Bayes algorithm is comparable to WEKA' s Naive Bayes algorithm in terms of performance. These results indicates that in the future Big Data environment, Map-Reduce model based systems such as Apache Mahout can be promising for machine learning usage.

  • PDF

Efficient Processing of Multiple Group-by Queries in MapReduce for Big Data Analysis (맵리듀스에서 빅데이터 분석을 위한 다중 Group-by 질의의 효율적인 처리 기법)

  • Park, Eunju;Park, Sojeong;Oh, Sohyun;Choi, Hyejin;Lee, Ki Yong;Shim, Junho
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.5
    • /
    • pp.387-392
    • /
    • 2015
  • MapReduce is a framework used to process large data sets in parallel on a large cluster. A group-by query is a query that partitions the input data into groups based on the values of the specified attributes, and then evaluates the value of the specified aggregate function for each group. In this paper, we propose an efficient method for processing multiple group-by queries using MapReduce. Instead of computing each group-by query independently, the proposed method computes multiple group-by queries in stages with one or more MapReduce jobs in order to reduce the total execution cost. We compared the performance of this method with the performance of a less sophisticated method that computes each group-by query independently. This comparison showed that the proposed method offers better performance in terms of execution time.