• Title/Summary/Keyword: MapReduce Framework

Search Result 100, Processing Time 0.026 seconds

An Efficient Implementation of Mobile Raspberry Pi Hadoop Clusters for Robust and Augmented Computing Performance

  • Srinivasan, Kathiravan;Chang, Chuan-Yu;Huang, Chao-Hsi;Chang, Min-Hao;Sharma, Anant;Ankur, Avinash
    • Journal of Information Processing Systems
    • /
    • v.14 no.4
    • /
    • pp.989-1009
    • /
    • 2018
  • Rapid advances in science and technology with exponential development of smart mobile devices, workstations, supercomputers, smart gadgets and network servers has been witnessed over the past few years. The sudden increase in the Internet population and manifold growth in internet speeds has occasioned the generation of an enormous amount of data, now termed 'big data'. Given this scenario, storage of data on local servers or a personal computer is an issue, which can be resolved by utilizing cloud computing. At present, there are several cloud computing service providers available to resolve the big data issues. This paper establishes a framework that builds Hadoop clusters on the new single-board computer (SBC) Mobile Raspberry Pi. Moreover, these clusters offer facilities for storage as well as computing. Besides the fact that the regular data centers require large amounts of energy for operation, they also need cooling equipment and occupy prime real estate. However, this energy consumption scenario and the physical space constraints can be solved by employing a Mobile Raspberry Pi with Hadoop clusters that provides a cost-effective, low-power, high-speed solution along with micro-data center support for big data. Hadoop provides the required modules for the distributed processing of big data by deploying map-reduce programming approaches. In this work, the performance of SBC clusters and a single computer were compared. It can be observed from the experimental data that the SBC clusters exemplify superior performance to a single computer, by around 20%. Furthermore, the cluster processing speed for large volumes of data can be enhanced by escalating the number of SBC nodes. Data storage is accomplished by using a Hadoop Distributed File System (HDFS), which offers more flexibility and greater scalability than a single computer system.

External Merge Sorting in Tajo with Variable Server Configuration (매개변수 환경설정에 따른 타조의 외부합병정렬 성능 연구)

  • Lee, Jongbaeg;Kang, Woon-hak;Lee, Sang-won
    • Journal of KIISE
    • /
    • v.43 no.7
    • /
    • pp.820-826
    • /
    • 2016
  • There is a growing requirement for big data processing which extracts valuable information from a large amount of data. The Hadoop system employs the MapReduce framework to process big data. However, MapReduce has limitations such as inflexible and slow data processing. To overcome these drawbacks, SQL query processing techniques known as SQL-on-Hadoop were developed. Apache Tajo, one of the SQL-on-Hadoop techniques, was developed by a Korean development group. External merge sort is one of the heavily used algorithms in Tajo for query processing. The performance of external merge sort in Tajo is influenced by two parameters, sort buffer size and fanout. In this paper, we analyzed the performance of external merge sort in Tajo with various sort buffer sizes and fanouts. In addition, we figured out that there are two major causes of differences in the performance of external merge sort: CPU cache misses which increase as the sort buffer size grows; and the number of merge passes determined by fanout.

Development of Application to Deal with Large Data Using Hadoop for 3D Printer (하둡을 이용한 3D 프린터용 대용량 데이터 처리 응용 개발)

  • Lee, Kang Eun;Kim, Sungsuk
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.1
    • /
    • pp.11-16
    • /
    • 2020
  • 3D printing is one of the emerging technologies and getting a lot of attention. To do 3D printing, 3D model is first generated, and then converted to G-code which is 3D printer's operations. Facet, which is a small triangle, represents a small surface of 3D model. Depending on the height or precision of the 3D model, the number of facets becomes very large and so the conversion time from 3D model to G-code takes longer. Apach Hadoop is a software framework to support distributed processing for large data set and its application range gets widening. In this paper, Hadoop is used to do the conversion works time-efficient way. 2-phase distributed algorithm is developed first. In the algorithm, all facets are sorted according to its lowest Z-value, divided into N parts, and converted on several nodes independently. The algorithm is implemented in four steps; preprocessing - Map - Shuffling - Reduce of Hadoop. Finally, to show the performance evaluation, Hadoop systems are set up and converts testing 3D model while changing the height or precision.

MissingFound: An Assistant System for Finding Missing Companions via Mobile Crowdsourcing

  • Liu, Weiqing;Li, Jing;Zhou, Zhiqiang;He, Jiling
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.10
    • /
    • pp.4766-4786
    • /
    • 2016
  • Looking for missing companions who are out of touch in public places might suffer a long and painful process. With the help of mobile crowdsourcing, the missing person's location may be reported in a short time. In this paper, we propose MissingFound, an assistant system that applies mobile crowdsourcing for finding missing companions. Discovering valuable users who have chances to see the missing person is the most important task of MissingFound but also a big challenge with the requirements of saving battery and protecting users' location privacy. A customized metric is designed to measure the probability of seeing, according to users' movement traces represented by WiFi RSSI fingerprints. Since WiFi RSSI fingerprints provide no knowledge of users' physical locations, the computation of probability is too complex for practical use. By parallelizing the original sequential algorithms under MapReduce framework, the selecting process can be accomplished within a few minutes for 10 thousand users with records of several days. Experimental evaluation with 23 volunteers shows that MissingFound can select out the potential witnesses in reality and achieves a high accuracy (76.75% on average). We believe that MissingFound can help not only find missing companions, but other public services (e.g., controlling communicable diseases).

Robust Speech Recognition using Noise Compensation Method Based on Eigen - Environment (Eigen - Environment 잡음 보상 방법을 이용한 강인한 음성인식)

  • Song Hwa Jeon;Kim Hyung Soon
    • MALSORI
    • /
    • no.52
    • /
    • pp.145-160
    • /
    • 2004
  • In this paper, a new noise compensation method based on the eigenvoice framework in feature space is proposed to reduce the mismatch between training and testing environments. The difference between clean and noisy environments is represented by the linear combination of K eigenvectors that represent the variation among environments. In the proposed method, the performance improvement of speech recognition systems is largely affected by how to construct the noisy models and the bias vector set. In this paper, two methods, the one based on MAP adaptation method and the other using stereo DB, are proposed to construct the noisy models. In experiments using Aurora 2 DB, we obtained 44.86% relative improvement with eigen-environment method in comparison with baseline system. Especially, in clean condition training mode, our proposed method yielded 66.74% relative improvement, which is better performance than several methods previously proposed in Aurora project.

  • PDF

Stereo Matching Algorithm Based on Line Constraint and Reliability Space (신뢰도 공간과 선형 제어를 통한 스테레오 정합 기법)

  • An, Xiao-Wei;Han, Young-Joon;Hahn, Hern-Soo
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2011.01a
    • /
    • pp.59-62
    • /
    • 2011
  • A new method is proposed for stereo vision where solution to disparity map is presented in terms of Line constraint and Reliability space -- the first constraint proposes a progressive framework for stereo matching which applies local area pixel-values from corresponding lines in the left and right image pairs. The second states that reliability space based on corresponding points records the disparity and then we are able to apply the median filter in order to reduce the noises which occur in the process. A coarse to fine result is presented after the median filtering, which improves the final result qualitatively. Experiment is evaluated by rectified stereo matching images pairs from Middlebury datasets and has proved that those two adopted strategies yield good matching quantitative results in terms of fast running speed.

  • PDF

k-NN Join Based on LSH in Big Data Environment

  • Ji, Jiaqi;Chung, Yeongjee
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.2
    • /
    • pp.99-105
    • /
    • 2018
  • k-Nearest neighbor join (k-NN Join) is a computationally intensive algorithm that is designed to find k-nearest neighbors from a dataset S for every object in another dataset R. Most related studies on k-NN Join are based on single-computer operations. As the data dimensions and data volume increase, running the k-NN Join algorithm on a single computer cannot generate results quickly. To solve this scalability problem, we introduce the locality-sensitive hashing (LSH) k-NN Join algorithm implemented in Spark, an approach for high-dimensional big data. LSH is used to map similar data onto the same bucket, which can reduce the data search scope. In order to achieve parallel implementation of the algorithm on multiple computers, the Spark framework is used to accelerate the computation of distances between objects in a cluster. Results show that our proposed approach is fast and accurate for high-dimensional and big data.

Large-scale Spatial Reasoning using MapReduce Framework (맵리듀스 프레임워크를 이용한 대용량 공간 추론 방식)

  • Nam, Sang-Ha;Kim, In-Cheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.769-772
    • /
    • 2014
  • Jeopardy 퀴즈쇼와 같은 DeepQA 환경에서 인간을 대신해 컴퓨터가 효과적으로 답하기 위해서는 인물, 지리, 사건, 역사 등을 포함하는 광범위한 지식베이스와 이를 토대로 한 빠른 시공간 추론 능력이 필요하다. 본 논문에서는 대표적인 병렬 분산 컴퓨팅 환경인 하둡/맵리듀스 프레임워크를 이용하여 방향 및 위상 관계를 추론하는 효율적인 대용량의 공간 추론 알고리즘을 제시한다. 본 알고리즘에서는 하둡/맵리듀스 프레임워크의 특성을 고려하여 병렬 분산처리의 효과를 높이기 위해, 지식 분할 문제를 맵 단계에서 해결하고, 이것을 토대로 리듀스 단계에서 효과적으로 새로운 공간 지식을 유도하도록 설계하였다. 또한, 본 알고리즘은 초기 공간 지식베이스로부터 새로운 지식을 유도할 수 있는 기능뿐만 아니라 초기 공간 지식베이스의 불일치성도 미연에 감지함으로써 불필요한 지식 유도 작업을 계속하지 않도록 설계하였다. 본 연구에서는 하둡/맵리듀스 프레임워크로 구현한 대용량 공간 추론기와 샘플공간 지식베이스를 이용하여 성능 분석 실험을 수행하였고, 이를 통해 본 논문에서 제시한 공간 추론 알고리즘과 공간 추론기의 높은 성능을 확인 할 수 있었다.

A Study on the Effects of Intermediate Data on the Performance of the MapReduce Framework (맵리듀스 프레임워크의 중간 데이터가 성능에 미치는 영향에 관한 연구)

  • Kim, Shin-gyu;Eom, Hyeonsang;Yeom, Heon Y.
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.130-133
    • /
    • 2012
  • 맵리듀스 프레임워크는 개발의 편의성, 높은 확장성, 결함 내성 기능을 제공하며 다양한 대용량 데이터 처리에 사용되고 있다. 또한, 최근의 데이터의 폭발적 증가는 높은 확장성을 제공하는 맵리듀스 프레임워크의 도입의 필요성을 더욱 증가시키고 있다. 이 경우 하나의 단일 클러스터에서 처리할 수 있는 계산 용량을 넘어설 수 있으며, 이를 위하여 클라우드 컴퓨팅 서비스 등에서 계산자원을 빌려오게 된다. 하지만 현재의 맵리듀스 프레임워크는 단일 클러스터 환경을 가정하고 설계되었기에 여러 개의 클러스터로 이루어진 환경에서 수행시킬 경우 전체 계산자원의 이용률이 떨어져서 투입된 자원에 비해 전체적인 성능이 낮아지는 경우가 발생하게 된다. 본 연구에서는 이의 원인이 맵과 리듀스 단계 사이에 존재하는 중간결과의 전송에 있음을 밝히고, 이의 전체 맵리듀스 프레임웍의 성능에 미치는 영향에 대하여 분석해보았다.

Recommendation of Best Empirical Route Based on Classification of Large Trajectory Data (대용량 경로데이터 분류에 기반한 경험적 최선 경로 추천)

  • Lee, Kye Hyung;Jo, Yung Hoon;Lee, Tea Ho;Park, Heemin
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.2
    • /
    • pp.101-108
    • /
    • 2015
  • This paper presents the implementation of a system that recommends empirical best routes based on classification of large trajectory data. As many location-based services are used, we expect the amount of location and trajectory data to become big data. Then, we believe we can extract the best empirical routes from the large trajectory repositories. Large trajectory data is clustered into similar route groups using Hadoop MapReduce framework. Clustered route groups are stored and managed by a DBMS, and thus it supports rapid response to the end-users' request. We aim to find the best routes based on collected real data, not the ideal shortest path on maps. We have implemented 1) an Android application that collects trajectories from users, 2) Apache Hadoop MapReduce program that can cluster large trajectory data, 3) a service application to query start-destination from a web server and to display the recommended routes on mobile phones. We validated our approach using real data we collected for five days and have compared the results with commercial navigation systems. Experimental results show that the empirical best route is better than routes recommended by commercial navigation systems.