• Title/Summary/Keyword: MapReduce framework

Search Result 100, Processing Time 0.032 seconds

High-Performance Korean Morphological Analyzer Using the MapReduce Framework on the GPU

  • Cho, Shi-Won;Lee, Dong-Wook
    • Journal of Electrical Engineering and Technology
    • /
    • v.6 no.4
    • /
    • pp.573-579
    • /
    • 2011
  • To meet the scalability and performance requirements of data analyses, which often involve voluminous data, efficient parallel or concurrent algorithms and frameworks are essential. We present a high-performance Korean morphological analyzer which employs the MapReduce framework on the graphics processing unit (GPU). MapReduce is a programming framework introduced by Google to aid the development of web search applications on a large number of central processing units (CPUs). GPUs are designed as a special-purpose co-processor. Their programming interfaces are typically formulated for graphics applications. Compared to CPUs, GPUs have greater computation power and memory bandwidth; however, GPUs are more difficult to program because of the design of their architectures. The performance of the Korean morphological analyzer using the MapReduce framework on the GPU is evaluated in comparison with the CPU-based model. The proposed Korean Morphological analyzer shows promising scalable performance on distributed computing with the GPU.

A Security Protection Framework for Cloud Computing

  • Zhu, Wenzheng;Lee, Changhoon
    • Journal of Information Processing Systems
    • /
    • v.12 no.3
    • /
    • pp.538-547
    • /
    • 2016
  • Cloud computing is a new style of computing in which dynamically scalable and reconfigurable resources are provided as a service over the internet. The MapReduce framework is currently the most dominant programming model in cloud computing. It is necessary to protect the integrity of MapReduce data processing services. Malicious workers, who can be divided into collusive workers and non-collusive workers, try to generate bad results in order to attack the cloud computing. So, figuring out how to efficiently detect the malicious workers has been very important, as existing solutions are not effective enough in defeating malicious behavior. In this paper, we propose a security protection framework to detect the malicious workers and ensure computation integrity in the map phase of MapReduce. Our simulation results show that our proposed security protection framework can efficiently detect both collusive and non-collusive workers and guarantee high computation accuracy.

Performance Evaluation of Energy Management Algorithms for MapReduce System (MapReduce 시스템을 위한 에너지 관리 알고리즘의 성능평가)

  • Kim, Min-Ki;Cho, Haengrae
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.9 no.2
    • /
    • pp.109-115
    • /
    • 2014
  • Analyzing large scale data has become an important activity for many organizations. Since MapReduce is a promising tool for processing the massive data sets, there are increasing studies to evaluate the performance of various algorithms related to MapReduce. In this paper, we first develop a simulation framework that includes MapReduce workload model, data center model, and the model of data access pattern. Then we propose two algorithms that can reduce the energy consumption of MapReduce systems. Using the simulation framework, we evaluate the performance of the proposed algorithms under different application characteristics and configurations of data centers.

PDFindexer: Distributed PDF Indexing system using MapReduce

  • Murtazaev, JAziz;Kihm, Jang-Su;Oh, Sangyoon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.4 no.1
    • /
    • pp.13-17
    • /
    • 2012
  • Indexing allows converting raw document collection into easily searchable representation. Web searching by Google or Yahoo provides subsecond response time which is made possible by efficient indexing of web-pages over the entire Web. Indexing process gets challenging when the scale gets bigger. Parallel techniques, such as MapReduce framework can assist in efficient large-scale indexing process. In this paper we propose PDFindexer, system for indexing scientific papers in PDF using MapReduce programming model. Unlike Web search engines, our target domain is scientific papers, which has pre-defined structure, such as title, abstract, sections, references. Our proposed system enables parsing scientific papers in PDF recreating their structure and performing efficient distributed indexing with MapReduce framework in a cluster of nodes. We provide the overview of the system, their components and interactions among them. We discuss some issues related with the design of the system and usage of MapReduce in parsing and indexing of large document collection.

An Application of MapReduce Technique over Peer-to-Peer Network (P2P 네트워크상에서 MapReduce 기법 활용)

  • Ren, Jian-Ji;Lee, Jae-Kee
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.8
    • /
    • pp.586-590
    • /
    • 2009
  • The objective of this paper describes the design of MapReduce over Peer-to-Peer network for dynamic environments applications. MapReduce is a software framework used for Cloud Computing which processing large data sets in a highly-parallel way. Based on the Peer-to-Peer network character which node failures will happen anytime, we focus on using a DHT routing protocol which named Pastry to handle the problem of node failures. Our results are very promising and indicate that the framework could have a wide application in P2P network systems while maintaining good computational efficiency and scalability. We believe that, P2P networks and parallel computing emerge as very hot research and development topics in industry and academia for many years to come.

Subspace Projection-Based Clustering and Temporal ACRs Mining on MapReduce for Direct Marketing Service

  • Lee, Heon Gyu;Choi, Yong Hoon;Jung, Hoon;Shin, Yong Ho
    • ETRI Journal
    • /
    • v.37 no.2
    • /
    • pp.317-327
    • /
    • 2015
  • A reliable analysis of consumer preference from a large amount of purchase data acquired in real time and an accurate customer characterization technique are essential for successful direct marketing campaigns. In this study, an optimal segmentation of post office customers in Korea is performed using a subspace projection-based clustering method to generate an accurate customer characterization from a high-dimensional census dataset. Moreover, a traditional temporal mining method is extended to an algorithm using the MapReduce framework for a consumer preference analysis. The experimental results show that it is possible to use parallel mining through a MapReduce-based algorithm and that the execution time of the algorithm is faster than that of a traditional method.

RDF/OWL data management on Map-Reduce architecture: A comparison between approaches (있는 Map-Reduce구조에 대한 RDF/OWL 데이터 관리 : 접근 사이에 비교사이에 비교)

  • Garcia, Guillermo Crocker;Lee, Young-Koo;Lee, Sung-Young
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06c
    • /
    • pp.142-144
    • /
    • 2011
  • In a world in constant changes, more and more devices are producing data and it is indispensable to manage these data. That is why map reduce framework is a solution to manage a large amount of data in a fast and right way. In the other hand semantic web (RDF/OWL) is getting popular and can be a solution to manage data in an efficient way, so that such data can be retrieved and understood by both human and machine. In this paper we describes and analyze some projects that manages RDF with Map-Reduce framework.

Big Numeric Data Classification Using Grid-based Bayesian Inference in the MapReduce Framework

  • Kim, Young Joon;Lee, Keon Myung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.4
    • /
    • pp.313-321
    • /
    • 2014
  • In the current era of data-intensive services, the handling of big data is a crucial issue that affects almost every discipline and industry. In this study, we propose a classification method for large volumes of numeric data, which is implemented in a distributed programming framework, i.e., MapReduce. The proposed method partitions the data space into a grid structure and it then models the probability distributions of classes for grid cells by collecting sufficient statistics using distributed MapReduce tasks. The class labeling of new data is achieved by k-nearest neighbor classification based on Bayesian inference.

The MapReduce framework for Large-scale Data Analysis: Overview and Research Trends (대규모 데이터 분석을 위한 MapReduce 기술의 연구 동향)

  • Lee, K.H.;Park, W.J.;Cho, K.S.;Ryu, W.
    • Electronics and Telecommunications Trends
    • /
    • v.28 no.6
    • /
    • pp.156-166
    • /
    • 2013
  • MapReduce는 다양한 형식의 대용량 데이터를 병렬 처리하는데 있어 효과적인 도구로 인식되고 있다. 특히 MapReduce의 오픈 소스 구현인 Hadoop은 여러 분야에서 널리 이용되고 있으며, 가장 대표적인 빅데이터 솔루션으로 현재까지 많은 주목을 받아오고 있다. 하지만, MapReduce는 그 구조적 특정으로 인한 이점과 함께 여러 제약과 단점들을 가진다. 이에 따라 MapReduce의 개선을 위한 많은 연구와 시스템 개량이 학계와 산업계에서 동시에 수행되어 왔다. 본고에서는 대용량 데이터 분석을 위한 MapReduce 프레임워크의 특성과 이를 개선하기 위한 최근의 연구 내용들을 소개한다. 또한 향후의 대용량 데이터 처리는 어떠한 모습을 취하게 될 것인지를 예측해 본다.

MRQUTER : A Parallel Qualitative Temporal Reasoner Using MapReduce Framework (MRQUTER: MapReduce 프레임워크를 이용한 병렬 정성 시간 추론기)

  • Kim, Jonghoon;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.5
    • /
    • pp.231-242
    • /
    • 2016
  • In order to meet rapid changes of Web information, it is necessary to extend the current Web technologies to represent both the valid time and location of each fact and knowledge, and reason their relationships. Until recently, many researches on qualitative temporal reasoning have been conducted in laboratory-scale, dealing with small knowledge bases. However, in this paper, we propose the design and implementation of a parallel qualitative temporal reasoner, MRQUTER, which can make reasoning over Web-scale large knowledge bases. This parallel temporal reasoner was built on a Hadoop cluster system using the MapReduce parallel programming framework. It decomposes the entire qualitative temporal reasoning process into several MapReduce jobs such as the encoding and decoding job, the inverse and equal reasoning job, the transitive reasoning job, the refining job, and applies some optimization techniques into each component reasoning job implemented with a pair of Map and Reduce functions. Through experiments using large benchmarking temporal knowledge bases, MRQUTER shows high reasoning performance and scalability.