• 제목/요약/키워드: MapReduce Framework

검색결과 100건 처리시간 0.019초

High-Performance Korean Morphological Analyzer Using the MapReduce Framework on the GPU

  • Cho, Shi-Won;Lee, Dong-Wook
    • Journal of Electrical Engineering and Technology
    • /
    • 제6권4호
    • /
    • pp.573-579
    • /
    • 2011
  • To meet the scalability and performance requirements of data analyses, which often involve voluminous data, efficient parallel or concurrent algorithms and frameworks are essential. We present a high-performance Korean morphological analyzer which employs the MapReduce framework on the graphics processing unit (GPU). MapReduce is a programming framework introduced by Google to aid the development of web search applications on a large number of central processing units (CPUs). GPUs are designed as a special-purpose co-processor. Their programming interfaces are typically formulated for graphics applications. Compared to CPUs, GPUs have greater computation power and memory bandwidth; however, GPUs are more difficult to program because of the design of their architectures. The performance of the Korean morphological analyzer using the MapReduce framework on the GPU is evaluated in comparison with the CPU-based model. The proposed Korean Morphological analyzer shows promising scalable performance on distributed computing with the GPU.

A Security Protection Framework for Cloud Computing

  • Zhu, Wenzheng;Lee, Changhoon
    • Journal of Information Processing Systems
    • /
    • 제12권3호
    • /
    • pp.538-547
    • /
    • 2016
  • Cloud computing is a new style of computing in which dynamically scalable and reconfigurable resources are provided as a service over the internet. The MapReduce framework is currently the most dominant programming model in cloud computing. It is necessary to protect the integrity of MapReduce data processing services. Malicious workers, who can be divided into collusive workers and non-collusive workers, try to generate bad results in order to attack the cloud computing. So, figuring out how to efficiently detect the malicious workers has been very important, as existing solutions are not effective enough in defeating malicious behavior. In this paper, we propose a security protection framework to detect the malicious workers and ensure computation integrity in the map phase of MapReduce. Our simulation results show that our proposed security protection framework can efficiently detect both collusive and non-collusive workers and guarantee high computation accuracy.

MapReduce 시스템을 위한 에너지 관리 알고리즘의 성능평가 (Performance Evaluation of Energy Management Algorithms for MapReduce System)

  • 김민기;조행래
    • 대한임베디드공학회논문지
    • /
    • 제9권2호
    • /
    • pp.109-115
    • /
    • 2014
  • Analyzing large scale data has become an important activity for many organizations. Since MapReduce is a promising tool for processing the massive data sets, there are increasing studies to evaluate the performance of various algorithms related to MapReduce. In this paper, we first develop a simulation framework that includes MapReduce workload model, data center model, and the model of data access pattern. Then we propose two algorithms that can reduce the energy consumption of MapReduce systems. Using the simulation framework, we evaluate the performance of the proposed algorithms under different application characteristics and configurations of data centers.

PDFindexer: Distributed PDF Indexing system using MapReduce

  • Murtazaev, JAziz;Kihm, Jang-Su;Oh, Sangyoon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제4권1호
    • /
    • pp.13-17
    • /
    • 2012
  • Indexing allows converting raw document collection into easily searchable representation. Web searching by Google or Yahoo provides subsecond response time which is made possible by efficient indexing of web-pages over the entire Web. Indexing process gets challenging when the scale gets bigger. Parallel techniques, such as MapReduce framework can assist in efficient large-scale indexing process. In this paper we propose PDFindexer, system for indexing scientific papers in PDF using MapReduce programming model. Unlike Web search engines, our target domain is scientific papers, which has pre-defined structure, such as title, abstract, sections, references. Our proposed system enables parsing scientific papers in PDF recreating their structure and performing efficient distributed indexing with MapReduce framework in a cluster of nodes. We provide the overview of the system, their components and interactions among them. We discuss some issues related with the design of the system and usage of MapReduce in parsing and indexing of large document collection.

P2P 네트워크상에서 MapReduce 기법 활용 (An Application of MapReduce Technique over Peer-to-Peer Network)

  • 임건길;이재기
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제15권8호
    • /
    • pp.586-590
    • /
    • 2009
  • 본 논문의 목적은 P2P 네트워크 상에서 동적 환경 애플리케이션을 지원하기 위한 MapReduce 의 설계이다. MapReduce는 클라우드컴퓨팅 중에서 대용량 데이터의 병렬처리를 위해서 개발된 소프트웨어 프레임워크이다. P2P 기반 네트워크의 특징은 노드 고장이 언제든지 발생할 수 있으며, 이런 노드 고장을 제어하기 위해 Pastry라는 DHT 라우팅 프로토콜의 사용에 초점을 맞추었다. 본 논문의 결과는 프레임워크가 양호한 계산 효율과 확장성을 유지하는 가운데 P2P 네트워크 시스템의 다양한 애플리케이션에 적용될 수 있음을 보이고 있다. 향후 몇 년 동안은 P2P 네트워크와 병렬 컴퓨팅이 산업과 학계에서 매우 중요한 연구 및 개발 주제로 자리 잡을 것으로 확신한다.

Subspace Projection-Based Clustering and Temporal ACRs Mining on MapReduce for Direct Marketing Service

  • Lee, Heon Gyu;Choi, Yong Hoon;Jung, Hoon;Shin, Yong Ho
    • ETRI Journal
    • /
    • 제37권2호
    • /
    • pp.317-327
    • /
    • 2015
  • A reliable analysis of consumer preference from a large amount of purchase data acquired in real time and an accurate customer characterization technique are essential for successful direct marketing campaigns. In this study, an optimal segmentation of post office customers in Korea is performed using a subspace projection-based clustering method to generate an accurate customer characterization from a high-dimensional census dataset. Moreover, a traditional temporal mining method is extended to an algorithm using the MapReduce framework for a consumer preference analysis. The experimental results show that it is possible to use parallel mining through a MapReduce-based algorithm and that the execution time of the algorithm is faster than that of a traditional method.

있는 Map-Reduce구조에 대한 RDF/OWL 데이터 관리 : 접근 사이에 비교사이에 비교 (RDF/OWL data management on Map-Reduce architecture: A comparison between approaches)

  • 크로케르기예르모;이영구;이승룡
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2011년도 한국컴퓨터종합학술대회논문집 Vol.38 No.1(C)
    • /
    • pp.142-144
    • /
    • 2011
  • In a world in constant changes, more and more devices are producing data and it is indispensable to manage these data. That is why map reduce framework is a solution to manage a large amount of data in a fast and right way. In the other hand semantic web (RDF/OWL) is getting popular and can be a solution to manage data in an efficient way, so that such data can be retrieved and understood by both human and machine. In this paper we describes and analyze some projects that manages RDF with Map-Reduce framework.

Big Numeric Data Classification Using Grid-based Bayesian Inference in the MapReduce Framework

  • Kim, Young Joon;Lee, Keon Myung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제14권4호
    • /
    • pp.313-321
    • /
    • 2014
  • In the current era of data-intensive services, the handling of big data is a crucial issue that affects almost every discipline and industry. In this study, we propose a classification method for large volumes of numeric data, which is implemented in a distributed programming framework, i.e., MapReduce. The proposed method partitions the data space into a grid structure and it then models the probability distributions of classes for grid cells by collecting sufficient statistics using distributed MapReduce tasks. The class labeling of new data is achieved by k-nearest neighbor classification based on Bayesian inference.

대규모 데이터 분석을 위한 MapReduce 기술의 연구 동향 (The MapReduce framework for Large-scale Data Analysis: Overview and Research Trends)

  • 이경하;박원주;조기성;류원
    • 전자통신동향분석
    • /
    • 제28권6호
    • /
    • pp.156-166
    • /
    • 2013
  • MapReduce는 다양한 형식의 대용량 데이터를 병렬 처리하는데 있어 효과적인 도구로 인식되고 있다. 특히 MapReduce의 오픈 소스 구현인 Hadoop은 여러 분야에서 널리 이용되고 있으며, 가장 대표적인 빅데이터 솔루션으로 현재까지 많은 주목을 받아오고 있다. 하지만, MapReduce는 그 구조적 특정으로 인한 이점과 함께 여러 제약과 단점들을 가진다. 이에 따라 MapReduce의 개선을 위한 많은 연구와 시스템 개량이 학계와 산업계에서 동시에 수행되어 왔다. 본고에서는 대용량 데이터 분석을 위한 MapReduce 프레임워크의 특성과 이를 개선하기 위한 최근의 연구 내용들을 소개한다. 또한 향후의 대용량 데이터 처리는 어떠한 모습을 취하게 될 것인지를 예측해 본다.

MRQUTER: MapReduce 프레임워크를 이용한 병렬 정성 시간 추론기 (MRQUTER : A Parallel Qualitative Temporal Reasoner Using MapReduce Framework)

  • 김종훈;김인철
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제5권5호
    • /
    • pp.231-242
    • /
    • 2016
  • 빠른 웹 정보의 변화에 잘 대응하기 위해서는, 사실과 지식이 실제로 유효한 시간과 장소들도 함께 표현하고 그들 간의 관계도 추론할 수 있도록 웹 기술의 확장이 필요하다. 본 논문에서는 그동안 소규모 지식 베이스를 이용한 실험실 수준의 정성 시간 추론 연구들에서 벗어나, 웹 스케일의 대규모 지식 베이스를 추론할 수 있는 병렬 정성 시간 추론기인 MRQUTER의 설계와 구현을 소개한다. Hadoop 클러스터 시스템과 MapReduce 병렬 프로그래밍 프레임워크를 이용해 개발된 MRQUTER에서는 정성 시간 추론 과정을 인코딩 및 디코딩 작업, 역 관계 및 동일 관계 추론 작업, 이행 관계 추론 작업, 관계 정제 작업 등 몇 개의 MapReduce 작업으로 나누고, 맵 함수와 리듀스 함수로 구현되는 각각의 단위 추론 작업을 효율화하기 위한 최적화 기술들을 적용하였다. 대규모 벤치마킹 시간 지식 베이스를 이용한 실험을 통해, MRQUTER의 높은 추론 성능과 확장성을 확인하였다.