Search | Korea Science

Performance Evaluation of Energy Management Algorithms for MapReduce System (MapReduce 시스템을 위한 에너지 관리 알고리즘의 성능평가)

Kim, Min-Ki;Cho, Haengrae
- IEMEK Journal of Embedded Systems and Applications
- /
- v.9 no.2
- /
- pp.109-115
- /
- 2014
Analyzing large scale data has become an important activity for many organizations. Since MapReduce is a promising tool for processing the massive data sets, there are increasing studies to evaluate the performance of various algorithms related to MapReduce. In this paper, we first develop a simulation framework that includes MapReduce workload model, data center model, and the model of data access pattern. Then we propose two algorithms that can reduce the energy consumption of MapReduce systems. Using the simulation framework, we evaluate the performance of the proposed algorithms under different application characteristics and configurations of data centers.
https://doi.org/10.14372/IEMEK.2014.9.2.109 인용 PDF KSCI

Improving the Map/Reduce Model through Data Distribution and Task Progress Scheduling (데이터 분배 및 태스크 진행 스케쥴링을 통한 맵/리듀스 모델의 성능 향상)

Hwang, In-Sung;Chung, Kyung-Yong;Rim, Kee-Wook;Lee, Jung-Hyun
- The Journal of the Korea Contents Association
- /
- v.10 no.10
- /
- pp.78-85
- /
- 2010
Map/Reduce is the programing model which can implement the Cloud Computing recently has been noticed. The model operates an application program processing amount of data using a lot of computers. It is important to plan the mechanism of separating the data in proper size and distributing that to a cluster consisted of computing node in efficient for using the computing nodes very well. Besides that, planning a process of Map phases and Reduce phases also influences the performance of Map/Reduce. This paper suggests the effectively distributing scheme that separates a huge data and operates Map task in the considering the performance of computing node and network status. And we make the Reduce task can be processed quickly through the tuning the mechanism of Map and Reduce task operation. Using the two Map/Reduce sample application, we experimented the suggestion and we evaluate suggestion considered it in how impact the Map/Reduce performance.
https://doi.org/10.5392/JKCA.10.10.078 인용 PDF KSCI

An Analytical Approach to Evaluation of SSD Effects under MapReduce Workloads

Ahn, Sungyong;Park, Sangkyu
- JSTS:Journal of Semiconductor Technology and Science
- /
- v.15 no.5
- /
- pp.511-518
- /
- 2015
As the cost-per-byte of SSDs dramatically decreases, the introduction of SSDs to Hadoop becomes an attractive choice for high performance data processing. In this paper the cost-per-performance of SSD-based Hadoop cluster (SSD-Hadoop) and HDD-based Hadoop cluster (HDD-Hadoop) are evaluated. For this, we propose a MapReduce performance model using queuing network to simulate the execution time of MapReduce job with varying cluster size. To achieve an accurate model, the execution time distribution of MapReduce job is carefully profiled. The developed model can precisely predict the execution time of MapReduce jobs with less than 7% difference for most cases. It is also found that SSD-Hadoop is 20% more cost efficient than HDD-Hadoop because SSD-Hadoop needs a smaller number of nodes than HDD-Hadoop to achieve a comparable performance, according to the results of simulation with varying the number of cluster nodes.
https://doi.org/10.5573/JSTS.2015.15.5.511 인용 PDF KSCI

Performance Evaluation of MapReduce Application running on Hadoop (Hadoop 상에서 MapReduce 응용프로그램 평가)

Kim, Junsu;Kang, Yunhee;Park, Youngbom
- Journal of Software Engineering Society
- /
- v.25 no.4
- /
- pp.63-67
- /
- 2012
According to the growth of data being generated in man fields, a distributed programming model MapReduce has been introduced to handle it. In this paper, we build two cluster system with Solaris and Linux environment on SUN Blade150 respectively and then to evaluate the performance of a MapReduce application running on MapReduce middleware Hadoop in terms of its average elapse time and standard deviation. As a result of this experiment, we show that the overall performance of the MapReduce application based on Hadoop is affected by the configuration of the cluster system.
PDF

Naive Bayes Learning Algorithm based on Map-Reduce Programming Model (Map-Reduce 프로그래밍 모델 기반의 나이브 베이스 학습 알고리즘)

Kang, Dae-Ki
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2011.10a
- /
- pp.208-209
- /
- 2011
In this paper, we introduce a Naive Bayes learning algorithm for learning and reasoning in Map-Reduce model based environment. For this purpose, we use Apache Mahout to execute Distributed Naive Bayes on University of California, Irvine (UCI) benchmark data sets. From the experimental results, we see that Apache Mahout' s Distributed Naive Bayes algorithm is comparable to WEKA' s Naive Bayes algorithm in terms of performance. These results indicates that in the future Big Data environment, Map-Reduce model based systems such as Apache Mahout can be promising for machine learning usage.
PDF

PDFindexer: Distributed PDF Indexing system using MapReduce

Murtazaev, JAziz;Kihm, Jang-Su;Oh, Sangyoon
- International Journal of Internet, Broadcasting and Communication
- /
- v.4 no.1
- /
- pp.13-17
- /
- 2012
Indexing allows converting raw document collection into easily searchable representation. Web searching by Google or Yahoo provides subsecond response time which is made possible by efficient indexing of web-pages over the entire Web. Indexing process gets challenging when the scale gets bigger. Parallel techniques, such as MapReduce framework can assist in efficient large-scale indexing process. In this paper we propose PDFindexer, system for indexing scientific papers in PDF using MapReduce programming model. Unlike Web search engines, our target domain is scientific papers, which has pre-defined structure, such as title, abstract, sections, references. Our proposed system enables parsing scientific papers in PDF recreating their structure and performing efficient distributed indexing with MapReduce framework in a cluster of nodes. We provide the overview of the system, their components and interactions among them. We discuss some issues related with the design of the system and usage of MapReduce in parsing and indexing of large document collection.
https://doi.org/10.7236/IJIBC.2012.4.1.13 인용 PDF

High-Performance Korean Morphological Analyzer Using the MapReduce Framework on the GPU

Cho, Shi-Won;Lee, Dong-Wook
- Journal of Electrical Engineering and Technology
- /
- v.6 no.4
- /
- pp.573-579
- /
- 2011
To meet the scalability and performance requirements of data analyses, which often involve voluminous data, efficient parallel or concurrent algorithms and frameworks are essential. We present a high-performance Korean morphological analyzer which employs the MapReduce framework on the graphics processing unit (GPU). MapReduce is a programming framework introduced by Google to aid the development of web search applications on a large number of central processing units (CPUs). GPUs are designed as a special-purpose co-processor. Their programming interfaces are typically formulated for graphics applications. Compared to CPUs, GPUs have greater computation power and memory bandwidth; however, GPUs are more difficult to program because of the design of their architectures. The performance of the Korean morphological analyzer using the MapReduce framework on the GPU is evaluated in comparison with the CPU-based model. The proposed Korean Morphological analyzer shows promising scalable performance on distributed computing with the GPU.
https://doi.org/10.5370/JEET.2011.6.4.573 인용 PDF KSCI

Pipeline-MapReduce Model for Processing Large Data Sets in Distributed Systems (대용량 분산 데이터 처리를 위한 Pipeline-MapReduce 모델)

Kim, Sun Jo;Kim, Taehyoung;Eom, Young Ik
- Proceedings of the Korea Information Processing Society Conference
- /
- 2009.11a
- /
- pp.121-122
- /
- 2009
인터넷 상에서 정보량이 급격히 증가함에 따라 ISP들은 데이터를 효과적으로 처리하고 분석하기 위한 방법을 연구하고 있다. 대표적으로 Google에서는 대용량의 분산 데이터 처리 기법인 MapReduce 모델을 개발하였다. 본 논문에서는 기존 MapReduce 모델에 Pipeline 방식을 적용하여 성능을 개선한 Pipeline-MapReduce 기법을 제안한다. 그리고 실험을 통해 제안 기법이 기존 기법에 비해 빠른 처리 결과를 나타냄을 보여준다.
https://doi.org/10.3745/PKIPS.y2009m11a.121 인용 PDF

A Security Protection Framework for Cloud Computing

Zhu, Wenzheng;Lee, Changhoon
- Journal of Information Processing Systems
- /
- v.12 no.3
- /
- pp.538-547
- /
- 2016
Cloud computing is a new style of computing in which dynamically scalable and reconfigurable resources are provided as a service over the internet. The MapReduce framework is currently the most dominant programming model in cloud computing. It is necessary to protect the integrity of MapReduce data processing services. Malicious workers, who can be divided into collusive workers and non-collusive workers, try to generate bad results in order to attack the cloud computing. So, figuring out how to efficiently detect the malicious workers has been very important, as existing solutions are not effective enough in defeating malicious behavior. In this paper, we propose a security protection framework to detect the malicious workers and ensure computation integrity in the map phase of MapReduce. Our simulation results show that our proposed security protection framework can efficiently detect both collusive and non-collusive workers and guarantee high computation accuracy.
https://doi.org/10.3745/JIPS.03.0053 인용 PDF KSCI

An Efficient Data Replacement Algorithm for Performance Optimization of MapReduce in Non-dedicated Distributed Computing Environments (비-전용 분산 컴퓨팅 환경에서 맵-리듀스 처리 성능 최적화를 위한 효율적인 데이터 재배치 알고리즘)

Ryu, Eunkyung;Son, Ingook;Park, Junho;Bok, Kyoungsoo;Yoo, Jaesoo
- The Journal of the Korea Contents Association
- /
- v.13 no.9
- /
- pp.20-27
- /
- 2013
In recently years, with the growth of social media and the development of mobile devices, the data have been significantly increased. MapReduce is an emerging programming model that processes large amount of data. However, since MapReduce evenly places the data in the dedicated distributed computing environment, it is not suitable to the non-dedicated distributed computing environment. The data replacement algorithms were proposed for performance optimization of MapReduce in the non-dedicated distributed computing environments. However, they spend much time for date replacement and cause the network load for unnecessary data transmission. In this paper, we propose an efficient data replacement algorithm for the performance optimization of MapReduce in the non-dedicated distributed computing environments. The proposed scheme computes the ratio of data blocks in the nodes based on the node availability model and reduces the network load by transmitting the data blocks considering the data placement. Our experimental results show that the proposed scheme outperforms the existing scheme.
https://doi.org/10.5392/JKCA.2013.13.09.020 인용 PDF KSCI

Search Result 158, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)