• Title/Summary/Keyword: MapReduce

Search Result 847, Processing Time 0.033 seconds

Deep Web and MapReduce

  • Tao, Yufei
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.3
    • /
    • pp.147-158
    • /
    • 2013
  • This invited paper introduces results on Web science and technology obtained during work with the Korea Advanced Institute of Science and Technology. In the first part, we discuss algorithms for exploring the deep Web, which refers to the collection of Web pages that cannot be reached by conventional Web crawlers. In the second part, we discuss sorting algorithms on the MapReduce system, which has become a dominant paradigm for massive parallel computing.

An Enterprise Location Recommendation Service in Metropolitan Region Using Skyline Query and MapReduce (Skyline Query와 MapReduce 방식을 이용한 대도시에서의 창업 위치 추천 서비스)

  • Lee, YongHyun;Kim, DongEun;Kim, Ummo
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2014.11a
    • /
    • pp.259-260
    • /
    • 2014
  • 본 논문은 편의점, 까페 등의 창업시 많은 후보군들 사이에서 적절한 위치를 추천하는 서비스를 만들어보고자 수행되었다. 본 연구는 Skyline Query를 이용하여 사용자가 설정한 지점으로부터의 거리에 따른 예상이익을 도출해내고, MapReduce를 사용하여 많은 후보군들을 대상으로 이를 효율적으로 처리하도록 구현하였다. 본 연구의 방법을 사용하여 창업자가 설정한 한정적 자원 및 거리 제한 조건 안에서 최적의 위치를 손쉽게 추천해줄 수 있을 것이다.

  • PDF

Evaluating MapReduce For Determining The Total Number of Tasks in Virtualized Machine (가상 머신에서의 태스크 개수 결정을 위한 MapReduce 성능평가)

  • Chung, Hae-Jin;Choi, Won-Seok;Kim, Yoon-Ho;Kim, Joon-Mo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.24-26
    • /
    • 2012
  • 하드웨어 컴퓨팅 자원의 성능을 최대로 활용하기 위한 소프트웨어 기술로 가상 머신 기술이 활발하게 사용되고 있다. 또, 하드웨어 컴퓨팅 자원의 병렬성을 극대화하기 위한 소프트웨어 기술로 함께 주목 받고 있는 기술이 분산 병렬 프로그래밍 기술이다. 그러나 가상머신에서 데이터를 병렬로 처리할 경우 I/O의 속도 저하 문제 등과 같은 단점이 있다. 본 논문에서는 성능 저하 없이 가상 머신에서 병렬 프로그래밍을 수행할 수 있도록 가상 머신에서의 태스크 개수 결정을 위한 선행 연구로서, 가상 머신 환경을 만들고, 여러 가지 속성 값을 변경하여 MapReduce 성능 평가결과를 보인다. 본 논문에서 수행한 실험의 결과는 가상머신에서의 MapReduce 태스크 결정 방법으로 연구에 참고자료로 사용될 수 있을 것이다.

Hadoop and MapReduce (하둡과 맵리듀스)

  • Park, Jeong-Hyeok;Lee, Sang-Yeol;Kang, Da Hyun;Won, Joong-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.1013-1027
    • /
    • 2013
  • As the need for large-scale data analysis is rapidly increasing, Hadoop, or the platform that realizes large-scale data processing, and MapReduce, or the internal computational model of Hadoop, are receiving great attention. This paper reviews the basic concepts of Hadoop and MapReduce necessary for data analysts who are familiar with statistical programming, through examples that combine the R programming language and Hadoop.

Subspace Projection-Based Clustering and Temporal ACRs Mining on MapReduce for Direct Marketing Service

  • Lee, Heon Gyu;Choi, Yong Hoon;Jung, Hoon;Shin, Yong Ho
    • ETRI Journal
    • /
    • v.37 no.2
    • /
    • pp.317-327
    • /
    • 2015
  • A reliable analysis of consumer preference from a large amount of purchase data acquired in real time and an accurate customer characterization technique are essential for successful direct marketing campaigns. In this study, an optimal segmentation of post office customers in Korea is performed using a subspace projection-based clustering method to generate an accurate customer characterization from a high-dimensional census dataset. Moreover, a traditional temporal mining method is extended to an algorithm using the MapReduce framework for a consumer preference analysis. The experimental results show that it is possible to use parallel mining through a MapReduce-based algorithm and that the execution time of the algorithm is faster than that of a traditional method.

An Improved Hybrid Canopy-Fuzzy C-Means Clustering Algorithm Based on MapReduce Model

  • Dai, Wei;Yu, Changjun;Jiang, Zilong
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2016
  • The fuzzy c-means (FCM) is a frequently utilized algorithm at present. Yet, the clustering quality and convergence rate of FCM are determined by the initial cluster centers, and so an improved FCM algorithm based on canopy cluster concept to quickly analyze the dataset has been proposed. Taking advantage of the canopy algorithm for its rapid acquisition of cluster centers, this algorithm regards the cluster results of canopy as the input. In this way, the convergence rate of the FCM algorithm is accelerated. Meanwhile, the MapReduce scheme of the proposed FCM algorithm is designed in a cloud environment. Experimental results demonstrate the hybrid canopy-FCM clustering algorithm processed by MapReduce be endowed with better clustering quality and higher operation speed.

Map Reduce-based P2P DBaaS Hub system

  • Jung, Yean-Woo;Lee, Jong-Yong;Jung, Kye-Dong
    • International journal of advanced smart convergence
    • /
    • v.5 no.1
    • /
    • pp.16-22
    • /
    • 2016
  • The database integration is being emphasized to one way of the companies collaboration. To database integration, companies are use like one database what their own, it can be provided more efficient service to customer. However, there exist some difficulty to database integration. that is the database security and database heterogeneity problems. In this paper, we proposed the MapReduce based p2p DBaaS hub system to solve database heterogeneity problem. The proposed system provides an environment for companies in the P2P cloud to integrate a database of each other. The proposed system uses DBaaS Hub for a collection of data in the P2P cloud, and use MapReduce for integrating the collected data.

A Design of the Cloud Aggregator on the MapReduce in the Multi Cloud

  • Hwang, Chigon;Shin, Hyoyoung;Lee, Jong-Yong;Jung, Kye-Dong
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.8 no.1
    • /
    • pp.83-90
    • /
    • 2016
  • The emergence of cloud has been able to provide a variety of IT service to the user. As organizations and companies are increased that provide these cloud service, many problems arises on integration. However, with the advent of latest technologies such as big data, document-oriented database, and MapReduce, this problem can be easily solved. This paper is intended to design the Cloud Aggregator to provide them as a service to collect information of the cloud system providing each service. To do this, we use the DBaaS(DataBase as a Service) and MapReduce techniques. This makes it possible to maintain the functionality of existing system and correct the problem that may occur depending on the combination.

RDF/OWL data management on Map-Reduce architecture: A comparison between approaches (있는 Map-Reduce구조에 대한 RDF/OWL 데이터 관리 : 접근 사이에 비교사이에 비교)

  • Garcia, Guillermo Crocker;Lee, Young-Koo;Lee, Sung-Young
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06c
    • /
    • pp.142-144
    • /
    • 2011
  • In a world in constant changes, more and more devices are producing data and it is indispensable to manage these data. That is why map reduce framework is a solution to manage a large amount of data in a fast and right way. In the other hand semantic web (RDF/OWL) is getting popular and can be a solution to manage data in an efficient way, so that such data can be retrieved and understood by both human and machine. In this paper we describes and analyze some projects that manages RDF with Map-Reduce framework.

Real-time log analysis system for detecting network attacks in a MapReduce environment (MapReduce 환경에서 네트워크 공격 탐지를 위한 실시간 로그 분석 시스템 개발)

  • Chang, Jin-Su;Shin, Jae-Hwan;Chang, Jae-Woo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.37-40
    • /
    • 2017
  • 네트워크 기술의 발전으로 인터넷의 보급률이 증가함에 따라, 네트워크 사용량 또한 증가하고 있다. 그러나 네트워크 사용량이 증가함에 따라 악의적인 네트워크 접근 또한 증가하고 있다. 이러한 악의적인 접근은 네트워크에서 발생하는 보안 로그를 분석함으로써 탐지가 가능하다. 그러나 대규모의 네트워크 트래픽이 발생함에 따라, 보안 로그의 처리 및 분석에 많은 시간이 소요된다. 본 논문에서는 MapReduce 환경에서 네트워크 공격 탐지를 위한 실시간 로그 분석 시스템을 개발한다. 이를 위해, Hadoop의 MapReduce를 통해 보안 로그의 속성을 추출하고 대용량의 보안 로그를 분산 처리한다. 아울러 처리된 보안 로그를 분석함으로써 실시간으로 발생하는 네트워크 공격 패턴을 탐지하고, 이를 시각적으로 표현함으로써 사용자가 네트워크 상태를 보다 쉽게 파악할 수 있도록 한다.