• Title/Summary/Keyword: map-reduce

Search Result 853, Processing Time 0.029 seconds

Paging extensions for HMIPv6: P-HMIPv6 (페이징 서비스를 위한 HMIPv6의 확장: P-HMIPv6)

  • 이준섭
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.05a
    • /
    • pp.222-225
    • /
    • 2003
  • IP-based mobility management is a core technology in the next generation network environment. MIPv6 is introduced to support for mobility in global network, and HMIPv6 is being considered as a local mobility management protocol to reduce signaling overhead for mobility management in the global network. Also, the paging technology is introduced to IP-based network to reduce signaling overhead and power consumption for mobility management. In this paper, we propose P-HMIPv6 which enables IP paging service in HMIPv6 to reduce signaling overhead for local mobility management in MAP domain of HMIPv6.

  • PDF

Boundary-preserving Stereo Matching based on Confidence Region Detection and Disparity Map Refinement (신뢰 영역 검출 및 시차 지도 재생성 기반 경계 보존 스테레오 매칭)

  • Yun, In Yong;Kim, Joong Kyu
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.5
    • /
    • pp.132-140
    • /
    • 2016
  • In this paper, we propose boundary-preserving stereo matching method based on adaptive disparity adjustment using confidence region detection. To find the initial disparity map, we compute data cost using the color space (CIE Lab) combined with the gradient space and apply double cost aggregation. We perform left/right consistency checking to sort out the mismatched region. This consistency check typically fails for occluded and mismatched pixels. We mark a pixel in the left disparity map as "inconsistent", if the disparity value of its counterpart pixel differs by a value larger than one pixel. In order to distinguish errors caused by the disparity discontinuity, we first detect the confidence map using the Mean-shift segmentation in the initial disparity map. Using this confidence map, we then adjust the disparity map to reduce the errors in initial disparity map. Experimental results demonstrate that the proposed method produces higher quality disparity maps by successfully preserving disparity discontinuities compared to existing methods.

The Analysis Framework for User Behavior Model using Massive Transaction Log Data (대규모 로그를 사용한 유저 행동모델 분석 방법론)

  • Lee, Jongseo;Kim, Songkuk
    • The Journal of Bigdata
    • /
    • v.1 no.2
    • /
    • pp.1-8
    • /
    • 2016
  • User activity log includes lots of hidden information, however it is not structured and too massive to process data, so there are lots of parts uncovered yet. Especially, it includes time series data. We can reveal lots of parts using it. But we cannot use log data directly to analyze users' behaviors. In order to analyze user activity model, it needs transformation process through extra framework. Due to these things, we need to figure out user activity model analysis framework first and access to data. In this paper, we suggest a novel framework model in order to analyze user activity model effectively. This model includes MapReduce process for analyzing massive data quickly in the distributed environment and data architecture design for analyzing user activity model. Also we explained data model in detail based on real online service log design. Through this process, we describe which analysis model is fit for specific data model. It raises understanding of processing massive log and designing analysis model.

  • PDF

A Benchmark Test of Spatial Big Data Processing Tools and a MapReduce Application

  • Nguyen, Minh Hieu;Ju, Sungha;Ma, Jong Won;Heo, Joon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.5
    • /
    • pp.405-414
    • /
    • 2017
  • Spatial data processing often poses challenges due to the unique characteristics of spatial data and this becomes more complex in spatial big data processing. Some tools have been developed and provided to users; however, they are not common for a regular user. This paper presents a benchmark test between two notable tools of spatial big data processing: GIS Tools for Hadoop and SpatialHadoop. At the same time, a MapReduce application is introduced to be used as a baseline to evaluate the effectiveness of two tools and to derive the impact of number of maps/reduces on the performance. By using these tools and New York taxi trajectory data, we perform a spatial data processing related to filtering the drop-off locations within Manhattan area. Thereby, the performance of these tools is observed with respect to increasing of data size and changing number of worker nodes. The results of this study are as follows 1) GIS Tools for Hadoop automatically creates a Quadtree index in each spatial processing. Therefore, the performance is improved significantly. However, users should be familiar with Java to handle this tool conveniently. 2) SpatialHadoop does not automatically create a spatial index for the data. As a result, its performance is much lower than GIS Tool for Hadoop on a same spatial processing. However, SpatialHadoop achieved the best result in terms of performing a range query. 3) The performance of our MapReduce application has increased four times after changing the number of reduces from 1 to 12.

Statistical Approach to Sentiment Classification using MapReduce (맵리듀스를 이용한 통계적 접근의 감성 분류)

  • Kang, Mun-Su;Baek, Seung-Hee;Choi, Young-Sik
    • Science of Emotion and Sensibility
    • /
    • v.15 no.4
    • /
    • pp.425-440
    • /
    • 2012
  • As the scale of the internet grows, the amount of subjective data increases. Thus, A need to classify automatically subjective data arises. Sentiment classification is a classification of subjective data by various types of sentiments. The sentiment classification researches have been studied focused on NLP(Natural Language Processing) and sentiment word dictionary. The former sentiment classification researches have two critical problems. First, the performance of morpheme analysis in NLP have fallen short of expectations. Second, it is not easy to choose sentiment words and determine how much a word has a sentiment. To solve these problems, this paper suggests a combination of using web-scale data and a statistical approach to sentiment classification. The proposed method of this paper is using statistics of words from web-scale data, rather than finding a meaning of a word. This approach differs from the former researches depended on NLP algorithms, it focuses on data. Hadoop and MapReduce will be used to handle web-scale data.

  • PDF

Dynamic Load Management Method for Spatial Data Stream Processing on MapReduce Online Frameworks (맵리듀스 온라인 프레임워크에서 공간 데이터 스트림 처리를 위한 동적 부하 관리 기법)

  • Jeong, Weonil
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.8
    • /
    • pp.535-544
    • /
    • 2018
  • As the spread of mobile devices equipped with various sensors and high-quality wireless network communications functionsexpands, the amount of spatio-temporal data generated from mobile devices in various service fields is rapidly increasing. In conventional research into processing a large amount of real-time spatio-temporal streams, it is very difficult to apply a Hadoop-based spatial big data system, designed to be a batch processing platform, to a real-time service for spatio-temporal data streams. This paper extends the MapReduce online framework to support real-time query processing for continuous-input, spatio-temporal data streams, and proposes a load management method to distribute overloads for efficient query processing. The proposed scheme shows a dynamic load balancing method for the nodes based on the inflow rate and the load factor of the input data based on the space partition. Experiments show that it is possible to support efficient query processing by distributing the spatial data stream in the corresponding area to the shared resources when load management in a specific area is required.

Design and Implementation of a Search Engine based on Apache Spark (아파치 스파크 기반 검색엔진의 설계 및 구현)

  • Park, Ki-Sung;Choi, Jae-Hyun;Kim, Jong-Bae;Park, Jae-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.1
    • /
    • pp.17-28
    • /
    • 2017
  • Recently, a study on data has been actively conducted because the value of the data has become more useful. Web crawler that is program of data collection recently spotlighted because it can take advantage of the various fields. Web crawler can be defined as a tool to analyze the web pages and collects the URL by traversing the web server in an automated manner. For the treatment of Big-data, distributed Web crawler is widely used which is based on the Hadoop MapReduce. But, it is difficult to use and has constraints on the performance. Apache spark that is the In-memory computing platform is an alternative to MapReduce. The search engine which is one of the main purposes of web crawler displays the information you search by keyword gathered by web crawler. If search engines implement a spark-based web crawler instead of traditional MapReduce-based web crawler, it would be a more rapid data collection.

A Customized Tourism System Using Log Data on Hadoop (로그 데이터를 이용한 하둡기반 맞춤형 관광시스템)

  • Ya, Ding;Kim, Kang-Chul
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.2
    • /
    • pp.397-404
    • /
    • 2018
  • As the usage of internet is increasing, a lot of user behavior are written in a log file and the researches and industries using the log files are getting activated recently. This paper uses the Hadoop based on open source distributed computing platform and proposes a customized tourism system by analyzing user behaviors in the log files. The proposed system uses Google Analytics to get user's log files from the website that users visit, and stores search terms extracted by MapReduce to HDFS. Also it gathers features about the sight-seeing places or cities which travelers want to tour from travel guide websites by Octopus application. It suggests the customized cities by matching the search terms and city features. NBP(next bit permutation) algorithm to rearrange the search terms and city features is used to increase the probability of matching. Some customized cities are suggested by analyzing log files for 39 users to show the performance of the proposed system.

Processing Method of Mass Small File Using Hadoop Platform (하둡 플랫폼을 이용한 대량의 스몰파일 처리방법)

  • Kim, Chang-Bok;Chung, Jae-Pil
    • Journal of Advanced Navigation Technology
    • /
    • v.18 no.4
    • /
    • pp.401-408
    • /
    • 2014
  • Hadoop is composed with MapReduce programming model for distributed processing and HDFS distributed file system. Hadoop is suitable framework for big data processing, but processing of mass small files have many problems. The processing of mass small file in hadoop have problems to created one mapper per one file, and it have problems to needed many memory for store of meta information of file. This paper have comparison evaluation processing method of mass small file with various method in hadoop platform. The processing of general compression format is inadequate because of processing by one mapper regardless of data size. The processing of sequence and hadoop archive file is removed memory problem of namenode by compress and combine of small file. Hadoop archive file is faster then sequence file about combine time of small file. The processing using CombineFileInputFormat class is needed not combine of small file, and it have similar speed big data processing method.

Applying TIPC Protocol for Increasing Network Performance in Hadoop-based Distributed Computing Environment (Hadoop 기반 분산 컴퓨팅 환경에서 네트워크 I/O의 성능개선을 위한 TIPC의 적용과 분석)

  • Yoo, Dae-Hyun;Chung, Sang-Hwa;Kim, Tae-Hun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.5
    • /
    • pp.351-359
    • /
    • 2009
  • Recently with increase of data in the Internet, platform technologies that can process huge data effectively such as Google platform and Hadoop are regarded as worthy of notice. In this kind of platform, there exist network I/O overheads to send task outputs due to the MapReduce operation which is a programming model to support parallel computation in the large cluster system. In this paper, we suggest applying of TIPC (Transparent Inter-Process Communication) protocol for reducing network I/O overheads and increasing network performance in the distributed computing environments. TIPC has a lightweight protocol stack and it spends relatively less CPU time than TCP because of its simple connection establishment and logical addressing. In this paper, we analyze main features of the Hadoop-based distributed computing system, and we build an experimental model which can be used for experiments to compare the performance of various protocols. In the experimental result, TIPC has a higher bandwidth and lower CPU overheads than other protocols.