• Title/Summary/Keyword: map-reduce

Search Result 849, Processing Time 0.043 seconds

A Study of Data Collection Method for Efficient Sharing in IoT Environment (사물인터넷(IoT) 환경에서 효율적 공유를 위한 데이터 수집 기법에 대한 연구)

  • Hwang, Chi-Gon;Yoon, Chang-Pyo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.268-269
    • /
    • 2015
  • The current Internet environment, it is accessible by a computer, but also transferred to the IoT(Internet of Things). These data become large. If the data are provided to the application without any adjustment, it is difficult to exert the original performance. In this paper, we propose a method for filtering the data using the MapReduce of big data processing techniques to refine the collected data. We want to address the heterogeneity of the data generated by the sensor by adding a knowledge identification step in MapReduce. We use XMDR for this purpose.

  • PDF

Cost-Effective MapReduce Processing in the Cloud (클라우드 환경에서의 비용 효율적인 맵리듀스 처리)

  • Ryu, Wooseok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.114-115
    • /
    • 2018
  • This paper studies a mechanism for cost-effective analysis of big data in the cloud environment. Recently, as a storage of electronic medical records can be managed outside the hospital, there is a growing demand for cloud-based big data analysis in small-and-medium hospitals. This paper firstly analyze the Amazon Elastic MapReduce which is a popular cloud framework for big data analysis, and proposes a cost model for analyzing big data using Amazon EMR with less cost. Using the proposed model, the user can construct a cost-effective computing cluster, which maximize the effectiveness of the analysis per operational cost.

  • PDF

Efficient Computation of Grouping Sets Queries Using MapReduce (맵리듀스에서 Grouping Sets 질의의 효율적인 계산 기법)

  • Park, So-Jeong;Park, Eun-Ju;Lee, Ki Yong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.783-786
    • /
    • 2014
  • 맵리듀스(MapReduce)는 대용량의 데이터를 여러 컴퓨터에서 분산, 병렬 처리하는 프레임워크이다. Grouping sets 질의는 사용자가 지정한 여러 개의 group-by들을 모두 구하는 질의로서, 롤업(rollup)과 큐브(cube)가 너무 많은 결과를 반환하는 단점을 보완하여 원하는 group-by들에 대한 결과만 얻을 수 있도록 한다. 본 논문은 맵리듀스 환경에서 grouping sets 질의를 효율적으로 계산하는 방법을 제안한다. 제안 방법은 grouping sets 질의를 2개의 맵리듀스 잡(job)을 통해 단계적으로 계산한다. 첫 번째 맵리듀스 잡은 grouping sets 질의에 포함된 group-by들이 모두 계산될 수 있는 '부모' group-by를 먼저 계산한다. 두 번째 맵리듀스 잡은 부모 group-by를 입력으로 하여 grouping sets 질의에 포함된 group-by들을 각각 계산한다. 부모 group-by의 크기가 입력 데이터의 크기에 비해 매우 작은 경우, 제안 방법은 입력 데이터로부터 각 group-by를 독립적으로 구하는 단순 방법보다 좋은 성능을 보인다. 실험을 통해 제안 방법이 각 group-by를 독립적으로 구하는 단순 방법보다 좋은 성능을 가짐을 보인다.

Capturing Data from Untapped Sources using Apache Spark for Big Data Analytics (빅데이터 분석을 위해 아파치 스파크를 이용한 원시 데이터 소스에서 데이터 추출)

  • Nichie, Aaron;Koo, Heung-Seo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.7
    • /
    • pp.1277-1282
    • /
    • 2016
  • The term "Big Data" has been defined to encapsulate a broad spectrum of data sources and data formats. It is often described to be unstructured data due to its properties of variety in data formats. Even though the traditional methods of structuring data in rows and columns have been reinvented into column families, key-value or completely replaced with JSON documents in document-based databases, the fact still remains that data have to be reshaped to conform to certain structure in order to persistently store the data on disc. ETL processes are key in restructuring data. However, ETL processes incur additional processing overhead and also require that data sources are maintained in predefined formats. Consequently, data in certain formats are completely ignored because designing ETL processes to cater for all possible data formats is almost impossible. Potentially, these unconsidered data sources can provide useful insights when incorporated into big data analytics. In this project, using big data solution, Apache Spark, we tapped into other sources of data stored in their raw formats such as various text files, compressed files etc and incorporated the data with persistently stored enterprise data in MongoDB for overall data analytics using MongoDB Aggregation Framework and MapReduce. This significantly differs from the traditional ETL systems in the sense that it is compactible regardless of the data formats at source.

Efficient Computation of Data Cubes in MapReduce (맵리듀스에서 데이터 큐브의 효율적인 계산 기법)

  • Lee, Ki Yong;Park, Sojeong;Park, Eunju;Park, Jinkyung;Choi, Yeunjung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.715-718
    • /
    • 2014
  • 맵리듀스(MapReduce)는 대용량 데이터의 병렬 처리에 사용되는 프로그래밍 모델이다. 데이터 큐브(data cube)는 대용량 데이터의 다차원 분석에 널리 사용되는 연산자로서, 주어진 차원 애트리뷰트들의 모든 가능한 조합에 대한 group-by 를 계산한다. 차원 애트리뷰트가 n 개일 때, 데이터 큐브는 총 $2^n$ 개의 group-by 를 계산한다. 본 논문은 맵리듀스 환경에서 데이터 큐브를 효율적으로 계산하는 방법을 제안한다. 제안 방법은 $2^n$ 개의 group-by 를 분할하고 이들을 ${\lceil}n/2{\rceil}$개의 맵리듀스 잡(job)을 통해 단계적으로 계산한다. 제안 방법은 각 맵리듀스 잡에서 맵 함수가 출력하는 중간결과의 크기를 최소화함으로써 총 계산 비용을 크게 줄인다. 실험을 통해 제안 방법은 기존 방법에 비해 데이터 큐브를 더 빠르게 계산함을 보인다.

Search for a user-centered system design and implementation (사용자 중심 검색 시스템 설계 및 구현)

  • Kim, A-Yong;Park, Man-Seub;Kim, Jong-Moon;Jeong, Dae-Jin;Jung, Hoe-kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.619-621
    • /
    • 2014
  • addition to the advances in information technology and the latest IT technology for their issue. To enable users who are using the Web to find need the information your search data they're sifting through about how many are struggling. In this paper, we propose a user-centered search system. Lucene search system to offer Hadoop's MapReduce with the Apache project Nutch, Solr, HDFS, utilizing design and implementation. This is the Web search users who wish to use depending on the intentions of the data that you want to collect and index information will be utilized in the search field.

  • PDF

Wind Turbine Placement Optimization at the Catholic University of Pusan Using 3-D Drone Mapping

  • Ambrosia, Matthew Stanley
    • Journal of Environmental Science International
    • /
    • v.30 no.1
    • /
    • pp.19-28
    • /
    • 2021
  • To reduce pollution, decrease the production of carbon dioxide, and to maintain a secure supply of energy, interest continues to grow in the area of renewable energy especially since there is a finite supply of cheap oil. Wind energy is one of the most viable options to consider and supply part of the energy needed to reduce dependence on foreign oil. However, it is difficult to predict the wind speed in an environment with many obstacles such as buildings and trees and getting accurate dimensions of those obstacles is difficult particularly on sloped mountainous terrain. In this study a drone was used to create a 3-D map of the campus of the Catholic University of Pusan. The dimensions and elevations for the 3-D map were used to make a model of the school campus in the CFD program Envi-met. Simulations were run for five different wind directions and 4 different elevations to find the location that would give the highest electrical output for a wind turbine. When considering all of these variables it was found that the optimal location was above the Student Union which had a 40% higher wind speed and could produce 274% more electrical power than the original wind speed.

CAttNet: A Compound Attention Network for Depth Estimation of Light Field Images

  • Dingkang Hua;Qian Zhang;Wan Liao;Bin Wang;Tao Yan
    • Journal of Information Processing Systems
    • /
    • v.19 no.4
    • /
    • pp.483-497
    • /
    • 2023
  • Depth estimation is one of the most complicated and difficult problems to deal with in the light field. In this paper, a compound attention convolutional neural network (CAttNet) is proposed to extract depth maps from light field images. To make more effective use of the sub-aperture images (SAIs) of light field and reduce the redundancy in SAIs, we use a compound attention mechanism to weigh the channel and space of the feature map after extracting the primary features, so it can more efficiently select the required view and the important area within the view. We modified various layers of feature extraction to make it more efficient and useful to extract features without adding parameters. By exploring the characteristics of light field, we increased the network depth and optimized the network structure to reduce the adverse impact of this change. CAttNet can efficiently utilize different SAIs correlations and features to generate a high-quality light field depth map. The experimental results show that CAttNet has advantages in both accuracy and time.

The Processing of Spatial Joins using a Bit-map Approximation (비트맵 근사 표현을 이용한 효율적인 공간 조인)

  • 홍남희;김희수
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.2
    • /
    • pp.157-164
    • /
    • 2001
  • This paper studies on the processing of spatial joins. The spatial join operation is divided into filters and refinement steps in general. The processing of spatial joins can be greatly improved by the use of filters that reduce the polygons in order to find the intersecting ones. As a result, three possible sets of answers are identified: the positive one, the negative one and the inconclusive one. To identify all the interesting pairs of polygons with inconclusive answers, it is necessary to have access to the representation of polygons so that an exact geometry test can take place. We introduce a bit-map approximation technique to drastically reduce the computation required by the refinement step during refinement processing. Bit-map representation are used for the description of the internal, the external and the boundary regions of the polygon objects. The proposed scheme increases the chance of trivial acceptance and rejection of data objects, and reduces unnecessary disk accesses in query processing. It has been shown that the reference to the object data file can be cut down by as much as 60%.

  • PDF

High-density genetic mapping using GBS in Chrysanthemum

  • Chung, Yong Suk;Cho, Jin Woong;Kim, Changsoo
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2017.06a
    • /
    • pp.57-57
    • /
    • 2017
  • Chrysanthemum is one of the most important floral crop in Korea produced about 7 billion dollars (1 billion for pot and 6 billion for cutting) in 2013. However, it is difficult to breed and to do genetic study because 1) it is highly self-incompatible, 2) it is outcrossing crop having heterozygotes, and 3) commercial cultvars are hexaploid (2n = 6x = 54). Although low-density genetic map and QTL study were reported, it is not enough to apply for the marker assisted selection and other genetic studies. Therefore, we are trying to make high-density genetic mapping using GBS with about 100 $F_1s$ of C. boreale that is oHohhfd diploid (2n = 2x = 18, about 2.8Gb) instead of commercial culitvars. Since Chrysanthemum is outcrossing, two-way pseudo-testcross model would be used to construct genetic map. Also, genotype-by-sequencing (GBS) would be utilized to generate sufficient number of markers and to maximize genomic representation in a cost effective manner. Those completed sequences would be analyzed with TASSEL-GBS pipeline. In order to reduce sequence error, only first 64 sequences, which have almost zero percent error, would be incorporated in the pipeline for the analysis. In addition, to reduce errors that is common in heterozygotes crops caused by low coverage, two rare cutters (NsiI and MseI) were used to increase sequence depth. Maskov algorithm would also used to deal with missing data. Further, sparsely placed markers on the physical map would be used as anchors to overcome problems caused by low coverage. For this purpose, were generated from transcriptome of Chrysanthemum using MISA program. Among those, 10 simple sequence repeat (SSR) markers, which are evenly distributed along each chromosome and polymorphic between two parents, would be selected.

  • PDF