• Title/Summary/Keyword: MapReduce Framework

Search Result 100, Processing Time 0.025 seconds

Conversion of Large RDF Data using Hash-based ID Mapping Tables with MapReduce Jobs (맵리듀스 잡을 사용한 해시 ID 매핑 테이블 기반 대량 RDF 데이터 변환 방법)

  • Kim, InA;Lee, Kyu-Chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.236-239
    • /
    • 2021
  • With the growth of AI technology, the scale of Knowledge Graphs continues to be expanded. Knowledge Graphs are mainly expressed as RDF representations that consist of connected triples. Many RDF storages compress and transform RDF triples into the condensed IDs. However, if we try to transform a large scale of RDF triples, it occurs the high processing time and memory overhead because it needs to search the large ID mapping table. In this paper, we propose the method of converting RDF triples using Hash-based ID mapping tables with MapReduce, which is the software framework with a parallel, distributed algorithm. Our proposed method not only transforms RDF triples into Integer-based IDs, but also improves the conversion speed and memory overhead. As a result of our experiment with the proposed method for LUBM, the size of the dataset is reduced by about 3.8 times and the conversion time was spent about 106 seconds.

  • PDF

SPARQL Query Processing System over Scalable Triple Data using SparkSQL Framework (SparQLing : SparkSQL 기반 대용량 트리플 데이터를 위한 SPARQL 질의 시스템 구축)

  • Jeon, MyungJoong;Hong, JinYoung;Park, YoungTack
    • Journal of KIISE
    • /
    • v.43 no.4
    • /
    • pp.450-459
    • /
    • 2016
  • Every year, RDFS data tends further toward scalability; hence, the manner of SPARQL processing needs to be changed for fast query. The query processing method of SPARQL has been studied using a scalable distributed processing framework. Current studies indicate that the query engine based on the scalable distributed processing framework i.e., Hadoop(MapReduce) is not suitable for real-time processing because of the repetitive tasks; in addition, it is difficult to construct a query engine based on an In-memory Distributed Query engine, because distributed structure on the low-level is required to be considered. In this paper, we proposed a method to construct a query engine for improving the speed of the query process with the mass triple data. The query engine processes the query of SPARQL using the SparkSQL, which is an In-memory based, distributed query processing framework. SparkSQL is a high-level distributed query engine that facilitates existing SQL statement. In order to process the SPARQL query, after generating the Algebra Tree using Jena, the Algebra Tree is required to be translated to Spark Algebra Tree for application in the Spark system, and construction of the system that generated the SparkSQL query. Furthermore, we proposed the design of triple property table based on DataFrame for more efficient query processing in the Spark system. Finally, we verified the validity through comparative evaluation with the query engine, which is the existing distributed processing framework.

Maximum Product Detection Algorithm for Group Testing Frameworks

  • Seong, Jin-Taek
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.2
    • /
    • pp.95-101
    • /
    • 2020
  • In this paper, we consider a group testing (GT) framework which is to find a set of defective samples out of a large number of samples. To handle this framework, we propose a maximum product detection algorithm (MPDA) which is based on maximum a posteriori probability (MAP). The key idea of this algorithm exploits iterative detection to propagate belief to neighbor samples by exchanging marginal probabilities between samples and output results. The belief propagation algorithm as a conventional approach has been used to detect defective samples, but it has computational complexity to obtain the marginal probability in the output nodes which combine other marginal probabilities from the sample nodes. We show that the our proposed MPDA provides a benefit to reduce computational complexity up to 12% in runtime, while its performance is only slightly degraded compared to the belief propagation algorithm. And we verify the simulations to compare the difference of performance.

Framework of Online Shopping Service based on M2M and IoT for Handheld Devices in Cloud Computing (클라우드 컴퓨팅에서 Handheld Devices 기반의 M2M 및 IoT 온라인 쇼핑 서비스 프레임워크)

  • Alsaffar, Aymen Abdullah;Aazam, Mohammad;Park, Jun-Young;Huh, Eui-Nam
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.179-182
    • /
    • 2013
  • We develop Framework architecture of Online Shopping Services based on M2M and IoT for Handheld Devices in Cloud Computing. MapReduce model will be used as a method to simplify large scale data processing when user search for purchasing products online which provide efficient, and fast respond time. Therefore, providing user with a enhanced Quality of Experience (QoE) as well as Quality of Service (QoS) when purchasing/searching products Online from big data.

Cost-Effective MapReduce Processing in the Cloud (클라우드 환경에서의 비용 효율적인 맵리듀스 처리)

  • Ryu, Wooseok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.114-115
    • /
    • 2018
  • This paper studies a mechanism for cost-effective analysis of big data in the cloud environment. Recently, as a storage of electronic medical records can be managed outside the hospital, there is a growing demand for cloud-based big data analysis in small-and-medium hospitals. This paper firstly analyze the Amazon Elastic MapReduce which is a popular cloud framework for big data analysis, and proposes a cost model for analyzing big data using Amazon EMR with less cost. Using the proposed model, the user can construct a cost-effective computing cluster, which maximize the effectiveness of the analysis per operational cost.

  • PDF

Capturing Data from Untapped Sources using Apache Spark for Big Data Analytics (빅데이터 분석을 위해 아파치 스파크를 이용한 원시 데이터 소스에서 데이터 추출)

  • Nichie, Aaron;Koo, Heung-Seo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.7
    • /
    • pp.1277-1282
    • /
    • 2016
  • The term "Big Data" has been defined to encapsulate a broad spectrum of data sources and data formats. It is often described to be unstructured data due to its properties of variety in data formats. Even though the traditional methods of structuring data in rows and columns have been reinvented into column families, key-value or completely replaced with JSON documents in document-based databases, the fact still remains that data have to be reshaped to conform to certain structure in order to persistently store the data on disc. ETL processes are key in restructuring data. However, ETL processes incur additional processing overhead and also require that data sources are maintained in predefined formats. Consequently, data in certain formats are completely ignored because designing ETL processes to cater for all possible data formats is almost impossible. Potentially, these unconsidered data sources can provide useful insights when incorporated into big data analytics. In this project, using big data solution, Apache Spark, we tapped into other sources of data stored in their raw formats such as various text files, compressed files etc and incorporated the data with persistently stored enterprise data in MongoDB for overall data analytics using MongoDB Aggregation Framework and MapReduce. This significantly differs from the traditional ETL systems in the sense that it is compactible regardless of the data formats at source.

E-voting Implementation in Egypt

  • Eraky, Ahmed
    • Journal of Contemporary Eastern Asia
    • /
    • v.16 no.1
    • /
    • pp.48-68
    • /
    • 2017
  • Manual elections processes in Egypt have several negative effects; that mainly leads to political corruption due to the lack of transparency. These issues negatively influence citizen's participation in the political life; while electronic voting systems aim to increase efficiency, transparency, and reduce the cost comparing to the manual voting. The main research objectives are, finding the successful factors that positively affects E-voting implementation in Egypt, in addition of finding out the reasons that keep Egyptian government far from applying E-voting, and to come up with the road map that Egyptian government has to take into consideration to successfully implement E-voting systems. The findings of the study suggest that there are seven independent variables affecting e-voting implementation which are; leadership, government willingness, legal framework, technical quality, awareness, citizen's trust in government and IT literacy. Technology-Organization-Environment (TOE) theory was used to provide an analytical framework for the study. A quantitative approach (i.e., survey questionnaire) strategy was used to collect data. A random sampling method was used to select the participants for the survey, whom are targeted voters in Egypt and have access to the internet, since the questionnaire was distributed online and the data is analyzed using regression analysis. Practical implications of this study will lead for more citizen participation in the political life due to the transparency that E-voting system will create, in addition to reduce the political corruption.

Big IoT Healthcare Data Analytics Framework Based on Fog and Cloud Computing

  • Alshammari, Hamoud;El-Ghany, Sameh Abd;Shehab, Abdulaziz
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1238-1249
    • /
    • 2020
  • Throughout the world, aging populations and doctor shortages have helped drive the increasing demand for smart healthcare systems. Recently, these systems have benefited from the evolution of the Internet of Things (IoT), big data, and machine learning. However, these advances result in the generation of large amounts of data, making healthcare data analysis a major issue. These data have a number of complex properties such as high-dimensionality, irregularity, and sparsity, which makes efficient processing difficult to implement. These challenges are met by big data analytics. In this paper, we propose an innovative analytic framework for big healthcare data that are collected either from IoT wearable devices or from archived patient medical images. The proposed method would efficiently address the data heterogeneity problem using middleware between heterogeneous data sources and MapReduce Hadoop clusters. Furthermore, the proposed framework enables the use of both fog computing and cloud platforms to handle the problems faced through online and offline data processing, data storage, and data classification. Additionally, it guarantees robust and secure knowledge of patient medical data.

DEMO: Deep MR Parametric Mapping with Unsupervised Multi-Tasking Framework

  • Cheng, Jing;Liu, Yuanyuan;Zhu, Yanjie;Liang, Dong
    • Investigative Magnetic Resonance Imaging
    • /
    • v.25 no.4
    • /
    • pp.300-312
    • /
    • 2021
  • Compressed sensing (CS) has been investigated in magnetic resonance (MR) parametric mapping to reduce scan time. However, the relatively long reconstruction time restricts its widespread applications in the clinic. Recently, deep learning-based methods have shown great potential in accelerating reconstruction time and improving imaging quality in fast MR imaging, although their adaptation to parametric mapping is still in an early stage. In this paper, we proposed a novel deep learning-based framework DEMO for fast and robust MR parametric mapping. Different from current deep learning-based methods, DEMO trains the network in an unsupervised way, which is more practical given that it is difficult to acquire large fully sampled training data of parametric-weighted images. Specifically, a CS-based loss function is used in DEMO to avoid the necessity of using fully sampled k-space data as the label, thus making it an unsupervised learning approach. DEMO reconstructs parametric weighted images and generates a parametric map simultaneously by unrolling an interaction approach in conventional fast MR parametric mapping, which enables multi-tasking learning. Experimental results showed promising performance of the proposed DEMO framework in quantitative MR T1ρ mapping.

A Hot-Data Replication Scheme Based on Data Access Patterns for Enhancing Processing Speed of MapReduce (맵-리듀스의 처리 속도 향상을 위한 데이터 접근 패턴에 따른 핫-데이터 복제 기법)

  • Son, Ingook;Ryu, Eunkyung;Park, Junho;Bok, Kyoungsoo;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.11
    • /
    • pp.21-27
    • /
    • 2013
  • In recently years, with the growth of social media and the development of mobile devices, the data have been significantly increased. Hadoop has been widely utilized as a typical distributed storage and processing framework. The tasks in Mapreduce based on the Hadoop distributed file system are allocated to the map as close as possible by considering the data locality. However, there are data being requested frequently according to the data analysis tasks of Mapreduce. In this paper, we propose a hot-data replication mechanism to improve the processing speed of Mapreduce according to data access patterns. The proposed scheme reduces the task processing time and improves the data locality using the replica optimization algorithm on the high access frequency of hot data. It is shown through performance evaluation that the proposed scheme outperforms the existing scheme in terms of the load of access frequency.