• Title/Summary/Keyword: Big data processing

Search Result 1,053, Processing Time 0.025 seconds

A Survey of Homomorphic Encryption for Outsourced Big Data Computation

  • Fun, Tan Soo;Samsudin, Azman
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.8
    • /
    • pp.3826-3851
    • /
    • 2016
  • With traditional data storage solutions becoming too expensive and cumbersome to support Big Data processing, enterprises are now starting to outsource their data requirements to third parties, such as cloud service providers. However, this outsourced initiative introduces a number of security and privacy concerns. In this paper, homomorphic encryption is suggested as a mechanism to protect the confidentiality and privacy of outsourced data, while at the same time allowing third parties to perform computation on encrypted data. This paper also discusses the challenges of Big Data processing protection and highlights its differences from traditional data protection. Existing works on homomorphic encryption are technically reviewed and compared in terms of their encryption scheme, homomorphism classification, algorithm design, noise management, and security assumption. Finally, this paper discusses the current implementation, challenges, and future direction towards a practical homomorphic encryption scheme for securing outsourced Big Data computation.

Performance Optimization of Big Data Center Processing System - Big Data Analysis Algorithm Based on Location Awareness

  • Zhao, Wen-Xuan;Min, Byung-Won
    • International Journal of Contents
    • /
    • v.17 no.3
    • /
    • pp.74-83
    • /
    • 2021
  • A location-aware algorithm is proposed in this study to optimize the system performance of distributed systems for processing big data with low data reliability and application performance. Compared with previous algorithms, the location-aware data block placement algorithm uses data block placement and node data recovery strategies to improve data application performance and reliability. Simulation and actual cluster tests showed that the location-aware placement algorithm proposed in this study could greatly improve data reliability and shorten the application processing time of I/O interfaces in real-time.

Big IoT Healthcare Data Analytics Framework Based on Fog and Cloud Computing

  • Alshammari, Hamoud;El-Ghany, Sameh Abd;Shehab, Abdulaziz
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1238-1249
    • /
    • 2020
  • Throughout the world, aging populations and doctor shortages have helped drive the increasing demand for smart healthcare systems. Recently, these systems have benefited from the evolution of the Internet of Things (IoT), big data, and machine learning. However, these advances result in the generation of large amounts of data, making healthcare data analysis a major issue. These data have a number of complex properties such as high-dimensionality, irregularity, and sparsity, which makes efficient processing difficult to implement. These challenges are met by big data analytics. In this paper, we propose an innovative analytic framework for big healthcare data that are collected either from IoT wearable devices or from archived patient medical images. The proposed method would efficiently address the data heterogeneity problem using middleware between heterogeneous data sources and MapReduce Hadoop clusters. Furthermore, the proposed framework enables the use of both fog computing and cloud platforms to handle the problems faced through online and offline data processing, data storage, and data classification. Additionally, it guarantees robust and secure knowledge of patient medical data.

Hazelcast Vs. Ignite: Opportunities for Java Programmers

  • Maxim, Bartkov;Tetiana, Katkova;S., Kruglyk Vladyslav;G., Murtaziev Ernest;V., Kotova Olha
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.406-412
    • /
    • 2022
  • Storing large amounts of data has always been a big problem from the beginning of computing history. Big Data has made huge advancements in improving business processes by finding the customers' needs using prediction models based on web and social media search. The main purpose of big data stream processing frameworks is to allow programmers to directly query the continuous stream without dealing with the lower-level mechanisms. In other words, programmers write the code to process streams using these runtime libraries (also called Stream Processing Engines). This is achieved by taking large volumes of data and analyzing them using Big Data frameworks. Streaming platforms are an emerging technology that deals with continuous streams of data. There are several streaming platforms of Big Data freely available on the Internet. However, selecting the most appropriate one is not easy for programmers. In this paper, we present a detailed description of two of the state-of-the-art and most popular streaming frameworks: Apache Ignite and Hazelcast. In addition, the performance of these frameworks is compared using selected attributes. Different types of databases are used in common to store the data. To process the data in real-time continuously, data streaming technologies are developed. With the development of today's large-scale distributed applications handling tons of data, these databases are not viable. Consequently, Big Data is introduced to store, process, and analyze data at a fast speed and also to deal with big users and data growth day by day.

A Context-Awareness Modeling User Profile Construction Method for Personalized Information Retrieval System

  • Kim, Jee Hyun;Gao, Qian;Cho, Young Im
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.2
    • /
    • pp.122-129
    • /
    • 2014
  • Effective information gathering and retrieval of the most relevant web documents on the topic of interest is difficult due to the large amount of information that exists in various formats. Current information gathering and retrieval techniques are unable to exploit semantic knowledge within documents in the "big data" environment; therefore, they cannot provide precise answers to specific questions. Existing commercial big data analytic platforms are restricted to a single data type; moreover, different big data analytic platforms are effective at processing different data types. Therefore, the development of a common big data platform that is suitable for efficiently processing various data types is needed. Furthermore, users often possess more than one intelligent device. It is therefore important to find an efficient preference profile construction approach to record the user context and personalized applications. In this way, user needs can be tailored according to the user's dynamic interests by tracking all devices owned by the user.

An Analysis of Utilization on Virtualized Computing Resource for Hadoop and HBase based Big Data Processing Applications (Hadoop과 HBase 기반의 빅 데이터 처리 응용을 위한 가상 컴퓨팅 자원 이용률 분석)

  • Cho, Nayun;Ku, Mino;Kim, Baul;Xuhua, Rui;Min, Dugki
    • Journal of Information Technology and Architecture
    • /
    • v.11 no.4
    • /
    • pp.449-462
    • /
    • 2014
  • In big data era, there are a number of considerable parts in processing systems for capturing, storing, and analyzing stored or streaming data. Unlike traditional data handling systems, a big data processing system needs to concern the characteristics (format, velocity, and volume) of being handled data in the system. In this situation, virtualized computing platform is an emerging platform for handling big data effectively, since virtualization technology enables to manage computing resources dynamically and elastically with minimum efforts. In this paper, we analyze resource utilization of virtualized computing resources to discover suitable deployment models in Apache Hadoop and HBase-based big data processing environment. Consequently, Task Tracker service shows high CPU utilization and high Disk I/O overhead during MapReduce phases. Moreover, HRegion service indicates high network resource consumption for transfer the traffic data from DataNode to Task Tracker. DataNode shows high memory resource utilization and Disk I/O overhead for reading stored data.

Hadoop Based Wavelet Histogram for Big Data in Cloud

  • Kim, Jeong-Joon
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.668-676
    • /
    • 2017
  • Recently, the importance of big data has been emphasized with the development of smartphone, web/SNS. As a result, MapReduce, which can efficiently process big data, is receiving worldwide attention because of its excellent scalability and stability. Since big data has a large amount, fast creation speed, and various properties, it is more efficient to process big data summary information than big data itself. Wavelet histogram, which is a typical data summary information generation technique, can generate optimal data summary information that does not cause loss of information of original data. Therefore, a system applying a wavelet histogram generation technique based on MapReduce has been actively studied. However, existing research has a disadvantage in that the generation speed is slow because the wavelet histogram is generated through one or more MapReduce Jobs. And there is a high possibility that the error of the data restored by the wavelet histogram becomes large. However, since the wavelet histogram generation system based on the MapReduce developed in this paper generates the wavelet histogram through one MapReduce Job, the generation speed can be greatly increased. In addition, since the wavelet histogram is generated by adjusting the error boundary specified by the user, the error of the restored data can be adjusted from the wavelet histogram. Finally, we verified the efficiency of the wavelet histogram generation system developed in this paper through performance evaluation.

Development of Big Data System for Energy Big Data (에너지 빅데이터를 수용하는 빅데이터 시스템 개발)

  • Song, Mingoo
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.1
    • /
    • pp.24-32
    • /
    • 2018
  • This paper proposes a Big Data system for energy Big Data which is aggregated in real-time from industrial and public sources. The constructed Big Data system is based on Hadoop and the Spark framework is simultaneously applied on Big Data processing, which supports in-memory distributed computing. In the paper, we focus on Big Data, in the form of heat energy for district heating, and deal with methodologies for storing, managing, processing and analyzing aggregated Big Data in real-time while considering properties of energy input and output. At present, the Big Data influx is stored and managed in accordance with the designed relational database schema inside the system and the stored Big Data is processed and analyzed as to set objectives. The paper exemplifies a number of heat demand plants, concerned with district heating, as industrial sources of heat energy Big Data gathered in real-time as well as the proposed system.

Design and Implementation of Dynamic Recommendation Service in Big Data Environment

  • Kim, Ryong;Park, Kyung-Hye
    • Journal of Information Technology Applications and Management
    • /
    • v.26 no.5
    • /
    • pp.57-65
    • /
    • 2019
  • Recommendation Systems are information technologies that E-commerce merchants have adopted so that online shoppers can receive suggestions on items that might be interesting or complementing to their purchased items. These systems stipulate valuable assistance to the user's purchasing decisions, and provide quality of push service. Traditionally, Recommendation Systems have been designed using a centralized system, but information service is growing vast with a rapid and strong scalability. The next generation of information technology such as Cloud Computing and Big Data Environment has handled massive data and is able to support enormous processing power. Nevertheless, analytic technologies are lacking the different capabilities when processing big data. Accordingly, we are trying to design a conceptual service model with a proposed new algorithm and user adaptation on dynamic recommendation service for big data environment.

A Study on the Enhancement Process of the Telecommunication Network Management using Big Data Analysis (Big Data 분석을 활용한 통신망 관리 시스템의 개선방안에 관한 연구)

  • Koo, Sung-Hwan;Shin, Min-Soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.12
    • /
    • pp.6060-6070
    • /
    • 2012
  • Real-Time Enterprise (RTE)'s key requirement is that it should respond and adapt fast to the change of the firms' internal and external situations including the change of market and customers' needs. Recently, the big data processing technology to support the speedy change of the firms is spotlighted. Under the circumstances that wire and wireless communication networks are evolving with an accelerated rate, it is especially critical to provide a strong security monitoring function and stable services through a real-time processing of massive communication data traffic. By applying the big data processing technology based on a cloud computing architecture, this paper solves the managerial problems of telecommunication service providers and discusses how to operate the network management system effectively.