• Title/Summary/Keyword: Hadoop Environment

Search Result 91, Processing Time 0.021 seconds

Big Data Platform Based on Hadoop and Application to Weight Estimation of FPSO Topside

  • Kim, Seong-Hoon;Roh, Myung-Il;Kim, Ki-Su;Oh, Min-Jae
    • Journal of Advanced Research in Ocean Engineering
    • /
    • v.3 no.1
    • /
    • pp.32-40
    • /
    • 2017
  • Recently, the amount of data to be processed and the complexity thereof have been increasing due to the development of information and communication technology, and industry's interest in such big data is increasing day by day. In the shipbuilding and offshore industry also, there is growing interest in the effective utilization of data, since various and vast amounts of data are being generated in the process of design, production, and operation. In order to effectively utilize big data in the shipbuilding and offshore industry, it is necessary to store and process large amounts of data. In this study, it was considered efficient to apply Hadoop and R, which are mostly used in big data related research. Hadoop is a framework for storing and processing big data. It provides the Hadoop Distributed File System (HDFS) for storing big data, and the MapReduce function for processing. Meanwhile, R provides various data analysis techniques through the language and environment for statistical calculation and graphics. While Hadoop makes it is easy to handle big data, it is difficult to finely process data; and although R has advanced analysis capability, it is difficult to use to process large data. This study proposes a big data platform based on Hadoop for applications in the shipbuilding and offshore industry. The proposed platform includes the existing data of the shipyard, and makes it possible to manage and process the data. To check the applicability of the platform, it is applied to estimate the weights of offshore structure topsides. In this study, we store data of existing FPSOs in Hadoop-based Hortonworks Data Platform (HDP), and perform regression analysis using RHadoop. We evaluate the effectiveness of large data processing by RHadoop by comparing the results of regression analysis and the processing time, with the results of using the conventional weight estimation program.

Design and Implementation of HDFS Data Encryption Scheme Using ARIA Algorithms on Hadoop (하둡 상에서 ARIA 알고리즘을 이용한 HDFS 데이터 암호화 기법의 설계 및 구현)

  • Song, Youngho;Shin, YoungSung;Chang, Jae-Woo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.2
    • /
    • pp.33-40
    • /
    • 2016
  • Due to the growth of social network systems (SNS), big data are realized and Hadoop was developed as a distributed platform for analyzing big data. Enterprises analyze data containing users' sensitive information by using Hadoop and utilize them for marketing. Therefore, researches on data encryption have been done to protect the leakage of sensitive data stored in Hadoop. However, the existing researches support only the AES encryption algorithm, the international standard of data encryption. Meanwhile, Korean government choose ARIA algorithm as a standard data encryption one. In this paper, we propose a HDFS data encryption scheme using ARIA algorithms on Hadoop. First, the proposed scheme provide a HDFS block splitting component which performs ARIA encryption and decryption under the distributed computing environment of Hadoop. Second, the proposed scheme also provide a variable-length data processing component which performs encryption and decryption by adding dummy data, in case when the last block of data does not contains 128 bit data. Finally, we show from performance analysis that our proposed scheme can be effectively used for both text string processing applications and science data analysis applications.

MRSPAKE : A Web-Scale Spatial Knowledge Extractor Using Hadoop MapReduce (MRSPAKE : Hadoop MapReduce를 이용한 웹 규모의 공간 지식 추출기)

  • Lee, Seok-Jun;Kim, In-Cheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.569-584
    • /
    • 2016
  • In this paper, we present a spatial knowledge extractor implemented in Hadoop MapReduce parallel, distributed computing environment. From a large spatial dataset, this knowledge extractor automatically derives a qualitative spatial knowledge base, which consists of both topological and directional relations on pairs of two spatial objects. By using R-tree index and range queries over a distributed spatial data file on HDFS, the MapReduce-enabled spatial knowledge extractor, MRSPAKE, can produce a web-scale spatial knowledge base in highly efficient way. In experiments with the well-known open spatial dataset, Open Street Map (OSM), the proposed web-scale spatial knowledge extractor, MRSPAKE, showed high performance and scalability.

Analysis of the Influence Factors of Data Loading Performance Using Apache Sqoop (아파치 스쿱을 사용한 하둡의 데이터 적재 성능 영향 요인 분석)

  • Chen, Liu;Ko, Junghyun;Yeo, Jeongmo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.2
    • /
    • pp.77-82
    • /
    • 2015
  • Big Data technology has been attracted much attention in aspect of fast data processing. Research of practicing Big Data technology is also ongoing to process large-scale structured data much faster in Relatioinal Database(RDB). Although there are lots of studies about measuring analyzing performance, studies about structured data loading performance, prior step of analyzing, is very rare. Thus, in this study, structured data in RDB is tested the performance that loads distributed processing platform Hadoop using Apache sqoop. Also in order to analyze the influence factors of data loading, it is tested repeatedly with different options of data loading and compared with data loading performance among RDB based servers. Although data loading performance of Apache Sqoop in test environment was low, but in large-scale Hadoop cluster environment we can expect much better performance because of getting more hardware resources. It is expected to be based on study improving data loading performance and whole steps of performance analyzing structured data in Hadoop Platform.

Rhipe Platform for Big Data Processing and Analysis (빅데이터 처리 및 분석을 위한 Rhipe 플랫폼)

  • Jung, Byung Ho;Shin, Ji Eun;Lim, Dong Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1171-1185
    • /
    • 2014
  • Rhipe that integrates R and Hadoop environment, made it possible to process and analyze massive amounts of data using a distributed processing environment. In this paper, we implemented multiple regression analysis using Rhipe with various data sizes of actual data and simulated data. Experimental results for comparing the computing speeds of pseudo-distributed and fully-distributed modes for configuring Hadoop cluster, showed fully-distributed mode was more fast than pseudo-distributed mode and computing speeds of fully-distributed mode were faster as the number of data nodes increases. We also compared the performance of our Rhipe with stats and biglm packages available on bigmemory. The results showed that our Rhipe was more fast than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

A Study on Data Storage and Recovery in Hadoop Environment (하둡 환경에 적합한 데이터 저장 및 복원 기법에 관한 연구)

  • Kim, Su-Hyun;Lee, Im-Yeong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.12
    • /
    • pp.569-576
    • /
    • 2013
  • Cloud computing has been receiving increasing attention recently. Despite this attention, security is the main problem that still needs to be addressed for cloud computing. In general, a cloud computing environment protects data by using distributed servers for data storage. When the amount of data is too high, however, different pieces of a secret key (if used) may be divided among hundreds of distributed servers. Thus, the management of a distributed server may be very difficult simply in terms of its authentication, encryption, and decryption processes, which incur vast overheads. In this paper, we proposed a efficiently data storage and recovery scheme using XOR and RAID in Hadoop environment.

Design and Implementation of Big Data Cluster for Indoor Environment Monitering (실내 환경 모니터링을 위한 빅데이터 클러스터 설계 및 구현)

  • Jeon, Byoungchan;Go, Mingu
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.13 no.2
    • /
    • pp.77-85
    • /
    • 2017
  • Due to the expansion of accommodation space caused by increase of population along with lifestyle changes, most of people spend their time indoor except for the travel time. Because of this, environmental change of indoor is very important, and it affects people's health and economy in resources. But, most of people don't acknowledge the importance of indoor environment. Thus, monitoring system for sustaining and managing indoor environment systematically is needed, and big data clusters should be used in order to save and manage numerous sensor data collected from many spaces. In this paper, we design a big data cluster for the indoor environment monitoring in order to store the sensor data and monitor unit of the huge building Implementation design big data cluster-based system for the analysis, and a distributed file system and building a Hadoop, HBase for big data processing. Also, various sensor data is saved for collection, and effective indoor environment management and health enhancement through monitoring is expected.

Design and Implementation of Vehicle Route Tracking System using Hadoop-Based Bigdata Image Processing (하둡 기반 빅데이터 영상 처리를 통한 차량 이동경로 추적 시스템의 설계 및 구현)

  • Yang, Seongeun;Choi, Changyeol;Choi, Hwangkyu
    • Journal of Digital Contents Society
    • /
    • v.14 no.4
    • /
    • pp.447-454
    • /
    • 2013
  • As the surveillance CCTVs are increasing every year, big data image processing for the CCTV image data has become a hot issue. In this paper, we propose a Hadoop-based big data image processing technique to recognize a vehicle number from a large amount of automatic number plate images taken from CCTVs. We also implement the vehicle route tracking system that displays the moving path of the searched vehicle on Google Maps with the related information together. In order to evaluate the performance we compare and analysis the vehicle number recognition time for a lot of CCTV image data in Hadoop and the single PC environment.

A Normal Network Behavior Profiling Method Based on Big Data Analysis Techniques (Hadoop/Hive) (빅데이터 분석 기술(Hadoop/Hive) 기반 네트워크 정상행위 규정 방법)

  • Kim, SungJin;Kim, Kangseok
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.5
    • /
    • pp.1117-1127
    • /
    • 2017
  • With the advent of Internet of Things (IoT), the number of devices connected to Internet has rapidly increased, but the security for IoT is still vulnerable. It is difficult to integrate existing security technologies due to generating a large amount of traffic by using different protocols to use various IoT devices according to purposes and to operate in a low power environment. Therefore, in this paper, we propose a normal network behavior profiling method based on big data analysis techniques. The proposed method utilizes a Hadoop/Hive for Big Data analytics and an R for statistical computing. Also we verify the effectiveness of the proposed method through a simulation.

Sequential Pattern Mining for Intrusion Detection System with Feature Selection on Big Data

  • Fidalcastro, A;Baburaj, E
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.5023-5038
    • /
    • 2017
  • Big data is an emerging technology which deals with wide range of data sets with sizes beyond the ability to work with software tools which is commonly used for processing of data. When we consider a huge network, we have to process a large amount of network information generated, which consists of both normal and abnormal activity logs in large volume of multi-dimensional data. Intrusion Detection System (IDS) is required to monitor the network and to detect the malicious nodes and activities in the network. Massive amount of data makes it difficult to detect threats and attacks. Sequential Pattern mining may be used to identify the patterns of malicious activities which have been an emerging popular trend due to the consideration of quantities, profits and time orders of item. Here we propose a sequential pattern mining algorithm with fuzzy logic feature selection and fuzzy weighted support for huge volumes of network logs to be implemented in Apache Hadoop YARN, which solves the problem of speed and time constraints. Fuzzy logic feature selection selects important features from the feature set. Fuzzy weighted supports provide weights to the inputs and avoid multiple scans. In our simulation we use the attack log from NS-2 MANET environment and compare the proposed algorithm with the state-of-the-art sequential Pattern Mining algorithm, SPADE and Support Vector Machine with Hadoop environment.