• Title/Summary/Keyword: Hadoop security

Search Result 35, Processing Time 0.029 seconds

Security Threats and Review for SQL on Hadoop (SQL on Hadoop 기술 동향 및 보안 위협)

  • Youn, Han Jung;Suk, Sang Kee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.04a
    • /
    • pp.691-693
    • /
    • 2015
  • SQL on Hadoop 기술은 하둡 분산 파일 시스템에 저장된 데이터를 대상으로 SQL을 이용하여 사용자의 질의를 처리하는 기술이다. 기존의 Hadoop 시스템이 맵리듀스의 한계와 기존 시스템의 호환성으로 인해 RDBMS와 병행사용이 불가피하다는 단점을 SQL을 이용해 극복하고자 하는 것이다. 본 논문에서는 SQL on Hadoop의 대표적 프레임워크인 Hive와 Impala의 특징과, 연구동향에 대해 살펴보고 예상되는 보안 위협에 대해 고찰한다.

A Design of Hadoop Security Protocol using One Time Key based on Hash-chain (해시 체인 기반 일회용 키를 이용한 하둡 보안 프로토콜 설계)

  • Jeong, Eun-Hee;Lee, Byung-Kwan
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.4
    • /
    • pp.340-349
    • /
    • 2017
  • This paper is proposed Hadoop security protocol to protect a reply attack and impersonation attack. The proposed hadoop security protocol is consists of user authentication module, public key based data node authentication module, name node authentication module, and data node authentication module. The user authentication module is issued the temporary access ID from TGS after verifing user's identification on Authentication Server. The public key based data node authentication module generates secret key between name node and data node, and generates OTKL(One-Time Key List) using Hash-chain. The name node authentication module verifies user's identification using user's temporary access ID, and issues DT(Delegation Token) and BAT(Block Access Token) to user. The data node authentication module sends the encrypted data block to user after verifing user's identification using OwerID of BAT. Therefore the proposed hadoop security protocol dose not only prepare the exposure of data node's secret key by using OTKL, timestamp, owerID but also detect the reply attack and impersonation attack. Also, it enhances the data access of data node, and enforces data security by sending the encrypted data.

Scalable P2P Botnet Detection with Threshold Setting in Hadoop Framework (하둡 프레임워크에서 한계점 가변으로 확장성이 가능한 P2P 봇넷 탐지 기법)

  • Huseynov, Khalid;Yoo, Paul D.;Kim, Kwangjo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.4
    • /
    • pp.807-816
    • /
    • 2015
  • During the last decade most of coordinated security breaches are performed by the means of botnets, which is a large overlay network of compromised computers being controlled by remote botmaster. Due to high volumes of traffic to be analyzed, the challenge is posed by managing tradeoff between system scalability and accuracy. We propose a novel Hadoop-based P2P botnet detection method solving the problem of scalability and having high accuracy. Moreover, our approach is characterized not to require labeled data and applicable to encrypted traffic as well.

A Digital Secret File Leakage Prevention System via Hadoop-based User Behavior Analysis (하둡 기반의 사용자 행위 분석을 통한 기밀파일 유출 방지 시스템)

  • Yoo, Hye-Rim;Shin, Gyu-Jin;Yang, Dong-Min;Lee, Bong-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.11
    • /
    • pp.1544-1553
    • /
    • 2018
  • Recently internal information leakage in industries is severely increasing in spite of industry security policy. Thus, it is essential to prepare an information leakage prevention measure by industries. Most of the leaks result from the insiders, not from external attacks. In this paper, a real-time internal information leakage prevention system via both storage and network is implemented in order to protect confidential file leakage. In addition, a Hadoop-based user behavior analysis and statistics system is designed and implemented for storing and analyzing information log data in industries. The proposed system stores a large volume of data in HDFS and improves data processing capability using RHive, consequently helps the administrator recognize and prepare the confidential file leak trials. The implemented audit system would be contributed to reducing the damage caused by leakage of confidential files inside of the industries via both portable data media and networks.

A Design of Permission Management System Based on Group Key in Hadoop Distributed File System (하둡 분산 파일 시스템에서 그룹키 기반 Permission Management 시스템 설계)

  • Kim, Hyungjoo;Kang, Jungho;You, Hanna;Jun, Moonseog
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.4
    • /
    • pp.141-146
    • /
    • 2015
  • Data have been increased enormously due to the development of IT technology such as recent smart equipments, social network services and streaming services. To meet these environments the technologies that can treat mass data have received attention, and the typical one is Hadoop. Hadoop is on the basis of open source, and it has been designed to be used at general purpose computers on the basis of Linux. To initial Hadoop nearly no security was introduced, but as the number of users increased data that need security increased and there appeared new version that introduced Kerberos and Token system in 2009. But in this method there was a problem that only one secret key can be used and access permission to blocks cannot be authenticated to each user, and there were weak points that replay attack and spoofing attack were possible. Hence, to supplement these weak points and to maintain efficiency a protocol on the basis of group key, in which users are authenticated in logical group and then this is reflected to token, is proposed in this paper. The result shows that it has solved the weak points and there is no big overhead in terms of efficiency.

Initial Authentication Protocol of Hadoop Distribution System based on Elliptic Curve (타원곡선기반 하둡 분산 시스템의 초기 인증 프로토콜)

  • Jeong, Yoon-Su;Kim, Yong-Tae;Park, Gil-Cheol
    • Journal of Digital Convergence
    • /
    • v.12 no.10
    • /
    • pp.253-258
    • /
    • 2014
  • Recently, the development of cloud computing technology is developed as soon as smartphones is increases, and increased that users want to receive big data service. Hadoop framework of the big data service is provided to hadoop file system and hadoop mapreduce supported by data-intensive distributed applications. But, smpartphone service using hadoop system is a very vulnerable state to data authentication. In this paper, we propose a initial authentication protocol of hadoop system assisted by smartphone service. Proposed protocol is combine symmetric key cryptography techniques with ECC algorithm in order to support the secure multiple data processing systems. In particular, the proposed protocol to access the system by the user Hadoop when processing data, the initial authentication key and the symmetric key instead of the elliptic curve by using the public key-based security is improved.

Design and Implementation of HDFS Data Encryption Scheme Using ARIA Algorithms on Hadoop (하둡 상에서 ARIA 알고리즘을 이용한 HDFS 데이터 암호화 기법의 설계 및 구현)

  • Song, Youngho;Shin, YoungSung;Chang, Jae-Woo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.2
    • /
    • pp.33-40
    • /
    • 2016
  • Due to the growth of social network systems (SNS), big data are realized and Hadoop was developed as a distributed platform for analyzing big data. Enterprises analyze data containing users' sensitive information by using Hadoop and utilize them for marketing. Therefore, researches on data encryption have been done to protect the leakage of sensitive data stored in Hadoop. However, the existing researches support only the AES encryption algorithm, the international standard of data encryption. Meanwhile, Korean government choose ARIA algorithm as a standard data encryption one. In this paper, we propose a HDFS data encryption scheme using ARIA algorithms on Hadoop. First, the proposed scheme provide a HDFS block splitting component which performs ARIA encryption and decryption under the distributed computing environment of Hadoop. Second, the proposed scheme also provide a variable-length data processing component which performs encryption and decryption by adding dummy data, in case when the last block of data does not contains 128 bit data. Finally, we show from performance analysis that our proposed scheme can be effectively used for both text string processing applications and science data analysis applications.

Access efficiency of small sized files in Big Data using various Techniques on Hadoop Distributed File System platform

  • Alange, Neeta;Mathur, Anjali
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.7
    • /
    • pp.359-364
    • /
    • 2021
  • In recent years Hadoop usage has been increasing day by day. The need of development of the technology and its specified outcomes are eagerly waiting across globe to adopt speedy access of data. Need of computers and its dependency is increasing day by day. Big data is exponentially growing as the entire world is working in online mode. Large amount of data has been produced which is very difficult to handle and process within a short time. In present situation industries are widely using the Hadoop framework to store, process and produce at the specified time with huge amount of data that has been put on the server. Processing of this huge amount of data having small files & its storage optimization is a big problem. HDFS, Sequence files, HAR, NHAR various techniques have been already proposed. In this paper we have discussed about various existing techniques which are developed for accessing and storing small files efficiently. Out of the various techniques we have specifically tried to implement the HDFS- HAR, NHAR techniques.

A Security Log Analysis System using Logstash based on Apache Elasticsearch (아파치 엘라스틱서치 기반 로그스태시를 이용한 보안로그 분석시스템)

  • Lee, Bong-Hwan;Yang, Dong-Min
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.2
    • /
    • pp.382-389
    • /
    • 2018
  • Recently cyber attacks can cause serious damage on various information systems. Log data analysis would be able to resolve this problem. Security log analysis system allows to cope with security risk properly by collecting, storing, and analyzing log data information. In this paper, a security log analysis system is designed and implemented in order to analyze security log data using the Logstash in the Elasticsearch, a distributed search engine which enables to collect and process various types of log data. The Kibana, an open source data visualization plugin for Elasticsearch, is used to generate log statistics and search report, and visualize the results. The performance of Elasticsearch-based security log analysis system is compared to the existing log analysis system which uses the Flume log collector, Flume HDFS sink and HBase. The experimental results show that the proposed system tremendously reduces both database query processing time and log data analysis time compared to the existing Hadoop-based log analysis system.

A Study on Data Storage and Recovery in Hadoop Environment (하둡 환경에 적합한 데이터 저장 및 복원 기법에 관한 연구)

  • Kim, Su-Hyun;Lee, Im-Yeong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.12
    • /
    • pp.569-576
    • /
    • 2013
  • Cloud computing has been receiving increasing attention recently. Despite this attention, security is the main problem that still needs to be addressed for cloud computing. In general, a cloud computing environment protects data by using distributed servers for data storage. When the amount of data is too high, however, different pieces of a secret key (if used) may be divided among hundreds of distributed servers. Thus, the management of a distributed server may be very difficult simply in terms of its authentication, encryption, and decryption processes, which incur vast overheads. In this paper, we proposed a efficiently data storage and recovery scheme using XOR and RAID in Hadoop environment.