• Title/Summary/Keyword: HADOOP

Search Result 398, Processing Time 0.029 seconds

Applying TIPC Protocol for Increasing Network Performance in Hadoop-based Distributed Computing Environment (Hadoop 기반 분산 컴퓨팅 환경에서 네트워크 I/O의 성능개선을 위한 TIPC의 적용과 분석)

  • Yoo, Dae-Hyun;Chung, Sang-Hwa;Kim, Tae-Hun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.5
    • /
    • pp.351-359
    • /
    • 2009
  • Recently with increase of data in the Internet, platform technologies that can process huge data effectively such as Google platform and Hadoop are regarded as worthy of notice. In this kind of platform, there exist network I/O overheads to send task outputs due to the MapReduce operation which is a programming model to support parallel computation in the large cluster system. In this paper, we suggest applying of TIPC (Transparent Inter-Process Communication) protocol for reducing network I/O overheads and increasing network performance in the distributed computing environments. TIPC has a lightweight protocol stack and it spends relatively less CPU time than TCP because of its simple connection establishment and logical addressing. In this paper, we analyze main features of the Hadoop-based distributed computing system, and we build an experimental model which can be used for experiments to compare the performance of various protocols. In the experimental result, TIPC has a higher bandwidth and lower CPU overheads than other protocols.

Initial Authentication Protocol of Hadoop Distribution System based on Elliptic Curve (타원곡선기반 하둡 분산 시스템의 초기 인증 프로토콜)

  • Jeong, Yoon-Su;Kim, Yong-Tae;Park, Gil-Cheol
    • Journal of Digital Convergence
    • /
    • v.12 no.10
    • /
    • pp.253-258
    • /
    • 2014
  • Recently, the development of cloud computing technology is developed as soon as smartphones is increases, and increased that users want to receive big data service. Hadoop framework of the big data service is provided to hadoop file system and hadoop mapreduce supported by data-intensive distributed applications. But, smpartphone service using hadoop system is a very vulnerable state to data authentication. In this paper, we propose a initial authentication protocol of hadoop system assisted by smartphone service. Proposed protocol is combine symmetric key cryptography techniques with ECC algorithm in order to support the secure multiple data processing systems. In particular, the proposed protocol to access the system by the user Hadoop when processing data, the initial authentication key and the symmetric key instead of the elliptic curve by using the public key-based security is improved.

Design and Implementation of HDFS Data Encryption Scheme Using ARIA Algorithms on Hadoop (하둡 상에서 ARIA 알고리즘을 이용한 HDFS 데이터 암호화 기법의 설계 및 구현)

  • Song, Youngho;Shin, YoungSung;Chang, Jae-Woo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.2
    • /
    • pp.33-40
    • /
    • 2016
  • Due to the growth of social network systems (SNS), big data are realized and Hadoop was developed as a distributed platform for analyzing big data. Enterprises analyze data containing users' sensitive information by using Hadoop and utilize them for marketing. Therefore, researches on data encryption have been done to protect the leakage of sensitive data stored in Hadoop. However, the existing researches support only the AES encryption algorithm, the international standard of data encryption. Meanwhile, Korean government choose ARIA algorithm as a standard data encryption one. In this paper, we propose a HDFS data encryption scheme using ARIA algorithms on Hadoop. First, the proposed scheme provide a HDFS block splitting component which performs ARIA encryption and decryption under the distributed computing environment of Hadoop. Second, the proposed scheme also provide a variable-length data processing component which performs encryption and decryption by adding dummy data, in case when the last block of data does not contains 128 bit data. Finally, we show from performance analysis that our proposed scheme can be effectively used for both text string processing applications and science data analysis applications.

Monitoring Tool for Hadoop Cluster (Hadoop 클러스터를 위한 모니터링 툴)

  • Keum, Tae-Hoon;Lee, Won-Joo;Jeon, Chang-Ho
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2010.07a
    • /
    • pp.17-18
    • /
    • 2010
  • 최근 이슈가 되고 있는 클라우드 컴퓨팅은 다수의 노드를 이용한 클러스터를 사용한다. 이러한 클러스터를 효율적으로 관리하기 위해 모니터링 툴을 사용하고 있다. 하지만, 기존의 모니터링 툴은 클러스터를 구성하는 노드의 가용성과 오버헤드, 데이터 수집/전송 방식에 중심을 둔 모니터링 툴이기 때문에 클라우드 클러스터의 세부 정보까지 모니터링 할 수 없다. 따라서 본 논문에서는 클라우드 컴퓨팅을 구축할 수 있는 플랫폼인 Hadoop을 위한 모니터링 툴을 제안한다.

  • PDF

Task Assignment Policy for Hadoop Considering Availability of Nodes (노드의 가용성을 고려한 하둡 태스크 할당 정책)

  • Ryu, Wooseok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.103-105
    • /
    • 2017
  • Hadoop MapReduce is a processing framework in which users' job can be efficiently processed in parallel and distributed ways on the Hadoop cluster. MapReduce task schedulers are used to select target nodes and assigns user's tasks to them. Previous schedulers cannot fully utilize resources of Hadoop cluster because they does not consider dynamic characteristics of cluster based on nodes' availability. To increase utilization of Hadoop cluster, this paper proposes a novel task assignment policy for MapReduce that assigns a job tasks to dynamic cluster efficiently by considering availability of each node.

  • PDF

Security Threats and Review for SQL on Hadoop (SQL on Hadoop 기술 동향 및 보안 위협)

  • Youn, Han Jung;Suk, Sang Kee
    • Annual Conference of KIPS
    • /
    • 2015.04a
    • /
    • pp.691-693
    • /
    • 2015
  • SQL on Hadoop 기술은 하둡 분산 파일 시스템에 저장된 데이터를 대상으로 SQL을 이용하여 사용자의 질의를 처리하는 기술이다. 기존의 Hadoop 시스템이 맵리듀스의 한계와 기존 시스템의 호환성으로 인해 RDBMS와 병행사용이 불가피하다는 단점을 SQL을 이용해 극복하고자 하는 것이다. 본 논문에서는 SQL on Hadoop의 대표적 프레임워크인 Hive와 Impala의 특징과, 연구동향에 대해 살펴보고 예상되는 보안 위협에 대해 고찰한다.

Workflow Scheduling Using Heuristic Scheduling in Hadoop

  • Thingom, Chintureena;Kumar R, Ganesh;Yeon, Guydeuk
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.4
    • /
    • pp.264-270
    • /
    • 2018
  • In our research study, we aim at optimizing multiple load in cloud, effective resource allocation and lesser response time for the job assigned. Using Hadoop on datacenter is the best and most efficient analytical service for any corporates. To provide effective and reliable performance analytical computing interface to the client, various cloud service providers host Hadoop clusters. The previous works done by many scholars were aimed at execution of workflows on Hadoop platform which also minimizes the cost of virtual machines and other computing resources. Earlier stochastic hill climbing technique was applied for single parameter and now we are working to optimize multiple parameters in the cloud data centers with proposed heuristic hill climbing. As many users try to priorities their job simultaneously in the cluster, resource optimized workflow scheduling technique should be very reliable to complete the task assigned before the deadlines and also to optimize the usage of the resources in cloud.

Study on Data Processing of the IOT Sensor Network Based on a Hadoop Cloud Platform and a TWLGA Scheduling Algorithm

  • Li, Guoyu;Yang, Kang
    • Journal of Information Processing Systems
    • /
    • v.17 no.6
    • /
    • pp.1035-1043
    • /
    • 2021
  • An Internet of Things (IOT) sensor network is an effective solution for monitoring environmental conditions. However, IOT sensor networks generate massive data such that the abilities of massive data storage, processing, and query become technical challenges. To solve the problem, a Hadoop cloud platform is proposed. Using the time and workload genetic algorithm (TWLGA), the data processing platform enables the work of one node to be shared with other nodes, which not only raises efficiency of one single node but also provides the compatibility support to reduce the possible risk of software and hardware. In this experiment, a Hadoop cluster platform with TWLGA scheduling algorithm is developed, and the performance of the platform is tested. The results show that the Hadoop cloud platform is suitable for big data processing requirements of IOT sensor networks.

Design on the IoT Sensor Data Collection Envionment using Lambda Architecture (Lambda 구조를 적용한 IoT 센서 데이터 수집 환경 설계)

  • Hwang, Yun-Young;Kim, Soo-Hyun;Shin, Yong-Tae
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.07a
    • /
    • pp.547-548
    • /
    • 2020
  • 데이터의 양은 기술의 발전과 함께 크게 증가하였다. Hadoop은 빅데이터 분야에서 사용되는 대표적인 빅데이터 처리 플랫폼으로 IoT 분야에서도 사용된다. HDFS(Haddop Distributed File System)는 Hadoop의 코어 프로젝트로 블록 기반의 대용량 데이터 저장소다. 기존의 Hadoop 기반 IoT 센서 데이터 수집 환경은 HDFS를 사용한다. 그러나 HDFS의 Small File로 인한 네임노드의 과부하 문제와 한 번 Import된 데이터의 Update와 Delete를 지원하지 않는 Hadoop의 특징으로 인해 성능과 활용이 제한적이다. 본 논문에서는 기존 Hadoop 기반 IoT 센서 데이터 수집 환경의 단점을 극복하기 위해 Lambda 구조를 적용한 IoT 센서 데이터 수집 환경을 설계한다.

  • PDF

Lambda Architecture Design using Apache Kudu and Impala (Apache Kudu와 Impala를 활용한 Lambda Arch tecture 설계)

  • Hwang, Yun-Young;Lee, Pil-Won;Shin, Yong-Tae
    • Annual Conference of KIPS
    • /
    • 2020.05a
    • /
    • pp.60-62
    • /
    • 2020
  • 데이터의 양은 기술의 발전으로 발생하는 크게 증가하였고 다양한 빅데이터 처리 플랫폼이 등장하고 있다. 이 중 가장 널리 사용되고 있는 품랫폼이 Apache 소프트웨어 재단에서 개발한 Hadoop이며, Hadoop은 IoT 분야에도 사용된다. 그러나 기존에 Hadoop 기반 IoT 센서 데이터 수집 분석 환경은 Hadoop의 코어 프로젝트인 HDFS의 Small File로 인한 네임노드의 과부하 문제와 Import된 데이터의 Update나 Delete가 불가능하다는 문제가 있다. 본 논문에서는 Apache Kudu와 Impala를 활용해 Lambda Architecture를 설계한다. 제안하는 Architecture는 IoT 센서 데이터를 Cold-Data와 Hot-Data로 분류해 각 성격에 맞는 스토리지에 저장하고 Batch를 동해 생성된 Batch-View와 Apache Kudu와 Impala를 통해 생성된 Real-time View를 활용해 기존 Hadoop 기반 IoT 센서 데이터 수집 분석 환경의 문제를 해결하고 사용자가 분석된 데이터에 접근하는 시간을 단축한다.