• Title/Summary/Keyword: Hadoop File System

Search Result 89, Processing Time 0.029 seconds

Delayed Block Replication Scheme of Hadoop Distributed File System for Flexible Management of Distributed Nodes (하둡 분산 파일시스템에서의 유연한 노드 관리를 위한 지연된 블록 복제 기법)

  • Ryu, Woo-Seok
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.12 no.2
    • /
    • pp.367-374
    • /
    • 2017
  • This paper discusses management problems of Hadoop distributed node, which is a platform for big data processing, and proposes a novel technique for enabling flexible node management of Hadoop Distributed File System. Hadoop cannot configure Hadoop cluster dynamically because it judges temporarily unavailable nodes as a failure. Delayed block replication scheme proposed in this paper delays the removal of unavailable node as much as possible so as to be easily rejoined. Experimental results show that the proposed scheme increases flexibility of node management with little impact on distributed processing performance when the cluster size changes.

A Licence Plate Recognition System using Hadoop (하둡을 이용한 번호판 인식 시스템)

  • Park, Jin-Woo;Park, Ho-Hyun
    • Journal of IKEEE
    • /
    • v.21 no.2
    • /
    • pp.142-145
    • /
    • 2017
  • Currently, a trend in image processing is high-quality and high-resolution. The size and amount of image data are increasing exponentially because of the development of information and communication technology. Thus, license plate recognition with a single processor cannot handle the increasing data. This paper proposes a number plate recognition system using a distributed processing framework, Hadoop. Using SequenceFile format in Hadoop, each mapper performs a license plate recognition with a number of image data in a data block Experimental results show that license plate recognition performance with 16 data nodes accomplishes speedup of maximum 14.7 times comparing with one data node. In large dataset, the recognition performance is robust even if the number of data nodes increases gradually.

A Study on the Improving Performance of Massively Small File Using the Reuse JVM in MapReduce (MapReduce에서 Reuse JVM을 이용한 대규모 스몰파일 처리성능 향상 방법에 관한 연구)

  • Choi, Chul Woong;Kim, Jeong In;Kim, Pan Koo
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.9
    • /
    • pp.1098-1104
    • /
    • 2015
  • With the widespread use of smartphones and IoT (Internet of Things), data are being generated on a large scale, and there is increased for the analysis of such data. Hence, distributed processing systems have gained much attention. Hadoop, which is a distributed processing system, saves the metadata of stored files in name nodes; in this case, the main problems are as follows: the memory becomes insufficient; load occurs because of massive small files; scheduling and file processing time increases because of the increased number of small files. In this paper, we propose a solution to address the increase in processing time because of massive small files, and thus improve the processing performance, using the Reuse JVM method provided by Hadoop. Through environment setting, the Reuse JVM method modifies the JVM produced conventionally for every task, so that multiple tasks are reused sequentially in one JVM. As a final outcome, the Reuse JVM method showed the best processing performance when used together with CombineFileInputFormat.

Implement of MapReduce-based Big Data Processing Scheme for Reducing Big Data Processing Delay Time and Store Data (빅데이터 처리시간 감소와 저장 효율성이 향상을 위한 맵리듀스 기반 빅데이터 처리 기법 구현)

  • Lee, Hyeopgeon;Kim, Young-Woon;Kim, Ki-Young
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.10
    • /
    • pp.13-19
    • /
    • 2018
  • MapReduce, the Hadoop's essential core technology, is most commonly used to process big data based on the Hadoop distributed file system. However, the existing MapReduce-based big data processing techniques have a feature of dividing and storing files in blocks predefined in the Hadoop distributed file system, thus wasting huge infrastructure resources. Therefore, in this paper, we propose an efficient MapReduce-based big data processing scheme. The proposed method enhances the storage efficiency of a big data infrastructure environment by converting and compressing the data to be processed into a data format in advance suitable for processing by MapReduce. In addition, the proposed method solves the problem of the data processing time delay arising from when implementing with focus on the storage efficiency.

A Secure Model for Reading and Writing in Hadoop Distributed File System and its Evaluation (하둡 분산파일시스템에서 안전한 쓰기, 읽기 모델과 평가)

  • Pang, Sechung;Ra, Ilkyeun;Kim, Yangwoo
    • Journal of Internet Computing and Services
    • /
    • v.13 no.5
    • /
    • pp.55-64
    • /
    • 2012
  • Nowadays, as Cloud computing becomes popular, a need for a DFS(distributed file system) is increased. But, in the current Cloud computing environments, there is no DFS framework that is sufficient to protect sensitive private information from attackers. Therefore, we designed and proposed a secure scheme for distributed file systems. The scheme provides confidentiality and availability for a distributed file system using a secret sharing method. In this paper, we measured the speed of encryption and decryption for our proposed method, and compared them with that of SEED algorithm which is the most popular algorithm in this field. This comparison showed the computational efficiency of our method. Moreover, the proposed secure read/write model is independent of Hadoop DFS structure so that our modified algorithm can be easily adapted for use in the HDFS. Finally, the proposed model is evaluated theoretically using performance measurement method for distributed secret sharing model.

A Parallel HDFS and MapReduce Functions for Emotion Analysis (감성분석을 위한 병렬적 HDFS와 맵리듀스 함수)

  • Back, BongHyun;Ryoo, Yun-Kyoo
    • Journal of the Korea society of information convergence
    • /
    • v.7 no.2
    • /
    • pp.49-57
    • /
    • 2014
  • Recently, opinion mining is introduced to extract useful information from SNS data and to evaluate the true intention of users. Opinion mining are required several efficient techniques to collect and analyze a large amount of SNS data and extract meaningful data from them. Therefore in this paper, we propose a parallel HDFS(Hadoop Distributed File System) and emotion functions based on Mapreduce to extract some emotional information of users from various unstructured big data on social networks. The experiment results have verified that the proposed system and functions perform faster than O(n) for data gathering time and loading time, and maintain stable load balancing for memory and CPU resources.

  • PDF

Development of Retargetable Hadoop Simulation Environment Based on DEVS Formalism (DEVS 형식론 기반의 재겨냥성 하둡 시뮬레이션 환경 개발)

  • Kim, Byeong Soo;Kang, Bong Gu;Kim, Tag Gon;Song, Hae Sang
    • Journal of the Korea Society for Simulation
    • /
    • v.26 no.4
    • /
    • pp.51-61
    • /
    • 2017
  • Hadoop platform is a representative storing and managing platform for big data. Hadoop consists of distributed computing system called MapReduce and distributed file system called HDFS. It is important to analyse the effectiveness according to the change of cluster constructions and several parameters. However, since it is hard to construct thousands of clusters and analyse the constructed system, simulation method is required to analyse the system. This paper proposes Hadoop simulator based on DEVS formalism which provides hierarchical and modular modeling. Hadoop simulator provides a retargetable experimental environment that is possible to change of various parameters, algorithms and models. It is also possible to design input models reflecting the characteristics of Hadoop applications. To maximize the user's convenience, the user interface, real-time model viewer, and input scenario editor are also provided. In this paper, we validate Hadoop Simulator through the comparison with the Hadoop execution results and perform various experiments.

Implementaion of Video Processing Framework using Hadoop-based cloud computing (Hadoop 기반 클라우드 컴퓨팅을 이용한 영상 처리 프레임워크 구현)

  • Ryu, Chungmo;Lee, Daecheol;Jang, Minwook;Kim, Cheolgi
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.139-142
    • /
    • 2013
  • 최근 대용량 영상데이터로부터 정보 수집, 영상 처리를 위한 클라우드 관련 연구들이 활발하다. 그러나 공개 소프트웨어를 이용한 클라우드 연구의 대부분은 라이브러리 수준이 아닌 단순히 프로그램 수준의 조합으로 작동한다. 이런 이유로 단순 조합에 따른 비효율성에 의한 성능문제는 크게 다루어지지 않는다. 본 논문에서는 이 비효율성을 해결하는데 중점을 두고 FFmpeg과 Hadoop을 라이브러리 수준으로 결합하여 기존보다 더 나은 성능의 영상클라우드 환경을 구축하였다. C기반의 영상처리 라이브러리인 FFmpeg와 JAVA기반의 클라우드 환경 Hadoop의 결합을 위해 JNI(Java Native Interface)를 이용하였다. 상세구현으로는 HDFS(Hadoop Distributed File System)을 확장하여 Hadoop MapReduce가 직접 FFmpeg을 통한 영상파일 접근이 가능하게 하였다. 이로써 FFmpeg과 Hadoop간 상이한 파일 접근 방식에서 발생하는 불필요한 작업에 의한 시스템의 성능저하를 막았다. 또한 응용의 확장성을 위해 영상작업시 작업영상을 영상처리의 최소단위인 GOP(Group of Pictures)단위로 잘라 클라우드의 노드들에게 분산시켰다. 결과적으로 기존에 존재하는 Hadoop과 FFmpeg을 프로그램적으로 결합한 영상처리 클라우드보다 총 처리시간을 앞당겼고, GOP 단위의 영상 처리는 영상기반 작업에 안정성과 응용의 확장성을 보장해주었다.

Initial Authentication Protocol of Hadoop Distribution System based on Elliptic Curve (타원곡선기반 하둡 분산 시스템의 초기 인증 프로토콜)

  • Jeong, Yoon-Su;Kim, Yong-Tae;Park, Gil-Cheol
    • Journal of Digital Convergence
    • /
    • v.12 no.10
    • /
    • pp.253-258
    • /
    • 2014
  • Recently, the development of cloud computing technology is developed as soon as smartphones is increases, and increased that users want to receive big data service. Hadoop framework of the big data service is provided to hadoop file system and hadoop mapreduce supported by data-intensive distributed applications. But, smpartphone service using hadoop system is a very vulnerable state to data authentication. In this paper, we propose a initial authentication protocol of hadoop system assisted by smartphone service. Proposed protocol is combine symmetric key cryptography techniques with ECC algorithm in order to support the secure multiple data processing systems. In particular, the proposed protocol to access the system by the user Hadoop when processing data, the initial authentication key and the symmetric key instead of the elliptic curve by using the public key-based security is improved.

Dynamic Cluster Management of Hadoop Distributed Filesystem (하둡 분산 파일시스템의 동적 클러스터 관리 기법)

  • Ryu, Wooseok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.10a
    • /
    • pp.435-437
    • /
    • 2016
  • Hadoop Distributed File System(HDFS) is a file system for distributed processing of big data by replicating data to distributed data nodes. HDFS cluster shows a great scalability up to thousands of nodes, but it assumes a exclusive node cluster with numerous nodes for the big data processing. Various operational-purpose worker systems used by office are hardly considered as a part of cluster. This paper discusses this problem and proposes a dynamic cluster management technique to increase storage capability and analytic performance of hadoop cluster. The propsed technique can add legacy systems to the cluster and can remove them from the cluster dynamically depending on their availability.

  • PDF