• Title/Summary/Keyword: 하둡 환경

Search Result 95, Processing Time 0.035 seconds

A Study on Collaborative Filtering Recommendation Algorithm base on Hadoop and Spark (하둡 및 스파크 기반의 협력 필터링 추천 알고리즘 연구)

  • Jung, Young Gyo;Kim, Sang Young;Lee, Jung-June;Youn, Hee Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2016.01a
    • /
    • pp.81-82
    • /
    • 2016
  • 최근 사용자들의 추천 서비스를 위해 다른 사용자들의 평가값을 이용하여 특정 사용자에게 서비스를 추천해주는 추천 시스템은 협력 필터링 방법을 널리 사용되고 있다. 하지만 이러한 추천 시스템은 클러스터링 과정에서 이미 분류된 그룹에 특정 사용자가 분류되어 정확히 분류되지 못하고, 사용자들의 평가값 오차가 클 경우 정확하지 못한 결과를 추천하는 문제점이 있다. 본 논문에서는 이러한 문제점을 해결하기 위하여 협력 필터링 알고리즘을 클러스터링 기반으로 분산 환경에서 구현하여, 추천의 효과를 최적화 하는 기법을 제안하며 하둡 및 스파크 기반으로 시스템을 구성하여 협력 필터링 추천 알고리즘을 비교 하였다.

  • PDF

Anomaly Detection Technique of Log Data Using Hadoop Ecosystem (하둡 에코시스템을 활용한 로그 데이터의 이상 탐지 기법)

  • Son, Siwoon;Gil, Myeong-Seon;Moon, Yang-Sae
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.2
    • /
    • pp.128-133
    • /
    • 2017
  • In recent years, the number of systems for the analysis of large volumes of data is increasing. Hadoop, a representative big data system, stores and processes the large data in the distributed environment of multiple servers, where system-resource management is very important. The authors attempted to detect anomalies from the rapid changing of the log data that are collected from the multiple servers using simple but efficient anomaly-detection techniques. Accordingly, an Apache Hive storage architecture was designed to store the log data that were collected from the multiple servers in the Hadoop ecosystem. Also, three anomaly-detection techniques were designed based on the moving-average and 3-sigma concepts. It was finally confirmed that all three of the techniques detected the abnormal intervals correctly, while the weighted anomaly-detection technique is more precise than the basic techniques. These results show an excellent approach for the detection of log-data anomalies with the use of simple techniques in the Hadoop ecosystem.

A Design of Permission Management System Based on Group Key in Hadoop Distributed File System (하둡 분산 파일 시스템에서 그룹키 기반 Permission Management 시스템 설계)

  • Kim, Hyungjoo;Kang, Jungho;You, Hanna;Jun, Moonseog
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.4
    • /
    • pp.141-146
    • /
    • 2015
  • Data have been increased enormously due to the development of IT technology such as recent smart equipments, social network services and streaming services. To meet these environments the technologies that can treat mass data have received attention, and the typical one is Hadoop. Hadoop is on the basis of open source, and it has been designed to be used at general purpose computers on the basis of Linux. To initial Hadoop nearly no security was introduced, but as the number of users increased data that need security increased and there appeared new version that introduced Kerberos and Token system in 2009. But in this method there was a problem that only one secret key can be used and access permission to blocks cannot be authenticated to each user, and there were weak points that replay attack and spoofing attack were possible. Hence, to supplement these weak points and to maintain efficiency a protocol on the basis of group key, in which users are authenticated in logical group and then this is reflected to token, is proposed in this paper. The result shows that it has solved the weak points and there is no big overhead in terms of efficiency.

The Bigdata Processing Environment Building for the Learning System (학습 시스템을 위한 빅데이터 처리 환경 구축)

  • Kim, Young-Geun;Kim, Seung-Hyun;Jo, Min-Hui;Kim, Won-Jung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.7
    • /
    • pp.791-797
    • /
    • 2014
  • In order to create an environment for Apache Hadoop for parallel distributed processing system of Bigdata, by connecting a plurality of computers, or to configure the node, using the configuration of the virtual nodes on a single computer it is necessary to build a cloud fading environment. However, be constructed in practice for education in these systems, there are many constraints in terms of cost and complex system configuration. Therefore, it is possible to be used as training for educational institutions and beginners in the field of Bigdata processing, development of learning systems and inexpensive practical is urgent. Based on the Raspberry Pi board, training and analysis of Big data processing, such as Hadoop and NoSQL is now the design and implementation of a learning system of parallel distributed processing of possible Bigdata in this study. It is expected that Bigdata parallel distributed processing system that has been implemented, and be a useful system for beginners who want to start a Bigdata and education.

A performance comparison for Apache Spark platform on environment of limited memory (제한된 메모리 환경에서의 아파치 스파크 성능 비교)

  • Song, Jun-Seok;Kim, Sang-Young;Lee, Jung-June;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2016.01a
    • /
    • pp.67-68
    • /
    • 2016
  • 최근 빅 데이터를 이용한 시스템들이 여러 분야에서 활발히 이용되기 시작하면서 대표적인 빅데이터 저장 및 처리 플랫폼인 하둡(Hadoop)의 기술적 단점을 보완할 수 있는 다양한 분산 시스템 플랫폼이 등장하고 있다. 그 중 아파치 스파크(Apache Spark)는 하둡 플랫폼의 속도저하 단점을 보완하기 위해 인 메모리 처리를 지원하여 대용량 데이터를 효율적으로 처리하는 오픈 소스 분산 데이터 처리 플랫폼이다. 하지만, 아파치 스파크의 작업은 메모리에 의존적이므로 제한된 메모리 환경에서 전체 작업 성능은 급격히 낮아진다. 본 논문에서는 메모리 용량에 따른 아파치 스파크 성능 비교를 통해 아파치 스파크 동작을 위해 필요한 적정 메모리 용량을 확인한다.

  • PDF

User Authentication Scheme based on Secret Sharing for Distributed File System in Hadoop (하둡의 분산 파일 시스템 구조를 고려한 비밀분산 기반의 사용자 인증 기법)

  • Kim, Su-Hyun;Lee, Im-Yeong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.740-743
    • /
    • 2013
  • 클라우드 컴퓨팅 환경에서는 사용자의 데이터를 수많은 분산서버를 이용하여 데이터를 암호화하여 저장한다. 구글, 야후 등 글로벌 인터넷 서비스 업체들은 인터넷 서비스 플랫폼의 중요성을 인식하고 자체 연구 개발을 수행, 저가 상용 노드를 기반으로 한 대규모 클러스터 기반의 클라우드 컴퓨팅 플랫폼 기술을 개발 활용하고 있다. 이와 같이 분산 컴퓨팅 환경에서 다양한 데이터 서비스가 가능해지면서 대용량 데이터의 분산관리가 주요 이슈로 떠오르고 있다. 한편, 대용량 데이터의 다양한 이용 형태로부터 악의적인 공격자나 내부 사용자에 의한 보안 취약성 및 프라이버시 침해가 발생할 수 있다. 특히, 하둡에서 데이터 블록의 권한 제어를 위해 사용하는 블록 접근 토큰에도 다양한 보안 취약점이 발생한다. 이러한 보안 취약점을 보완하기 위해 본 논문에서는 비밀분산 기반의 블록 접근 토큰 관리 기법을 제안한다.

Implement of Job Processing Using GPU for Hadoop Environment (하둡 환경에서 GPU를 사용한 Job 처리 방법)

  • Hong, Seok-min;Yoo, Yeon-jun;Lee, Hyeop Geon;Kim, Young Woon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.77-79
    • /
    • 2022
  • IT기술이 발전함에 따라 전 세계 데이터의 규모는 매년 증가하고 있다. 빅데이터 플랫폼을 사용하는 기업들은 더욱 빠른 빅데이터 처리를 원하고 있다. 이에 본 논문은 하둡 환경에서 GPU를 사용한 Job 처리 방법을 제안한다. 제안하는 방법은 CPU, GPU 클러스터를 따로 구성하여 세 가지 크기로 분류한 Job들을 알맞은 클러스터에 할당하여 처리한다. 향후, 제안하는 방법의 실질적인 검증을 위해 실제 구현과 성능 평가가 필요하다.

Research of Soft-Interface Creation and Provision Methodology According to Applications Based on Mobile Device Environment (모바일 디바이스 환경에서 어플리케이션에 따른 소프트 인터페이스 제작 및 제공 방안 연구)

  • Cho, Changhee;Park, Sanghyun;Lee, Sang-Joon;Kim, Jinsul
    • Journal of Digital Contents Society
    • /
    • v.14 no.4
    • /
    • pp.513-519
    • /
    • 2013
  • In this paper, we provide interfaces according to user application environments and provide tools through web-site that users can create interface to apply a wide range of application environment. HTML5 is used in the creation processing, so users can create various interfaces by dragging mouse and apply it to multimedia, game applications as well as documents by using the ASCII code and key events that are provided in the Android OS. Database of interfaces is stored in HDFS (Hadoop Distributed File System) based on Hadoop for management and users can have their own designed interface or select interfaces through simple login any time. In order to provide interface quickly, HIVE based on Hadoop is used for search and the data is provided in XML file which smart mobile can process quickly.

On Implementing a Learning Environment for Big Data Processing using Raspberry Pi (라즈베리파이를 이용한 빅 데이터 처리 학습 환경 구축)

  • Hwang, Boram;Kim, Seonggyu
    • Journal of Digital Convergence
    • /
    • v.14 no.4
    • /
    • pp.251-258
    • /
    • 2016
  • Big data processing is a broad term for processing data sets so large or complex that traditional data processing applications are inadequate. Widespread use of smart devices results in a huge impact on the way we process data. Many organizations are contemplating how to incorporate or integrate those devices into their enterprise data systems. We have proposed a way to process big data by way of integrating Raspberry Pi into a Hadoop cluster as a computational grid. We have then shown the efficiency through several experiments and the ease of scaling of the proposed system.

An elastic distributed parallel Hadoop system for bigdata platform and distributed inference engines (동적 분산병렬 하둡시스템 및 분산추론기에 응용한 서버가상화 빅데이터 플랫폼)

  • Song, Dong Ho;Shin, Ji Ae;In, Yean Jin;Lee, Wan Gon;Lee, Kang Se
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1129-1139
    • /
    • 2015
  • Inference process generates additional triples from knowledge represented in RDF triples of semantic web technology. Tens of million of triples as an initial big data and the additionally inferred triples become a knowledge base for applications such as QA(question&answer) system. The inference engine requires more computing resources to process the triples generated while inferencing. The additional computing resources supplied by underlying resource pool in cloud computing can shorten the execution time. This paper addresses an algorithm to allocate the number of computing nodes "elastically" at runtime on Hadoop, depending on the size of knowledge data fed. The model proposed in this paper is composed of the layered architecture: the top layer for applications, the middle layer for distributed parallel inference engine to process the triples, and lower layer for elastic Hadoop and server visualization. System algorithms and test data are analyzed and discussed in this paper. The model hast the benefit that rich legacy Hadoop applications can be run faster on this system without any modification.