• Title/Summary/Keyword: Hadoop Server

Search Result 28, Processing Time 0.019 seconds

Access efficiency of small sized files in Big Data using various Techniques on Hadoop Distributed File System platform

  • Alange, Neeta;Mathur, Anjali
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.7
    • /
    • pp.359-364
    • /
    • 2021
  • In recent years Hadoop usage has been increasing day by day. The need of development of the technology and its specified outcomes are eagerly waiting across globe to adopt speedy access of data. Need of computers and its dependency is increasing day by day. Big data is exponentially growing as the entire world is working in online mode. Large amount of data has been produced which is very difficult to handle and process within a short time. In present situation industries are widely using the Hadoop framework to store, process and produce at the specified time with huge amount of data that has been put on the server. Processing of this huge amount of data having small files & its storage optimization is a big problem. HDFS, Sequence files, HAR, NHAR various techniques have been already proposed. In this paper we have discussed about various existing techniques which are developed for accessing and storing small files efficiently. Out of the various techniques we have specifically tried to implement the HDFS- HAR, NHAR techniques.

Research of Soft-Interface Creation and Provision Methodology According to Applications Based on Mobile Device Environment (모바일 디바이스 환경에서 어플리케이션에 따른 소프트 인터페이스 제작 및 제공 방안 연구)

  • Cho, Changhee;Park, Sanghyun;Lee, Sang-Joon;Kim, Jinsul
    • Journal of Digital Contents Society
    • /
    • v.14 no.4
    • /
    • pp.513-519
    • /
    • 2013
  • In this paper, we provide interfaces according to user application environments and provide tools through web-site that users can create interface to apply a wide range of application environment. HTML5 is used in the creation processing, so users can create various interfaces by dragging mouse and apply it to multimedia, game applications as well as documents by using the ASCII code and key events that are provided in the Android OS. Database of interfaces is stored in HDFS (Hadoop Distributed File System) based on Hadoop for management and users can have their own designed interface or select interfaces through simple login any time. In order to provide interface quickly, HIVE based on Hadoop is used for search and the data is provided in XML file which smart mobile can process quickly.

Integrated Verification of Hadoop Cluster Prototypes and Analysis Software for SMB (중소기업을 위한 하둡 클러스터의 프로토타입과 분석 소프트웨어의 통합된 검증)

  • Cha, Byung-Rae;Kim, Nam-Ho;Lee, Seong-Ho;Ji, Yoo-Kang;Kim, Jong-Won
    • Journal of Advanced Navigation Technology
    • /
    • v.18 no.2
    • /
    • pp.191-199
    • /
    • 2014
  • Recently, researches to facilitate utilization by small and medium business (SMB) of cloud computing and big data paradigm, which is the booming adoption of IT area, has been on the increase. As one of these efforts, in this paper, we design and implement the prototype to tentatively build up Hadoop cluster under private cloud infrastructure environments. Prototype implementation are made on each hardware type such as single board, PC, and server and performance is measured. Also, we present the integrated verification results for the data analysis performance of the analysis software system running on top of realized prototypes by employing ASA (American Standard Association) Dataset. For this, we implement the analysis software system using several open sources such as R, Python, D3, and java and perform a test.

A Design of Hadoop Security Protocol using One Time Key based on Hash-chain (해시 체인 기반 일회용 키를 이용한 하둡 보안 프로토콜 설계)

  • Jeong, Eun-Hee;Lee, Byung-Kwan
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.4
    • /
    • pp.340-349
    • /
    • 2017
  • This paper is proposed Hadoop security protocol to protect a reply attack and impersonation attack. The proposed hadoop security protocol is consists of user authentication module, public key based data node authentication module, name node authentication module, and data node authentication module. The user authentication module is issued the temporary access ID from TGS after verifing user's identification on Authentication Server. The public key based data node authentication module generates secret key between name node and data node, and generates OTKL(One-Time Key List) using Hash-chain. The name node authentication module verifies user's identification using user's temporary access ID, and issues DT(Delegation Token) and BAT(Block Access Token) to user. The data node authentication module sends the encrypted data block to user after verifing user's identification using OwerID of BAT. Therefore the proposed hadoop security protocol dose not only prepare the exposure of data node's secret key by using OTKL, timestamp, owerID but also detect the reply attack and impersonation attack. Also, it enhances the data access of data node, and enforces data security by sending the encrypted data.

Implement on Search Machine using Open Source Framework (오픈 소스 프레임워크를 활용한 검색엔진 구현)

  • Song, Hyun-Ok;Kim, A-Yong;Jung, Hoe-Kyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.3
    • /
    • pp.552-557
    • /
    • 2015
  • IT technology development and smart appliances due to the increased use of a lot of data on production and consumption has become in the internet. Because this is why importance of information retrieval technology although the growing becoming aware of the difficult techniques to access the required of lot a background knowledge on information retrieval technology. However, the Lucene due to emerge provide to background can implement on search engine by using the Lucene of lack background knowledge for search technology. In this paper, suggest to implement on search engine by using the developed a framework on Lucene-based. Suggest a frameworks are use in the search engines on have guarantee in server environment support on distributed processing and distributed storage, and high availability by using the Hadoop and Nutch, Solr, Zookeeper.

An Efficient Log Data Processing Architecture for Internet Cloud Environments

  • Kim, Julie;Bahn, Hyokyung
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.8 no.1
    • /
    • pp.33-41
    • /
    • 2016
  • Big data management is becoming an increasingly important issue in both industry and academia of information science community today. One of the important categories of big data generated from software systems is log data. Log data is generally used for better services in various service providers and can also be used to improve system reliability. In this paper, we propose a novel big data management architecture specialized for log data. The proposed architecture provides a scalable log management system that consists of client and server side modules for efficient handling of log data. To support large and simultaneous log data from multiple clients, we adopt the Hadoop infrastructure in the server-side file system for storing and managing log data efficiently. We implement the proposed architecture to support various client environments and validate the efficiency through measurement studies. The results show that the proposed architecture performs better than the existing logging architecture by 42.8% on average. All components of the proposed architecture are implemented based on open source software and the developed prototypes are now publicly available.

High Rate Denial-of-Service Attack Detection System for Cloud Environment Using Flume and Spark

  • Gutierrez, Janitza Punto;Lee, Kilhung
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.675-689
    • /
    • 2021
  • Nowadays, cloud computing is being adopted for more organizations. However, since cloud computing has a virtualized, volatile, scalable and multi-tenancy distributed nature, it is challenging task to perform attack detection in the cloud following conventional processes. This work proposes a solution which aims to collect web server logs by using Flume and filter them through Spark Streaming in order to only consider suspicious data or data related to denial-of-service attacks and reduce the data that will be stored in Hadoop Distributed File System for posterior analysis with the frequent pattern (FP)-Growth algorithm. With the proposed system, we can address some of the difficulties in security for cloud environment, facilitating the data collection, reducing detection time and consequently enabling an almost real-time attack detection.

CERES: A Log-based, Interactive Web Analytics System for Backbone Networks (CERES: 백본망 로그 기반 대화형 웹 분석 시스템)

  • Suh, Ilhyun;Chung, Yon Dohn
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.10
    • /
    • pp.651-657
    • /
    • 2015
  • The amount of web traffic has increased as a result of the rapid growth of the use of web-based applications. In order to obtain valuable information from web logs, we need to develop systems that can support interactive, flexible, and efficient ways to analyze and handle large amounts of data. In this paper, we present CERES, a log-based, interactive web analytics system for backbone networks. Since CERES focuses on analyzing web log records generated from backbone networks, it is possible to perform a web analysis from the perspective of a network. CERES is designed for deployment in a server cluster using the Hadoop Distributed File System (HDFS) as the underlying storage. We transform and store web log records from backbone networks into relations and then allow users to use a SQL-like language to analyze web log records in a flexible and interactive manner. In particular, we use the data cube technique to enable the efficient statistical analysis of web log. The system provides users a web-based, multi-modal user interface.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

Design of Efficient Big Data Collection Method based on Mass IoT devices (방대한 IoT 장치 기반 환경에서 효율적인 빅데이터 수집 기법 설계)

  • Choi, Jongseok;Shin, Yongtae
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.4
    • /
    • pp.300-306
    • /
    • 2021
  • Due to the development of IT technology, hardware technologies applied to IoT equipment have recently been developed, so smart systems using low-cost, high-performance RF and computing devices are being developed. However, in the infrastructure environment where a large amount of IoT devices are installed, big data collection causes a load on the collection server due to a bottleneck between the transmitted data. As a result, data transmitted to the data collection server causes packet loss and reduced data throughput. Therefore, there is a need for an efficient big data collection technique in an infrastructure environment where a large amount of IoT devices are installed. Therefore, in this paper, we propose an efficient big data collection technique in an infrastructure environment where a vast amount of IoT devices are installed. As a result of the performance evaluation, the packet loss and data throughput of the proposed technique are completed without loss of the transmitted file. In the future, the system needs to be implemented based on this design.