• Title/Summary/Keyword: Hadoop System

Search Result 235, Processing Time 0.028 seconds

A Study on Phon Call Big Data Analytics (전화통화 빅데이터 분석에 관한 연구)

  • Kim, Jeongrae;Jeong, Chanki
    • Journal of Information Technology and Architecture
    • /
    • v.10 no.3
    • /
    • pp.387-397
    • /
    • 2013
  • This paper proposes an approach to big data analytics for phon call data. The analytical models for phon call data is composed of the PVPF (Parallel Variable-length Phrase Finding) algorithm for identifying verbal phrases of natural language and the word count algorithm for measuring the usage frequency of keywords. In the proposed model, we identify words using the PVPF algorithm, and measure the usage frequency of the identified words using word count algorithm in MapReduce. The results can be interpreted from various viewpoints. We design and implement the model based HDFS (Hadoop Distributed File System), verify the proposed approach through a case study of phon call data. So we extract useful results through analysis of keyword correlation and usage frequency.

Deep Learning-Based Smart Meter Wattage Prediction Analysis Platform

  • Jang, Seonghoon;Shin, Seung-Jung
    • International journal of advanced smart convergence
    • /
    • v.9 no.4
    • /
    • pp.173-178
    • /
    • 2020
  • As the fourth industrial revolution, in which people, objects, and information are connected as one, various fields such as smart energy, smart cities, artificial intelligence, the Internet of Things, unmanned cars, and robot industries are becoming the mainstream, drawing attention to big data. Among them, Smart Grid is a technology that maximizes energy efficiency by converging information and communication technologies into the power grid to establish a smart grid that can know electricity usage, supply volume, and power line conditions. Smart meters are equient that monitors and communicates power usage. We start with the goal of building a virtual smart grid and constructing a virtual environment in which real-time data is generated to accommodate large volumes of data that are small in capacity but regularly generated. A major role is given in creating a software/hardware architecture deployment environment suitable for the system for test operations. It is necessary to identify the advantages and disadvantages of the software according to the characteristics of the collected data and select sub-projects suitable for the purpose. The collected data was collected/loaded/processed/analyzed by the Hadoop ecosystem-based big data platform, and used to predict power demand through deep learning.

Development of Software Education Support System using Learning Analysis Technique (학습분석 기법을 적용한 소프트웨어교육 지원 시스템 개발)

  • Jeon, In-seong;Song, Ki-Sang
    • Journal of The Korean Association of Information Education
    • /
    • v.24 no.2
    • /
    • pp.157-165
    • /
    • 2020
  • As interest in software education has increased, discussions on teaching, learning, and evaluation method it have also been active. One of the problems of software education teaching method is that the instructor cannot grasp the content of coding in progress in the learner's computer in real time, and therefore, instructors are limited in providing feedback to learners in a timely manner. To overcome this problem, in this study, we developed a software education support system that grasps the real-time learner coding situation under block-based programming environment by applying a learning analysis technique and delivers it to the instructor, and visualizes the data collected during learning through the Hadoop system. The system includes a presentation layer to which teachers and learners access, a business layer to analyze and structure code, and a DB layer to store class information, account information, and learning information. The instructor can set the content to be learned in advance in the software education support system, and compare and analyze the learner's achievement through the computational thinking components rubric, based on the data comparing the stored code with the students' code.

A System Design for Real-Time Monitoring of Patient Waiting Time based on Open-Source Platform (오픈소스 플랫폼 기반의 실시간 환자 대기시간 모니터링 시스템 설계)

  • Ryu, Wooseok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.4
    • /
    • pp.575-580
    • /
    • 2018
  • This paper discusses system for real-time monitoring of patient waiting time in hospitals based on open-source platform. It is necessary to make use of open-source projects to develop a high-performance stream processing system, which analyzes and processes stream data in real time, with less cost. The Hadoop ecosystem is a well-known big data processing platform consisting of numerous open-source subprojects. This paper first defines several requirements for the monitoring system, and selects a few projects from the Hadoop ecosystem that are suited to meet the requirements. Then, the paper proposes system architecture and a detailed module design using Apache Spark, Apache Kafka, and so on. The proposed system can reduce development costs by using open-source projects and by acquiring data from legacy hospital information system. High-performance and fault-tolerance of the system can also be achieved through distributed processing.

Interoperability between NoSQL and RDBMS via Auto-mapping Scheme in Distributed Parallel Processing Environment (분산병렬처리 환경에서 오토매핑 기법을 통한 NoSQL과 RDBMS와의 연동)

  • Kim, Hee Sung;Lee, Bong Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.11
    • /
    • pp.2067-2075
    • /
    • 2017
  • Lately big data processing is considered as an emerging issue. As a huge amount of data is generated, data processing capability is getting important. In processing big data, both Hadoop distributed file system and unstructured date processing-based NoSQL data store are getting a lot of attention. However, there still exists problems and inconvenience to use NoSQL. In case of low volume data, MapReduce of NoSQL normally consumes unnecessary processing time and requires relatively much more data retrieval time than RDBMS. In order to address the NoSQL problem, in this paper, an interworking scheme between NoSQL and the conventional RDBMS is proposed. The developed auto-mapping scheme enables to choose an appropriate database (NoSQL or RDBMS) depending on the amount of data, which results in fast search time. The experimental results for a specific data set shows that the database interworking scheme reduces data searching time by 35% at the maximum.

Constructing a Support Vector Machine for Localization on a Low-End Cluster Sensor Network (로우엔드 클러스터 센서 네트워크에서 위치 측정을 위한 지지 벡터 머신)

  • Moon, Sangook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.12
    • /
    • pp.2885-2890
    • /
    • 2014
  • Localization of a sensor network node using machine learning has been recently studied. It is easy for Support vector machines algorithm to implement in high level language enabling parallelism. Raspberrypi is a linux system which can be used as a sensor node. Pi can be used to construct IP based Hadoop clusters. In this paper, we realized Support vector machine using python language and built a sensor network cluster with 5 Pi's. We also established a Hadoop software framework to employ MapReduce mechanism. In our experiment, we implemented the test sensor network with a variety of parameters and examined based on proficiency, resource evaluation, and processing time. The experimentation showed that with more execution power and memory volume, Pi could be appropriate for a member node of the cluster, accomplishing precise classification for sensor localization using machine learning.

A Security Log Analysis System using Logstash based on Apache Elasticsearch (아파치 엘라스틱서치 기반 로그스태시를 이용한 보안로그 분석시스템)

  • Lee, Bong-Hwan;Yang, Dong-Min
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.2
    • /
    • pp.382-389
    • /
    • 2018
  • Recently cyber attacks can cause serious damage on various information systems. Log data analysis would be able to resolve this problem. Security log analysis system allows to cope with security risk properly by collecting, storing, and analyzing log data information. In this paper, a security log analysis system is designed and implemented in order to analyze security log data using the Logstash in the Elasticsearch, a distributed search engine which enables to collect and process various types of log data. The Kibana, an open source data visualization plugin for Elasticsearch, is used to generate log statistics and search report, and visualize the results. The performance of Elasticsearch-based security log analysis system is compared to the existing log analysis system which uses the Flume log collector, Flume HDFS sink and HBase. The experimental results show that the proposed system tremendously reduces both database query processing time and log data analysis time compared to the existing Hadoop-based log analysis system.

A Study on Procurement Audit Integration Real Time Monitoring System Using Process Mining Under Big Data Environment (빅 데이터 환경하에서 프로세스 마이닝을 이용한 구매 감사 통합 실시간 모니터링 시스템에 대한 연구)

  • Yoo, Young-Seok;Park, Han-Gyu;Back, Seung-Hoon;Hong, Sung-Chan
    • Journal of Internet Computing and Services
    • /
    • v.18 no.3
    • /
    • pp.71-83
    • /
    • 2017
  • In recent years, by utilizing the greatest strengths of process mining, the various research activities have been actively progressed to use auditing work of business organization. On the other hand, there is insufficient research on systematic and efficient analysis of massive data generated under big data environment using process mining, and proactive monitoring of risk management from audit side, which is one of important management activities of corporate organization. In this study, we intend to realize Hadoop-based internal audit integrated real-time monitoring system in order to detect the abnormal symptoms in prevent accidents in advance. Through the integrated real-time monitoring system for purchasing audit, we intend to realize strengthen the delivery management of purchasing materials ordered, reduce cost of purchase, manage competitive companies, prevent fraud, comply with regulations, and adhere to internal control accounting system. As a result, we can provide information that can be immediately executed due to enhanced purchase audit integrated real-time monitoring by analyzing data efficiently using process mining via Hadoop-based systems. From an integrated viewpoint, it is possible to manage the business status, by processing a large amount of work at a high speed faster than the continuous monitoring, the effectiveness of the quality improvement of the purchase audit and the innovation of the purchase process appears.

Spark-based Network Log Analysis Aystem for Detecting Network Attack Pattern Using Snort (Snort를 이용한 비정형 네트워크 공격패턴 탐지를 수행하는 Spark 기반 네트워크 로그 분석 시스템)

  • Baek, Na-Eun;Shin, Jae-Hwan;Chang, Jin-Su;Chang, Jae-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.4
    • /
    • pp.48-59
    • /
    • 2018
  • Recently, network technology has been used in various fields due to development of network technology. However, there has been an increase in the number of attacks targeting public institutions and companies by exploiting the evolving network technology. Meanwhile, the existing network intrusion detection system takes much time to process logs as the amount of network log increases. Therefore, in this paper, we propose a Spark-based network log analysis system that detects unstructured network attack pattern. by using Snort. The proposed system extracts and analyzes the elements required for network attack pattern detection from large amount of network log data. For the analysis, we propose a rule to detect network attack patterns for Port Scanning, Host Scanning, DDoS, and worm activity, and can detect real attack pattern well by applying it to real log data. Finally, we show from our performance evaluation that the proposed Spark-based log analysis system is more than two times better on log data processing performance than the Hadoop-based system.

Big data-based information recommendation system (빅데이터 기반 정보 추천 시스템)

  • Lee, Jong-Chan;Lee, Moon-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.3
    • /
    • pp.443-450
    • /
    • 2018
  • Due to the improvement of quality of life, health care is a main concern of modern people, and the demand for healthcare system is increasing naturally. However, it is difficult to provide customized wellness information suitable for a specific user because there are various medical information on the Internet and it is difficult to estimate the reliability of the information. In this study, we propose a user - centered service that can provide customized service suitable for users rather than simple search function by classifying big data as text mining and providing personalized medical information. We built a big data system and measured the data processing time while increasing the Hadoop slave node for efficient big data analysis. It is confirmed that it is efficient to build big data system than existing system.