• Title/Summary/Keyword: 하둡 스파크

Search Result 28, Processing Time 0.024 seconds

Operational Big Data Analytics platform for Smart Factory (스마트팩토리를 위한 운영빅데이터 분석 플랫폼)

  • Bae, Hyerim;Park, Sanghyuck;Choi, Yulim;Joo, Byeongjun;Sutrisnowati, Riska Asriana;Pulshashi, Iq Reviessay;Putra, Ahmad Dzulfikar Adi;Adi, Taufik Nur;Lee, Sanghwa;Won, Seokrae
    • The Journal of Bigdata
    • /
    • v.1 no.2
    • /
    • pp.9-19
    • /
    • 2016
  • Since ICT convergence became a major issue, German government has carried forward a policy 'Industry 4.0' that triggered ICT convergence with manufacturing. Now this trend gets into our stride. From this facts, we can expect great leap up to quality perfection in low cost. Recently Korean government also enforces policy with 'Manufacturing 3.0' for upgrading Korean manufacturing industry with being accelerated by many related technologies. We, in the paper, developed a custom-made operational big data analysis platform for the implementation of operational intelligence to improve industry capability. Our platform is designed based on spring framework and web. In addition, HDFS and spark architectures helps our system analyze massive data on the field with streamed data processed by process mining algorithm. Extracted knowledge from data will support enhancement of manufacturing performance.

  • PDF

Development of Procurement Announcement Analysis Support System (전자조달공고 분석지원 시스템 개발)

  • Lim, Il-kwon;Park, Dong-Jun;Cho, Han-Jin
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.8
    • /
    • pp.53-60
    • /
    • 2018
  • Domestic public e-procurement has been recognized excellence at home and abroad. However, it is difficult for procurement companies to check the related announcements and to grasp the status of procurement announcements at a glance. In this paper, we propose an e-Procurement Announcement Analysis Support System using the HDFS, HDFS, Apache Spark, and Collaborative Filtering Technology for procurement announcement recommendation service and procurement announcement and contract trend analysis service for effective e-procurement system. Procurement announcement recommendation service can relieve the procurement company from searching for announcements according to the characteristics and characteristics of the procurement company. The procurement announcement/contract trend analysis service visualizes the procurement announcement/contract information and procures It is implemented so that the analysis information of electronic procurement can be seen at a glance to the company and the demand organization.

Anomalous Pattern Analysis of Large-Scale Logs with Spark Cluster Environment

  • Sion Min;Youyang Kim;Byungchul Tak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.3
    • /
    • pp.127-136
    • /
    • 2024
  • This study explores the correlation between system anomalies and large-scale logs within the Spark cluster environment. While research on anomaly detection using logs is growing, there remains a limitation in adequately leveraging logs from various components of the cluster and considering the relationship between anomalies and the system. Therefore, this paper analyzes the distribution of normal and abnormal logs and explores the potential for anomaly detection based on the occurrence of log templates. By employing Hadoop and Spark, normal and abnormal log data are generated, and through t-SNE and K-means clustering, templates of abnormal logs in anomalous situations are identified to comprehend anomalies. Ultimately, unique log templates occurring only during abnormal situations are identified, thereby presenting the potential for anomaly detection.

An Analysis of Factors Affecting Quality of Life through the Analysis of Public Health Big Data (클라우드 기반의 공개의료 빅데이터 분석을 통한 삶의 질에 영향을 미치는 요인분석)

  • Kim, Min-kyoung;Cho, Young-bok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.6
    • /
    • pp.835-841
    • /
    • 2018
  • In this study, we analyzed public health data analysis using the hadoop-based spack in the cloud environment using the data of the Community Health Survey from 2012 to 2014, and the factors affecting the quality of life and quality of life. In the proposed paper, we constructed a cloud manager for parallel processing support using Hadoop - based Spack for open medical big data analysis. And we analyzed the factors affecting the "quality of life" of the individual among open medical big data quickly without restriction of hardware. The effects of public health data on health - related quality of life were classified into personal characteristics and community characteristics. And multiple-level regression analysis (ANOVA, t-test). As a result of the experiment, the factors affecting the quality of life were 73.8 points for men and 70.0 points for women, indicating that men had higher health - related quality of life than women.

FAST Design for Large-Scale Satellite Image Processing (대용량 위성영상 처리를 위한 FAST 시스템 설계)

  • Lee, Youngrim;Park, Wanyong;Park, Hyunchun;Shin, Daesik
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.25 no.4
    • /
    • pp.372-380
    • /
    • 2022
  • This study proposes a distributed parallel processing system, called the Fast Analysis System for remote sensing daTa(FAST), for large-scale satellite image processing and analysis. FAST is a system that designs jobs in vertices and sequences, and distributes and processes them simultaneously. FAST manages data based on the Hadoop Distributed File System, controls entire jobs based on Apache Spark, and performs tasks in parallel in multiple slave nodes based on a docker container design. FAST enables the high-performance processing of progressively accumulated large-volume satellite images. Because the unit task is performed based on Docker, it is possible to reuse existing source codes for designing and implementing unit tasks. Additionally, the system is robust against software/hardware faults. To prove the capability of the proposed system, we performed an experiment to generate the original satellite images as ortho-images, which is a pre-processing step for all image analyses. In the experiment, when FAST was configured with eight slave nodes, it was found that the processing of a satellite image took less than 30 sec. Through these results, we proved the suitability and practical applicability of the FAST design.

Development of Big Data and AutoML Platforms for Smart Plants (스마트 플랜트를 위한 빅데이터 및 AutoML 플랫폼 개발)

  • Jin-Young Kang;Byeong-Seok Jeong
    • The Journal of Bigdata
    • /
    • v.8 no.2
    • /
    • pp.83-95
    • /
    • 2023
  • Big data analytics and AI play a critical role in the development of smart plants. This study presents a big data platform for plant data and an 'AutoML platform' for AI-based plant O&M(Operation and Maintenance). The big data platform collects, processes and stores large volumes of data generated in plants using Hadoop, Spark, and Kafka. The AutoML platform is a machine learning automation system aimed at constructing predictive models for equipment prognostics and process optimization in plants. The developed platforms configures a data pipeline considering compatibility with existing plant OISs(Operation Information Systems) and employs a web-based GUI to enhance both accessibility and convenience for users. Also, it has functions to load user-customizable modules into data processing and learning algorithms, which increases process flexibility. This paper demonstrates the operation of the platforms for a specific process of an oil company in Korea and presents an example of an effective data utilization platform for smart plants.

Design of Splunk Platform based Big Data Analysis System for Objectionable Information Detection (Splunk 플랫폼을 활용한 유해 정보 탐지를 위한 빅데이터 분석 시스템 설계)

  • Lee, Hyeop-Geon;Kim, Young-Woon;Kim, Ki-Young;Choi, Jong-Seok
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.1
    • /
    • pp.76-81
    • /
    • 2018
  • The Internet of Things (IoT), which is emerging as a future economic growth engine, has been actively introduced in areas close to our daily lives. However, there are still IoT security threats that need to be resolved. In particular, with the spread of smart homes and smart cities, an explosive amount of closed-circuit televisions (CCTVs) have been installed. The Internet protocol (IP) information and even port numbers assigned to CCTVs are open to the public via search engines of web portals or on social media platforms, such as Facebook and Twitter; even with simple tools these pieces of information can be easily hacked. For this reason, a big-data analytics system is needed, capable of supporting quick responses against data, that can potentially contain risk factors to security or illegal websites that may cause social problems, by assisting in analyzing data collected by search engines and social media platforms, frequently utilized by Internet users, as well as data on illegal websites.

Design and Implemention of Real-time web Crawling distributed monitoring system (실시간 웹 크롤링 분산 모니터링 시스템 설계 및 구현)

  • Kim, Yeong-A;Kim, Gea-Hee;Kim, Hyun-Ju;Kim, Chang-Geun
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.1
    • /
    • pp.45-53
    • /
    • 2019
  • We face problems from excessive information served with websites in this rapidly changing information era. We find little information useful and much useless and spend a lot of time to select information needed. Many websites including search engines use web crawling in order to make data updated. Web crawling is usually used to generate copies of all the pages of visited sites. Search engines index the pages for faster searching. With regard to data collection for wholesale and order information changing in realtime, the keyword-oriented web data collection is not adequate. The alternative for selective collection of web information in realtime has not been suggested. In this paper, we propose a method of collecting information of restricted web sites by using Web crawling distributed monitoring system (R-WCMS) and estimating collection time through detailed analysis of data and storing them in parallel system. Experimental results show that web site information retrieval is applied to the proposed model, reducing the time of 15-17%.