• Title/Summary/Keyword: data pre-processing

Search Result 809, Processing Time 0.04 seconds

High Rate Denial-of-Service Attack Detection System for Cloud Environment Using Flume and Spark

  • Gutierrez, Janitza Punto;Lee, Kilhung
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.675-689
    • /
    • 2021
  • Nowadays, cloud computing is being adopted for more organizations. However, since cloud computing has a virtualized, volatile, scalable and multi-tenancy distributed nature, it is challenging task to perform attack detection in the cloud following conventional processes. This work proposes a solution which aims to collect web server logs by using Flume and filter them through Spark Streaming in order to only consider suspicious data or data related to denial-of-service attacks and reduce the data that will be stored in Hadoop Distributed File System for posterior analysis with the frequent pattern (FP)-Growth algorithm. With the proposed system, we can address some of the difficulties in security for cloud environment, facilitating the data collection, reducing detection time and consequently enabling an almost real-time attack detection.

Forest Fire Detection System using Drone Streaming Images (드론 스트리밍 영상 이미지 분석을 통한 실시간 산불 탐지 시스템)

  • Yoosin Kim
    • Journal of Advanced Navigation Technology
    • /
    • v.27 no.5
    • /
    • pp.685-689
    • /
    • 2023
  • The proposed system in the study aims to detect forest fires in real-time stream data received from the drone-camera. Recently, the number of wildfires has been increasing, and also the large scaled wildfires are frequent more and more. In order to prevent forest fire damage, many experiments using the drone camera and vision analysis are actively conducted, however there were many challenges, such as network speed, pre-processing, and model performance, to detect forest fires from real-time streaming data of the flying drone. Therefore, this study applied image data processing works to capture five good image frames for vision analysis from whole streaming data and then developed the object detection model based on YOLO_v2. As the result, the classification model performance of forest fire images reached upto 93% of accuracy, and the field test for the model verification detected the forest fire with about 70% accuracy.

Souce Code Identification Using Deep Neural Network (심층신경망을 이용한 소스 코드 원작자 식별)

  • Rhim, Jisu;Abuhmed, Tamer
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.9
    • /
    • pp.373-378
    • /
    • 2019
  • Since many programming sources are open online, problems with reckless plagiarism and copyrights are occurring. Among them, source codes produced by repeated authors may have unique fingerprints due to their programming characteristics. This paper identifies each author by learning from a Google Code Jam program source using deep neural network. In this case, the original creator's source is to be vectored using a pre-processing instrument such as predictive-based vector or frequency-based approach, TF-IDF, etc. and to identify the original program source by learning by using a deep neural network. In addition a language-independent learning system was constructed using a pre-processing machine and compared with other existing learning methods. Among them, models using TF-IDF and in-depth neural networks were found to perform better than those using other pre-processing or other learning methods.

A System for Thermal Distortion Analysis of Hull Structures by Solar Radiation (선체의 태양복사 열변형 해석을 위한 전처리시스템)

  • Ha, Yunsok;Lee, Donghoon
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.53 no.4
    • /
    • pp.275-281
    • /
    • 2016
  • One of the most important things for quality to meet ship-production schedule is an accuracy control. A ship is assembled by welding through whole production process, so it is important that loss by correction will not happen as much as possible by using some engineering skills like reverse design, reverse setting and margin for thermal shrinkage. These efforts are a quite effective in fabrication stages, but not in erection stages. If a ship block which consists of common steel is exposed to directional solar radiation, its dimensional accuracy will change high as time by its thermal expansion coefficient. Therefore, the measuring work would be often done at dawn or evening even with having a very accurate device. In this study, an FE analysis method is developed to solve this problem. It can change measured data affected by solar thermal distortion to ones not, even though ship-block is measured at an arbitrary time. It will use the time when measuring, the direction of block and the weather record by satellites. It is confirmed by a comparison between measured data of a ship-block and the result by suggested analysis method. Furthermore, a pre-processing system is also developed for fast application of the suggested analysis method.

A Design and Implementation of Missing Person Identification System using face Recognition

  • Shin, Jong-Hwan;Park, Chan-Mi;Lee, Heon-Ju;Lee, Seoung-Hyeon;Lee, Jae-Kwang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.2
    • /
    • pp.19-25
    • /
    • 2021
  • In this paper proposes a method of finding missing persons based on face-recognition technology and deep learning. In this paper, a real-time face-recognition technology was developed, which performs face verification and improves the accuracy of face identification through data fortification for face recognition and convolutional neural network(CNN)-based image learning after the pre-processing of images transmitted from a mobile device. In identifying a missing person's image using the system implemented in this paper, the model that learned both original and blur-processed data performed the best. Further, a model using the pre-learned Noisy Student outperformed the one not using the same, but it has had a limitation of producing high levels of deflection and dispersion.

Effective Pre-rating Method Based on Users' Dichotomous Preferences and Average Ratings Fusion for Recommender Systems

  • Cheng, Shulin;Wang, Wanyan;Yang, Shan;Cheng, Xiufang
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.462-472
    • /
    • 2021
  • With an increase in the scale of recommender systems, users' rating data tend to be extremely sparse. Some methods have been utilized to alleviate this problem; nevertheless, it has not been satisfactorily solved yet. Therefore, we propose an effective pre-rating method based on users' dichotomous preferences and average ratings fusion. First, based on a user-item ratings matrix, a new user-item preference matrix was constructed to analyze and model user preferences. The items were then divided into two categories based on a parameterized dynamic threshold. The missing ratings for items that the user was not interested in were directly filled with the lowest user rating; otherwise, fusion ratings were utilized to fill the missing ratings. Further, an optimized parameter λ was introduced to adjust their weights. Finally, we verified our method on a standard dataset. The experimental results show that our method can effectively reduce the prediction error and improve the recommendation quality. As for its application, our method is effective, but not complicated.

Feature selection-based Risk Prediction for Hypertension in Korean men (한국 남성의 고혈압에 대한 특징 선택 기반 위험 예측)

  • Dashdondov, Khongorzul;Kim, Mi-Hye
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.323-325
    • /
    • 2021
  • In this article, we have improved the prediction of hypertension detection using the feature selection method for the Korean national health data named by the KNHANES database. The study identified a variety of risk factors associated with chronic hypertension. The paper is divided into two modules. The first of these is a data pre-processing step that uses a factor analysis (FA) based feature selection method from the dataset. The next module applies a predictive analysis step to detect and predict hypertension risk prediction. In this study, we compare the mean standard error (MSE), F1-score, and area under the ROC curve (AUC) for each classification model. The test results show that the proposed FIFA-OE-NB algorithm has an MSE, F1-score, and AUC outcomes 0.259, 0.460, and 64.70%, respectively. These results demonstrate that the proposed FIFA-OE method outperforms other models for hypertension risk predictions.

Investigation of light stimulated mouse brain activation in high magnetic field fMRI using image segmentation methods

  • Kim, Wook;Woo, Sang-Keun;Kang, Joo Hyun;Lim, Sang Moo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.12
    • /
    • pp.11-18
    • /
    • 2016
  • Magnetic resonance image (MRI) is widely used in brain research field and medical image. Especially, non-invasive brain activation acquired image technique, which is functional magnetic resonance image (fMRI) is used in brain study. In this study, we investigate brain activation occurred by LED light stimulation. For investigate of brain activation in experimental small animal, we used high magnetic field 9.4T MRI. Experimental small animal is Balb/c mouse, method of fMRI is using echo planar image (EPI). EPI method spend more less time than any other MRI method. For this reason, however, EPI data has low contrast. Due to the low contrast, image pre-processing is very hard and inaccuracy. In this study, we planned the study protocol, which is called block design in fMRI research field. The block designed has 8 LED light stimulation session and 8 rest session. All block is consist of 6 EPI images and acquired 1 slice of EPI image is 16 second. During the light session, we occurred LED light stimulation for 1 minutes 36 seconds. During the rest session, we do not occurred light stimulation and remain the light off state for 1 minutes 36 seconds. This session repeat the all over the EPI scan time, so the total spend time of EPI scan has almost 26 minutes. After acquired EPI data, we performed the analysis of this image data. In this study, we analysis of EPI data using statistical parametric map (SPM) software and performed image pre-processing such as realignment, co-registration, normalization, smoothing of EPI data. The pre-processing of fMRI data have to segmented using this software. However this method has 3 different method which is Gaussian nonparametric, warped modulate, and tissue probability map. In this study we performed the this 3 different method and compared how they can change the result of fMRI analysis results. The result of this study show that LED light stimulation was activate superior colliculus region in mouse brain. And the most higher activated value of segmentation method was using tissue probability map. this study may help to improve brain activation study using EPI and SPM analysis.

A Design of RSIDS using Rough Set Theory and Support Vector Machine Algorithm (Rough Set Theory와 Support Vector Machine 알고리즘을 이용한 RSIDS 설계)

  • Lee, Byung-Kwan;Jeong, Eun-Hee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.12
    • /
    • pp.179-185
    • /
    • 2012
  • This paper proposes a design of RSIDS(RST and SVM based Intrusion Detection System) using RST(Rough Set Theory) and SVM(Support Vector Machine) algorithm. The RSIDS consists of PrePro(PreProcessing) module, RRG(RST based Rule Generation) module, and SAD(SVM based Attack Detection) module. The PrePro module changes the collected information to the data format of RSIDS. The RRG module analyzes attack data, generates the rules of attacks, extracts attack information from the massive data by using these rules, and transfers the extracted attack information to the SAD module. The SAD module detects the attacks by using it, which the SAD module notifies to a manager. Therefore, compared to the existing SVM, the RSIDS improved average ADR(Attack Detection Ratio) from 77.71% to 85.28%, and reduced average FPR(False Positive ratio) from 13.25% to 9.87%. Thus, the RSIDS is estimated to have been improved, compared to the existing SVM.

New Medical Image Fusion Approach with Coding Based on SCD in Wireless Sensor Network

  • Zhang, De-gan;Wang, Xiang;Song, Xiao-dong
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.6
    • /
    • pp.2384-2392
    • /
    • 2015
  • The technical development and practical applications of big-data for health is one hot topic under the banner of big-data. Big-data medical image fusion is one of key problems. A new fusion approach with coding based on Spherical Coordinate Domain (SCD) in Wireless Sensor Network (WSN) for big-data medical image is proposed in this paper. In this approach, the three high-frequency coefficients in wavelet domain of medical image are pre-processed. This pre-processing strategy can reduce the redundant ratio of big-data medical image. Firstly, the high-frequency coefficients are transformed to the spherical coordinate domain to reduce the correlation in the same scale. Then, a multi-scale model product (MSMP) is used to control the shrinkage function so as to make the small wavelet coefficients and some noise removed. The high-frequency parts in spherical coordinate domain are coded by improved SPIHT algorithm. Finally, based on the multi-scale edge of medical image, it can be fused and reconstructed. Experimental results indicate the novel approach is effective and very useful for transmission of big-data medical image(especially, in the wireless environment).