• Title/Summary/Keyword: Feature Importance Analysis

Search Result 135, Processing Time 0.026 seconds

DDoS traffic analysis using decision tree according by feature of traffic flow (트래픽 속성 개수를 고려한 의사 결정 트리 DDoS 기반 분석)

  • Jin, Min-Woo;Youm, Sung-Kwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.69-74
    • /
    • 2021
  • Internet access is also increasing as online activities increase due to the influence of Corona 19. However, network attacks are also diversifying by malicious users, and DDoS among the attacks are increasing year by year. These attacks are detected by intrusion detection systems and can be prevented at an early stage. Various data sets are used to verify intrusion detection algorithms, but in this paper, CICIDS2017, the latest traffic, is used. DDoS attack traffic was analyzed using the decision tree. In this paper, we analyzed the traffic by using the decision tree. Through the analysis, a decisive feature was found, and the accuracy of the decisive feature was confirmed by proceeding the decision tree to prove the accuracy of detection. And the contents of false positive and false negative traffic were analyzed. As a result, learning the feature and the two features showed that the accuracy was 98% and 99.8% respectively.

Fault Detection of a Proposed Three-Level Inverter Based on a Weighted Kernel Principal Component Analysis

  • Lin, Mao;Li, Ying-Hui;Qu, Liang;Wu, Chen;Yuan, Guo-Qiang
    • Journal of Power Electronics
    • /
    • v.16 no.1
    • /
    • pp.182-189
    • /
    • 2016
  • Fault detection is the research focus and priority in this study to ensure the high reliability of a proposed three-level inverter. Kernel principal component analysis (KPCA) has been widely used for feature extraction because of its simplicity. However, highlighting useful information that may be hidden under retained KPCs remains a problem. A weighted KPCA is proposed to overcome this shortcoming. Variable contribution plots are constructed to evaluate the importance of each KPC on the basis of sensitivity analysis theory. Then, different weighting values of KPCs are set to highlight the useful information. The weighted statistics are evaluated comprehensively by using the improved feature eigenvectors. The effectiveness of the proposed method is validated. The diagnosis results of the inverter indicate that the proposed method is superior to conventional KPCA.

Investigating Non-Laboratory Variables to Predict Diabetic and Prediabetic Patients from Electronic Medical Records Using Machine Learning

  • Mukhtar, Hamid;Al Azwari, Sana
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.9
    • /
    • pp.19-30
    • /
    • 2021
  • Diabetes Mellitus (DM) is one of common chronic diseases leading to severe health complications that may cause death. The disease influences individuals, community, and the government due to the continuous monitoring, lifelong commitment, and the cost of treatment. The World Health Organization (WHO) considers Saudi Arabia as one of the top 10 countries in diabetes prevalence across the world. Since most of the medical services are provided by the government, the cost of the treatment in terms of hospitals and clinical visits and lab tests represents a real burden due to the large scale of the disease. The ability to predict the diabetic status of a patient without the laboratory tests by performing screening based on some personal features can lessen the health and economic burden caused by diabetes alone. The goal of this paper is to investigate the prediction of diabetic and prediabetic patients by considering factors other than the laboratory tests, as required by physicians in general. With the data obtained from local hospitals, medical records were processed to obtain a dataset that classified patients into three classes: diabetic, prediabetic, and non-diabetic. After applying three machine learning algorithms, we established good performance for accuracy, precision, and recall of the models on the dataset. Further analysis was performed on the data to identify important non-laboratory variables related to the patients for diabetes classification. The importance of five variables (gender, physical activity level, hypertension, BMI, and age) from the person's basic health data were investigated to find their contribution to the state of a patient being diabetic, prediabetic or normal. Our analysis presented great agreement with the risk factors of diabetes and prediabetes stated by the American Diabetes Association (ADA) and other health institutions worldwide. We conclude that by performing class-specific analysis of the disease, important factors specific to Saudi population can be identified, whose management can result in controlling the disease. We also provide some recommendations learnt from this research.

An improved cross-correlation method based on wavelet transform and energy feature extraction for pipeline leak detection

  • Li, Suzhen;Wang, Xinxin;Zhao, Ming
    • Smart Structures and Systems
    • /
    • v.16 no.1
    • /
    • pp.213-222
    • /
    • 2015
  • Early detection and precise location of leakage is of great importance for life-cycle maintenance and management of municipal pipeline system. In the past few years, acoustic emission (AE) techniques have demonstrated to be an excellent tool for on-line leakage detection. Regarding the multi-mode and frequency dispersion characteristics of AE signals propagating along a pipeline, the direct cross-correlation technique that assumes the constant AE propagation velocity does not perform well in practice for acoustic leak location. This paper presents an improved cross-correlation method based on wavelet transform, with due consideration of the frequency dispersion characteristics of AE wave and the contribution of different mode. Laboratory experiments conducted to simulate pipeline gas leakage and investigate the frequency spectrum signatures of AE leak signals. By comparing with the other methods for leak location identification, the feasibility and superiority of the proposed method are verified.

A Weighted Fuzzy Min-Max Neural Network for Pattern Classification (패턴 분류 문제에서 가중치를 고려한 퍼지 최대-최소 신경망)

  • Kim Ho-Joon;Park Hyun-Jung
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.8
    • /
    • pp.692-702
    • /
    • 2006
  • In this study, a weighted fuzzy min-max (WFMM) neural network model for pattern classification is proposed. The model has a modified structure of FMM neural network in which the weight concept is added to represent the frequency factor of feature values in a learning data set. First we present in this paper a new activation function of the network which is defined as a hyperbox membership function. Then we introduce a new learning algorithm for the model that consists of three kinds of processes: hyperbox creation/expansion, hyperbox overlap test, and hyperbox contraction. A weight adaptation rule considering the frequency factors is defined for the learning process. Finally we describe a feature analysis technique using the proposed model. Four kinds of relevance factors among feature values, feature types, hyperboxes and patterns classes are proposed to analyze relative importance of each feature in a given problem. Two types of practical applications, Fisher's Iris data and Cleveland medical data, have been used for the experiments. Through the experimental results, the effectiveness of the proposed method is discussed.

A study on fault diagnosis of marine engine using a neural network with dimension-reduced vibration signals (차원 축소 진동 신호를 이용한 신경망 기반 선박 엔진 고장진단에 관한 연구)

  • Sim, Kichan;Lee, Kangsu;Byun, Sung-Hoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.5
    • /
    • pp.492-499
    • /
    • 2022
  • This study experimentally investigates the effect of dimensionality reduction of vibration signal on fault diagnosis of a marine engine. By using the principal component analysis, a vibration signal having the dimension of 513 is converted into a low-dimensional signal having the dimension of 1 to 15, and the variation in fault diagnosis accuracy according to the dimensionality change is observed. The vibration signal measured from a full-scale marine generator diesel engine is used, and the contribution of the dimension-reduced signal is quantitatively evaluated using two kinds of variable importance analysis algorithms which are the integrated gradients and the feature permutation methods. As a result of experimental data analysis, the accuracy of the fault diagnosis is shown to improve as the number of dimensions used increases, and when the dimension approaches 10, near-perfect fault classification accuracy is achieved. This shows that the dimension of the vibration signal can be considerably reduced without degrading fault diagnosis accuracy. In the variable importance analysis, the dimension-reduced principal components show higher contribution than the conventional statistical features, which supports the effectiveness of the dimension-reduced signals on fault diagnosis.

A SEM-ANN Two-step Approach for Predicting Determinants of Cloud Service Use Intention (SEM-Artificial Neural Network 2단계 접근법에 의한 클라우드 스토리지 서비스 이용의도 영향요인에 관한 연구)

  • Guangbo Jiang;Sundong Kwon
    • Journal of Information Technology Applications and Management
    • /
    • v.30 no.6
    • /
    • pp.91-111
    • /
    • 2023
  • This study aims to identify the influencing factors of intention to use cloud services using the SEM-ANN two-step approach. In previous studies of SEM-ANN, SEM presented R2 and ANN presented MSE(mean squared error), so analysis performance could not be compared. In this study, R2 and MSE were calculated and presented by SEM and ANN, respectively. Then, analysis performance was compared and feature importances were compared by sensitivity analysis. As a result, the ANN default model improved R2 by 2.87 compared to the PLS model, showing a small Cohen's effect size. The ANN optimization model improved R2 by 7.86 compared to the PLS model, showing a medium Cohen effect size. In normalized feature importances, the order of importances was the same for PLS and ANN. The contribution of this study, which links structural equation modeling to artificial intelligence, is that it verified the effect of improving the explanatory power of the research model while maintaining the order of importance of independent variables.

DDoS attack analysis based on decision tree considering importance (중요도를 고려한 의사 결정 트리 기반 DDoS 공격 분석)

  • Youm, Sungkwan;Park, Sangyoon;Shin, Kwang-Seong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.652-654
    • /
    • 2021
  • Attacks such as DDoS are detected by the intrusion detection system and can be prevented early. DDoS attack traffic was analyzed using the decision tree. Deterministic features with high importance were found, and the accuracy was verified by proceeding the decision tree for only those properties. And the contents of false positive and false negative traffic were analyzed. As a result, the accuracy of one attribute was 98% and the two attributes were 99.8%, respectively.

  • PDF

A Semantic Analysis of One Prodiscourse Maker in Korean:kulay (담화대용표지{그래}의 의미 연구)

  • 신현숙
    • Korean Journal of Cognitive Science
    • /
    • v.2 no.1
    • /
    • pp.143-165
    • /
    • 1990
  • I will discuss some aspects of the meaning of prodiscoure maker 'kulay'in Korea.This marker has been studied few scholars,since Korean lingusts did not have any interest about this category of linguistic form.Also,they did not realized the importance of discourse and discourse markers.So,we have only shallow information about prodiscourse phenomena and prodiscourse markers,too. Morphologically,kulay(그래)'could be analyzed into 'ku(그)'and 'lay(래)'and 'lay(래)'could be divided into'l(ㄹ)'and 'ay(ㅐ)' again.But I will discuss 'kulay'as one linguistic unit without divison. It will be claimed in this paper that both [prodiscoures]feature and [discourse continuity]feature can satisfactorily account for the core meaning of'kulay'.And,it will be mentioned that the marker has many kinds of specfic meaning depends on paricular discourse.Also, I would like to examine the semantic feature([prodiscourse+discourse continuity]) in many kinds of korean discourse.And I will show that some factors re;ated tp the marker's specific meaning are the meaning of preceding and following discourse and the participant's psychological attitude.The conclusion must be that the meaning of 'kulay'can help us understand certain phenomena about prodiscourse and prodiscourse markers in the korean language.Also the various meanings of 'kulay'can give more information to Applied-Korean linguistics.

Predicting Determinants of Seoul-Bike Data Using Optimized Gradient-Boost (최적화된 Gradient-Boost를 사용한 서울 자전거 데이터의 결정 요인 예측)

  • Kim, Chayoung;Kim, Yoon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.861-866
    • /
    • 2022
  • Seoul introduced the shared bicycle system, "Seoul Public Bike" in 2015 to help reduce traffic volume and air pollution. Hence, to solve various problems according to the supply and demand of the shared bicycle system, "Seoul Public Bike," several studies are being conducted. Most of the research is a strategic "Bicycle Rearrangement" in regard to the imbalance between supply and demand. Moreover, most of these studies predict demand by grouping features such as weather or season. In previous studies, demand was predicted by time-series-analysis. However, recently, studies that predict demand using deep learning or machine learning are emerging. In this paper, we can show that demand prediction can be made a little better by discovering new features or ordering the importance of various features based on well-known feature-patterns. In this study, by ordering the selection of new features or the importance of the features, a better coefficient of determination can be obtained even if the well-known deep learning or machine learning or time-series-analysis is exploited as it is. Therefore, we could be a better one for demand prediction.