• Title/Summary/Keyword: Correlation based Feature Selection

Search Result 54, Processing Time 0.022 seconds

A Study on CPA Performance Enhancement using the PCA (주성분 분석 기반의 CPA 성능 향상 연구)

  • Baek, Sang-Su;Jang, Seung-Kyu;Park, Aesun;Han, Dong-Guk;Ryou, Jae-Cheol
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.24 no.5
    • /
    • pp.1013-1022
    • /
    • 2014
  • Correlation Power Analysis (CPA) is a type of Side-Channel Analysis (SCA) that extracts the secret key using the correlation coefficient both side-channel information leakage by cryptography device and intermediate value of algorithms. Attack performance of the CPA is affected by noise and temporal synchronization of power consumption leaked. In the recent years, various researches about the signal processing have been presented to improve the performance of power analysis. Among these signal processing techniques, compression techniques of the signal based on Principal Component Analysis (PCA) has been presented. Selection of the principal components is an important issue in signal compression based on PCA. Because selection of the principal component will affect the performance of the analysis. In this paper, we present a method of selecting the principal component by using the correlation of the principal components and the power consumption is high and a CPA technique based on the principal component that utilizes the feature that the principal component has different. Also, we prove the performance of our method by carrying out the experiment.

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.8
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

Prediction model of osteoporosis using nutritional components based on association (연관성 규칙 기반 영양소를 이용한 골다공증 예측 모델)

  • Yoo, JungHun;Lee, Bum Ju
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.3
    • /
    • pp.457-462
    • /
    • 2020
  • Osteoporosis is a disease that occurs mainly in the elderly and increases the risk of fractures due to structural deterioration of bone mass and tissues. The purpose of this study are to assess the relationship between nutritional components and osteoporosis and to evaluate models for predicting osteoporosis based on nutrient components. In experimental method, association was performed using binary logistic regression, and predictive models were generated using the naive Bayes algorithm and variable subset selection methods. The analysis results for single variables indicated that food intake and vitamin B2 showed the highest value of the area under the receiver operating characteristic curve (AUC) for predicting osteoporosis in men. In women, monounsaturated fatty acids showed the highest AUC value. In prediction model of female osteoporosis, the models generated by the correlation based feature subset and wrapper based variable subset methods showed an AUC value of 0.662. In men, the model by the full variable obtained an AUC of 0.626, and in other male models, the predictive performance was very low in sensitivity and 1-specificity. The results of these studies are expected to be used as the basic information for the treatment and prevention of osteoporosis.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.

Remaining useful life prediction for PMSM under radial load using particle filter

  • Lee, Younghun;Kim, Inhwan;Choi, Sikgyoung;Oh, Jaewook;Kim, Namsu
    • Smart Structures and Systems
    • /
    • v.29 no.6
    • /
    • pp.799-805
    • /
    • 2022
  • Permanent magnet synchronous motors (PMSMs) are widely used in systems requiring high control precision, efficiency, and reliability. Predicting the remaining useful life (RUL) with health monitoring of PMSMs prevents catastrophic failure and ensures reliable operation of system. In this study, a model-based method for predicting the RUL of PMSMs using phase current and vibration signals is proposed. The proposed method includes feature selection and RUL prediction based on a particle filter with a degradation model. The Paris-Erdogan model describing micro fatigue crack propagation is used as the degradation model. An experimental set-up to conduct accelerated life test, capable of monitoring various signals was designed in this study. Phase current and vibration data obtained from an accelerated life test of the PMSMs were used to verify the proposed approach. Features extracted from the data were clustered based on monotonicity and correlation clustering, respectively. The results identify the effectiveness of using the current data in predicting the RUL of PMSMs.

Role of Features in Plasma Information Based Virtual Metrology (PI-VM) for SiO2 Etching Depth (플라즈마 정보인자를 활용한 SiO2 식각 깊이 가상 계측 모델의 특성 인자 역할 분석)

  • Jang, Yun Chang;Park, Seol Hye;Jeong, Sang Min;Ryu, Sang Won;Kim, Gon Ho
    • Journal of the Semiconductor & Display Technology
    • /
    • v.18 no.4
    • /
    • pp.30-34
    • /
    • 2019
  • We analyzed how the features in plasma information based virtual metrology (PI-VM) for SiO2 etching depth with variation of 5% contribute to the prediction accuracy, which is previously developed by Jang. As a single feature, the explanatory power to the process results is in the order of plasma information about electron energy distribution function (PIEEDF), equipment, and optical emission spectroscopy (OES) features. In the procedure of stepwise variable selection (SVS), OES features are selected after PIEEDF. Informative vector for developed PI-VM also shows relatively high correlation between OES features and etching depth. This is because the reaction rate of each chemical species that governs the etching depth can be sensitively monitored when OES features are used with PIEEDF. Securing PIEEDF is important for the development of virtual metrology (VM) for prediction of process results. The role of PIEEDF as an independent feature and the ability to monitor variation of plasma thermal state can make other features in the procedure of SVS more sensitive to the process results. It is expected that fault detection and classification (FDC) can be effectively developed by using the PI-VM.

Analyzing Factors Contributing to Research Performance using Backpropagation Neural Network and Support Vector Machine

  • Ermatita, Ermatita;Sanmorino, Ahmad;Samsuryadi, Samsuryadi;Rini, Dian Palupi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.1
    • /
    • pp.153-172
    • /
    • 2022
  • In this study, the authors intend to analyze factors contributing to research performance using Backpropagation Neural Network and Support Vector Machine. The analyzing factors contributing to lecturer research performance start from defining the features. The next stage is to collect datasets based on defining features. Then transform the raw dataset into data ready to be processed. After the data is transformed, the next stage is the selection of features. Before the selection of features, the target feature is determined, namely research performance. The selection of features consists of Chi-Square selection (U), and Pearson correlation coefficient (CM). The selection of features produces eight factors contributing to lecturer research performance are Scientific Papers (U: 154.38, CM: 0.79), Number of Citation (U: 95.86, CM: 0.70), Conference (U: 68.67, CM: 0.57), Grade (U: 10.13, CM: 0.29), Grant (U: 35.40, CM: 0.36), IPR (U: 19.81, CM: 0.27), Qualification (U: 2.57, CM: 0.26), and Grant Awardee (U: 2.66, CM: 0.26). To analyze the factors, two data mining classifiers were involved, Backpropagation Neural Networks (BPNN) and Support Vector Machine (SVM). Evaluation of the data mining classifier with an accuracy score for BPNN of 95 percent, and SVM of 92 percent. The essence of this analysis is not to find the highest accuracy score, but rather whether the factors can pass the test phase with the expected results. The findings of this study reveal the factors that have a significant impact on research performance and vice versa.

Content-based Image Retrieval using Variable Region Color (가변 영역 색상을 이용한 내용기반 영상검색)

  • Kim Dong-Woo;Song Young-Jun;Kwon Dong-Jin;Ahn Jae-Hyeong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.6 no.5
    • /
    • pp.367-372
    • /
    • 2005
  • In this paper, we proposed a method of content-based image retrieval using variable region. Content-based image retrieval uses color histogram for the most part. But the existing color histogram methods have a disadvantage that it reduces accuracy because of quantization error and absence of spatial information. In order to overcome this, we convert color information to HSV space, quantize hue factor being pure color information, and calculate histogram of the factor. On the other hand, to solve the problem of the absence of spatial information, we select object region in consideration of color feature and region correlation. It maintains the size of region in the selected object region. But non-object region is integrated in one region. After of selection variable region, we retrieve using color feature. As the result of experimentation, the proposed method improves 10$\%$ in average of precision.

  • PDF

Exploring Feature Selection Methods for Effective Emotion Mining (효과적 이모션마이닝을 위한 속성선택 방법에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.3
    • /
    • pp.107-117
    • /
    • 2019
  • In the era of SNS, many people relies on it to express their emotions about various kinds of products and services. Therefore, for the companies eagerly seeking to investigate how their products and services are perceived in the market, emotion mining tasks using dataset from SNSs become important much more than ever. Basically, emotion mining is a branch of sentiment analysis which is based on BOW (bag-of-words) and TF-IDF. However, there are few studies on the emotion mining which adopt feature selection (FS) methods to look for optimal set of features ensuring better results. In this sense, this study aims to propose FS methods to conduct emotion mining tasks more effectively with better outcomes. This study uses Twitter and SemEval2007 dataset for the sake of emotion mining experiments. We applied three FS methods such as CFS (Correlation based FS), IG (Information Gain), and ReliefF. Emotion mining results were obtained from applying the selected features to nine classifiers. When applying DT (decision tree) to Tweet dataset, accuracy increases with CFS, IG, and ReliefF methods. When applying LR (logistic regression) to SemEval2007 dataset, accuracy increases with ReliefF method.

A Network Intrusion Security Detection Method Using BiLSTM-CNN in Big Data Environment

  • Hong Wang
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.688-701
    • /
    • 2023
  • The conventional methods of network intrusion detection system (NIDS) cannot measure the trend of intrusiondetection targets effectively, which lead to low detection accuracy. In this study, a NIDS method which based on a deep neural network in a big-data environment is proposed. Firstly, the entire framework of the NIDS model is constructed in two stages. Feature reduction and anomaly probability output are used at the core of the two stages. Subsequently, a convolutional neural network, which encompasses a down sampling layer and a characteristic extractor consist of a convolution layer, the correlation of inputs is realized by introducing bidirectional long short-term memory. Finally, after the convolution layer, a pooling layer is added to sample the required features according to different sampling rules, which promotes the overall performance of the NIDS model. The proposed NIDS method and three other methods are compared, and it is broken down under the conditions of the two databases through simulation experiments. The results demonstrate that the proposed model is superior to the other three methods of NIDS in two databases, in terms of precision, accuracy, F1- score, and recall, which are 91.64%, 93.35%, 92.25%, and 91.87%, respectively. The proposed algorithm is significant for improving the accuracy of NIDS.