• Title/Summary/Keyword: recursive feature addition

Search Result 9, Processing Time 0.03 seconds

RFA: Recursive Feature Addition Algorithm for Machine Learning-Based Malware Classification

  • Byeon, Ji-Yun;Kim, Dae-Ho;Kim, Hee-Chul;Choi, Sang-Yong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.2
    • /
    • pp.61-68
    • /
    • 2021
  • Recently, various technologies that use machine learning to classify malicious code have been studied. In order to enhance the effectiveness of machine learning, it is most important to extract properties to identify malicious codes and normal binaries. In this paper, we propose a feature extraction method for use in machine learning using recursive methods. The proposed method selects the final feature using recursive methods for individual features to maximize the performance of machine learning. In detail, we use the method of extracting the best performing features among individual feature at each stage, and then combining the extracted features. We extract features with the proposed method and apply them to machine learning algorithms such as Decision Tree, SVM, Random Forest, and KNN, to validate that machine learning performance improves as the steps continue.

A Novel Network Anomaly Detection Method based on Data Balancing and Recursive Feature Addition

  • Liu, Xinqian;Ren, Jiadong;He, Haitao;Wang, Qian;Sun, Shengting
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.3093-3115
    • /
    • 2020
  • Network anomaly detection system plays an essential role in detecting network anomaly and ensuring network security. Anomaly detection system based machine learning has become an increasingly popular solution. However, due to the unbalance and high-dimension characteristics of network traffic, the existing methods unable to achieve the excellent performance of high accuracy and low false alarm rate. To address this problem, a new network anomaly detection method based on data balancing and recursive feature addition is proposed. Firstly, data balancing algorithm based on improved KNN outlier detection is designed to select part respective data on each category. Combination optimization about parameters of improved KNN outlier detection is implemented by genetic algorithm. Next, recursive feature addition algorithm based on correlation analysis is proposed to select effective features, in which a cross contingency test is utilized to analyze correlation and obtain a features subset with a strong correlation. Then, random forests model is as the classification model to detection anomaly. Finally, the proposed algorithm is evaluated on benchmark datasets KDD Cup 1999 and UNSW_NB15. The result illustrates the proposed strategies enhance accuracy and recall, and decrease the false alarm rate. Compared with other algorithms, this algorithm still achieves significant effects, especially recall in the small category.

A Study on the Feature Extraction for High Speed Character Recognition -By Using Interative Extraction and Hierarchical Formation of Directional Information- (고속 문자 인식을 위한 특징량 추출에 관한 연구 - 방향정보의 반복적 추출과 특징량의 계층성을 이용하여 -)

  • 강선미;이기용;양윤모;양윤모;김덕진
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.29B no.11
    • /
    • pp.102-110
    • /
    • 1992
  • In this paper, a new method of character recognition is proposed. It uses density information, in addition to positional and directional information generally used, to recognize a character. Four directional feature primitives are extracted from the thinning templates on the observation that the output of the templates have directional property in general. A simple and fast feature extraction scheme is possible. Features are organized from recursive nonary tree(N-tree) that corresponds to normalized character area. Each node of the N-tree has four directional features that are sum of the features of it's nine sub-nodes. Every feature primitive from the templates are added to the corresponding leaf and then summed to the upper nodes successively. Recognition can be accomplished by using appropriate feature level of N-tree. Also, effectiveness of each node's feature vector was tested by experiment. A method to implement the proposed feature vector organization algorithm into hardware is proposed as well. The third generation node, which is 4$\times$4, is used as a unit processing element to extract features, and it was implemented in hardware. As a result, we could observe that it is possible to extract feature vector for real-time processing.

  • PDF

Prediction on the Ratio of Added Value in Industry Using Forecasting Combination based on Machine Learning Method (머신러닝 기법 기반의 예측조합 방법을 활용한 산업 부가가치율 예측 연구)

  • Kim, Jeong-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.12
    • /
    • pp.49-57
    • /
    • 2020
  • This study predicts the ratio of added value, which represents the competitiveness of export industries in South Korea, using various machine learning techniques. To enhance the accuracy and stability of prediction, forecast combination technique was applied to predicted values of machine learning techniques. In particular, this study improved the efficiency of the prediction process by selecting key variables out of many variables using recursive feature elimination method and applying them to machine learning techniques. As a result, it was found that the predicted value by the forecast combination method was closer to the actual value than the predicted values of the machine learning techniques. In addition, the forecast combination method showed stable prediction results unlike volatile predicted values by machine learning techniques.

SOx Process Simulation, Monitoring, and Pattern Classification in a Power Plant (발전소에서의 SOx 공정 모사, 모니터링 및 패턴 분류)

  • 최상욱;유창규;이인범
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.8 no.10
    • /
    • pp.827-832
    • /
    • 2002
  • We propose a prediction method of the pollutant and a synchronous classification of the current state of SOx emission in the power plant. We use the auto-regressive with exogeneous (ARX) model as a predictor of SOx emission and use a radial basis function network (RBFN) as a pattem classifier. The ARX modeling scheme is implemented using recursive least squares (RLS) method to update the model parameters adaptively. The capability of SOx emission monitoring is utilized with the application of the RBFN classifier. Experimental results show that the ARX model can predict the SOx emission concentration well and ARX modeling parameters can be a good feature for the state monitoring. in addition, its validity has been verified through the power spectrum analysis. Consequently, the RBFN classifier in combination with ARX model is shown to be quite adequate for monitoring the state of SOx emission.

Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion

  • Zhou, Xuan
    • Journal of Information Processing Systems
    • /
    • v.17 no.2
    • /
    • pp.337-351
    • /
    • 2021
  • Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.

Prediction of the employment ratio by industry using constrainted forecast combination (제약하의 예측조합 방법을 활용한 산업별 고용비중 예측)

  • Kim, Jeong-Woo
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.11
    • /
    • pp.257-267
    • /
    • 2020
  • In this study, we predicted the employment ratio by the export industry using various machine learning methods and verified whether the prediction performance is improved by applying the constrained forecast combination method to these predicted values. In particular, the constrained forecast combination method is known to improve the prediction accuracy and stability by imposing the sum of predicted values' weights up to one. In addition, this study considered various variables affecting the employment ratio of each industry, and so we adopted recursive feature elimination method that allows efficient use of machine learning methods. As a result, the constrained forecast combination showed more accurate prediction performance than the predicted values of the machine learning methods, and in particular, the stability of the prediction performance of the constrained forecast combination was higher than that of other machine learning methods.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.

A Real-time Particle Filtering Framework for Robust Camera Tracking in An AR Environment (증강현실 환경에서의 강건한 카메라 추적을 위한 실시간 입자 필터링 기법)

  • Lee, Seok-Han
    • Journal of Digital Contents Society
    • /
    • v.11 no.4
    • /
    • pp.597-606
    • /
    • 2010
  • This paper describes a real-time camera tracking framework specifically designed to track a monocular camera in an AR workspace. Typically, the Kalman filter is often employed for the camera tracking. In general, however, tracking performances of conventional methods are seriously affected by unpredictable situations such as ambiguity in feature detection, occlusion of features and rapid camera shake. In this paper, a recursive Bayesian sampling framework which is also known as the particle filter is adopted for the camera pose estimation. In our system, the camera state is estimated on the basis of the Gaussian distribution without employing additional uncertainty model and sample weight computation. In addition, the camera state is directly computed based on new sample particles which are distributed according to the true posterior of system state. In order to verify the proposed system, we conduct several experiments for unstable situations in the desktop AR environments.