• Title/Summary/Keyword: Feature Variables

Search Result 366, Processing Time 0.023 seconds

Availability Verification of Feature Variables for Pattern Classification on Weld Flaws (용접결함의 패턴분류를 위한 특징변수 유효성 검증)

  • Kim, Chang-Hyun;Kim, Jae-Yeol;Yu, Hong-Yeon;Hong, Sung-Hoon
    • Transactions of the Korean Society of Machine Tool Engineers
    • /
    • v.16 no.6
    • /
    • pp.62-70
    • /
    • 2007
  • In this study, the natural flaws in welding parts are classified using the signal pattern classification method. The storage digital oscilloscope including FFT function and enveloped waveform generator is used and the signal pattern recognition procedure is made up the digital signal processing, feature extraction, feature selection and classifier design. It is composed with and discussed using the distance classifier that is based on euclidean distance the empirical Bayesian classifier. Feature extraction is performed using the class-mean scatter criteria. The signal pattern classification method is applied to the signal pattern recognition of natural flaws.

Feature selection for text data via topic modeling (토픽 모형을 이용한 텍스트 데이터의 단어 선택)

  • Woosol, Jang;Ye Eun, Kim;Won, Son
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.739-754
    • /
    • 2022
  • Usually, text data consists of many variables, and some of them are closely correlated. Such multi-collinearity often results in inefficient or inaccurate statistical analysis. For supervised learning, one can select features by examining the relationship between target variables and explanatory variables. On the other hand, for unsupervised learning, since target variables are absent, one cannot use such a feature selection procedure as in supervised learning. In this study, we propose a word selection procedure that employs topic models to find latent topics. We substitute topics for the target variables and select terms which show high relevance for each topic. Applying the procedure to real data, we found that the proposed word selection procedure can give clear topic interpretation by removing high-frequency words prevalent in various topics. In addition, we observed that, by applying the selected variables to the classifiers such as naïve Bayes classifiers and support vector machines, the proposed feature selection procedure gives results comparable to those obtained by using class label information.

Association Rule Mining Considering Strategic Importance (전략적 중요도를 고려한 연관규칙 탐사)

  • Choi, Doug-Won;Shin, Jin-Gyu
    • Annual Conference of KIPS
    • /
    • 2007.05a
    • /
    • pp.443-446
    • /
    • 2007
  • A new association rule mining algorithm, which reflects the strategic importance of associative relationships between items, was developed and presented in this paper. This algorithm exploits the basic framework of Apriori procedures and TSAA(transitive support association Apriori) procedure developed by Hyun and Choi in evaluating non-frequent itemsets. The algorithm considers the strategic importance(weight) of feature variables in the association rule mining process. Sample feature variables of strategic importance include: profitability, marketing value, customer satisfaction, and frequency. A database with 730 transaction data set of a large scale discount store was used to compare and verify the performance of the presented algorithm against the existing Apriori and TSAA algorithms. The result clearly indicated that the new algorithm produced substantially different association itemsets according to the weights assigned to the strategic feature variables.

Landslide susceptibility assessment using feature selection-based machine learning models

  • Liu, Lei-Lei;Yang, Can;Wang, Xiao-Mi
    • Geomechanics and Engineering
    • /
    • v.25 no.1
    • /
    • pp.1-16
    • /
    • 2021
  • Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The large number of inputs or conditioning factors for these models, however, can reduce the computation efficiency and increase the difficulty in collecting data. Feature selection is a good tool to address this problem by selecting the most important features among all factors to reduce the size of the input variables. However, two important questions need to be solved: (1) how do feature selection methods affect the performance of machine learning models? and (2) which feature selection method is the most suitable for a given machine learning model? This paper aims to address these two questions by comparing the predictive performance of 13 feature selection-based machine learning (FS-ML) models and 5 ordinary machine learning models on LSA. First, five commonly used machine learning models (i.e., logistic regression, support vector machine, artificial neural network, Gaussian process and random forest) and six typical feature selection methods in the literature are adopted to constitute the proposed models. Then, fifteen conditioning factors are chosen as input variables and 1,017 landslides are used as recorded data. Next, feature selection methods are used to obtain the importance of the conditioning factors to create feature subsets, based on which 13 FS-ML models are constructed. For each of the machine learning models, a best optimized FS-ML model is selected according to the area under curve value. Finally, five optimal FS-ML models are obtained and applied to the LSA of the studied area. The predictive abilities of the FS-ML models on LSA are verified and compared through the receive operating characteristic curve and statistical indicators such as sensitivity, specificity and accuracy. The results showed that different feature selection methods have different effects on the performance of LSA machine learning models. FS-ML models generally outperform the ordinary machine learning models. The best FS-ML model is the recursive feature elimination (RFE) optimized RF, and RFE is an optimal method for feature selection.

A machine learning informed prediction of severe accident progressions in nuclear power plants

  • JinHo Song;SungJoong Kim
    • Nuclear Engineering and Technology
    • /
    • v.56 no.6
    • /
    • pp.2266-2273
    • /
    • 2024
  • A machine learning platform is proposed for the diagnosis of a severe accident progression in a nuclear power plant. To predict the key parameters for accident management including lost signals, a long short term memory (LSTM) network is proposed, where multiple accident scenarios are used for training. Training and test data were produced by MELCOR simulation of the Fukushima Daiichi Nuclear Power Plant (FDNPP) accident at unit 3. Feature variables were selected among plant parameters, where the importance ranking was determined by a recursive feature elimination technique using RandomForestRegressor. To answer the question of whether a reduced order ML model could predict the complex transient response, we performed a systematic sensitivity study for the choices of target variables, the combination of training and test data, the number of feature variables, and the number of neurons to evaluate the performance of the proposed ML platform. The number of sensitivity cases was chosen to guarantee a 95 % tolerance limit with a 95 % confidence level based on Wilks' formula to quantify the uncertainty of predictions. The results of investigations indicate that the proposed ML platform consistently predicts the target variable. The median and mean predictions were close to the true value.

Arrow Diagrams for Kernel Principal Component Analysis

  • Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.3
    • /
    • pp.175-184
    • /
    • 2013
  • Kernel principal component analysis(PCA) maps observations in nonlinear feature space to a reduced dimensional plane of principal components. We do not need to specify the feature space explicitly because the procedure uses the kernel trick. In this paper, we propose a graphical scheme to represent variables in the kernel principal component analysis. In addition, we propose an index for individual variables to measure the importance in the principal component plane.

Short Note on Optimizing Feature Selection to Improve Medical Diagnosis

  • Guo, Cui;Ryoo, Hong Seo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.39 no.4
    • /
    • pp.71-74
    • /
    • 2014
  • A new classification framework called 'support feature machine' was introduced in [2] for analyzing medical data. Contrary to authors' claim, however, the proposed method is not designed to guarantee minimizing the use of the spatial feature variables. This paper mathematically remedies this drawback and provides comments on models from [2].

Neural-network-based Fault Detection and Diagnosis Method Using EIV(errors-in variables) (EIV를 이용한 신경회로망 기반 고장진단 방법)

  • Han, Hyung-Seob;Cho, Sang-Jin;Chong, Ui-Pil
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • v.21 no.11
    • /
    • pp.1020-1028
    • /
    • 2011
  • As rotating machines play an important role in industrial applications such as aeronautical, naval and automotive industries, many researchers have developed various condition monitoring system and fault diagnosis system by applying artificial neural network. Since using obtained signals without preprocessing as inputs of neural network can decrease performance of fault classification, it is very important to extract significant features of captured signals and to apply suitable features into diagnosis system according to the kinds of obtained signals. Therefore, this paper proposes a neural-network-based fault diagnosis system using AR coefficients as feature vectors by LPC(linear predictive coding) and EIV(errors-in variables) analysis. We extracted feature vectors from sound, vibration and current faulty signals and evaluated the suitability of feature vectors depending on the classification results and training error rates by changing AR order and adding noise. From experimental results, we conclude that classification results using feature vectors by EIV analysis indicate more than 90 % stably for less than 10 orders and noise effect comparing to LPC.

Analysis of Weights and Feature Patterns in Popular 2D Deep Neural Networks Models for MRI Image Classification

  • Khagi, Bijen;Kwon, Goo-Rak
    • Journal of Multimedia Information System
    • /
    • v.9 no.3
    • /
    • pp.177-182
    • /
    • 2022
  • A deep neural network (DNN) includes variables whose values keep on changing with the training process until it reaches the final point of convergence. These variables are the co-efficient of a polynomial expression to relate to the feature extraction process. In general, DNNs work in multiple 'dimensions' depending upon the number of channels and batches accounted for training. However, after the execution of feature extraction and before entering the SoftMax or other classifier, there is a conversion of features from multiple N-dimensions to a single vector form, where 'N' represents the number of activation channels. This usually happens in a Fully connected layer (FCL) or a dense layer. This reduced 2D feature is the subject of study for our analysis. For this, we have used the FCL, so the trained weights of this FCL will be used for the weight-class correlation analysis. The popular DNN models selected for our study are ResNet-101, VGG-19, and GoogleNet. These models' weights are directly used for fine-tuning (with all trained weights initially transferred) and scratch trained (with no weights transferred). Then the comparison is done by plotting the graph of feature distribution and the final FCL weights.

A New Variable Selection Method Based on Mutual Information Maximization by Replacing Collinear Variables for Nonlinear Quantitative Structure-Property Relationship Models

  • Ghasemi, Jahan B.;Zolfonoun, Ehsan
    • Bulletin of the Korean Chemical Society
    • /
    • v.33 no.5
    • /
    • pp.1527-1535
    • /
    • 2012
  • Selection of the most informative molecular descriptors from the original data set is a key step for development of quantitative structure activity/property relationship models. Recently, mutual information (MI) has gained increasing attention in feature selection problems. This paper presents an effective mutual information-based feature selection approach, named mutual information maximization by replacing collinear variables (MIMRCV), for nonlinear quantitative structure-property relationship models. The proposed variable selection method was applied to three different QSPR datasets, soil degradation half-life of 47 organophosphorus pesticides, GC-MS retention times of 85 volatile organic compounds, and water-to-micellar cetyltrimethylammonium bromide partition coefficients of 62 organic compounds.The obtained results revealed that using MIMRCV as feature selection method improves the predictive quality of the developed models compared to conventional MI based variable selection algorithms.