• Title/Summary/Keyword: Accuracy of Selection

Search Result 1,156, Processing Time 0.132 seconds

Performance Improvement of Feature Selection Methods based on Bio-Inspired Algorithms (생태계 모방 알고리즘 기반 특징 선택 방법의 성능 개선 방안)

  • Yun, Chul-Min;Yang, Ji-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.15B no.4
    • /
    • pp.331-340
    • /
    • 2008
  • Feature Selection is one of methods to improve the classification accuracy of data in the field of machine learning. Many feature selection algorithms have been proposed and discussed for years. However, the problem of finding the optimal feature subset from full data still remains to be a difficult problem. Bio-inspired algorithms are well-known evolutionary algorithms based on the principles of behavior of organisms, and very useful methods to find the optimal solution in optimization problems. Bio-inspired algorithms are also used in the field of feature selection problems. So in this paper we proposed new improved bio-inspired algorithms for feature selection. We used well-known bio-inspired algorithms, Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), to find the optimal subset of features that shows the best performance in classification accuracy. In addition, we modified the bio-inspired algorithms considering the prior importance (prior relevance) of each feature. We chose the mRMR method, which can measure the goodness of single feature, to set the prior importance of each feature. We modified the evolution operators of GA and PSO by using the prior importance of each feature. We verified the performance of the proposed methods by experiment with datasets. Feature selection methods using GA and PSO produced better performances in terms of the classification accuracy. The modified method with the prior importance demonstrated improved performances in terms of the evolution speed and the classification accuracy.

Development of Short-Term Load Forecasting Method by Analysis of Load Characteristics during Chuseok Holiday (추석 연휴 전력수요 특성 분석을 통한 단기전력 수요예측 기법 개발)

  • Kwon, Oh-Sung;Song, Kyung-Bin
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.60 no.12
    • /
    • pp.2215-2220
    • /
    • 2011
  • The accurate short-term load forecasting is essential for the efficient power system operation and the system marginal price decision of the electricity market. So far, errors of load forecasting for Chuseok Holiday are very big compared with forecasting errors for the other special days. In order to improve the accuracy of load forecasting for Chuseok Holiday, selection of input data, the daily normalized load patterns and load forecasting model are investigated. The efficient data selection and daily normalized load pattern based on fuzzy linear regression model is proposed. The proposed load forecasting method for Chuseok Holiday is tested in recent 5 years from 2006 to 2010, and improved the accuracy of the load forecasting compared with the former research.

A Dual Filter-based Channel Selection for Classification of Motor Imagery EEG (동작 상상 EEG 분류를 위한 이중 filter-기반의 채널 선택)

  • Lee, David;Lee, Hee Jae;Park, Snag-Hoon;Lee, Sang-Goog
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.887-892
    • /
    • 2017
  • Brain-computer interface (BCI) is a technology that controls computer and transmits intention by measuring and analyzing electroencephalogram (EEG) signals generated in multi-channel during mental work. At this time, optimal EEG channel selection is necessary not only for convenience and speed of BCI but also for improvement in accuracy. The optimal channel is obtained by removing duplicate(redundant) channels or noisy channels. This paper propose a dual filter-based channel selection method to select the optimal EEG channel. The proposed method first removes duplicate channels using Spearman's rank correlation to eliminate redundancy between channels. Then, using F score, the relevance between channels and class labels is obtained, and only the top m channels are then selected. The proposed method can provide good classification accuracy by using features obtained from channels that are associated with class labels and have no duplicates. The proposed channel selection method greatly reduces the number of channels required while improving the average classification accuracy.

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

  • Aydadenta, Husna;Adiwijaya, Adiwijaya
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1167-1175
    • /
    • 2018
  • Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.

Two-stage imputation method to handle missing data for categorical response variable

  • Jong-Min Kim;Kee-Jae Lee;Seung-Joo Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.6
    • /
    • pp.577-587
    • /
    • 2023
  • Conventional categorical data imputation techniques, such as mode imputation, often encounter issues related to overestimation. If the variable has too many categories, multinomial logistic regression imputation method may be impossible due to computational limitations. To rectify these limitations, we propose a two-stage imputation method. During the first stage, we utilize the Boruta variable selection method on the complete dataset to identify significant variables for the target categorical variable. Then, in the second stage, we use the important variables for the target categorical variable for logistic regression to impute missing data in binary variables, polytomous regression to impute missing data in categorical variables, and predictive mean matching to impute missing data in quantitative variables. Through analysis of both asymmetric and non-normal simulated and real data, we demonstrate that the two-stage imputation method outperforms imputation methods lacking variable selection, as evidenced by accuracy measures. During the analysis of real survey data, we also demonstrate that our suggested two-stage imputation method surpasses the current imputation approach in terms of accuracy.

SELECTION OF DESIGN PARAMETERS IN OPTICAL SYSTEM OF STAR TRACKER FOR A SATELLITE (위성용 STAR TRACKER 광학계의 설계요소 선정)

  • Nah, Ja-Kyung;Kim, Yong-Ha;Yi, Yu
    • Journal of Astronomy and Space Sciences
    • /
    • v.16 no.2
    • /
    • pp.273-284
    • /
    • 1999
  • In order to develop star trackers for a satellite in our country we studies selection procedure of optical parameters. For logical selection of the optical parameters, we simulated the entire processes in which star lights imaged on a CCD sensor were read into and processed in an associated electronics. The simulation resulted in relations between star's magnitude and achievable pointing accuracy, from which we derived optimal optical parameters to satisfy a required pointing accuracy of a star tracker. The selected optical parameters were used in an optical system design of a star tracker with a pointing accuracy of 10 arcsec.

  • PDF

Selection of Input Nodes in Artificial Neural Network for Bankruptcy Prediction by Link Weight Analysis Approach (연결강도분석접근법에 의한 부도예측용 인공신경망 모형의 입력노드 선정에 관한 연구)

  • 이응규;손동우
    • Journal of Intelligence and Information Systems
    • /
    • v.7 no.2
    • /
    • pp.19-33
    • /
    • 2001
  • Link weight analysis approach is suggested as a heuristic for selection of input nodes in artificial neural network for bankruptcy prediction. That is to analyze each input node\\\\`s link weight-absolute value of link weight between an input node and a hidden node in a well-trained neural network model. Prediction accuracy of three methods in this approach, -weak-linked-neurons elimination method, strong-linked-neurons selection method and integrated link weight model-is compared with that of decision tree and multivariate discrimination analysis. In result, the methods suggested in this study show higher accuracy than decision tree and multivariate discrimination analysis. Especially an integrated model has much higher accuracy than any individual models.

  • PDF

Improvement of Classification Accuracy on Success and Failure Factors in Software Reuse using Feature Selection (특징 선택을 이용한 소프트웨어 재사용의 성공 및 실패 요인 분류 정확도 향상)

  • Kim, Young-Ok;Kwon, Ki-Tae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.4
    • /
    • pp.219-226
    • /
    • 2013
  • Feature selection is the one of important issues in the field of machine learning and pattern recognition. It is the technique to find a subset from the source data and can give the best classification performance. Ie, it is the technique to extract the subset closely related to the purpose of the classification. In this paper, we experimented to select the best feature subset for improving classification accuracy when classify success and failure factors in software reuse. And we compared with existing studies. As a result, we found that a feature subset was selected in this study showed the better classification accuracy.

A study of the genomic estimated breeding value and accuracy using genotypes in Hanwoo steer (Korean cattle)

  • Eun Ho, Kim;Du Won, Sun;Ho Chan, Kang;Ji Yeong, Kim;Cheol Hyun, Myung;Doo Ho, Lee;Seung Hwan, Lee;Hyun Tae, Lim
    • Korean Journal of Agricultural Science
    • /
    • v.48 no.4
    • /
    • pp.681-691
    • /
    • 2021
  • The estimated breeding value (EBV) and accuracy of Hanwoo steer (Korean cattle) is an indicator that can predict the slaughter time in the future and carcass performance outcomes. Recently, studies using pedigrees and genotypes are being actively conducted to improve the accuracy of the EBV. In this study, the pedigree and genotype of 46 steers obtained from livestock farm A in Gyeongnam were used for a pedigree best linear unbiased prediction (PBLUP) and a genomic best linear unbiased prediction (GBLUP) to estimate and analyze the breeding value and accuracy of the carcass weight (CWT), eye muscle area (EMA), back-fat thickness (BFT), and marbling score (MS). PBLUP estimated the EBV and accuracy by constructing a numeric relationship matrix (NRM) from the 46 steers and reference population I (545,483 heads) with the pedigree and phenotype. GBLUP estimated genomic EBV (GEBV) and accuracy by constructing a genomic relationship matrix (GRM) from the 46 steers and reference population II (16,972 heads) with the genotype and phenotype. As a result, in the order of CWT, EMA, BFT, and MS, the accuracy levels of PBLUP were 0.531, 0.519, 0.524 and 0.530, while the accuracy outcomes of GBLUP were 0.799, 0.779, 0.768, and 0.810. The accuracy estimated by GBLUP was 50.1 - 53.1% higher than that estimated by PBLUP. GEBV estimated with the genotype is expected to show higher accuracy than the EBV calculated using only the pedigree and is thus expected to be used as basic data for genomic selection in the future.

On the Bias of Bootstrap Model Selection Criteria

  • Kee-Won Lee;Songyong Sim
    • Journal of the Korean Statistical Society
    • /
    • v.25 no.2
    • /
    • pp.195-203
    • /
    • 1996
  • A bootstrap method is used to correct the apparent downward bias of a naive plug-in bootstrap model selection criterion, which is shown to enjoy a high degree of accuracy. Comparison of bootstrap method with the asymptotic method is made through an illustrative example.

  • PDF