• 제목/요약/키워드: selection of classifiers

Search Result 130, Processing Time 0.03 seconds

The Method of Gene Selection for Machine Learning Classifiers In Career Classification (암 분류를 목적으로 하는 기계 학습 분류기를 위한 효과적인 유전자 선택 방법)

  • 박형근;이수정;이일병
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.205-207
    • /
    • 2004
  • 유전자 발현 분석 시스템에 있어서 microarray 기술의 발전은 유전 질환 진단의 정확성과 신뢰도를 향상시키는 데에 큰 기여를 하였다. 다양한 microarray기술을 통해 얻은 대량의 유전자 발현 정보는 기계 학습분류기를 이용한 암의 분류와 진단, 예측 분야에도 효과적으로 이용될 수 있다. 이 과정에서 종류에 따른 암의 정확한 분류를 위해서는 되도록 해당 암 클래스와의 직접적인 연관이 있는 유전자만을 선택하여 활용하는 것이 효과적이다. 본 논문에서는 이러한 정보력 있는 유전자(informative gene)를 효과적으로 선택 할 수 있는 유전자 선택 방법을 제시하고, 이를 이용하여 세 가지 벤치마크 암 데이터에 대하여 체계적인 실험을 하였다. 그 결과 향상된 분류 성능을 확인할 수 있었다.

  • PDF

Radiomics-based Biomarker Validation Study for Region Classification in 2D Prostate Cross-sectional Images (2D 전립선 단면 영상에서 영역 분류를 위한 라디오믹스 기반 바이오마커 검증 연구)

  • Jun Young, Park;Young Jae, Kim;Jisup, Kim;Kwang Gi, Kim
    • Journal of Biomedical Engineering Research
    • /
    • v.44 no.1
    • /
    • pp.25-32
    • /
    • 2023
  • Recognizing the size and location of prostate cancer is critical for prostate cancer diagnosis, treatment, and predicting prognosis. This paper proposes a model to classify the tumor region and normal tissue with cross-sectional visual images of prostatectomy tissue. We used specimen images of 44 prostate cancer patients who received prostatectomy at Gachon University Gil Hospital. A total of 289 prostate slice images consist of 200 slices including tumor region and 89 slices not including tumor region. Images were divided based on the presence or absence of tumor, and a total of 93 features from each slice image were extracted using Radiomics: 18 first order, 24 GLCM, 16 GLRLM, 16 GLSZM, 5 NGTDM, and 14 GLDM. We compared feature selection techniques such as LASSO, ANOVA, SFS, Ridge and RF, LR, SVM classifiers for the model's high performances. We evaluated the model's performance with AUC of the ROC curve. The results showed that the combination of feature selection techniques LASSO, Ridge, and classifier RF could be best with an AUC of 0.99±0.005.

Hierarchically penalized support vector machine for the classication of imbalanced data with grouped variables (그룹변수를 포함하는 불균형 자료의 분류분석을 위한 서포트 벡터 머신)

  • Kim, Eunkyung;Jhun, Myoungshic;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.5
    • /
    • pp.961-975
    • /
    • 2016
  • The hierarchically penalized support vector machine (H-SVM) has been developed to perform simultaneous classification and input variable selection when input variables are naturally grouped or generated by factors. However, the H-SVM may suffer from estimation inefficiency because it applies the same amount of shrinkage to each variable without assessing its relative importance. In addition, when analyzing imbalanced data with uneven class sizes, the classification accuracy of the H-SVM may drop significantly in predicting minority class because its classifiers are undesirably biased toward the majority class. To remedy such problems, we propose the weighted adaptive H-SVM (WAH-SVM) method, which uses a adaptive tuning parameters to improve the performance of variable selection and the weights to differentiate the misclassification of data points between classes. Numerical results are presented to demonstrate the competitive performance of the proposed WAH-SVM over existing SVM methods.

Searching for Optimal Ensemble of Feature-classifier Pairs in Gene Expression Profile using Genetic Algorithm (유전알고리즘을 이용한 유전자발현 데이타상의 특징-분류기쌍 최적 앙상블 탐색)

  • 박찬호;조성배
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.525-536
    • /
    • 2004
  • Gene expression profile is numerical data of gene expression level from organism, measured on the microarray. Generally, each specific tissue indicates different expression levels in related genes, so that we can classify disease with gene expression profile. Because all genes are not related to disease, it is needed to select related genes that is called feature selection, and it is needed to classify selected genes properly. This paper Proposes GA based method for searching optimal ensemble of feature-classifier pairs that are composed with seven feature selection methods based on correlation, similarity, and information theory, and six representative classifiers. In experimental results with leave-one-out cross validation on two gene expression Profiles related to cancers, we can find ensembles that produce much superior to all individual feature-classifier fairs for Lymphoma dataset and Colon dataset.

Feature selection for text data via sparse principal component analysis (희소주성분분석을 이용한 텍스트데이터의 단어선택)

  • Won Son
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.501-514
    • /
    • 2023
  • When analyzing high dimensional data such as text data, if we input all the variables as explanatory variables, statistical learning procedures may suffer from over-fitting problems. Furthermore, computational efficiency can deteriorate with a large number of variables. Dimensionality reduction techniques such as feature selection or feature extraction are useful for dealing with these problems. The sparse principal component analysis (SPCA) is one of the regularized least squares methods which employs an elastic net-type objective function. The SPCA can be used to remove insignificant principal components and identify important variables from noisy observations. In this study, we propose a dimension reduction procedure for text data based on the SPCA. Applying the proposed procedure to real data, we find that the reduced feature set maintains sufficient information in text data while the size of the feature set is reduced by removing redundant variables. As a result, the proposed procedure can improve classification accuracy and computational efficiency, especially for some classifiers such as the k-nearest neighbors algorithm.

Feature Selection to Predict Very Short-term Heavy Rainfall Based on Differential Evolution (미분진화 기반의 초단기 호우예측을 위한 특징 선택)

  • Seo, Jae-Hyun;Lee, Yong Hee;Kim, Yong-Hyuk
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.6
    • /
    • pp.706-714
    • /
    • 2012
  • The Korea Meteorological Administration provided the recent four-years records of weather dataset for our very short-term heavy rainfall prediction. We divided the dataset into three parts: train, validation and test set. Through feature selection, we select only important features among 72 features to avoid significant increase of solution space that arises when growing exponentially with the dimensionality. We used a differential evolution algorithm and two classifiers as the fitness function of evolutionary computation to select more accurate feature subset. One of the classifiers is Support Vector Machine (SVM) that shows high performance, and the other is k-Nearest Neighbor (k-NN) that is fast in general. The test results of SVM were more prominent than those of k-NN in our experiments. Also we processed the weather data using undersampling and normalization techniques. The test results of our differential evolution algorithm performed about five times better than those using all features and about 1.36 times better than those using a genetic algorithm, which is the best known. Running times when using a genetic algorithm were about twenty times longer than those when using a differential evolution algorithm.

Investigating the Performance of Bayesian-based Feature Selection and Classification Approach to Social Media Sentiment Analysis (소셜미디어 감성분석을 위한 베이지안 속성 선택과 분류에 대한 연구)

  • Chang Min Kang;Kyun Sun Eo;Kun Chang Lee
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.1-19
    • /
    • 2022
  • Social media-based communication has become crucial part of our personal and official lives. Therefore, it is no surprise that social media sentiment analysis has emerged an important way of detecting potential customers' sentiment trends for all kinds of companies. However, social media sentiment analysis suffers from huge number of sentiment features obtained in the process of conducting the sentiment analysis. In this sense, this study proposes a novel method by using Bayesian Network. In this model MBFS (Markov Blanket-based Feature Selection) is used to reduce the number of sentiment features. To show the validity of our proposed model, we utilized online review data from Yelp, a famous social media about restaurant, bars, beauty salons evaluation and recommendation. We used a number of benchmarking feature selection methods like correlation-based feature selection, information gain, and gain ratio. A number of machine learning classifiers were also used for our validation tasks, like TAN, NBN, Sons & Spouses BN (Bayesian Network), Augmented Markov Blanket. Furthermore, we conducted Bayesian Network-based what-if analysis to see how the knowledge map between target node and related explanatory nodes could yield meaningful glimpse into what is going on in sentiments underlying the target dataset.

Development of Polynomial Based Response Surface Approximations Using Classifier Systems (분류시스템을 이용한 다항식기반 반응표면 근사화 모델링)

  • 이종수
    • Korean Journal of Computational Design and Engineering
    • /
    • v.5 no.2
    • /
    • pp.127-135
    • /
    • 2000
  • Emergent computing paradigms such as genetic algorithms have found increased use in problems in engineering design. These computational tools have been shown to be applicable in the solution of generically difficult design optimization problems characterized by nonconvexities in the design space and the presence of discrete and integer design variables. Another aspect of these computational paradigms that have been lumped under the bread subject category of soft computing, is the domain of artificial intelligence, knowledge-based expert system, and machine learning. The paper explores a machine learning paradigm referred to as teaming classifier systems to construct the high-quality global function approximations between the design variables and a response function for subsequent use in design optimization. A classifier system is a machine teaming system which learns syntactically simple string rules, called classifiers for guiding the system's performance in an arbitrary environment. The capability of a learning classifier system facilitates the adaptive selection of the optimal number of training data according to the noise and multimodality in the design space of interest. The present study used the polynomial based response surface as global function approximation tools and showed its effectiveness in the improvement on the approximation performance.

  • PDF

Multi-classifier Fusion Based Facial Expression Recognition Approach

  • Jia, Xibin;Zhang, Yanhua;Powers, David;Ali, Humayra Binte
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.1
    • /
    • pp.196-212
    • /
    • 2014
  • Facial expression recognition is an important part in emotional interaction between human and machine. This paper proposes a facial expression recognition approach based on multi-classifier fusion with stacking algorithm. The kappa-error diagram is employed in base-level classifiers selection, which gains insights about which individual classifier has the better recognition performance and how diverse among them to help improve the recognition accuracy rate by fusing the complementary functions. In order to avoid the influence of the chance factor caused by guessing in algorithm evaluation and get more reliable awareness of algorithm performance, kappa and informedness besides accuracy are utilized as measure criteria in the comparison experiments. To verify the effectiveness of our approach, two public databases are used in the experiments. The experiment results show that compared with individual classifier and two other typical ensemble methods, our proposed stacked ensemble system does recognize facial expression more accurately with less standard deviation. It overcomes the individual classifier's bias and achieves more reliable recognition results.

Visual Tracking Using Improved Multiple Instance Learning with Co-training Framework for Moving Robot

  • Zhou, Zhiyu;Wang, Junjie;Wang, Yaming;Zhu, Zefei;Du, Jiayou;Liu, Xiangqi;Quan, Jiaxin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.11
    • /
    • pp.5496-5521
    • /
    • 2018
  • Object detection and tracking is the basic capability of mobile robots to achieve natural human-robot interaction. In this paper, an object tracking system of mobile robot is designed and validated using improved multiple instance learning algorithm. The improved multiple instance learning algorithm which prevents model drift significantly. Secondly, in order to improve the capability of classifiers, an active sample selection strategy is proposed by optimizing a bag Fisher information function instead of the bag likelihood function, which dynamically chooses most discriminative samples for classifier training. Furthermore, we integrate the co-training criterion into algorithm to update the appearance model accurately and avoid error accumulation. Finally, we evaluate our system on challenging sequences and an indoor environment in a laboratory. And the experiment results demonstrate that the proposed methods can stably and robustly track moving object.