• 제목/요약/키워드: Selection Methods

검색결과 4,023건 처리시간 0.033초

On loss functions for model selection in wavelet based Bayesian method

  • Park, Chun-Gun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제20권6호
    • /
    • pp.1191-1197
    • /
    • 2009
  • Most Bayesian approaches to model selection of wavelet analysis have drawbacks that computational cost is expensive to obtain accuracy for the fitted unknown function. To overcome the drawback, this article introduces loss functions which are criteria for level dependent threshold selection in wavelet based Bayesian methods with arbitrary size and regular design points. We demonstrate the utility of these criteria by four test functions and real data.

  • PDF

Discretization Method Based on Quantiles for Variable Selection Using Mutual Information

  • CHa, Woon-Ock;Huh, Moon-Yul
    • Communications for Statistical Applications and Methods
    • /
    • 제12권3호
    • /
    • pp.659-672
    • /
    • 2005
  • This paper evaluates discretization of continuous variables to select relevant variables for supervised learning using mutual information. Three discretization methods, MDL, Histogram and 4-Intervals are considered. The process of discretization and variable subset selection is evaluated according to the classification accuracies with the 6 real data sets of UCI databases. Results show that 4-Interval discretization method based on quantiles, is robust and efficient for variable selection process. We also visually evaluate the appropriateness of the selected subset of variables.

신경 회로망 학습을 통한 모델 선택의 자동화 (Automation of Model Selection through Neural Networks Learning)

  • 류재흥
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2004년도 추계학술대회 학술발표 논문집 제14권 제2호
    • /
    • pp.313-316
    • /
    • 2004
  • Model selection is the process that sets up the regularization parameter in the support vector machine or regularization network by using the external methods such as general cross validation or L-curve criterion. This paper suggests that the regularization parameter can be obtained simultaneously within the learning process of neural networks without resort to separate selection methods. In this paper, extended kernel method is introduced. The relationship between regularization parameter and the bias term in the extended kernel is established. Experimental results show the effectiveness of the new model selection method.

  • PDF

정보자료(情報資料)의 선택과 평가(評價)에 관한 이론(理論)과 사례 연구(2) : 전문가(專門家) 시스템을 중심으로 (A Study on the Selection and Evaluation of Information Resources(2) : Expert System)

  • 최원태
    • 정보관리연구
    • /
    • 제25권3호
    • /
    • pp.1-27
    • /
    • 1994
  • 본(本) 연구(硏究)는 정보자료(情報資料)의 선택과 평가에 관련된 이론과 도서관에 응용된 사례를 중심으로 고찰한 것이다. 본 연구에서는 정보자료의 선택과 평가에 관한 선행연구를 통계적(統計的) 평가방법(評價方法), 비용(費用) 대(對) 효과(效果)의 방법, 전문가 시스템 방법의 3가지 유형으로 구분하였다. 또한 본(本) 연구(硏究)에서는 정보자료의 선택과 평가를 위하여 개발된 전문가 시스템의 이론적 배경과 응용시스템의 개발현황(開發現況)과 전망에 대하여 고찰하였다.

  • PDF

정보자료(情報資料)의 선택과 평가(評價)에 관한 이론(理論)과 사례 연구(1) : 단행본(單行本)과 연속간행물(連續刊行物)을 중심으로 (A Study on the Selection and Evaluation of Information Resources(l) : Monographs and Serials)

  • 최원태
    • 정보관리연구
    • /
    • 제25권2호
    • /
    • pp.1-30
    • /
    • 1994
  • 본(本) 연구(硏究)는 정보자료(情報資料)의 선택과 평가에 관련된 여러 이론과 도서관에 응용된 사례를 중심으로 고찰한 것이다. 본 연구에서는 정보자료의 선택과 평가에 관한 선행연구를 통계적(統計的) 평가방법(評價方法), 비용(費用) 대(對) 효과(效果)의 방법, 전문가 시스템 방법의 3가지 유형으로 구분하였다. 또한 본(本) 연구(硏究)에서는 정보자료의 선택과 평가를 위하여 개발된 전문가 시스템의 이론적 배경과 응용 시스템의 개발현황(開發現況)과 전망에 대하여 고찰하였다.

  • PDF

자동차 조립라인에서 총 가외작업을 최소로 하는 투입순서 결정 (Sequencing to Minimize the Total Utility Work in Car Assembly Lines)

  • 현철주
    • 대한안전경영과학회지
    • /
    • 제5권1호
    • /
    • pp.69-82
    • /
    • 2003
  • The sequence which minimizes overall utility work in car assembly lines reduces the cycle time, the number of utility workers, and the risk of conveyor stopping. This study suggests mathematical formulation of the sequencing problem to minimize overall utility work, and present a genetic algorithm which can provide a near optimal solution in real time. To apply a genetic algorithm to the sequencing problem in car assembly lines, the representation, selection methods, and genetic parameters are studied. Experiments are carried out to compare selection methods such as roullette wheel selection, tournament selection and ranking selection. Experimental results show that ranking selection method outperforms the others in solution quality, whereas tournament selection provides the best performance in computation time.

Band Selection Using Forward Feature Selection Algorithm for Citrus Huanglongbing Disease Detection

  • Katti, Anurag R.;Lee, W.S.;Ehsani, R.;Yang, C.
    • Journal of Biosystems Engineering
    • /
    • 제40권4호
    • /
    • pp.417-427
    • /
    • 2015
  • Purpose: This study investigated different band selection methods to classify spectrally similar data - obtained from aerial images of healthy citrus canopies and citrus greening disease (Huanglongbing or HLB) infected canopies - using small differences without unmixing endmember components and therefore without the need for an endmember library. However, large number of hyperspectral bands has high redundancy which had to be reduced through band selection. The objective, therefore, was to first select the best set of bands and then detect citrus Huanglongbing infected canopies using these bands in aerial hyperspectral images. Methods: The forward feature selection algorithm (FFSA) was chosen for band selection. The selected bands were used for identifying HLB infected pixels using various classifiers such as K nearest neighbor (KNN), support vector machine (SVM), naïve Bayesian classifier (NBC), and generalized local discriminant bases (LDB). All bands were also utilized to compare results. Results: It was determined that a few well-chosen bands yielded much better results than when all bands were chosen, and brought the classification results on par with standard hyperspectral classification techniques such as spectral angle mapper (SAM) and mixture tuned matched filtering (MTMF). Median detection accuracies ranged from 66-80%, which showed great potential toward rapid detection of the disease. Conclusions: Among the methods investigated, a support vector machine classifier combined with the forward feature selection algorithm yielded the best results.

A Novel Feature Selection Method in the Categorization of Imbalanced Textual Data

  • Pouramini, Jafar;Minaei-Bidgoli, Behrouze;Esmaeili, Mahdi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권8호
    • /
    • pp.3725-3748
    • /
    • 2018
  • Text data distribution is often imbalanced. Imbalanced data is one of the challenges in text classification, as it leads to the loss of performance of classifiers. Many studies have been conducted so far in this regard. The proposed solutions are divided into several general categories, include sampling-based and algorithm-based methods. In recent studies, feature selection has also been considered as one of the solutions for the imbalance problem. In this paper, a novel one-sided feature selection known as probabilistic feature selection (PFS) was presented for imbalanced text classification. The PFS is a probabilistic method that is calculated using feature distribution. Compared to the similar methods, the PFS has more parameters. In order to evaluate the performance of the proposed method, the feature selection methods including Gini, MI, FAST and DFS were implemented. To assess the proposed method, the decision tree classifications such as C4.5 and Naive Bayes were used. The results of tests on Reuters-21875 and WebKB figures per F-measure suggested that the proposed feature selection has significantly improved the performance of the classifiers.

Assessment of genomic prediction accuracy using different selection and evaluation approaches in a simulated Korean beef cattle population

  • Nwogwugwu, Chiemela Peter;Kim, Yeongkuk;Choi, Hyunji;Lee, Jun Heon;Lee, Seung-Hwan
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제33권12호
    • /
    • pp.1912-1921
    • /
    • 2020
  • Objective: This study assessed genomic prediction accuracies based on different selection methods, evaluation procedures, training population (TP) sizes, heritability (h2) levels, marker densities and pedigree error (PE) rates in a simulated Korean beef cattle population. Methods: A simulation was performed using two different selection methods, phenotypic and estimated breeding value (EBV), with an h2 of 0.1, 0.3, or 0.5 and marker densities of 10, 50, or 777K. A total of 275 males and 2,475 females were randomly selected from the last generation to simulate ten recent generations. The simulation of the PE dataset was modified using only the EBV method of selection with a marker density of 50K and a heritability of 0.3. The proportions of errors substituted were 10%, 20%, 30%, and 40%, respectively. Genetic evaluations were performed using genomic best linear unbiased prediction (GBLUP) and single-step GBLUP (ssGBLUP) with different weighted values. The accuracies of the predictions were determined. Results: Compared with phenotypic selection, the results revealed that the prediction accuracies obtained using GBLUP and ssGBLUP increased across heritability levels and TP sizes during EBV selection. However, an increase in the marker density did not yield higher accuracy in either method except when the h2 was 0.3 under the EBV selection method. Based on EBV selection with a heritability of 0.1 and a marker density of 10K, GBLUP and ssGBLUP_0.95 prediction accuracy was higher than that obtained by phenotypic selection. The prediction accuracies from ssGBLUP_0.95 outperformed those from the GBLUP method across all scenarios. When errors were introduced into the pedigree dataset, the prediction accuracies were only minimally influenced across all scenarios. Conclusion: Our study suggests that the use of ssGBLUP_0.95, EBV selection, and low marker density could help improve genetic gains in beef cattle.

기계학습 접근법에 기반한 유전자 선택 방법들에 대한 리뷰 (A review of gene selection methods based on machine learning approaches)

  • 이하정;김재직
    • 응용통계연구
    • /
    • 제35권5호
    • /
    • pp.667-684
    • /
    • 2022
  • 유전자 발현 데이터는 각 유전자에 대해 mRNA 양의 정도를 나타내고, 그러한 유전자 발현량에 대한 분석은 질병 발생에 대한 메커니즘을 이해하고 새로운 치료제와 치료 방법을 개발하는데 중요한 아이디어를 제공해오고 있다. 오늘날 DNA 마이크로어레이와 RNA-시퀀싱과 같은 고출력 기술은 수천 개의 유전자 발현량을 동시에 측정하는 것을 가능하게 하여 고차원성이라는 유전자 발현 데이터의 특징을 발생시켰다. 이러한 고차원성으로 인해 유전자 발현 데이터를 분석하기 위한 학습 모형들은 과적합 문제에 부딪히기 쉽고, 이를 해결하기 위해 차원 축소 또는 변수 선택 기술들이 사전 분석 단계로써 보통 사용된다. 특히, 사전 분석 단계에서 우리는 유전자 선택법을 이용하여 부적절하거나 중복된 유전자를 제거할 수 있고 중요한 유전자를 찾아낼 수도 있다. 현재까지 다양한 유전자 선택 방법들이 기계학습의 맥락에서 개발되어왔다. 본 논문에서는 기계학습 접근법을 사용하는 최근의 유전자 선택 방법들을 집중적으로 살펴보고자 한다. 또한, 현재까지 개발된 유전자 선택 방법들의 근본적인 문제점과 앞으로의 연구 방향에 대해 논의하고자 한다.