Search | Korea Science

Evaluating Variable Selection Techniques for Multivariate Linear Regression (다중선형회귀모형에서의 변수선택기법 평가)

Ryu, Nahyeon;Kim, Hyungseok;Kang, Pilsung
- Journal of Korean Institute of Industrial Engineers
- /
- v.42 no.5
- /
- pp.314-326
- /
- 2016
The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.
https://doi.org/10.7232/JKIIE.2016.42.5.314 인용 PDF KSCI

연결강도분석을 이용한 통합된 부도예측용 신경망모형

Lee Woongkyu;Lim Young Ha
- Proceedings of the Korea Association of Information Systems Conference
- /
- 2002.11a
- /
- pp.289-312
- /
- 2002
This study suggests the Link weight analysis approach to choose input variables and an integrated model to make more accurate bankruptcy prediction model. the Link weight analysis approach is a method to choose input variables to analyze each input node's link weight which is the absolute value of link weight between an input nodes and a hidden layer. There are the weak-linked neurons elimination method, the strong-linked neurons selection method in the link weight analysis approach. The Integrated Model is a combined type adapting Bagging method that uses the average value of the four models, the optimal weak-linked-neurons elimination method, optimal strong-linked neurons selection method, decision-making tree model, and MDA. As a result, the methods suggested in this study - the optimal strong-linked neurons selection method, the optimal weak-linked neurons elimination method, and the integrated model - show much higher accuracy than MDA and decision making tree model. Especially the integrated model shows much higher accuracy than MDA and decision making tree model and shows slightly higher accuracy than the optimal weak-linked neurons elimination method and the optimal strong-linked neurons selection method.
PDF

Geometrical description based on forward selection & backward elimination methods for regression models (다중회귀모형에서 전진선택과 후진제거의 기하학적 표현)

Hong, Chong-Sun;Kim, Moung-Jin
- Journal of the Korean Data and Information Science Society
- /
- v.21 no.5
- /
- pp.901-908
- /
- 2010
A geometrical description method is proposed to represent the process of the forward selection and backward elimination methods among many variable selection methods for multiple regression models. This graphical method shows the process of the forward selection and backward elimination on the first and second quadrants, respectively, of half circle with a unit radius. At each step, the SSR is represented by the norm of vector and the extra SSR or partial determinant coefficient is represented by the angle between two vectors. Some lines are dotted when the partial F test results are statistically significant, so that statistical analysis could be explored. This geometrical description can be obtained the final regression models based on the forward selection and backward elimination methods. And the goodness-of-fit for the model could be explored.
PDF KSCI

A Design of an Optimized Classifier based on Feature Elimination for Gene Selection (유전자 선택을 위해 속성 삭제에 기반을 둔 최적화된 분류기 설계)

Lee, Byung-Kwan;Park, Seok-Gyu;Tifani, Yusrina
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.8 no.5
- /
- pp.384-393
- /
- 2015
This paper proposes an optimized classifier based on feature elimination (OCFE) for gene selection with combining two feature elimination methods, ReliefF and SVM-RFE. ReliefF algorithm is filter feature selection which rank the data by the importance of the data. SVM-RFE algorithm is a wrapper feature selection which wrapped the data and rank the data based on the weight of feature. With combining these two methods we get less error rate average, 0.3016138 for OCFE and 0.3096779 for SVM-RFE. The proposed method also get better accuracy with 70% for OCFE and 69% for SVM-RFE.
https://doi.org/10.17661/jkiiect.2015.8.5.384 인용 PDF KSCI

Gene Selection Based on Support Vector Machine using Bootstrap (붓스트랩 방법을 활용한 SVM 기반 유전자 선택 기법)

Song, Seuck-Heun;Kim, Kyoung-Hee;Park, Chang-Yi;Koo, Ja-Yong
- The Korean Journal of Applied Statistics
- /
- v.20 no.3
- /
- pp.531-540
- /
- 2007
The recursive feature elimination for support vector machine is known to be useful in selecting relevant genes. Since the criterion for choosing relevant genes is the absolute value of a coefficient, the recursive feature elimination may suffer from a scaling problem. We propose a modified version of the recursive feature elimination algorithm using bootstrap. In our method, the criterion for determining relevant genes is the absolute value of a coefficient divided by its standard error, which accounts for statistical variability of the coefficient. Through numerical examples, we illustrate that our method is effective in gene selection.
https://doi.org/10.5351/KJAS.2007.20.3.531 인용 PDF KSCI

Feature Selection for Case-Based Reasoning using the Order of Selection and Elimination Effects of Individual Features (개별 속성의 선택 및 제거효과 순위를 이용한 사례기반 추론의 속성 선정)

이재식;이혁희
- Journal of Intelligence and Information Systems
- /
- v.8 no.2
- /
- pp.117-137
- /
- 2002
A CBR(Case-Based Reasoning) system solves the new problems by adapting the solutions that were used to solve the old problems. Past cases are retained in the case base, each in a specific form that is determined by features. Features are selected for the purpose of representing the case in the best way. Similar cases are retrieved by comparing the feature values and calculating the similarity scores. Therefore, the performance of CBR depends on the selected feature subsets. In this research, we measured the Selection Effect and the Elimination Effect of each feature. The Selection Effect is measured by performing the CBR with only one feature, and the Elimination Effect is measured by performing the CBR without only one feature. Based on these measurements, the feature subsets are selected. The resulting CBR showed better performance in terms of accuracy and efficiency than the CBR with all features.
PDF

Generalization of the Stream Network by the Geographic Hierarchy of Landform Data (지형자료의 계층화를 이용한 하계망 일반화)

Kim Nam-Shin
- Journal of the Korean Geographical Society
- /
- v.40 no.4 s.109
- /
- pp.441-453
- /
- 2005
This study aims to generalize the stream network developing algorithm of the geographic hierarchy Stream networks with hierarchy system should be spatially hierarchized in linear features. The generalization procedure of the stream networks are composed of the hierarchy of stream, selection and elimination, and algorithm. Working of stream networks is composed by the decision of direction on stream networks, ranking of stroke segments, and ordering by the strahler method, using geographic data query for controlling selection and elimination of the linear feature by scale. Improved Simoo algorithm was effective in enhancement and decreasing curvature of linear features. Resultantly, it is expected to improve generalization of features with various spatial hierarchy.
PDF KSCI

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data

Mehmood, Tahir;Rasheed, Zahid
- Communications for Statistical Applications and Methods
- /
- v.22 no.6
- /
- pp.575-587
- /
- 2015
The development in data collection techniques results in high dimensional data sets, where discrimination is an important and commonly encountered problem that are crucial to resolve when high dimensional data is heterogeneous (non-common variance covariance structure for classes). An example of this is to classify microbial habitat preferences based on codon/bi-codon usage. Habitat preference is important to study for evolutionary genetic relationships and may help industry produce specific enzymes. Most classification procedures assume homogeneity (common variance covariance structure for all classes), which is not guaranteed in most high dimensional data sets. We have introduced regularized elimination in partial least square coupled with QDA (rePLS-QDA) for the parsimonious variable selection and classification of high dimensional heterogeneous data sets based on recently introduced regularized elimination for variable selection in partial least square (rePLS) and heterogeneous classification procedure quadratic discriminant analysis (QDA). A comparison of proposed and existing methods is conducted over the simulated data set; in addition, the proposed procedure is implemented to classify microbial habitat preferences by their codon/bi-codon usage. Five bacterial habitats (Aquatic, Host Associated, Multiple, Specialized and Terrestrial) are modeled. The classification accuracy of each habitat is satisfactory and ranges from 89.1% to 100% on test data. Interesting codon/bi-codons usage, their mutual interactions influential for respective habitat preference are identified. The proposed method also produced results that concurred with known biological characteristics that will help researchers better understand divergence of species.
https://doi.org/10.5351/CSAM.2015.22.6.575 인용 PDF KSCI

Automatic threshold selection for edge detection using a noise estimation scheme and its application (잡음추측을 이용한 자동적인 에지검출 문턱값 선택과 그 응용)

김형수;오승준
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.3
- /
- pp.553-563
- /
- 1996
Detecting edges is one of issues with essentialimprotance in the area of image analysis. An edge in an image is a boundary or contour at which a significant change occurs in image intensity. Edge detection has been studied in many addlications such as imagesegmentation, robot vision, and image compression. In this paper, we propose an automatic threshold selection scheme for edge detection and show its application to noise elimination. The scheme suggested here applied statistical properties of the noise estimated from a noisy image to threshold selection. Since a selected threshold value in the scheme depends on not the characgreistic of an orginal image but the statistical feature of added noise, we can remove ad-hoc manners used for selecting the threshold value as well as decide the value theoretically. Furthermore, that shceme can reduce the number of edge pixels either generated or lost by noise. an application of the scheme to noise elimination is shown here. Noise in the input image can be eliminated with considering the direction of each edge pixedl on the edge map obtained by applying the threshold selection scheme proposed in this paper. Achieving significantly improved results in terms of SNR as well as subjective quality, we can claim that the suggested method works well.
PDF

An Elimination Type Two-Stage Selection Procedure for Gamma Populations

Lee, Seung-Ho;Choi, Kook Lyeol
- Journal of Korean Society for Quality Management
- /
- v.13 no.2
- /
- pp.29-36
- /
- 1985
The problem of selecting the gamma population with the largest mean out of k gamma populations, each of which has the same shape parameter is considered. An elimination type two-stage procedure is proposed which guarantees the same probability requirement using the indifference-zone approach as does the single-stage procedure of Gibbons, Olkin and Sobel (1977). The two-stage procedure has the highly desirable property that the expected total number of observations required by the procedure is always less than that of the corresponding single-stage procedure regardless of the configuration of the population parameters.
PDF

Search Result 107, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)