• Title/Summary/Keyword: 전진선택법

Search Result 28, Processing Time 0.03 seconds

Cancer Classification with Gene Expression Profiles using Forward Selection Method (전진 선택법을 이용한 유전자 발현정보 기반의 암 분류)

  • Yoo, Si-Ho;Cho, Sung-Bae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.05a
    • /
    • pp.293-296
    • /
    • 2003
  • 유전 발현 데이터는 생명체의 특정 조직에서 채취한 샘플을 microarray상에서 측정한 것으로 유전자들의 발현 정도가 수치로 나타난 데이터이다. 일반적으로 정상조직과 이상조직에서 관련 유전자들의 발현 정도는 차이를 보이기 때문에, 유전발현 데이터를 통하여 암을 분류할 수 있다. 하지만 분류에 모든 유전자가 관여하지는 않으므로 관련성 있는 유전자만을 선별해내는 작업인 특징 선택방법이 필요하다. 본 논문에서는 회귀분석의 변수선택방법중 하나인 전진 선택법(forward selection method)을 사용하여 유전자들을 선택하고 분류하는 방법을 제안한다. 실험데이터는 대장암 데이트를 사용하였고, 분류기는 KNN을 사용하였다. 이 방법과 상관계수를 이용한 특징 선택 방법인 피어슨 상관계수와 스피어맨 상관계수방법과 비교해본 결과 전진 선택법에 의한 특징 선택방법이 암의 분류에 있어서 더 효과적인 유전자 선택을 한다는 사실을 확인하였다. 실험결과 90.3%의 높은 인식률을 보였다.

  • PDF

Classifying Cancer Using Partially Correlated Genes Selected by Forward Selection Method (전진선택법에 의해 선택된 부분 상관관계의 유전자들을 이용한 암 분류)

  • 유시호;조성배
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.3
    • /
    • pp.83-92
    • /
    • 2004
  • Gene expression profile is numerical data of gene expression level from organism measured on the microarray. Generally, each specific tissue indicates different expression levels in related genes, so that we can classify cancer with gene expression profile. Because not all the genes are related to classification, it is needed to select related genes that is called feature selection. This paper proposes a new gene selection method using forward selection method in regression analysis. This method reduces redundant information in the selected genes to have more efficient classification. We used k-nearest neighbor as a classifier and tested with colon cancer dataset. The results are compared with Pearson's coefficient and Spearman's coefficient methods and the proposed method showed better performance. It showed 90.3% accuracy in classification. The method also successfully applied to lymphoma cancer dataset.

Efficient variable selection method using conditional mutual information (조건부 상호정보를 이용한 분류분석에서의 변수선택)

  • Ahn, Chi Kyung;Kim, Donguk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1079-1094
    • /
    • 2014
  • In this paper, we study efficient gene selection methods by using conditional mutual information. We suggest gene selection methods using conditional mutual information based on semiparametric methods utilizing multivariate normal distribution and Edgeworth approximation. We compare our suggested methods with other methods such as mutual information filter, SVM-RFE, Cai et al. (2009)'s gene selection (MIGS-original) in SVM classification. By these experiments, we show that gene selection methods using conditional mutual information based on semiparametric methods have better performance than mutual information filter. Furthermore, we show that they take far less computing time than Cai et al. (2009)'s gene selection but have similar performance.

Geometrical description based on forward selection & backward elimination methods for regression models (다중회귀모형에서 전진선택과 후진제거의 기하학적 표현)

  • Hong, Chong-Sun;Kim, Moung-Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.5
    • /
    • pp.901-908
    • /
    • 2010
  • A geometrical description method is proposed to represent the process of the forward selection and backward elimination methods among many variable selection methods for multiple regression models. This graphical method shows the process of the forward selection and backward elimination on the first and second quadrants, respectively, of half circle with a unit radius. At each step, the SSR is represented by the norm of vector and the extra SSR or partial determinant coefficient is represented by the angle between two vectors. Some lines are dotted when the partial F test results are statistically significant, so that statistical analysis could be explored. This geometrical description can be obtained the final regression models based on the forward selection and backward elimination methods. And the goodness-of-fit for the model could be explored.

Variable selection in partial linear regression using the least angle regression (부분선형모형에서 LARS를 이용한 변수선택)

  • Seo, Han Son;Yoon, Min;Lee, Hakbae
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.937-944
    • /
    • 2021
  • The problem of selecting variables is addressed in partial linear regression. Model selection for partial linear models is not easy since it involves nonparametric estimation such as smoothing parameter selection and estimation for linear explanatory variables. In this work, several approaches for variable selection are proposed using a fast forward selection algorithm, least angle regression (LARS). The proposed procedures use t-test, all possible regressions comparisons or stepwise selection process with variables selected by LARS. An example based on real data and a simulation study on the performance of the suggested procedures are presented.

K-Terminal Network Reliability (네트웍 구조의 다중 단말 신뢰도)

  • 김국;송기원
    • Proceedings of the Korean Reliability Society Conference
    • /
    • 2005.06a
    • /
    • pp.271-278
    • /
    • 2005
  • 시점이 1개 있고 연결되어야 할 단말이 다수개인 K-terminal 네트웍의 신뢰도 구조에서 신뢰도를 구하는 알고리즘을 제안하였다. 네트웍 구조의 신뢰도 계산은 일반으로 NP-hard 문제인데 여기서 새로운 해법을 제안한다. 두 가지 개념이 중요한 점인데 첫째는 분해법이고 두 번째는 재귀식 계산 방법이 가능한 점이다. 분해법을 할 때 키스톤 부품을 찾아내는 번거로운 절차 대신 시점으로 부터 전진방향(forward)으로 하나씩 구성품을 선택하여 분해한다. 이러한 방법은 어떠한 키스톤 부품을 선택해야 할지 기준을 생각할 필요가 없으므로 간단하며 알고리즘을 간단하게 만든다. 또한 이 방법에서는 분해에 의해서 두 개의 하위 문제가 생성되고 원 문제와 재귀관계를 수립할 수 있다. 이러한 재귀식 알고리즘은 컴퓨터 프로그램을 간단하게 만든다. 또한 하위 문제는 기억장치에 저장해 두고 차례로 계산에 사용한다.

  • PDF

Variable Selection for Logistic Regression Model Using Adjusted Coefficients of Determination (수정 결정계수를 사용한 로지스틱 회귀모형에서의 변수선택법)

  • Hong C. S.;Ham J. H.;Kim H. I.
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.435-443
    • /
    • 2005
  • Coefficients of determination in logistic regression analysis are defined as various statistics, and their values are relatively smaller than those for linear regression model. These coefficients of determination are not generally used to evaluate and diagnose logistic regression model. Liao and McGee (2003) proposed two adjusted coefficients of determination which are robust at the addition of inappropriate predictors and the variation of sample size. In this work, these adjusted coefficients of determination are applied to variable selection method for logistic regression model and compared with results of other methods such as the forward selection, backward elimination, stepwise selection, and AIC statistic.

The Neurophysiological Approaches in Animal Experiments (신경생리학적(神經生理學的) 동물실험(動物實驗))

  • Cheon, Jin-Sook
    • Korean Journal of Biological Psychiatry
    • /
    • v.5 no.1
    • /
    • pp.3-16
    • /
    • 1998
  • The neurophysiological study has been widely used in search of the relationship between brain and behavior. The basic techniques for the animal experiments of this kind such as stereotaxic techniques, lesioning methods, the methods of electrical stimulation and recording, and confirmation of histological location were briefly reviewed. Nevertheless, the importance of complementary neurochemical, neuroanatomical and behavioral studies can not be neglected.

  • PDF

The correlation and regression analyses based on variable selection for the university evaluation index (대학 평가지표들에 대한 상관분석과 변수선택에 의한 선형모형추정)

  • Song, Pil-Jun;Kim, Jong-Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.3
    • /
    • pp.457-465
    • /
    • 2012
  • The purpose of this study is to analyze the association between indicators and to find statistical models based on important indicators at 'College Notifier' in Korea Council for University Education. First, Pearson correlation coefficients are used to find statistically significant correlations. By variable selection method, the important indicators are selected and their coefficients are estimated. As variable selection method, backward and stepwise methods are employed.

Apartment Price Prediction Using Deep Learning and Machine Learning (딥러닝과 머신러닝을 이용한 아파트 실거래가 예측)

  • Hakhyun Kim;Hwankyu Yoo;Hayoung Oh
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.2
    • /
    • pp.59-76
    • /
    • 2023
  • Since the COVID-19 era, the rise in apartment prices has been unconventional. In this uncertain real estate market, price prediction research is very important. In this paper, a model is created to predict the actual transaction price of future apartments after building a vast data set of 870,000 from 2015 to 2020 through data collection and crawling on various real estate sites and collecting as many variables as possible. This study first solved the multicollinearity problem by removing and combining variables. After that, a total of five variable selection algorithms were used to extract meaningful independent variables, such as Forward Selection, Backward Elimination, Stepwise Selection, L1 Regulation, and Principal Component Analysis(PCA). In addition, a total of four machine learning and deep learning algorithms were used for deep neural network(DNN), XGBoost, CatBoost, and Linear Regression to learn the model after hyperparameter optimization and compare predictive power between models. In the additional experiment, the experiment was conducted while changing the number of nodes and layers of the DNN to find the most appropriate number of nodes and layers. In conclusion, as a model with the best performance, the actual transaction price of apartments in 2021 was predicted and compared with the actual data in 2021. Through this, I am confident that machine learning and deep learning will help investors make the right decisions when purchasing homes in various economic situations.