• 제목/요약/키워드: selection criterion

검색결과 437건 처리시간 0.023초

문헌빈도와 장서빈도를 이용한 kNN 분류기의 자질선정에 관한 연구 (A Study on Feature Selection for kNN Classifier using Document Frequency and Collection Frequency)

  • 이용구
    • 한국도서관정보학회지
    • /
    • 제44권1호
    • /
    • pp.27-47
    • /
    • 2013
  • 이 연구에서는 자동 색인을 통해 쉽게 얻을 수 있는 자질의 문헌빈도와 장서빈도를 이용하여 자동분류에서 자질 선정 기법을 kNN 분류기에 적용하였을 때, 어떠한 분류성능을 보이는지 알아보고자 하였다. 실험집단으로 한국일보-20000(HKIB-20000)의 일부를 이용하였다. 실험 결과 첫째, 장서빈도를 이용하여 고빈도 자질을 선정하고 저빈도 자질을 제거한 자질선정 방법이 문헌빈도보다 더 좋은 성능을 가져오는 것으로 나타났다. 둘째, 문헌빈도와 장서빈도 모두 저빈도 자질을 우선으로 선정하는 방법은 좋은 분류성능을 가져오지 못했다. 셋째, 장서빈도와 같은 단순빈도에서 자질 선정 구간을 조정하는 것이 문헌빈도와 장서빈도의 조합보다 더 좋은 성능을 가져오는 것으로 나타났다.

베이지안 로지스틱 회귀모형에서의 추론에 대한 연구 (Inferential Problems in Bayesian Logistic Regression Models)

  • 황진수;강성찬
    • 응용통계연구
    • /
    • 제24권6호
    • /
    • pp.1149-1160
    • /
    • 2011
  • 기존의 frequentist 추론에 비해 Bayesian 추론에서의 가설 검정 및 모형 선택 문제는 학자들 간에 일치된 견해를 보이지 못하고 있으며 아직도 논란이 되는 것들이 많다. Bayesian 추론에서 가설 검정 및 모형 선택의 기준으로 널리 쓰이는 Bayes factor는 이해하기 쉬우나 여러 경우에 구하기 어려운 단점이 존재한다. 그 외에 다른 기준으로 Spiegelhalter 등 (2002)가 제시한 DIC(Deviance Information Criterion)과 frequentist 추론에서의 P-value에 대비되는 Bayesian P-value가 있다. 본 논문에서는 Swiss banknote 자료를 Bayesian 로지스틱 회귀모형으로 분석하고 관련 기준들을 구하여 각 기준들이 일관성 있는 결론을 보이는지 확인하고자 한다.

정보 소득율 기반의 변수 선택을 통한 영화 관객 수 예측 (Predicting the Number of Movie Audiences Through Variable Selection Based on Information Gain Measure)

  • 박현목;최상현
    • Journal of Information Technology Applications and Management
    • /
    • 제26권3호
    • /
    • pp.19-27
    • /
    • 2019
  • In this study, we propose a methodology for predicting the movie audience based on movie information that can be easily acquired before opening and effectively distinguishing qualitative variables. In addition, we constructed a model to estimate the number of movie audiences at the time of data acquisition through the configured variables. Another purpose of this study is to provide a criterion for categorizing success of movies with qualitative characteristics. As an evaluation criterion, we used information gain ratio which is the node selection criterion of C4.5 algorithm. Through the procedure we have selected 416 movie data features. As a result of the multiple linear regression model, the performance of the regression model using the variables selection method based on the information gain ratio was excellent.

H.264/AVC에서 새로운 필터 선택 기준을 이용한 매크로 블록 기반 적응 보간 필터 방법 (Macroblock-based Adaptive Interpolation Filter Method Using New Filter Selection Criterion in H.264/AVC)

  • 윤근수;문용호;김재호
    • 한국통신학회논문지
    • /
    • 제33권4C호
    • /
    • pp.312-320
    • /
    • 2008
  • H.264/AVC에서 부호화 효율 개선을 위해 매크로 블록 기반 적응 보간 방법이 고려되어졌다. 이 방법에서 필터 선택 기준은 비트율과 왜곡 항들이 고려하여 좋은 성능을 발휘하지만 아직 개선의 여지를 남겨두고 있다. 따라서 본 논문에서는 기존 방법보다 높은 부호화 효율 개선을 위해 움직임 벡터와 예측 에러에 대한 두 가지 비트율과 복원 에러를 고려하여 새로운 필터 선택 기준을 제안한다. 부가적으로 선택된 필터 정보 전송을 위한 오버헤더 (overhead)를 줄이는 알고리듬을 나타낸다. 실험 결과는 제안 방법이 기존 방법에 비하여 우수한 성능을 보이고 H.264/AVC에 비해 전체 비트율이 평균 5.19% (참조 프레임: 1개)와 5.14% (참조 프레임: 5개) 절감된다.

Testing for A Change Point by Model Selection Tools in Linear Regression Models

  • Yoon, Yong-Hwa;Kim, Jong-Tae;Cho, Kil-Ho;Shin, Kyung-A
    • Communications for Statistical Applications and Methods
    • /
    • 제7권3호
    • /
    • pp.655-665
    • /
    • 2000
  • Several information criterions, Schwarz information criterion (SIC), Akaike information criterion (AIC), and the modified Akaike information criterion ($AIC_c$), are proposed to locate a change point in the multiple linear regression model. These methods are applied to a stock Exchange data set and compared to the results.

  • PDF

Probability Estimation of Snow Damage on Sugi (Cryptomeria japonica) Forest Stands by Logistic Regression Model in Toyama Prefecture, Japan

  • Kamo, Ken-Ichi;Yanagihara, Hirokazu;Kato, Akio;Yoshimoto, Atsushi
    • Journal of Forest and Environmental Science
    • /
    • 제24권3호
    • /
    • pp.137-142
    • /
    • 2008
  • In this paper, we apply a logistic regression model to the data of snow damage on sugi (Cryptomeria japonica) occurred in Toyama prefecture (in Japan) in 2004 for estimating the risk probability. In order to specify the factors effecting snow damage, we apply a model selection procedure determining optimal subset of explanatory variables. In this process we consider the following 3 information criteria, 1) Akaike's information criterion, 2) Baysian information criterion, 3) Bias-corrected Akaike's information criterion. For the selected variables, we give a proper interpretation from the viewpoint of natural disaster.

  • PDF

On an Optimal Bayesian Variable Selection Method for Generalized Logit Model

  • Kim, Hea-Jung;Lee, Ae Kuoung
    • Communications for Statistical Applications and Methods
    • /
    • 제7권2호
    • /
    • pp.617-631
    • /
    • 2000
  • This paper is concerned with suggesting a Bayesian method for variable selection in generalized logit model. It is based on Laplace-Metropolis algorithm intended to propose a simple method for estimating the marginal likelihood of the model. The algorithm then leads to a criterion for the selection of variables. The criterion is to find a subset of variables that maximizes the marginal likelihood of the model and it is seen to be a Bayes rule in a sense that it minimizes the risk of the variable selection under 0-1 loss function. Based upon two examples, the suggested method is illustrated and compared with existing frequentist methods.

  • PDF

ELCIC: An R package for model selection using the empirical-likelihood based information criterion

  • Chixiang Chen;Biyi Shen;Ming Wang
    • Communications for Statistical Applications and Methods
    • /
    • 제30권4호
    • /
    • pp.355-368
    • /
    • 2023
  • This article introduces the R package ELCIC (https://cran.r-project.org/web/packages/ELCIC/index.html), which provides an empirical likelihood-based information criterion (ELCIC) for model selection that includes, but is not limited to, variable selection. The empirical likelihood is a semi-parametric approach to draw statistical inference that does not require distribution assumptions for data generation. Therefore, ELCIC is more robust and versatile in the context of model selection compared to the currently existing information criteria. This paper illustrates several applications of ELCIC, including its use in generalized linear models, generalized estimating equations (GEE) for longitudinal data, and weighted GEE (WGEE) for missing longitudinal data under the mechanisms of missing at random and dropout.

A Simple $N^{th}$ Best-Relay Selection Criterion for Opportunistic Two-Way Relay Networks under Outdated Channel State Information

  • Ou, Jinglan;Wu, Haowei;Wang, Qi;Zou, Yutao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제8권10호
    • /
    • pp.3409-3422
    • /
    • 2014
  • The frequency spectrum available for the wireless communication is extremely crowded. In order to improve the spectral efficiency, the two-way relay networks have aroused great attention. A simple $N^{th}$ best-relay selection criterion for the opportunistic two-way relay networks is proposed, which can be implemented easily by extending the distributed timer technique in practice, since the proposed criterion is mainly based on the channel gains. The outage performance of the proposed relay selection scheme is analyzed under the outdated channel state information (CSI), and a tight closed-form lower bound and asymptotic value of the outage probability over Rayleigh fading channels are obtained. Simulation results demonstrate that the tight closed-form lower bound of the outage probability very closely matches with simulated ones in the whole SNR region, and the asymptotic results provide good tight approximations to the simulation ones, especially in the high SNR region.