• 제목/요약/키워드: Variable selection

검색결과 885건 처리시간 0.023초

그룹변수를 포함하는 불균형 자료의 분류분석을 위한 서포트 벡터 머신 (Hierarchically penalized support vector machine for the classication of imbalanced data with grouped variables)

  • 김은경;전명식;방성완
    • 응용통계연구
    • /
    • 제29권5호
    • /
    • pp.961-975
    • /
    • 2016
  • H-SVM은 입력변수들이 그룹화 되어 있는 경우 분류함수의 추정에서 그룹 및 그룹 내의 변수선택을 동시에 할 수 있는 방법론이다. 그러나 H-SVM은 입력변수들의 중요도에 상관없이 모든 변수들을 동일하게 축소 추정하기 때문에 추정의 효율성이 감소될 수 있다. 또한, 집단별 개체수가 상이한 불균형 자료의 분류분석에서는 분류함수가 편향되어 추정되므로 소수집단의 예측력이 하락할 수 있다. 이러한 문제점들을 보완하기 위해 본 논문에서는 적응적 조율모수를 사용하여 변수선택의 성능을 개선하고 집단별 오분류 비용을 차등적으로 부여하는 WAH-SVM을 제안하였다. 또한, 모의실험과 실제자료 분석을 통하여 제안한 모형과 기존 방법론들의 성능 비교하였으며, 제안한 모형의 유용성과 활용 가능성 확인하였다.

조리외식전공 대학생의 전공선택동기가 진로효능감과 전공만족에 미치는 영향 관계 (The Effects of Major Selection Motivation on Career Efficacy and Major Satisfaction of College Students majoring in Culinary Art and Foodservice Management)

  • 채현석
    • 한국조리학회지
    • /
    • 제23권5호
    • /
    • pp.34-47
    • /
    • 2017
  • This study is designed to figure out the effects of major selection motivation on career efficacy and major satisfaction of college students majoring in culinary and foodservice management. To achieve this purpose, a survey was carried out to 209 college students. The findings showed that their major selection had a significant effect on their career efficacy and major satisfaction. But the mediating effect of their career efficacy as a mediator - which improves their major selection and major satisfaction - was partially adopted. Consequently, their internal and external participation motivation for their major selection is a facilitating mechanism to maximize their major satisfaction, and it is necessary to limit the use as a mediating variable of their career efficacy.

민간의료보험이 의료기관 종별 선택에 미치는 영향: 관절염 환자의 외래 이용을 중심으로 (The Impacts of Private Health Insurance on Medical Institution Selection: Evidence from Outpatient Service Utilization among Arthritis Patients)

  • 유창훈;강성욱;최지헌;권영대
    • 한국병원경영학회지
    • /
    • 제22권2호
    • /
    • pp.58-69
    • /
    • 2017
  • Recently, with the increase in the number of private health insurance subscribers, interest in overuse of the medical service is increasing. This study analyzed the impacts of private health insurance (PHI) on medical institution selection in outpatient service utilization among persons with arthritis. In order to control patients' health status, we extracted outpatient episodes with the same disease (KCD6, M13) from Korea Health Panel. The unit of analysis was an outpatient visit with arthritis in 2014 (n=23,363). In the light of insurance coverage, we redefined three type of private health insurance (ex, indemnity, fixed benefit, and non-insured) as a test variable and two type of medical institution (ex, hospital and physician visit) as a dependent variable. We conducted a probit regression analysis to identify the impacts of PHI on medical institution selection controlling for heteroscedasticity. The results of this study showed that the insured with indemnity were more likely to choose hospital departments than clinics (marginal effect=0.0475, p=0.000). The impact of participation of fixed benefit PHI was not as clear as that of indemnity type (marginal effect=0.0162, p=0.047). In conclusion, this study confirmed that PHI, particularly indemnity type has a significant impact on the selection of medical institutions. Healthcare policy makers should consider that PHI not only affects the overall quantitative increase in healthcare utilization, but also influences the selection of medical institutions.

비선형 시계열 하천생태모형 개발과정 중 시간지연단계와 입력변수, 모형 예측성 간 관계평가 (Relationship among Degree of Time-delay, Input Variables, and Model Predictability in the Development Process of Non-linear Ecological Model in a River Ecosystem)

  • 정광석;김동균;윤주덕;라긍환;김현우;주기재
    • 생태와환경
    • /
    • 제43권1호
    • /
    • pp.161-167
    • /
    • 2010
  • In this study, we implemented an experimental approach of ecological model development in order to emphasize the importance of input variable selection with respect to time-delayed arrangement between input and output variables. Time-series modeling requires relevant input variable selection for the prediction of a specific output variable (e.g. density of a species). Inadequate variable utility for input often causes increase of model construction time and low efficiency of developed model when applied to real world representation. Therefore, for future prediction, researchers have to decide number of time-delay (e.g. months, weeks or days; t-n) to predict a certain phenomenon at current time t. We prepared a total of 3,900 equation models produced by Time-Series Optimized Genetic Programming (TSOGP) algorithm, for the prediction of monthly averaged density of a potamic phytoplankton species Stephanodiscus hantzschii, considering future prediction from 0- (no future prediction) to 12-months ahead (interval by 1 month; 300 equations per each month-delay). From the investigation of model structure, input variable selectivity was obviously affected by the time-delay arrangement, and the model predictability was related with the type of input variables. From the results, we can conclude that, although Machine Learning (ML) algorithms which have popularly been used in Ecological Informatics (EI) provide high performance in future prediction of ecological entities, the efficiency of models would be lowered unless relevant input variables are selectively used.

상호정보량 기법을 적용한 인공신경망 입력자료의 선정 (Input Variables Selection of Artificial Neural Network Using Mutual Information)

  • 한광희;류용준;김태순;허준행
    • 한국수자원학회논문집
    • /
    • 제43권1호
    • /
    • pp.81-94
    • /
    • 2010
  • 본 연구는 인공신경망의 성능을 향상시키기 위한 여러 가지 방법들 중의 하나인 입력변수 선정기법에 관한 연구로서, 일반적으로 널리 사용되고 있는 상관계수를 이용한 입력변수 선정기법 외에 상호정보량을 활용한 방법을 적용하여 인공신경망의 성능을 향상시키고자 하였다. 대상자료는 기상청에서 제공하는 RDAPS자료의 152개 출력값으로 지상강우량의 예측값인 APCP를 포함하고 있으며, 강우관측값간의 상호정보량을 구해 가장 영향력이 큰 변수를 입력변수로 사용하였다. 기존연구결과, 그리고 상관계수만을 이용해서 입력변수를 선정한 결과와 비교해볼 때, 상호정보량을 적용한 경우 입력변수는 주로 바람과 관련된 변수들이 선정되었으며, 평균제곱근오차, 평균제곱근상대오차, 그룹별로 구분한 경우의 절대오차, 그리고 구간별로 구분한 경우의 상대오차를 비교한 경과 상호정보량을 이용한 입력변수 선정방법의 정확도가 전반적으로 높은 것으로 나타났으며, 특히 강우량이 상대적으로 큰 경우의 오차를 많이 감소시킬 수 있는 것으로 나타났다.

비방향 DEA 게임 교차효율성을 이용한 공급업체 선정방법 (A Non-Oriented DEA Game Cross Efficiency Model for Supplier Selection)

  • 임성묵
    • 산업경영시스템학회지
    • /
    • 제38권2호
    • /
    • pp.108-119
    • /
    • 2015
  • This study intends to propose a non-oriented DEA based game cross-efficiency approach for supplier selection. With a discussion on the choice of DEA models and approaches that are most appropriate for supplier selection, we propose a game cross efficiency model based upon the non-oriented variable returns-to-scale RAM DEA by adapting the existing game cross efficiency model based upon the oriented constant returns-to-scale CCR DEA. We develop the RAM game cross efficiency model and a convergent iterative solution procedure to find the best game cross efficiency scores that constitute a Nash equilibrium. We illustrate the proposed model with two data sets of supplier selection, and demonstrate that significantly different results are obtained when compared with the existing approaches.

ELCIC: An R package for model selection using the empirical-likelihood based information criterion

  • Chixiang Chen;Biyi Shen;Ming Wang
    • Communications for Statistical Applications and Methods
    • /
    • 제30권4호
    • /
    • pp.355-368
    • /
    • 2023
  • This article introduces the R package ELCIC (https://cran.r-project.org/web/packages/ELCIC/index.html), which provides an empirical likelihood-based information criterion (ELCIC) for model selection that includes, but is not limited to, variable selection. The empirical likelihood is a semi-parametric approach to draw statistical inference that does not require distribution assumptions for data generation. Therefore, ELCIC is more robust and versatile in the context of model selection compared to the currently existing information criteria. This paper illustrates several applications of ELCIC, including its use in generalized linear models, generalized estimating equations (GEE) for longitudinal data, and weighted GEE (WGEE) for missing longitudinal data under the mechanisms of missing at random and dropout.

이동통신 마이크로셀 기지국의 최적 위치 선정을 위한 전파경로 해석 (An Propagation Path Analysis for Optimal Position Selection of Microcell Base Station in the Mobile Communication System)

  • 노순국;박창균
    • 한국음향학회지
    • /
    • 제18권7호
    • /
    • pp.92-100
    • /
    • 1999
  • 마이크로셀 이동 통신에서 기지국으로부터 이동국까지의 전파환경을 보다 신속하고 정확히 해석하기 위해 전파의 반사횟수와 전파경로를 연산처리 할 수 있는 삼각해석법 알고리즘을 제안한다. 그리고, 이동국이 가시거리 영역의 전파 음영지역 또는 가시거리 영역과 임의의 경사각으로 기울어져 있는 비가시거리 영역의 전파 음영지역에 위치한 경우를 각각 가정하고 제안 알고리즘을 시뮬레이션하여, 그 결과를 분석함으로써 마이크로셀 이동 통신의 최적 기지국 위치 선정 조건을 제시한다.

  • PDF

사용편의성 모델수립을 위한 제품 설계 변수의 선별방법 : 유전자 알고리즘 접근방법 (A Method for Screening Product Design Variables for Building A Usability Model : Genetic Algorithm Approach)

  • 양희철;한성호
    • 대한인간공학회지
    • /
    • 제20권1호
    • /
    • pp.45-62
    • /
    • 2001
  • This study suggests a genetic algorithm-based partial least squares (GA-based PLS) method to select the design variables for building a usability model. The GA-based PLS uses a genetic algorithm to minimize the root-mean-squared error of a partial least square regression model. A multiple linear regression method is applied to build a usability model that contains the variables seleded by the GA-based PLS. The performance of the usability model turned out to be generally better than that of the previous usability models using other variable selection methods such as expert rating, principal component analysis, cluster analysis, and partial least squares. Furthermore, the model performance was drastically improved by supplementing the category type variables selected by the GA-based PLS in the usability model. It is recommended that the GA-based PLS be applied to the variable selection for developing a usability model.

  • PDF

혼합형 데이터에 대한 나무형 군집화 (Tree-structured Clustering for Mixed Data)

  • 양경숙;허명회
    • 응용통계연구
    • /
    • 제19권2호
    • /
    • pp.271-282
    • /
    • 2006
  • 본 논문에서는 범주형과 연속형 변수들이 혼합된 데이터에 적용할 수 있는 나무형 군집화 알고리즘을 제안하였다. 특히 혼합된 변수들이 공통의 의미를 갖도록 하기 위해 범주형 변수들을 전처리하는 방법을 고안하였다. 수치 예로서 SPSS의 신용(credit) 데이터와 독일신용자료(German credit data)에 알고리즘을 적용하고 그 결과를 검토하였다.