• 제목/요약/키워드: Input Variable Importance

검색결과 47건 처리시간 0.024초

Input Variable Importance in Supervised Learning Models

  • Huh, Myung-Hoe;Lee, Yong Goo
    • Communications for Statistical Applications and Methods
    • /
    • 제10권1호
    • /
    • pp.239-246
    • /
    • 2003
  • Statisticians, or data miners, are often requested to assess the importances of input variables in the given supervised learning model. For the purpose, one may rely on separate ad hoc measures depending on modeling types, such as linear regressions, the neural networks or trees. Consequently, the conceptual consistency in input variable importance measures is lacking, so that the measures cannot be directly used in comparing different types of models, which is often done in data mining processes, In this short communication, we propose a unified approach to the importance measurement of input variables. Our method uses sensitivity analysis which begins by perturbing the values of input variables and monitors the output change. Research scope is limited to the models for continuous output, although it is not difficult to extend the method to supervised learning models for categorical outcomes.

비선형 시계열 하천생태모형 개발과정 중 시간지연단계와 입력변수, 모형 예측성 간 관계평가 (Relationship among Degree of Time-delay, Input Variables, and Model Predictability in the Development Process of Non-linear Ecological Model in a River Ecosystem)

  • 정광석;김동균;윤주덕;라긍환;김현우;주기재
    • 생태와환경
    • /
    • 제43권1호
    • /
    • pp.161-167
    • /
    • 2010
  • In this study, we implemented an experimental approach of ecological model development in order to emphasize the importance of input variable selection with respect to time-delayed arrangement between input and output variables. Time-series modeling requires relevant input variable selection for the prediction of a specific output variable (e.g. density of a species). Inadequate variable utility for input often causes increase of model construction time and low efficiency of developed model when applied to real world representation. Therefore, for future prediction, researchers have to decide number of time-delay (e.g. months, weeks or days; t-n) to predict a certain phenomenon at current time t. We prepared a total of 3,900 equation models produced by Time-Series Optimized Genetic Programming (TSOGP) algorithm, for the prediction of monthly averaged density of a potamic phytoplankton species Stephanodiscus hantzschii, considering future prediction from 0- (no future prediction) to 12-months ahead (interval by 1 month; 300 equations per each month-delay). From the investigation of model structure, input variable selectivity was obviously affected by the time-delay arrangement, and the model predictability was related with the type of input variables. From the results, we can conclude that, although Machine Learning (ML) algorithms which have popularly been used in Ecological Informatics (EI) provide high performance in future prediction of ecological entities, the efficiency of models would be lowered unless relevant input variables are selectively used.

Effects of Input Variables in Radiological Accident Consequence Assessment

  • Han, Moon-Hee;Hwang, Won-Tae;Kim, Eun-Han;Suh, Kyung-Suk;Park, Young-Gil
    • 한국원자력학회:학술대회논문집
    • /
    • 한국원자력학회 1998년도 춘계학술발표회논문집(2)
    • /
    • pp.659-664
    • /
    • 1998
  • The importance of input wariables of real-time accident consequence assessment model has been analyzed. Partial correlation coefficients of input variables related to the plume and the ingestion exposure have been estimated using latino hypercube sampling technique. It is known that wind speed and growth dilution rate are the most important variable in plume and ingestion exposure, respectively.

  • PDF

베이지안 접근법을 이용한 입력변수 및 근사모델 불확실성 하에 서의 신뢰성 분석 (Reliability Analysis Under Input Variable and Metamodel Uncertainty Using Simulation Method Based on Bayesian Approach)

  • 안다운;원준호;김은정;최주호
    • 대한기계학회논문집A
    • /
    • 제33권10호
    • /
    • pp.1163-1170
    • /
    • 2009
  • Reliability analysis is of great importance in the advanced product design, which is to evaluate reliability due to the associated uncertainties. There are three types of uncertainties: the first is the aleatory uncertainty which is related with inherent physical randomness that is completely described by a suitable probability model. The second is the epistemic uncertainty, which results from the lack of knowledge due to the insufficient data. These two uncertainties are encountered in the input variables such as dimensional tolerances, material properties and loading conditions. The third is the metamodel uncertainty which arises from the approximation of the response function. In this study, an integrated method for the reliability analysis is proposed that can address all these uncertainties in a single Bayesian framework. Markov Chain Monte Carlo (MCMC) method is employed to facilitate the simulation of the posterior distribution. Mathematical and engineering examples are used to demonstrate the proposed method.

시간변수기 의복구매 행동에 미치는 영향에 대한 이론적 연구 (A Theoretical Study on Time Variable Influences in Clothing Purchase Behavior)

  • 임경복;임숙자
    • 한국의류학회지
    • /
    • 제18권3호
    • /
    • pp.355-367
    • /
    • 1994
  • In consumer behavior, money and time have been considered as two important resources as purchase means. Money was treated as an important research variable, but time resource was neglected as an input variable due to lack of well-defined concept and complexity of its nature. Nontheless as industralization and urbanization progress, the importance of time has in- creased. The main objective of this study was to suggest framework of time and time research methodology in clothing and textiles field. This study reviewed both theoretical and empirical research which were performed in diverse research fields. It was suggested that time facotrs, (eg. point, interval, span), should be defined to each decision process as needed, and theoretical frame should be developed accordingly. Time pressure should be included in future for more reliable survey Finally, since clothing can be a personal object, the subjective feeling and environmental factors scold be considered in research.

  • PDF

이단계 Latin Hypercube 추출법과 그 응용 (Two-stage Latin hypercube sampling and its application)

  • 임미정;권우주;이주호
    • 응용통계연구
    • /
    • 제8권2호
    • /
    • pp.99-108
    • /
    • 1995
  • 본 논문에서는 컴퓨터 모델을 이용하여 복잡한 시스템을 모형화할 때 결과값의 분포를 보다 정확히 추정하기 위한 입력변수의 추출방법으로서 McKay 등(1979)이 제안한 Latin Hypercube 추출법을 개선한 이단계 Latin Hypercube 추출법을 제시하고 모의 실험을 통하여 새로운 표본추출법이 기존의 표본추출법들보다 더 효율적임을 보였다.

  • PDF

Imprecise DEA Efficiency Assessments : Characterizations and Methods

  • Park, Kyung-Sam
    • Management Science and Financial Engineering
    • /
    • 제14권2호
    • /
    • pp.67-87
    • /
    • 2008
  • Data envelopment analysis (DEA) has proven to be a useful tool for assessing efficiency or productivity of organizations which is of vital practical importance in managerial decision making. While DEA assumes exact input and output data, the development of imprecise DEA (IDEA) broadens the scope of applications to efficiency evaluations involving imprecise information which implies various forms of ordinal and bounded data possibly or often occurring in practice. The primary purpose of this article is to characterize the variable efficiency in IDEA. Since DEA describes a pair of primal and dual models, also called envelopment and multiplier models, we can basically consider two IDEA models: One incorporates imprecise data into envelopment model and the other includes the same imprecise data in multiplier model. The issues of rising importance are thus the relationships between the two models and how to solve them. The groundwork we will make includes a duality study which makes it possible to characterize the efficiency solutions from the two models. This also relates to why we take into account the variable efficiency and its bounds in IDEA that some of the published IDEA studies have made. We also present computational aspects of the efficiency bounds and how to interpret the efficiency solutions.

농촌지역주민의 의료이용행위에 영향 주는 자극요인분석 (Analytical Studies on Medical Utilization Behaviors in Rural Areas)

  • 김영임
    • 대한간호학회지
    • /
    • 제15권2호
    • /
    • pp.5-15
    • /
    • 1985
  • This study was conducted for the purpose of fin-ding out the variance explaining the medical facilities utilization behavior, which is defined adaptation behavior Process by focal, contextual, residual stimuli in Roy's Adaptation Model. What kinds of characteristics can explain adaptation behavior in Roy's Model? And which is the relative importance of input variables? For this analysis, stepwise multiple regression and path analysis was used. The data come from the 1981 Baseline Household Interview Survey in remote rural area. The findings of the analysis can be summarized as follows: First, Total variance of independant variables for adaptation behavior, that is medical facilities utilization including clinic, drug store, health center, herb medicine was shown 16.2 percent. The most important variable which explain the dependent variable was the occurance of illness with the Ra of value 0.112. The illness symptom, living level, regular care source was shown important variables with relatively high the R²value and significant beta coefficient. Second, in the path analysis of variables which is selected important variables, the occurance of illness was shown variable which has the highest direct effect which 0.297 path coefficient. Also the education level of household was shown variable which has the highest indirect effect through living level and the occurance of illness in causal model. Third, This analysis suggests that the occurance of illness belonging focal stimuli are more influenced than others. To sum up, It is seem to the occurance of illness, illness symptom belonging focal stimuli have high explanation ability through direct effect, education level of household among contextual stimuli have explanation ability through indirect effect.

  • PDF

그룹변수를 포함하는 불균형 자료의 분류분석을 위한 서포트 벡터 머신 (Hierarchically penalized support vector machine for the classication of imbalanced data with grouped variables)

  • 김은경;전명식;방성완
    • 응용통계연구
    • /
    • 제29권5호
    • /
    • pp.961-975
    • /
    • 2016
  • H-SVM은 입력변수들이 그룹화 되어 있는 경우 분류함수의 추정에서 그룹 및 그룹 내의 변수선택을 동시에 할 수 있는 방법론이다. 그러나 H-SVM은 입력변수들의 중요도에 상관없이 모든 변수들을 동일하게 축소 추정하기 때문에 추정의 효율성이 감소될 수 있다. 또한, 집단별 개체수가 상이한 불균형 자료의 분류분석에서는 분류함수가 편향되어 추정되므로 소수집단의 예측력이 하락할 수 있다. 이러한 문제점들을 보완하기 위해 본 논문에서는 적응적 조율모수를 사용하여 변수선택의 성능을 개선하고 집단별 오분류 비용을 차등적으로 부여하는 WAH-SVM을 제안하였다. 또한, 모의실험과 실제자료 분석을 통하여 제안한 모형과 기존 방법론들의 성능 비교하였으며, 제안한 모형의 유용성과 활용 가능성 확인하였다.

Predicting Students' Engagement in Online Courses Using Machine Learning

  • Alsirhani, Jawaher;Alsalem, Khalaf
    • International Journal of Computer Science & Network Security
    • /
    • 제22권9호
    • /
    • pp.159-168
    • /
    • 2022
  • No one denies the importance of online courses, which provide a very important alternative, especially for students who have jobs that prevent them from attending face-to-face in traditional classes; Engagement is one of the most important fundamental variables that indicate the course's success in achieving its objectives. Therefore, the current study aims to build a model using machine learning to predict student engagement in online courses. An online questionnaire was prepared and applied to the students of Jouf University in the Kingdom of Saudi Arabia, and data was obtained from the input variables in the questionnaire, which are: specialization, gender, academic year, skills, emotional aspects, participation, performance, and engagement in the online course as a dependent variable. Multiple regression was used to analyze the data using SPSS. Kegel was used to build the model as a machine learning technique. The results indicated that there is a positive correlation between the four variables (skills, emotional aspects, participation, and performance) and engagement in online courses. The model accuracy was very high 99.99%, This shows the model's ability to predict engagement in the light of the input variables.