• Title/Summary/Keyword: Input Variable Importance

Search Result 46, Processing Time 0.027 seconds

Input Variable Importance in Supervised Learning Models

  • Huh, Myung-Hoe;Lee, Yong Goo
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.1
    • /
    • pp.239-246
    • /
    • 2003
  • Statisticians, or data miners, are often requested to assess the importances of input variables in the given supervised learning model. For the purpose, one may rely on separate ad hoc measures depending on modeling types, such as linear regressions, the neural networks or trees. Consequently, the conceptual consistency in input variable importance measures is lacking, so that the measures cannot be directly used in comparing different types of models, which is often done in data mining processes, In this short communication, we propose a unified approach to the importance measurement of input variables. Our method uses sensitivity analysis which begins by perturbing the values of input variables and monitors the output change. Research scope is limited to the models for continuous output, although it is not difficult to extend the method to supervised learning models for categorical outcomes.

Relationship among Degree of Time-delay, Input Variables, and Model Predictability in the Development Process of Non-linear Ecological Model in a River Ecosystem (비선형 시계열 하천생태모형 개발과정 중 시간지연단계와 입력변수, 모형 예측성 간 관계평가)

  • Jeong, Kwang-Seuk;Kim, Dong-Kyun;Yoon, Ju-Duk;La, Geung-Hwan;Kim, Hyun-Woo;Joo, Gea-Jae
    • Korean Journal of Ecology and Environment
    • /
    • v.43 no.1
    • /
    • pp.161-167
    • /
    • 2010
  • In this study, we implemented an experimental approach of ecological model development in order to emphasize the importance of input variable selection with respect to time-delayed arrangement between input and output variables. Time-series modeling requires relevant input variable selection for the prediction of a specific output variable (e.g. density of a species). Inadequate variable utility for input often causes increase of model construction time and low efficiency of developed model when applied to real world representation. Therefore, for future prediction, researchers have to decide number of time-delay (e.g. months, weeks or days; t-n) to predict a certain phenomenon at current time t. We prepared a total of 3,900 equation models produced by Time-Series Optimized Genetic Programming (TSOGP) algorithm, for the prediction of monthly averaged density of a potamic phytoplankton species Stephanodiscus hantzschii, considering future prediction from 0- (no future prediction) to 12-months ahead (interval by 1 month; 300 equations per each month-delay). From the investigation of model structure, input variable selectivity was obviously affected by the time-delay arrangement, and the model predictability was related with the type of input variables. From the results, we can conclude that, although Machine Learning (ML) algorithms which have popularly been used in Ecological Informatics (EI) provide high performance in future prediction of ecological entities, the efficiency of models would be lowered unless relevant input variables are selectively used.

Effects of Input Variables in Radiological Accident Consequence Assessment

  • Han, Moon-Hee;Hwang, Won-Tae;Kim, Eun-Han;Suh, Kyung-Suk;Park, Young-Gil
    • Proceedings of the Korean Nuclear Society Conference
    • /
    • 1998.05b
    • /
    • pp.659-664
    • /
    • 1998
  • The importance of input wariables of real-time accident consequence assessment model has been analyzed. Partial correlation coefficients of input variables related to the plume and the ingestion exposure have been estimated using latino hypercube sampling technique. It is known that wind speed and growth dilution rate are the most important variable in plume and ingestion exposure, respectively.

  • PDF

Reliability Analysis Under Input Variable and Metamodel Uncertainty Using Simulation Method Based on Bayesian Approach (베이지안 접근법을 이용한 입력변수 및 근사모델 불확실성 하에 서의 신뢰성 분석)

  • An, Da-Wn;Won, Jun-Ho;Kim, Eun-Jeong;Choi, Joo-Ho
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.33 no.10
    • /
    • pp.1163-1170
    • /
    • 2009
  • Reliability analysis is of great importance in the advanced product design, which is to evaluate reliability due to the associated uncertainties. There are three types of uncertainties: the first is the aleatory uncertainty which is related with inherent physical randomness that is completely described by a suitable probability model. The second is the epistemic uncertainty, which results from the lack of knowledge due to the insufficient data. These two uncertainties are encountered in the input variables such as dimensional tolerances, material properties and loading conditions. The third is the metamodel uncertainty which arises from the approximation of the response function. In this study, an integrated method for the reliability analysis is proposed that can address all these uncertainties in a single Bayesian framework. Markov Chain Monte Carlo (MCMC) method is employed to facilitate the simulation of the posterior distribution. Mathematical and engineering examples are used to demonstrate the proposed method.

A Theoretical Study on Time Variable Influences in Clothing Purchase Behavior (시간변수기 의복구매 행동에 미치는 영향에 대한 이론적 연구)

  • 임경복;임숙자
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.18 no.3
    • /
    • pp.355-367
    • /
    • 1994
  • In consumer behavior, money and time have been considered as two important resources as purchase means. Money was treated as an important research variable, but time resource was neglected as an input variable due to lack of well-defined concept and complexity of its nature. Nontheless as industralization and urbanization progress, the importance of time has in- creased. The main objective of this study was to suggest framework of time and time research methodology in clothing and textiles field. This study reviewed both theoretical and empirical research which were performed in diverse research fields. It was suggested that time facotrs, (eg. point, interval, span), should be defined to each decision process as needed, and theoretical frame should be developed accordingly. Time pressure should be included in future for more reliable survey Finally, since clothing can be a personal object, the subjective feeling and environmental factors scold be considered in research.

  • PDF

Two-stage Latin hypercube sampling and its application (이단계 Latin Hypercube 추출법과 그 응용)

  • 임미정;권우주;이주호
    • The Korean Journal of Applied Statistics
    • /
    • v.8 no.2
    • /
    • pp.99-108
    • /
    • 1995
  • When modeling a complicated system with a computer model, it is of vital importance to choos input values efficiently. The Latin Hypercube sampling (LHS) proposed by MaKay et al.(1979) has been most widely used for choosing input values for a computer model. We propose the two-stage Latin Hypercube sampling(TLHS) which is an improved version of the LHS for procucing input values in estimating the excectation of a function of the output variable. The proposed method is applied to simulation study of the performance of a printer actuator and it is shown to outperform the other sampling methods including the LHS in accuracy.

  • PDF

Imprecise DEA Efficiency Assessments : Characterizations and Methods

  • Park, Kyung-Sam
    • Management Science and Financial Engineering
    • /
    • v.14 no.2
    • /
    • pp.67-87
    • /
    • 2008
  • Data envelopment analysis (DEA) has proven to be a useful tool for assessing efficiency or productivity of organizations which is of vital practical importance in managerial decision making. While DEA assumes exact input and output data, the development of imprecise DEA (IDEA) broadens the scope of applications to efficiency evaluations involving imprecise information which implies various forms of ordinal and bounded data possibly or often occurring in practice. The primary purpose of this article is to characterize the variable efficiency in IDEA. Since DEA describes a pair of primal and dual models, also called envelopment and multiplier models, we can basically consider two IDEA models: One incorporates imprecise data into envelopment model and the other includes the same imprecise data in multiplier model. The issues of rising importance are thus the relationships between the two models and how to solve them. The groundwork we will make includes a duality study which makes it possible to characterize the efficiency solutions from the two models. This also relates to why we take into account the variable efficiency and its bounds in IDEA that some of the published IDEA studies have made. We also present computational aspects of the efficiency bounds and how to interpret the efficiency solutions.

Analytical Studies on Medical Utilization Behaviors in Rural Areas (농촌지역주민의 의료이용행위에 영향 주는 자극요인분석)

  • 김영임
    • Journal of Korean Academy of Nursing
    • /
    • v.15 no.2
    • /
    • pp.5-15
    • /
    • 1985
  • This study was conducted for the purpose of fin-ding out the variance explaining the medical facilities utilization behavior, which is defined adaptation behavior Process by focal, contextual, residual stimuli in Roy's Adaptation Model. What kinds of characteristics can explain adaptation behavior in Roy's Model? And which is the relative importance of input variables? For this analysis, stepwise multiple regression and path analysis was used. The data come from the 1981 Baseline Household Interview Survey in remote rural area. The findings of the analysis can be summarized as follows: First, Total variance of independant variables for adaptation behavior, that is medical facilities utilization including clinic, drug store, health center, herb medicine was shown 16.2 percent. The most important variable which explain the dependent variable was the occurance of illness with the Ra of value 0.112. The illness symptom, living level, regular care source was shown important variables with relatively high the R²value and significant beta coefficient. Second, in the path analysis of variables which is selected important variables, the occurance of illness was shown variable which has the highest direct effect which 0.297 path coefficient. Also the education level of household was shown variable which has the highest indirect effect through living level and the occurance of illness in causal model. Third, This analysis suggests that the occurance of illness belonging focal stimuli are more influenced than others. To sum up, It is seem to the occurance of illness, illness symptom belonging focal stimuli have high explanation ability through direct effect, education level of household among contextual stimuli have explanation ability through indirect effect.

  • PDF

Hierarchically penalized support vector machine for the classication of imbalanced data with grouped variables (그룹변수를 포함하는 불균형 자료의 분류분석을 위한 서포트 벡터 머신)

  • Kim, Eunkyung;Jhun, Myoungshic;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.5
    • /
    • pp.961-975
    • /
    • 2016
  • The hierarchically penalized support vector machine (H-SVM) has been developed to perform simultaneous classification and input variable selection when input variables are naturally grouped or generated by factors. However, the H-SVM may suffer from estimation inefficiency because it applies the same amount of shrinkage to each variable without assessing its relative importance. In addition, when analyzing imbalanced data with uneven class sizes, the classification accuracy of the H-SVM may drop significantly in predicting minority class because its classifiers are undesirably biased toward the majority class. To remedy such problems, we propose the weighted adaptive H-SVM (WAH-SVM) method, which uses a adaptive tuning parameters to improve the performance of variable selection and the weights to differentiate the misclassification of data points between classes. Numerical results are presented to demonstrate the competitive performance of the proposed WAH-SVM over existing SVM methods.

Predicting Students' Engagement in Online Courses Using Machine Learning

  • Alsirhani, Jawaher;Alsalem, Khalaf
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.159-168
    • /
    • 2022
  • No one denies the importance of online courses, which provide a very important alternative, especially for students who have jobs that prevent them from attending face-to-face in traditional classes; Engagement is one of the most important fundamental variables that indicate the course's success in achieving its objectives. Therefore, the current study aims to build a model using machine learning to predict student engagement in online courses. An online questionnaire was prepared and applied to the students of Jouf University in the Kingdom of Saudi Arabia, and data was obtained from the input variables in the questionnaire, which are: specialization, gender, academic year, skills, emotional aspects, participation, performance, and engagement in the online course as a dependent variable. Multiple regression was used to analyze the data using SPSS. Kegel was used to build the model as a machine learning technique. The results indicated that there is a positive correlation between the four variables (skills, emotional aspects, participation, and performance) and engagement in online courses. The model accuracy was very high 99.99%, This shows the model's ability to predict engagement in the light of the input variables.