• Title/Summary/Keyword: Input Variable Selection

Search Result 67, Processing Time 0.038 seconds

Impact of Diverse Configuration in Multivariate Bias Correction Methods on Large-Scale Climate Variable Simulations under Climate Change

  • de Padua, Victor Mikael N.;Ahn Kuk-Hyun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.161-161
    • /
    • 2023
  • Bias correction of values is a necessary step in downscaling coarse and systematically biased global climate models for use in local climate change impact studies. In addition to univariate bias correction methods, many multivariate methods which correct multiple variables jointly - each with their own mathematical designs - have been developed recently. While some literature have focused on the inter-comparison of these multivariate bias correction methods, none have focused extensively on the effect of diverse configurations (i.e., different combinations of input variables to be corrected) of climate variables, particularly high-dimensional ones, on the ability of the different methods to remove biases in uni- and multivariate statistics. This study evaluates the impact of three configurations (inter-variable, inter-spatial, and full dimensional dependence configurations) on four state-of-the-art multivariate bias correction methods in a national-scale domain over South Korea using a gridded approach. An inter-comparison framework evaluating the performance of the different combinations of configurations and bias correction methods in adjusting various climate variable statistics was created. Precipitation, maximum, and minimum temperatures were corrected across 306 high-resolution (0.2°) grid cells and were evaluated. Results show improvements in most methods in correcting various statistics when implementing high-dimensional configurations. However, some instabilities were observed, likely tied to the mathematical designs of the methods, informing that some multivariate bias correction methods are incompatible with high-dimensional configurations highlighting the potential for further improvements in the field, as well as the importance of proper selection of the correction method specific to the needs of the user.

  • PDF

Estimation Technique of Computationally Variable Distance Step in 1-D Numerical Model (1차원 수치모형의 가변 계산거리간격 추정 기법)

  • Kim, Keuk-Soo;Kim, Ji-Sung;Kim, Won
    • Journal of Korea Water Resources Association
    • /
    • v.44 no.5
    • /
    • pp.363-376
    • /
    • 2011
  • 1-D hydrodynamic numerical models have been most widely used in the field of flood analysis. The model's input data are upstream/downstream boundaries, roughness coefficients, cross-sections, and so on, and computational distance step and time step are the most important factors in order to guarantee the computational accuracy, stability, and efficiency. In this study, a theoretical explanation is presented for the basis of the previous empirical selection criteria of cross-section's location; also, the estimation technique of computationally variable distance step is proposed to reflect the properties of flow at every computational time step. Combining this technique with 1-D unsteady numerical model, it was applied to two events of Teton dam failure flood and the Han River flood. The numerical experimental results demonstrate that the accuracy and stability is increased when used more interpolated cross-sections and show that the proposed technique of computationally variable distance step has the same order of accuracy with smaller numbers of cross-section than previous empirical selection criteria. The practical use of this technique will be possible to analyze the river floods with high efficiency as well as accuracy and stability.

A Study on the Constructions MOVAGs based on Operation Algorithm for Multiple Valued Logic Function and Circuits Design using T-gate (다치 논리 함수 연산 알고리즘에 기초한 MOVAG 구성과 T-gate를 이용한 회로 설계에 관한 연구)

  • Yoon, Byoung-Hee;Park, Soo-Jin;Kim, Heung-Soo
    • Journal of IKEEE
    • /
    • v.8 no.1 s.14
    • /
    • pp.22-32
    • /
    • 2004
  • In this paper, we proposed MOVAG(Multi Output Value Array Graphs) based on OVAG by Honghai Jiang to construct multiple valued logic function The MDD(Muliple-valued Decision Diagra) needs many processing time and efforts in circuit design for given multi-variable function by D.M.Miller, and we designed a MOVAG which has reduce the data processing time and low complexity. We propose the construction algorithm and input matrix selection algorithm and we designed the multiple-valued logic circuit using T-gate and verified by simulation results.

  • PDF

A Study on Forest Land Classification Using Multivariate Statistical Methods : A Case Study at Mt. Kwanak (다변수통계방법을 이용한 산지분류에 관한 연구)

  • 정순오
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.13 no.1
    • /
    • pp.43-66
    • /
    • 1985
  • Korea needs proper and rational public policies on conservation and use of forest land and other natural resources because of the accelerating expansion of national land developments in recent years. Unfortunately, there is no systematic planning system to support the needs. Generally, forest land use planning needs suitability analysis based on efficient land classification system. The goal of this study was to classify a forest land using multivariate satistical methods. A case study was carried out in winter of 1983 on a mountainous area higher than 100m above sea level located at Mt. Kwanak in Anyang -city, Kyung-gi-do (province). The study area was 19.80 km$^2$wide and was divided into 1, 383 Operational Taxonomic Units (OTU's) by a 120m$\times$120m grid. Fourteen descriptors were identified and quantified for each OTU from existing national land data : elevation, slope, aspect, terrain form, geologic material, surface soil permeability, topsoil type, depth of the solum, soil acidity, forest cover type, stand size class, stand age class, stand density class, and simple forest soil capability class. For this study, a FORTRAN IV program was written for input and output map data, and the computer statistics packages, SPSS and BMD, were used to perform the multivariate statistical analysis. Fourteen variables were analyzed to investigate the characteristics of their fire quench distribution and to estimate the correlation coefficients among them. Principal component analysis was executed to find the dimensions of forest land characteristics, and factor scores were used for proper samples of OTU throughout the study area. In order to develop the classes of forest land classification based on 102 surrogates, cluster and discriminant analyses of principal descriptor variable matrix were undertaken. Results obtained through a series of multivariate statistical analyses were as follows ; 1) Principal component analysis was proved to be a useful tool for data selection and identification of principal descriptor variables which represented the characteristics of forest land and facilitated the selection of samples.

  • PDF

A Study on Selecting Principle Component Variables Using Adaptive Correlation (적응적 상관도를 이용한 주성분 변수 선정에 관한 연구)

  • Ko, Myung-Sook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.3
    • /
    • pp.79-84
    • /
    • 2021
  • A feature extraction method capable of reflecting features well while mainaining the properties of data is required in order to process high-dimensional data. The principal component analysis method that converts high-level data into low-dimensional data and express high-dimensional data with fewer variables than the original data is a representative method for feature extraction of data. In this study, we propose a principal component analysis method based on adaptive correlation when selecting principal component variables in principal component analysis for data feature extraction when the data is high-dimensional. The proposed method analyzes the principal components of the data by adaptively reflecting the correlation based on the correlation between the input data. I want to exclude them from the candidate list. It is intended to analyze the principal component hierarchy by the eigen-vector coefficient value, to prevent the selection of the principal component with a low hierarchy, and to minimize the occurrence of data duplication inducing data bias through correlation analysis. Through this, we propose a method of selecting a well-presented principal component variable that represents the characteristics of actual data by reducing the influence of data bias when selecting the principal component variable.

Development and application of prediction model of hyperlipidemia using SVM and meta-learning algorithm (SVM과 meta-learning algorithm을 이용한 고지혈증 유병 예측모형 개발과 활용)

  • Lee, Seulki;Shin, Taeksoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.111-124
    • /
    • 2018
  • This study aims to develop a classification model for predicting the occurrence of hyperlipidemia, one of the chronic diseases. Prior studies applying data mining techniques for predicting disease can be classified into a model design study for predicting cardiovascular disease and a study comparing disease prediction research results. In the case of foreign literatures, studies predicting cardiovascular disease were predominant in predicting disease using data mining techniques. Although domestic studies were not much different from those of foreign countries, studies focusing on hypertension and diabetes were mainly conducted. Since hypertension and diabetes as well as chronic diseases, hyperlipidemia, are also of high importance, this study selected hyperlipidemia as the disease to be analyzed. We also developed a model for predicting hyperlipidemia using SVM and meta learning algorithms, which are already known to have excellent predictive power. In order to achieve the purpose of this study, we used data set from Korea Health Panel 2012. The Korean Health Panel produces basic data on the level of health expenditure, health level and health behavior, and has conducted an annual survey since 2008. In this study, 1,088 patients with hyperlipidemia were randomly selected from the hospitalized, outpatient, emergency, and chronic disease data of the Korean Health Panel in 2012, and 1,088 nonpatients were also randomly extracted. A total of 2,176 people were selected for the study. Three methods were used to select input variables for predicting hyperlipidemia. First, stepwise method was performed using logistic regression. Among the 17 variables, the categorical variables(except for length of smoking) are expressed as dummy variables, which are assumed to be separate variables on the basis of the reference group, and these variables were analyzed. Six variables (age, BMI, education level, marital status, smoking status, gender) excluding income level and smoking period were selected based on significance level 0.1. Second, C4.5 as a decision tree algorithm is used. The significant input variables were age, smoking status, and education level. Finally, C4.5 as a decision tree algorithm is used. In SVM, the input variables selected by genetic algorithms consisted of 6 variables such as age, marital status, education level, economic activity, smoking period, and physical activity status, and the input variables selected by genetic algorithms in artificial neural network consist of 3 variables such as age, marital status, and education level. Based on the selected parameters, we compared SVM, meta learning algorithm and other prediction models for hyperlipidemia patients, and compared the classification performances using TP rate and precision. The main results of the analysis are as follows. First, the accuracy of the SVM was 88.4% and the accuracy of the artificial neural network was 86.7%. Second, the accuracy of classification models using the selected input variables through stepwise method was slightly higher than that of classification models using the whole variables. Third, the precision of artificial neural network was higher than that of SVM when only three variables as input variables were selected by decision trees. As a result of classification models based on the input variables selected through the genetic algorithm, classification accuracy of SVM was 88.5% and that of artificial neural network was 87.9%. Finally, this study indicated that stacking as the meta learning algorithm proposed in this study, has the best performance when it uses the predicted outputs of SVM and MLP as input variables of SVM, which is a meta classifier. The purpose of this study was to predict hyperlipidemia, one of the representative chronic diseases. To do this, we used SVM and meta-learning algorithms, which is known to have high accuracy. As a result, the accuracy of classification of hyperlipidemia in the stacking as a meta learner was higher than other meta-learning algorithms. However, the predictive performance of the meta-learning algorithm proposed in this study is the same as that of SVM with the best performance (88.6%) among the single models. The limitations of this study are as follows. First, various variable selection methods were tried, but most variables used in the study were categorical dummy variables. In the case with a large number of categorical variables, the results may be different if continuous variables are used because the model can be better suited to categorical variables such as decision trees than general models such as neural networks. Despite these limitations, this study has significance in predicting hyperlipidemia with hybrid models such as met learning algorithms which have not been studied previously. It can be said that the result of improving the model accuracy by applying various variable selection techniques is meaningful. In addition, it is expected that our proposed model will be effective for the prevention and management of hyperlipidemia.

Temporal distritution analysis of design rainfall by significance test of regression coefficients (회귀계수의 유의성 검정방법에 따른 설계강우량 시간분포 분석)

  • Park, Jin Heea;Lee, Jae Joon
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.4
    • /
    • pp.257-266
    • /
    • 2022
  • Inundation damage is increasing every year due to localized heavy rain and an increase of rainfall exceeding the design frequency. Accordingly, the importance of hydraulic structures for flood control and defense is also increasing. The hydraulic structures are designed according to its purpose and performance, and the amount of flood is an important calculation factor. However, in Korea, design rainfall is used as input data for hydrological analysis for the design of hydraulic structures due to the lack of sufficient data and the lack of reliability of observation data. Accurate probability rainfall and its temporal distribution are important factors to estimate the design rainfall. In practice, the regression equation of temporal distribution for the design rainfall is calculated using the cumulative rainfall percentage of Huff's quartile method. In addition, the 6th order polynomial regression equation which shows high overall accuracy, is uniformly used. In this study, the optimized regression equation of temporal distribution is derived using the variable selection method according to the principle of parsimony in statistical modeling. The derived regression equation of temporal distribution is verified through the significance test. As a result of this study, it is most appropriate to derive the regression equation of temporal distribution using the stepwise selection method, which has the advantages of both forward selection and backward elimination.

Modeling of a PEM Fuel Cell Stack using Partial Least Squares and Artificial Neural Networks (부분최소자승법과 인공신경망을 이용한 고분자전해질 연료전지 스택의 모델링)

  • Han, In-Su;Shin, Hyun Khil
    • Korean Chemical Engineering Research
    • /
    • v.53 no.2
    • /
    • pp.236-242
    • /
    • 2015
  • We present two data-driven modeling methods, partial least square (PLS) and artificial neural network (ANN), to predict the major operating and performance variables of a polymer electrolyte membrane (PEM) fuel cell stack. PLS and ANN models were constructed using the experimental data obtained from the testing of a 30 kW-class PEM fuel cell stack, and then were compared with each other in terms of their prediction and computational performances. To reduce the complexity of the models, we combined a variables importance on PLS projection (VIP) as a variable selection method into the modeling procedure in which the predictor variables are selected from a set of input operation variables. The modeling results showed that the ANN models outperformed the PLS models in predicting the average cell voltage and cathode outlet temperature of the fuel cell stack. However, the PLS models also offered satisfactory prediction performances although they can only capture linear correlations between the predictor and output variables. Depending on the degree of modeling accuracy and speed, both ANN and PLS models can be employed for performance predictions, offline and online optimizations, controls, and fault diagnoses in the field of PEM fuel cell designs and operations.

Multi-Modal Controller Usability for Smart TV Control

  • Yu, Jeongil;Kim, Seongmin;Choe, Jaeho;Jung, Eui S.
    • Journal of the Ergonomics Society of Korea
    • /
    • v.32 no.6
    • /
    • pp.517-528
    • /
    • 2013
  • Objective: The objective of this study was to suggest a multi-modal controller type for Smart TV Control. Background: Recently, many issues regarding the Smart TV are arising due to the rising complexity of features in a Smart TV. One of the specific issues involves what type of controller must be utilized in order to perform regulated tasks. This study examines the ongoing trend of the controller. Method: The selected participants had experiences with the Smart TV and were 20 to 30 years of age. A pre-survey determined the first independent variable of five tasks(Live TV, Record, Share, Web, App Store). The second independent variable was the type of controllers(Conventional, Mouse, Voice-Based Remote Controllers). The dependent variables were preference, task completion time, and error rate. The experiment consist a series of three experiments. The first experiment utilized a uni-modal Controller for tasks; the second experiment utilized a dual-modal Controller, while the third experiment utilized a triple-modal Controller. Results: The first experiment revealed that the uni-modal Controller (Conventional, Voice Controller) showed the best results for the Live TV task. The second experiment revealed that the dual-modal Controller(Conventional-Voice, Conventional-Mouse combinations) showed the best results for the Share, Web, App Store tasks. The third experiment revealed that the triple-modal Controller among all the level had not effective compared with dual-modal Controller. Conclusion: In order to control simple tasks in a smart TV, our results showed that a uni-modal Controller was more effective than a dual-modal controller. However, the control of complex tasks was better suited to the dual-modal Controller. User preference for a controller differs according the Smart TV functions. For instance, there was a high user preference for the uni-Controller for simple functions while high user preference appeared for Dual-Controllers when the task was complex. Additionally, in accordance with task characteristics, there was a high user preference for the Voice Controller for channel and volume adjustment. Furthermore, there was a high user preference for the Conventional Controller for menu selection. In situations where the user had to input text, the Voice Controller had the highest preference among users while the Mouse Type, Voice Controller had the highest user preference for performing a search or selecting items on the menu. Application: The results of this study may be utilized in the design of a controller which can effectively carry out the various tasks of the Smart TV.

A Study on the Prediction Model of Stock Price Index Trend based on GA-MSVM that Simultaneously Optimizes Feature and Instance Selection (입력변수 및 학습사례 선정을 동시에 최적화하는 GA-MSVM 기반 주가지수 추세 예측 모형에 관한 연구)

  • Lee, Jong-sik;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.147-168
    • /
    • 2017
  • There have been many studies on accurate stock market forecasting in academia for a long time, and now there are also various forecasting models using various techniques. Recently, many attempts have been made to predict the stock index using various machine learning methods including Deep Learning. Although the fundamental analysis and the technical analysis method are used for the analysis of the traditional stock investment transaction, the technical analysis method is more useful for the application of the short-term transaction prediction or statistical and mathematical techniques. Most of the studies that have been conducted using these technical indicators have studied the model of predicting stock prices by binary classification - rising or falling - of stock market fluctuations in the future market (usually next trading day). However, it is also true that this binary classification has many unfavorable aspects in predicting trends, identifying trading signals, or signaling portfolio rebalancing. In this study, we try to predict the stock index by expanding the stock index trend (upward trend, boxed, downward trend) to the multiple classification system in the existing binary index method. In order to solve this multi-classification problem, a technique such as Multinomial Logistic Regression Analysis (MLOGIT), Multiple Discriminant Analysis (MDA) or Artificial Neural Networks (ANN) we propose an optimization model using Genetic Algorithm as a wrapper for improving the performance of this model using Multi-classification Support Vector Machines (MSVM), which has proved to be superior in prediction performance. In particular, the proposed model named GA-MSVM is designed to maximize model performance by optimizing not only the kernel function parameters of MSVM, but also the optimal selection of input variables (feature selection) as well as instance selection. In order to verify the performance of the proposed model, we applied the proposed method to the real data. The results show that the proposed method is more effective than the conventional multivariate SVM, which has been known to show the best prediction performance up to now, as well as existing artificial intelligence / data mining techniques such as MDA, MLOGIT, CBR, and it is confirmed that the prediction performance is better than this. Especially, it has been confirmed that the 'instance selection' plays a very important role in predicting the stock index trend, and it is confirmed that the improvement effect of the model is more important than other factors. To verify the usefulness of GA-MSVM, we applied it to Korea's real KOSPI200 stock index trend forecast. Our research is primarily aimed at predicting trend segments to capture signal acquisition or short-term trend transition points. The experimental data set includes technical indicators such as the price and volatility index (2004 ~ 2017) and macroeconomic data (interest rate, exchange rate, S&P 500, etc.) of KOSPI200 stock index in Korea. Using a variety of statistical methods including one-way ANOVA and stepwise MDA, 15 indicators were selected as candidate independent variables. The dependent variable, trend classification, was classified into three states: 1 (upward trend), 0 (boxed), and -1 (downward trend). 70% of the total data for each class was used for training and the remaining 30% was used for verifying. To verify the performance of the proposed model, several comparative model experiments such as MDA, MLOGIT, CBR, ANN and MSVM were conducted. MSVM has adopted the One-Against-One (OAO) approach, which is known as the most accurate approach among the various MSVM approaches. Although there are some limitations, the final experimental results demonstrate that the proposed model, GA-MSVM, performs at a significantly higher level than all comparative models.