• Title/Summary/Keyword: Multivariate Techniques

Search Result 214, Processing Time 0.02 seconds

Evaluating Variable Selection Techniques for Multivariate Linear Regression (다중선형회귀모형에서의 변수선택기법 평가)

  • Ryu, Nahyeon;Kim, Hyungseok;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.5
    • /
    • pp.314-326
    • /
    • 2016
  • The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

Evaluation of the Geum River by Multivariate Analysis: Principal Component Analysis and Factor Analysis (다변량분석법을 이용한 금강 유역의 수질오염특성 연구)

  • Kim, Mi-Ah;Lee, Jae-kwan;Zoh, Kyung-Duk
    • Journal of Korean Society on Water Environment
    • /
    • v.23 no.1
    • /
    • pp.161-168
    • /
    • 2007
  • The main aim of this work is focus on the Geum river water quality evaluation of pollution data obtained by monitoring measurement during the period 2001-2005. The complex data matrix 19 (entire monitoring stations)*13 (parameters), 60 (month)*13 (parameters) and 20 (season)*13 (parameters) were treated with different multivariate techniques such as factor analysis/principal component analysis (FA/PCA). FA/PCA identified two factor (19*13) classified pollutant Loading factor (BOD, COD, pH, Cond, T-N, T-P, $NH_3$-N, $NO_3$-N, $PO_4$-P, Chl-a), seasonal factor (water temp, SS) and three Factor (60*13, 20*13) classified pollutant Loading factor (BOD, COD, Cond, T-N, T-P, $NH_3$-N, $NO_3$-N, $PO_4$-P), seasonal factor (water temp, SS) and metabolic factor (Chl-a, pH). Loadings of pollutant factor is potent influence main factor in the Geum river which is explained by loadings of pollutant factor at whole sampling stations (71.16%), month (52.75%) and season (56.57%) of main water quality stations. Result of this study is that pollutant loading factor is affected at Gongju 1, 2, Buyeo 1, 2, Gangkyeong, Yeongi stations by entire stations and entire month (Gongju 1, Cheongwon stations), April, May, July and August (buyeo 1) by month. Also the pollutant Loading factor is season gives an influence in winter (Gongju 1, buyeo 1) from main sampling stations, but Cheongwon characteristic is non-seasonal influenced. This study presents necessity and usefulness of multivariate statistic techniques for evaluation and interpretation of large complex data set with a view to get better information data effective management of water sources.

Performance Analysis of Volatility Models for Estimating Portfolio Value at Risk (포트폴리오 VaR 측정을 위한 변동성 모형의 성과분석)

  • Yeo, Sung Chil;Li, Zhaojing
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.3
    • /
    • pp.541-559
    • /
    • 2015
  • VaR is now widely used as an important tool to evaluate and manage financial risks. In particular, it is important to select an appropriate volatility model for the rate of return of financial assets. In this study, both univariate and multivariate models are considered to evaluate VaR of the portfolio composed of KOSPI, Hang-Seng, Nikkei indexes, and their performances are compared through back testing techniques. Overall, multivariate models are shown to be more appropriate than univariate models to estimate the portfolio VaR, in particular DCC and ADCC models are shown to be more superior than others.

Bearing fault detection through multiscale wavelet scalogram-based SPC

  • Jung, Uk;Koh, Bong-Hwan
    • Smart Structures and Systems
    • /
    • v.14 no.3
    • /
    • pp.377-395
    • /
    • 2014
  • Vibration-based fault detection and condition monitoring of rotating machinery, using statistical process control (SPC) combined with statistical pattern recognition methodology, has been widely investigated by many researchers. In particular, the discrete wavelet transform (DWT) is considered as a powerful tool for feature extraction in detecting fault on rotating machinery. Although DWT significantly reduces the dimensionality of the data, the number of retained wavelet features can still be significantly large. Then, the use of standard multivariate SPC techniques is not advised, because the sample covariance matrix is likely to be singular, so that the common multivariate statistics cannot be calculated. Even though many feature-based SPC methods have been introduced to tackle this deficiency, most methods require a parametric distributional assumption that restricts their feasibility to specific problems of process control, and thus limit their application. This study proposes a nonparametric multivariate control chart method, based on multiscale wavelet scalogram (MWS) features, that overcomes the limitation posed by the parametric assumption in existing SPC methods. The presented approach takes advantage of multi-resolution analysis using DWT, and obtains MWS features with significantly low dimensionality. We calculate Hotelling's $T^2$-type monitoring statistic using MWS, which has enough damage-discrimination ability. A bootstrap approach is used to determine the upper control limit of the monitoring statistic, without any distributional assumption. Numerical simulations demonstrate the performance of the proposed control charting method, under various damage-level scenarios for a bearing system.

Analysis of biodiesel quality based on infrared spectroscopy and multivariate statistics (적외선 분광분석과 다변량 통계에 기반한 바이오디젤 품질분석)

  • Kim, Hye-Sil;Cho, Hyun-Woo;Liu, J. Jay
    • Analytical Science and Technology
    • /
    • v.25 no.4
    • /
    • pp.214-222
    • /
    • 2012
  • ASTM (American Society for Testing and Materials) D6751-10 suggests analytical methods as well as specifications for biodiesel quality. However, it is expensive and time-consuming to follow the ASTM testing methods to analyze biodiesel and various impurities. This paper develops a quantitative analysis system for biodiesel and impurities based on Infrared spectroscopy and a multivariate statistical method, PLS (partial least squares). In addition, four different pre-processing techniques were compared for spectrum correction and noise reduction. Savitzky-Golay pre-processing showed the best performance.

Profiling Program Behavior with X2 distance-based Multivariate Analysis for Intrusion Detection (침입탐지를 위한 X2 거리기반 다변량 분석기법을 이용한 프로그램 행위 프로파일링)

  • Kim, Chong-Il;Kim, Yong-Min;Seo, Jae-Hyeon;Noh, Bong-Nam
    • The KIPS Transactions:PartC
    • /
    • v.10C no.4
    • /
    • pp.397-404
    • /
    • 2003
  • Intrusion detection techniques based on program behavior can detect potential intrusions against systems by analyzing system calls made by demon programs or root-privileged programs and building program profiles. But there is a drawback : large profiles must be built for each program. In this paper, we apply $X^2$ distance-based multivariate analysis to profiling program behavior and detecting abnormal behavior in order to reduce profiles. Experiment results show that profiles are relatively small and the detection rate is significant.

MBRDR: R-package for response dimension reduction in multivariate regression

  • Heesung Ahn;Jae Keun Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.179-189
    • /
    • 2024
  • In multivariate regression with a high-dimensional response Y ∈ ℝr and a relatively low-dimensional predictor X ∈ ℝp (where r ≥ 2), the statistical analysis of such data presents significant challenges due to the exponential increase in the number of parameters as the dimension of the response grows. Most existing dimension reduction techniques primarily focus on reducing the dimension of the predictors (X), not the dimension of the response variable (Y). Yoo and Cook (2008) introduced a response dimension reduction method that preserves information about the conditional mean E(Y | X). Building upon this foundational work, Yoo (2018) proposed two semi-parametric methods, principal response reduction (PRR) and principal fitted response reduction (PFRR), then expanded these methods to unstructured principal fitted response reduction (UPFRR) (Yoo, 2019). This paper reviews these four response dimension reduction methodologies mentioned above. In addition, it introduces the implementation of the mbrdr package in R. The mbrdr is a unique tool in the R community, as it is specifically designed for response dimension reduction, setting it apart from existing dimension reduction packages that focus solely on predictors.

Application of Some Multivariate Analysis Techniques to Coppice Growth Measures (다변량분석방법(多変量分析方法)에 의한 맹아생장(萌芽生長) 자료(資料) 분석(分析))

  • Lee, Don Koo
    • Journal of Korean Society of Forest Science
    • /
    • v.50 no.1
    • /
    • pp.45-48
    • /
    • 1980
  • Multivariate analysis methods were used to examine the relationships between top and bottom growth variables of hybrid poplars after coppicing and to discriminate between clones in coppice growth potential. Strong and linear relationship was exhibited between top and bottom growth variables. Clone 5328 was different from the other clones and the best among the clones in coppice growth potential.

  • PDF

Empirical Bayes Posterior Odds Ratio for Heteroscedastic Classification

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.16 no.2
    • /
    • pp.92-101
    • /
    • 1987
  • Our interest is to access in some way teh relative odds or probability that a multivariate observation Z belongs to one of k multivariate normal populations with unequal covariance matrices. We derived the empirical Bayes posterior odds ratio for the classification rule when population parameters are unknown. It is a generalization of the posterior odds ratio suggested by Gelsser (1964). The classification rule does not have complicated distribution theory which a large variety of techniques from the sampling viewpoint have. The proposed posterior odds ratio is compared to the Gelsser's posterior odds ratio through a Monte Carlo study. The results show that the empiricla Bayes posterior odds ratio, in general, performs better than the Gelsser's. Especially, for large dimension of Z and small training sample, the performance is prominent.

  • PDF

Effect of Dimension Reduction on Prediction Performance of Multivariate Nonlinear Time Series

  • Jeong, Jun-Yong;Kim, Jun-Seong;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.14 no.3
    • /
    • pp.312-317
    • /
    • 2015
  • The dynamic system approach in time series has been used in many real problems. Based on Taken's embedding theorem, we can build the predictive function where input is the time delay coordinates vector which consists of the lagged values of the observed series and output is the future values of the observed series. Although the time delay coordinates vector from multivariate time series brings more information than the one from univariate time series, it can exhibit statistical redundancy which disturbs the performance of the prediction function. We apply dimension reduction techniques to solve this problem and analyze the effect of this approach for prediction. Our experiment uses delayed Lorenz series; least squares support vector regression approximates the predictive function. The result shows that linearly preserving projection improves the prediction performance.