• Title/Summary/Keyword: Multivariate Data

Search Result 1,996, Processing Time 0.024 seconds

Principal selected response reduction in multivariate regression (다변량회귀에서 주선택 반응변수 차원축소)

  • Yoo, Jae Keun
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.4
    • /
    • pp.659-669
    • /
    • 2021
  • Multivariate regression often appears in longitudinal or functional data analysis. Since multivariate regression involves multi-dimensional response variables, it is more strongly affected by the so-called curse of dimension that univariate regression. To overcome this issue, Yoo (2018) and Yoo (2019a) proposed three model-based response dimension reduction methodologies. According to various numerical studies in Yoo (2019a), the default method suggested in Yoo (2019a) is least sensitive to the simulated models, but it is not the best one. To release this issue, the paper proposes an selection algorithm by comparing the other two methods with the default one. This approach is called principal selected response reduction. Various simulation studies show that the proposed method provides more accurate estimation results than the default one by Yoo (2019a), and it confirms practical and empirical usefulness of the propose method over the default one by Yoo (2019a).

Fault Detection Method for Multivariate Process using ICA (독립성분분석을 이용한 다변량 공정에서의 고장탐지 방법)

  • Jung, Seunghwan;Kim, Minseok;Lee, Hansoo;Kim, Jonggeun;Kim, Sungshin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.2
    • /
    • pp.192-197
    • /
    • 2020
  • Multivariate processes, such as large scale power plants or chemical processes are operated in very hazardous environment, which can lead to significant human and material losses if a fault occurs. On-line monitoring technology, therefore, is essential to detect system faults. In this paper, the ICA-based fault detection method is conducted using three different multivariate process data. Fault detection procedure based on ICA is divided into off-line and on-line processes. The off-line process determines a threshold for fault detection by using the obtained dataset when the system is normal. And the on-line process computes statistics of query vectors measured in real-time. The fault is detected by comparing computed statistics and previously defined threshold. For comparison, the PCA-based fault detection method is also implemented in this paper. Experimental results show that the ICA-based fault detection method detects the system faults earlier and better than the PCA-based method.

Analyzing the Impact of Multivariate Inputs on Deep Learning-Based Reservoir Level Prediction and Approaches for Mid to Long-Term Forecasting (다변량 입력이 딥러닝 기반 저수율 예측에 미치는 영향 분석과 중장기 예측 방안)

  • Hyeseung Park;Jongwook Yoon;Hojun Lee;Hyunho Yang
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.199-207
    • /
    • 2024
  • Local reservoirs are crucial sources for agricultural water supply, necessitating stable water level management to prepare for extreme climate conditions such as droughts. Water level prediction is significantly influenced by local climate characteristics, such as localized rainfall, as well as seasonal factors including cropping times, making it essential to understand the correlation between input and output data as much as selecting an appropriate prediction model. In this study, extensive multivariate data from over 400 reservoirs in Jeollabuk-do from 1991 to 2022 was utilized to train and validate a water level prediction model that comprehensively reflects the complex hydrological and climatological environmental factors of each reservoir, and to analyze the impact of each input feature on the prediction performance of water levels. Instead of focusing on improvements in water level performance through neural network structures, the study adopts a basic Feedforward Neural Network composed of fully connected layers, batch normalization, dropout, and activation functions, focusing on the correlation between multivariate input data and prediction performance. Additionally, most existing studies only present short-term prediction performance on a daily basis, which is not suitable for practical environments that require medium to long-term predictions, such as 10 days or a month. Therefore, this study measured the water level prediction performance up to one month ahead through a recursive method that uses daily prediction values as the next input. The experiment identified performance changes according to the prediction period and analyzed the impact of each input feature on the overall performance based on an Ablation study.

Penalized least distance estimator in the multivariate regression model (다변량 선형회귀모형의 벌점화 최소거리추정에 관한 연구)

  • Jungmin Shin;Jongkyeong Kang;Sungwan Bang
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.1
    • /
    • pp.1-12
    • /
    • 2024
  • In many real-world data, multiple response variables are often dependent on the same set of explanatory variables. In particular, if several response variables are correlated with each other, simultaneous estimation considering the correlation between response variables might be more effective way than individual analysis by each response variable. In this multivariate regression analysis, least distance estimator (LDE) can estimate the regression coefficients simultaneously to minimize the distance between each training data and the estimates in a multidimensional Euclidean space. It provides a robustness for the outliers as well. In this paper, we examine the least distance estimation method in multivariate linear regression analysis, and furthermore, we present the penalized least distance estimator (PLDE) for efficient variable selection. The LDE technique applied with the adaptive group LASSO penalty term (AGLDE) is proposed in this study which can reflect the correlation between response variables in the model and can efficiently select variables according to the importance of explanatory variables. The validity of the proposed method was confirmed through simulations and real data analysis.

The Comparison of Singular Value Decomposition and Spectral Decomposition

  • Shin, Yang-Gyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.4
    • /
    • pp.1135-1143
    • /
    • 2007
  • The singular value decomposition and the spectral decomposition are the useful methods in the area of matrix computation for multivariate techniques such as principal component analysis and multidimensional scaling. These techniques aim to find a simpler geometric structure for the data points. The singular value decomposition and the spectral decomposition are the methods being used in these techniques for this purpose. In this paper, the singular value decomposition and the spectral decomposition are compared.

  • PDF

Evaluation of Water Quality Using Multivariate Statistic Analysis in Busan Coastal Area

  • Kim, Sang-Soo;Cho, Jang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.531-542
    • /
    • 2004
  • Principal component analysis and cluster analysis were conducted to comprehensively evaluate the water quality of Busan coastal area with the data collected seasonally by the analysis of surface water at 10 stations from 1997 to 2003. We noted that the first principal component was regarded as a factor related with the input of nutrient-rich fresh water and the second principal component as meteorological characteristics. Also we obtained that water qualities of station 4 and 9 were different from those of other stations in Busan coastal area.

  • PDF

Properties of alternative VaR for multivariate normal distributions (다변량 정규분포에서 대안적인 VaR의 특성)

  • Hong, Chong Sun;Lee, Gi Pum
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.6
    • /
    • pp.1453-1463
    • /
    • 2016
  • The most useful financial risk measure may be VaR (Value at Risk) which estimates the maximum loss amount statistically. The VaR tends to be estimated in many industries by using transformed univariate risk including variance-covariance matrix and a specific portfolio. Hong et al. (2016) are defined the Vector at Risk based on the multivariate quantile vector. When a specific portfolio is given, one point among Vector at Risk is founded as the best VaR which is called as an alternative VaR (AVaR). In this work, AVaRs have been investigated for multivariate normal distributions with many kinds of variance-covariance matrix and various portfolio weight vectors, and compared with VaRs. It has been found that the AVaR has smaller values than VaR. Some properties of AVaR are derived and discussed with these characteristics.

Decoding Brain Patterns for Colored and Grayscale Images using Multivariate Pattern Analysis

  • Zafar, Raheel;Malik, Muhammad Noman;Hayat, Huma;Malik, Aamir Saeed
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.4
    • /
    • pp.1543-1561
    • /
    • 2020
  • Taxonomy of human brain activity is a complicated rather challenging procedure. Due to its multifaceted aspects, including experiment design, stimuli selection and presentation of images other than feature extraction and selection techniques, foster its challenging nature. Although, researchers have focused various methods to create taxonomy of human brain activity, however use of multivariate pattern analysis (MVPA) for image recognition to catalog the human brain activities is scarce. Moreover, experiment design is a complex procedure and selection of image type, color and order is challenging too. Thus, this research bridge the gap by using MVPA to create taxonomy of human brain activity for different categories of images, both colored and gray scale. In this regard, experiment is conducted through EEG testing technique, with feature extraction, selection and classification approaches to collect data from prequalified criteria of 25 graduates of University Technology PETRONAS (UTP). These participants are shown both colored and gray scale images to record accuracy and reaction time. The results showed that colored images produces better end result in terms of accuracy and response time using wavelet transform, t-test and support vector machine. This research resulted that MVPA is a better approach for the analysis of EEG data as more useful information can be extracted from the brain using colored images. This research discusses a detail behavior of human brain based on the color and gray scale images for the specific and unique task. This research contributes to further improve the decoding of human brain with increased accuracy. Besides, such experiment settings can be implemented and contribute to other areas of medical, military, business, lie detection and many others.

Exploring Chemotherapy-Induced Toxicities through Multivariate Projection of Risk Factors: Prediction of Nausea and Vomiting

  • Yap, Kevin Yi-Lwern;Low, Xiu Hui;Chan, Alexandre
    • Toxicological Research
    • /
    • v.28 no.2
    • /
    • pp.81-91
    • /
    • 2012
  • Many risk factors exist for chemotherapy-induced nausea and vomiting (CINV). This study utilized a multivariate projection technique to identify which risk factors were predictive of CINV in clinical practice. A single-centre, prospective, observational study was conducted from January 2007~July 2010 in Singapore. Patients were on highly (HECs) and moderately emetogenic chemotherapies with/without radiotherapy. Patient demographics and CINV risk factors were documented. Daily recording of CINV events was done using a standardized diary. Principal component (PC) analysis was performed to identify which risk factors could differentiate patients with and without CINV. A total of 710 patients were recruited. Majority were females (67%) and Chinese (84%). Five risk factors were potential CINV predictors: histories of alcohol drinking, chemotherapy-induced nausea, chemotherapy-induced vomiting, fatigue and gender. Period (ex-/current drinkers) and frequency of drinking (social/chronic drinkers) differentiated the CINV endpoints in patients on HECs and anthracycline-based, and XELOX regimens, respectively. Fatigue interference and severity were predictive of CINV in anthracycline-based populations, while the former was predictive in HEC and XELOX populations. PC analysis is a potential technique in analyzing clinical population data, and can provide clinicians with an insight as to what predictors to look out for in the clinical assessment of CINV. We hope that our results will increase the awareness among clinician-scientists regarding the usefulness of this technique in the analysis of clinical data, so that appropriate preventive measures can be taken to improve patients' quality of life.

Survival Analysis of Patients with Breast Cancer using Weibull Parametric Model

  • Baghestani, Ahmad Reza;Moghaddam, Sahar Saeedi;Majd, Hamid Alavi;Akbari, Mohammad Esmaeil;Nafissi, Nahid;Gohari, Kimiya
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.18
    • /
    • pp.8567-8571
    • /
    • 2016
  • Background: The Cox model is known as one of the most frequently-used methods for analyzing survival data. However, in some situations parametric methods may provide better estimates. In this study, a Weibull parametric model was employed to assess possible prognostic factors that may affect the survival of patients with breast cancer. Materials and Methods: We studied 438 patients with breast cancer who visited and were treated at the Cancer Research Center in Shahid Beheshti University of Medical Sciences during 1992 to 2012; the patients were followed up until October 2014. Patients or family members were contacted via telephone calls to confirm whether they were still alive. Clinical, pathological, and biological variables as potential prognostic factors were entered in univariate and multivariate analyses. The log-rank test and the Weibull parametric model with a forward approach, respectively, were used for univariate and multivariate analyses. All analyses were performed using STATA version 11. A P-value lower than 0.05 was defined as significant. Results: On univariate analysis, age at diagnosis, level of education, type of surgery, lymph node status, tumor size, stage, histologic grade, estrogen receptor, progesterone receptor, and lymphovascular invasion had a statistically significant effect on survival time. On multivariate analysis, lymph node status, stage, histologic grade, and lymphovascular invasion were statistically significant. The one-year overall survival rate was 98%. Conclusions: Based on these data and using Weibull parametric model with a forward approach, we found out that patients with lymphovascular invasion were at 2.13 times greater risk of death due to breast cancer.