• Title/Summary/Keyword: Multi-Variable Regression Method

Search Result 50, Processing Time 0.037 seconds

Analysis of multi-center bladder cancer survival data using variable-selection method of multi-level frailty models (다수준 프레일티모형 변수선택법을 이용한 다기관 방광암 생존자료분석)

  • Kim, Bohyeon;Ha, Il Do;Lee, Donghwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.499-510
    • /
    • 2016
  • It is very important to select relevant variables in regression models for survival analysis. In this paper, we introduce a penalized variable-selection procedure in multi-level frailty models based on the "frailtyHL" R package (Ha et al., 2012). Here, the estimation procedure of models is based on the penalized hierarchical likelihood, and three penalty functions (LASSO, SCAD and HL) are considered. The proposed methods are illustrated with multi-country/multi-center bladder cancer survival data from the EORTC in Belgium. We compare the results of three variable-selection methods and discuss their advantages and disadvantages. In particular, the results of data analysis showed that the SCAD and HL methods select well important variables than in the LASSO method.

Fault Detection & SPC of Batch Process using Multi-way Regression Method (다축-다변량회귀분석 기법을 이용한 회분식 공정의 이상감지 및 통계적 제어 방법)

  • Woo, Kyoung Sup;Lee, Chang Jun;Han, Kyoung Hoon;Ko, Jae Wook;Yoon, En Sup
    • Korean Chemical Engineering Research
    • /
    • v.45 no.1
    • /
    • pp.32-38
    • /
    • 2007
  • A batch Process has a multi-way data structure that consists of batch-time-variable axis, so the statistical modeling of a batch process is a difficult and challenging issue to the process engineers. In this study, We applied a statistical process control technique to the general batch process data, and implemented a fault-detection and Statistical process control system that was able to detect, identify and diagnose the fault. Semiconductor etch process and semi-batch styrene-butadiene rubber process data are used to case study. Before the modeling, we pre-processed the data using the multi-way unfolding technique to decompose the data structure. Multivariate regression techniques like support vector regression and partial least squares were used to identify the relation between the process variables and process condition. Finally, we constructed the root mean squared error chart and variable contribution chart to diagnose the faults.

MP-Lasso chart: a multi-level polar chart for visualizing group Lasso analysis of genomic data

  • Min Song;Minhyuk Lee;Taesung Park;Mira Park
    • Genomics & Informatics
    • /
    • v.20 no.4
    • /
    • pp.48.1-48.7
    • /
    • 2022
  • Penalized regression has been widely used in genome-wide association studies for joint analyses to find genetic associations. Among penalized regression models, the least absolute shrinkage and selection operator (Lasso) method effectively removes some coefficients from the model by shrinking them to zero. To handle group structures, such as genes and pathways, several modified Lasso penalties have been proposed, including group Lasso and sparse group Lasso. Group Lasso ensures sparsity at the level of pre-defined groups, eliminating unimportant groups. Sparse group Lasso performs group selection as in group Lasso, but also performs individual selection as in Lasso. While these sparse methods are useful in high-dimensional genetic studies, interpreting the results with many groups and coefficients is not straightforward. Lasso's results are often expressed as trace plots of regression coefficients. However, few studies have explored the systematic visualization of group information. In this study, we propose a multi-level polar Lasso (MP-Lasso) chart, which can effectively represent the results from group Lasso and sparse group Lasso analyses. An R package to draw MP-Lasso charts was developed. Through a real-world genetic data application, we demonstrated that our MP-Lasso chart package effectively visualizes the results of Lasso, group Lasso, and sparse group Lasso.

Latent Variable Fit to Interlaboratory Studies

  • Jeon, Gyeongbae
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.3
    • /
    • pp.885-897
    • /
    • 2000
  • The use of an unweighted mean and of separate tests is part of the current practice for analyzing interlaboratory studies, and we hope to improve on this method. We fit, using maximum likelihood(ML), a rather intricate, multi-parameter measurement model with the material's true value as a latent variable in a situation where quite serviceable regression and ANOVA calculations have already been developed. The model fit leads to both a weighted estimate of he overall mean, and to tests for equality of means, slopes and variances. Maximum likelihood tests for difference among variances poses a challenge in that the likelihood can easily becoem unbounded. Thus the major objective become to provide a useful test of variance equality.

  • PDF

Long-Term Maximum Power Demand Forecasting in Consideration of Dry Bulb Temperature (건구온파를 오인한 장기최대전력수요예측에 관한 연구)

  • 고희석;정재길
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.34 no.10
    • /
    • pp.389-398
    • /
    • 1985
  • Recently maximum power demand of our country has become to be under the great in fluence of electric cooling and air conditioning demand which are sensitive to weather conditions. This paper presents the technique and algorithm to forecast the long-term maximum power demand considering the characteristics of electric power and weather variable. By introducing a weather load model for forecasting long-term maximum power demand with the recent statistic data of power demand, annual maximum power demand is separated into two parts such as the base load component, affected little by weather, and the weather sensitive load component by means of multi-regression analysis method. And we derive the growth trend regression equations of above two components and their individual coefficients, the maximum power demand of each forecasting year can be forecasted with the sum of above two components. In this case we use the coincident dry bulb temperature as the weather variable at the occurence of one-day maximum power demand. As the growth trend regression equation we choose an exponential trend curve for the base load component, and real quadratic curve for the weather sensitive load component. The validity of the forecasting technique and algorithm proposed in this paper is proved by the case study for the present Korean power system.

  • PDF

Correlation between Welding Parameters and Detaching Drop Size using Regression (회귀 분석을 이용한 용접 변수와 이탈 액적 크기의 상호 관계)

  • 최상균;한창우;이상룡;이영문
    • Journal of Welding and Joining
    • /
    • v.20 no.1
    • /
    • pp.83-90
    • /
    • 2002
  • Metal Transfer in gas metal arc (GMA) welding is a complex phenomenon affected by many parameters of the welding conditions and material properties. In this research, the correlation equation between the welding condition and detaching droplet size and detaching velocity in GMA welding was studied via recession analysis on the results of numerical analysis using the volume-of-fluid (VOF) method. Welding parameters and material properties were grouped into three dimensionless numbers and detaching droplet size was expressed as the function of them. Second order and exponential multi-variable correlation forms were assumed, and the coefficients of these equations were calculated for globular and spray modes as well as entire transfer modes. Applying correlation equation into available experimental data, it shows good agreement.

Subset selection in multiple linear regression: An improved Tabu search

  • Bae, Jaegug;Kim, Jung-Tae;Kim, Jae-Hwan
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.2
    • /
    • pp.138-145
    • /
    • 2016
  • This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.

Development of suspended solid concentration measurement technique based on multi-spectral satellite imagery in Nakdong River using machine learning model (기계학습모형을 이용한 다분광 위성 영상 기반 낙동강 부유 물질 농도 계측 기법 개발)

  • Kwon, Siyoon;Seo, Il Won;Beak, Donghae
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.2
    • /
    • pp.121-133
    • /
    • 2021
  • Suspended Solids (SS) generated in rivers are mainly introduced from non-point pollutants or appear naturally in the water body, and are an important water quality factor that may cause long-term water pollution by being deposited. However, the conventional method of measuring the concentration of suspended solids is labor-intensive, and it is difficult to obtain a vast amount of data via point measurement. Therefore, in this study, a model for measuring the concentration of suspended solids based on remote sensing in the Nakdong River was developed using Sentinel-2 data that provides high-resolution multi-spectral satellite images. The proposed model considers the spectral bands and band ratios of various wavelength bands using a machine learning model, Support Vector Regression (SVR), to overcome the limitation of the existing remote sensing-based regression equations. The optimal combination of variables was derived using the Recursive Feature Elimination (RFE) and weight coefficients for each variable of SVR. The results show that the 705nm band belonging to the red-edge wavelength band was estimated as the most important spectral band, and the proposed SVR model produced the most accurate measurement compared with the previous regression equations. By using the RFE, the SVR model developed in this study reduces the variable dependence compared to the existing regression equations based on the single spectral band or band ratio and provides more accurate prediction of spatial distribution of suspended solids concentration.

An Improved Method for Monitoring of Soil Moisture Using NOAA-AVHRR Data

  • Fu, June;Pang, Zhiguo;Xiao, Qianguang
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.195-197
    • /
    • 2003
  • Soil moisture is a crucial variable in research works of hydrology, meteorology and plant sciences. Adequate soil moisture is essential for plant growth; excesses and deficits of soil moisture must be considered in agricultural practices. There are already several remote sensing methods used for monitoring soil moisture, such as thermal inertia, vegetation water-supplying index, crop water stress index and multi-factor regression. In this paper, an improved method has been discussed which is based on the thermal inertia. We analyzed the problems of monitoring soil moisture using satellites at first, and then put forward an simplified method which directly uses land surface temperature differences to measure soil moisture. Also we have taken the influence of vegetation into account, and import NDVI into the model. The method was used in the study of soil moisture in Heilongjiang Province, China, and we draw the conclusion by the experiments that the model can evidently increase the precision of monitoring soil moisture.

  • PDF