• Title/Summary/Keyword: Multivariate statistical models

Search Result 126, Processing Time 0.037 seconds

An approach for simultaneous determination for geographical origins of Korean Panax ginseng by UPLC-QTOF/MS coupled with OPLS-DA models

  • Song, Hyuk-Hwan;Kim, Doo-Young;Woo, Soyeun;Lee, Hyeong-Kyu;Oh, Sei-Ryang
    • Journal of Ginseng Research
    • /
    • v.37 no.3
    • /
    • pp.341-348
    • /
    • 2013
  • Identification of the origins of Panax ginseng has been issued in Korea scientifically and economically. We describe a metabolomics approach used for discrimination and prediction of ginseng roots from different origins in Korea. The fresh ginseng roots from six ginseng cooperative associations (Gangwon, Gaeseong, Punggi, Chungbuk, Jeonbuk, and Anseong) were analyzed by UPLC-MS-based approach combined with orthogonal projections to latent structure-discriminant analysis multivariate analysis. The ginsengs from Gangwon and Gaeseong were easily differentiated. We further analyzed the metabolomics results in subgroups. Punggi, Chungbuk, Jeonbuk, and Anseong ginseng could be easily differentiated by the first two orthogonal components. As a validation of the discrimination model, we performed blind prediction tests of sample origins using an external test set. Our model predicted their geographical origins as 99.7% probability. The robust discriminatory power and statistical validity of our method suggest its general applicability for determining the origins of P. ginseng samples.

Value at Risk calculation using sparse vine copula models (성근 바인 코풀라 모형을 이용한 고차원 금융 자료의 VaR 추정)

  • An, Kwangjoon;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.875-887
    • /
    • 2021
  • Value at Risk (VaR) is the most popular measure for market risk. In this paper, we consider the VaR estimation of portfolio consisting of a variety of assets based on multivariate copula model known as vine copula. In particular, sparse vine copula which penalizes too many parameters is considered. We show in the simulation study that sparsity indeed improves out-of-sample forecasting of VaR. Empirical analysis on 60 KOSPI stocks during the last 5 years also demonstrates that sparse vine copula outperforms regular copula model.

Optimize rainfall prediction utilize multivariate time series, seasonal adjustment and Stacked Long short term memory

  • Nguyen, Thi Huong;Kwon, Yoon Jeong;Yoo, Je-Ho;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.373-373
    • /
    • 2021
  • Rainfall forecasting is an important issue that is applied in many areas, such as agriculture, flood warning, and water resources management. In this context, this study proposed a statistical and machine learning-based forecasting model for monthly rainfall. The Bayesian Gaussian process was chosen to optimize the hyperparameters of the Stacked Long Short-term memory (SLSTM) model. The proposed SLSTM model was applied for predicting monthly precipitation of Seoul station, South Korea. Data were retrieved from the Korea Meteorological Administration (KMA) in the period between 1960 and 2019. Four schemes were examined in this study: (i) prediction with only rainfall; (ii) with deseasonalized rainfall; (iii) with rainfall and minimum temperature; (iv) with deseasonalized rainfall and minimum temperature. The error of predicted rainfall based on the root mean squared error (RMSE), 16-17 mm, is relatively small compared with the average monthly rainfall at Seoul station is 117mm. The results showed scheme (iv) gives the best prediction result. Therefore, this approach is more straightforward than the hydrological and hydraulic models, which request much more input data. The result indicated that a deep learning network could be applied successfully in the hydrology field. Overall, the proposed method is promising, given a good solution for rainfall prediction.

  • PDF

Study on Vacuum Pump Monitoring Using MPCA Statistical Method (MPCA 기반의 통계기법을 이용한 진공펌프 상태진단에 관한 연구)

  • Sung D.;Kim J.;Jung W.;Lee S.;Cheung W.;Lim J.;Chung K.
    • Journal of the Korean Vacuum Society
    • /
    • v.15 no.4
    • /
    • pp.338-346
    • /
    • 2006
  • In semiconductor process, it is so hard to predict an exact failure point of the vacuum pump due to its harsh operation conditions and nonlinear properties, which may causes many problems, such as production of inferior goods or waste of unnecessary materials. Therefore it is very urgent and serious problem to develop diagnostic models which can monitor the operation conditions appropriately and recognize the failure point exactly, indicating when to replace the vacuum pump. In this study, many influencing factors are totally considered and eventually the monitoring model using multivariate statistical methods is suggested. The pivotal algorithms are Multiway Principal Component Analysis(MPCA), Dynamic Time Warping Algorithm(DTW Algorithm), etc.

Distribution Characteristics of PM10 and Heavy Metals in Ambient Air of Gyeonggi-do Area using Statistical Analysis (통계분석을 이용한 경기도 대기 중 미세먼지 및 중금속 분포 특성)

  • Kim, Jong Soo;Hong, Soon Mo;Kim, Myoung Sook;Kim, Yo Yong;Shin, Eun Sang
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.30 no.3
    • /
    • pp.281-290
    • /
    • 2014
  • This study was conducted to evaluate the distribution characteristics of $PM_{10}$ and heavy metals concentrations in the ambient air of Gyeonggi-do area by region and season from February, 2013 to March, 2014. The regression model for the prediction of formation characteristics and contamination degree of $PM_{10}$ and heavy metals by correlation analysis and regression analysis for using the multivariate statistical analysis was also established. The main wind direction during the investigation period was South East (SE) and West South West (WSW) winds, and the concentration of $SO_2$ at Ansan with industrial region showed 1.6 times higher than Suwon, Euiwang with residential region. The concentrations (median) of Pb, Cu and Ni at Ansan showed 3.2~4.5, 1.9~2.2 and 1.7~2.6 times respectively higher than those at Suwon. By the seasonal concentration variation, the concentrations of $PM_{10}$, Pb, Fe and As in winter and spring (December to May) showed 1.7, 1.9, 1.9 and 2.7 times respectively higher than those in summer and fall (June to November). As, Fe and $PM_{10}$ had a big difference by the seasonal factors, and Cu and Ni were evaluated to be influenced by the regional factors. From the results of correlation analysis among the target items, the correlation coefficient of PM and Mn had 0.82 (p/0.01) and that of Fe and Mn had 0.82 (p/0.01), which showed high correlation. And the correlation coefficients for $SO_2$ and Pb, CO and $PM_{10}$ were 0.66 (p/0.01) and 0.62 (p/0.01) respectively. The multiple linear regression models for $PM_{10}$, Pb, Cu, Cr, As, Ni, Fe and Mn were established by independent variables of CO, $SO_2$ and meteorological factors (wind speed, relative humidity). In the regression models, independent variable $SO_2$ was in cause-and-effect relationship with all dependent variables, and $PM_{10}$, Fe and Mn were influenced by CO and wind speed, and Pb, Cu, Ni and As had a main factor of $SO_2$.

Variable Selection for Multi-Purpose Multivariate Data Analysis (다목적 다변량 자료분석을 위한 변수선택)

  • Huh, Myung-Hoe;Lim, Yong-Bin;Lee, Yong-Goo
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.1
    • /
    • pp.141-149
    • /
    • 2008
  • Recently we frequently analyze multivariate data with quite large number of variables. In such data sets, virtually duplicated variables may exist simultaneously even though they are conceptually distinguishable. Duplicate variables may cause problems such as the distortion of principal axes in principal component analysis and factor analysis and the distortion of the distances between observations, i.e. the input for cluster analysis. Also in supervised learning or regression analysis, duplicated explanatory variables often cause the instability of fitted models. Since real data analyses are aimed often at multiple purposes, it is necessary to reduce the number of variables to a parsimonious level. The aim of this paper is to propose a practical algorithm for selection of a subset of variables from a given set of p input variables, by the criterion of minimum trace of partial variances of unselected variables unexplained by selected variables. The usefulness of proposed method is demonstrated in visualizing the relationship between selected and unselected variables, in building a predictive model with very large number of independent variables, and in reducing the number of variables and purging/merging categories in categorical data.

A Development of Hotel Bankruptcy Prediction Model on Artificial Neural Network (인공신경망 기반 호텔 부도예측모형 개발)

  • Choi, Sung-Ju;Lee, Sang-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.10
    • /
    • pp.125-133
    • /
    • 2014
  • This paper develops a bankruptcy prediction model on an Artificial Neural Network for hotel management. A bankruptcy prediction model has a specific feature to predict a bankruptcy of the whole hotel business after evaluate bankruptcy possibility on the basis of business performance data of each branch. here are many traditional statistical models for bankruptcy prediction such as Multivariate Discriminant Analysis or Logit Analysis. However, we chose Artificial Neural Network because the method has accuracy rates of prediction better than those of other methods. We first selected 100 good enterprises and 100 bankrupt enterprises as experimental data and set up a bankruptcy prediction model by use of a tool for Artificial Neural Network, NeuroShell. The model and its experiments, which demonstrated high efficiency, can certainly provide great help in decision making in the field of hotel management and in deciding on the bankruptcy or financial solidity of each branch of serviced residence hotel.

Analysis of Climate Effects on Italian Ryegrass Yield via Structural Equation Model (구조방정식 모형을 이용한 이탈리안 라이그라스 생산량에 대한 기후요인의 연구)

  • Kim, Moonju;Sung, Kyung-Il;Kim, Young-Ju
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1187-1196
    • /
    • 2014
  • Italian Ryegrass (IRG), which is known as high yielding and the highest quality winter annual forage crop, is grown in mid-south area in Korea. This study aims to analyze the cause-and-effect relationship between IRG yield and climate variables such as temperature and precipitation by using IRG data and climate data of Korea Weather Bureau. From path analysis of structural equation model under multivariate normality, we found that there was a weather effect on IRG yield that the winter grass IRG yield was directly affected by spring temperature and indirectly affected by spring rainfall. These results showed that IRG can be sown in early spring in the area where it is hard to prepare for winter due to low temperature. This paper can contribute to increase IRG yield by showing the cause-and-effect relationship and this study can be extended to various structural equation models for other crops.

Determinants of Poor Self-rated Health in Korean Adults With Diabetes

  • Lee, Hwi-Won;Song, Minkyo;Yang, Jae Jeong;Kang, Daehee
    • Journal of Preventive Medicine and Public Health
    • /
    • v.48 no.6
    • /
    • pp.287-300
    • /
    • 2015
  • Objectives: Self-rated health is a measure of perceived health widely used in epidemiological studies. Our study investigated the determinants of poor self-rated health in middle-aged Korean adults with diabetes. Methods: A cross-sectional study was conducted based on the Health Examinees Study. A total of 9759 adults aged 40 to 69 years who reported having physician-diagnosed diabetes were analyzed with regard to a range of health determinants, including sociodemographic, lifestyle, psychosocial, and physical variables, in association with self-rated health status using multivariate logistic regression models. A p-value <0.05 was considered to indicate statistical significance. Results: We found that negative psychosocial conditions, including frequent stress events and severe distress according to the psychosocial well-being index, were most strongly associated with poor self-rated health (odds ratio $[OR]_{\text{Frequent stress events}}$, 5.40; 95% confidence interval [CI], 4.63 to 6.29; $OR_{\text{Severe distress}}$, 11.08; 95% CI, 8.77 to 14.00). Moreover, younger age and being underweight or obese were shown to be associated with poor self-rated health. Physical factors relating to participants' medical history of diabetes, such as a younger age at diagnosis, a longer duration of diabetes, insulin therapy, hemoglobin A1c levels of 6.5% or more, and comorbidities, were other correlates of poor reported health. Conclusions: Our findings suggest that, in addition to medical variables, unfavorable socioeconomic factors, and adverse lifestyle behaviors, younger age, being underweight or obese, and psychosocial stress could be distinc factors in predicting negative perceived health status in Korean adults with diabetes.

Reproductive Variables and Risk of Breast Malignant and Benign Tumours in Yunnan Province, China

  • Yanhua, Che;Geater, Alan;You, Jing;Li, Li;Shaoqiang, Zhou;Chongsuvivatwong, Virasakdi;Sriplung, Hutcha
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.5
    • /
    • pp.2179-2184
    • /
    • 2012
  • Introduction and aim: To compare reproductive factor influence on patients with pathological diagnosed malignant and benign tumor in the Breast Department, The First Peoples' Hospital of Kunming in Yunnan province, China. Methods: A hospital-based case-control study was conducted on 263 breast cancer (BC) cases and 457 non-breast cancer controls from 2009 to 2011. The cases and controls information on demographics, medical history, and reproductive characteristics variables were collected using a self-administered questionnaire and routine medical records. Histology of breast cancer tissue and benign breast lesion were documented by pathology reports. Since some variables in data analysis had zero count in at least one category, binomial-response GLM using the bias-reduction method was applied to estimate OR's and their 95% confidence intervals (95% CI). To adjust for age and menopause status, a compound variable comprising age and menopausal status was retained in the statistical models. Results: multivariate model analysis revealed significant independent positive associations of BC with short menstrual cycle, old age at first live birth, never breastfeeding, history of oral contraception experience, increased number of abortion, postmenopausal status, and nulliparity. Categorised by age and menopausal status, perimenopausal women had about 3-fold and postmenopausal women had more than 5-fold increased risk of BC compared to premenopausal women. Discussion and Conclusion: This study has confirmed the significant association of BC and estrogen related risk factors of breast cancer including longer menstrual cycle, older age of first live birth, never breastfeeding, nulliparity, and number of abortions more than one. The findings suggest that female hormonal factors, especially the trend of menopause status play a significant role in the development of BC in Yunnan women.