• Title/Summary/Keyword: Multivariate statistical models

Search Result 126, Processing Time 0.027 seconds

Using machine learning to forecast and assess the uncertainty in the response of a typical PWR undergoing a steam generator tube rupture accident

  • Tran Canh Hai Nguyen ;Aya Diab
    • Nuclear Engineering and Technology
    • /
    • v.55 no.9
    • /
    • pp.3423-3440
    • /
    • 2023
  • In this work, a multivariate time-series machine learning meta-model is developed to predict the transient response of a typical nuclear power plant (NPP) undergoing a steam generator tube rupture (SGTR). The model employs Recurrent Neural Networks (RNNs), including the Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and a hybrid CNN-LSTM model. To address the uncertainty inherent in such predictions, a Bayesian Neural Network (BNN) was implemented. The models were trained using a database generated by the Best Estimate Plus Uncertainty (BEPU) methodology; coupling the thermal hydraulics code, RELAP5/SCDAP/MOD3.4 to the statistical tool, DAKOTA, to predict the variation in system response under various operational and phenomenological uncertainties. The RNN models successfully captures the underlying characteristics of the data with reasonable accuracy, and the BNN-LSTM approach offers an additional layer of insight into the level of uncertainty associated with the predictions. The results demonstrate that LSTM outperforms GRU, while the hybrid CNN-LSTM model is computationally the most efficient. This study aims to gain a better understanding of the capabilities and limitations of machine learning models in the context of nuclear safety. By expanding the application of ML models to more severe accident scenarios, where operators are under extreme stress and prone to errors, ML models can provide valuable support and act as expert systems to assist in decision-making while minimizing the chances of human error.

Analysis on Correlation between AE Parameters and Stress Intensity Factor using Principal Component Regression and Artificial Neural Network (주성분 회귀분석 및 인공신경망을 이용한 AE변수와 응력확대계수와의 상관관계 해석)

  • Kim, Ki-Bok;Yoon, Dong-Jin;Jeong, Jung-Chae;Park, Phi-Iip;Lee, Seung-Seok
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.21 no.1
    • /
    • pp.80-90
    • /
    • 2001
  • The aim of this study is to develop the methodology which enables to identify the mechanical properties of element such as stress intensity factor by using the AE parameters. Considering the multivariate and nonlinear properties of AE parameters such as ringdown count, rise time, energy, event duration and peak amplitude from fatigue cracks of machine element the principal component regression(PCR) and artificial neural network(ANN) models for the estimation of stress intensity factor were developed and validated. The AE parameters were found to be very significant to estimate the stress intensity factor. Since the statistical values including correlation coefficients, standard mr of calibration, standard error of prediction and bias were stable, the PCR and ANN models for stress intensity factor were very robust. The performance of ANN model for unknown data of stress intensity factor was better than that of PCR model.

  • PDF

Operation Modes Classification of Chemical Processes for History Data-Based Fault Diagnosis Methods (데이터 기반 이상진단법을 위한 화학공정의 조업모드 판별)

  • Lee, Chang Jun;Ko, Jae Wook;Lee, Gibaek
    • Korean Chemical Engineering Research
    • /
    • v.46 no.2
    • /
    • pp.383-388
    • /
    • 2008
  • The safe and efficient operation of the chemical processes has become one of the primary concerns of chemical companies, and a variety of fault diagnosis methods have been developed to diagnose faults when abnormal situations arise. Recently, many research efforts have focused on fault diagnosis methods based on quantitative history data-based methods such as statistical models. However, when the history data-based models trained with the data obtained on an operation mode are applied to another operating condition, the models can make continuous wrong diagnosis, and have limits to be applied to real chemical processes with various operation modes. In order to classify operation modes of chemical processes, this study considers three multivariate models of Euclidean distance, FDA (Fisher's Discriminant Analysis), and PCA (principal component analysis), and integrates them with process dynamics to lead dynamic Euclidean distance, dynamic FDA, and dynamic PCA. A case study of the TE (Tennessee Eastman) process having six operation modes illustrates the conclusion that dynamic PCA model shows the best classification performance.

Plant breeding in the 21st century: Molecular breeding and high throughput phenotyping

  • Sorrells, Mark E.
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2017.06a
    • /
    • pp.14-14
    • /
    • 2017
  • The discipline of plant breeding is experiencing a renaissance impacting crop improvement as a result of new technologies, however fundamental questions remain for predicting the phenotype and how the environment and genetics shape it. Inexpensive DNA sequencing, genotyping, new statistical methods, high throughput phenotyping and gene-editing are revolutionizing breeding methods and strategies for improving both quantitative and qualitative traits. Genomic selection (GS) models use genome-wide markers to predict performance for both phenotyped and non-phenotyped individuals. Aerial and ground imaging systems generate data on correlated traits such as canopy temperature and normalized difference vegetative index that can be combined with genotypes in multivariate models to further increase prediction accuracy and reduce the cost of advanced trials with limited replication in time and space. Design of a GS training population is crucial to the accuracy of prediction models and can be affected by many factors including population structure and composition. Prediction models can incorporate performance over multiple environments and assess GxE effects to identify a highly predictive subset of environments. We have developed a methodology for analyzing unbalanced datasets using genome-wide marker effects to group environments and identify outlier environments. Environmental covariates can be identified using a crop model and used in a GS model to predict GxE in unobserved environments and to predict performance in climate change scenarios. These new tools and knowledge challenge the plant breeder to ask the right questions and choose the tools that are appropriate for their crop and target traits. Contemporary plant breeding requires teams of people with expertise in genetics, phenotyping and statistics to improve efficiency and increase prediction accuracy in terms of genotypes, experimental design and environment sampling.

  • PDF

A Meta Analysis of Using Structural Equation Model on the Korean MIS Research (국내 MIS 연구에서 구조방정식모형 활용에 관한 메타분석)

  • Kim, Jong-Ki;Jeon, Jin-Hwan
    • Asia pacific journal of information systems
    • /
    • v.19 no.4
    • /
    • pp.47-75
    • /
    • 2009
  • Recently, researches on Management Information Systems (MIS) have laid out theoretical foundation and academic paradigms by introducing diverse theories, themes, and methodologies. Especially, academic paradigms of MIS encourage a user-friendly approach by developing the technologies from the users' perspectives, which reflects the existence of strong causal relationships between information systems and user's behavior. As in other areas in social science the use of structural equation modeling (SEM) has rapidly increased in recent years especially in the MIS area. The SEM technique is important because it provides powerful ways to address key IS research problems. It also has a unique ability to simultaneously examine a series of casual relationships while analyzing multiple independent and dependent variables all at the same time. In spite of providing many benefits to the MIS researchers, there are some potential pitfalls with the analytical technique. The research objective of this study is to provide some guidelines for an appropriate use of SEM based on the assessment of current practice of using SEM in the MIS research. This study focuses on several statistical issues related to the use of SEM in the MIS research. Selected articles are assessed in three parts through the meta analysis. The first part is related to the initial specification of theoretical model of interest. The second is about data screening prior to model estimation and testing. And the last part concerns estimation and testing of theoretical models based on empirical data. This study reviewed the use of SEM in 164 empirical research articles published in four major MIS journals in Korea (APJIS, ISR, JIS and JITAM) from 1991 to 2007. APJIS, ISR, JIS and JITAM accounted for 73, 17, 58, and 16 of the total number of applications, respectively. The number of published applications has been increased over time. LISREL was the most frequently used SEM software among MIS researchers (97 studies (59.15%)), followed by AMOS (45 studies (27.44%)). In the first part, regarding issues related to the initial specification of theoretical model of interest, all of the studies have used cross-sectional data. The studies that use cross-sectional data may be able to better explain their structural model as a set of relationships. Most of SEM studies, meanwhile, have employed. confirmatory-type analysis (146 articles (89%)). For the model specification issue about model formulation, 159 (96.9%) of the studies were the full structural equation model. For only 5 researches, SEM was used for the measurement model with a set of observed variables. The average sample size for all models was 365.41, with some models retaining a sample as small as 50 and as large as 500. The second part of the issue is related to data screening prior to model estimation and testing. Data screening is important for researchers particularly in defining how they deal with missing values. Overall, discussion of data screening was reported in 118 (71.95%) of the studies while there was no study discussing evidence of multivariate normality for the models. On the third part, issues related to the estimation and testing of theoretical models on empirical data, assessing model fit is one of most important issues because it provides adequate statistical power for research models. There were multiple fit indices used in the SEM applications. The test was reported in the most of studies (146 (89%)), whereas normed-test was reported less frequently (65 studies (39.64%)). It is important that normed- of 3 or lower is required for adequate model fit. The most popular model fit indices were GFI (109 (66.46%)), AGFI (84 (51.22%)), NFI (44 (47.56%)), RMR (42 (25.61%)), CFI (59 (35.98%)), RMSEA (62 (37.80)), and NNFI (48 (29.27%)). Regarding the test of construct validity, convergent validity has been examined in 109 studies (66.46%) and discriminant validity in 98 (59.76%). 81 studies (49.39%) have reported the average variance extracted (AVE). However, there was little discussion of direct (47 (28.66%)), indirect, and total effect in the SEM models. Based on these findings, we suggest general guidelines for the use of SEM and propose some recommendations on concerning issues of latent variables models, raw data, sample size, data screening, reporting parameter estimated, model fit statistics, multivariate normality, confirmatory factor analysis, reliabilities and the decomposition of effects.

Continuity Simulation and Trend Analysis of Water Qualities in Incoming Flows to Lake Paldang by Log Linear Models (로그선형모델을 이용한 팔당호 유입지류 수질의 연속성 시뮬레이션과 경향 분석)

  • Na, Eun-Hye;Park, Seok-Soon
    • Korean Journal of Ecology and Environment
    • /
    • v.36 no.3 s.104
    • /
    • pp.336-343
    • /
    • 2003
  • Two types of statistical models, simple and multivariate log linear models, were studied for continuity simulation and trend analysis of water qualities in incoming flows to Lake Paldang. Water quality is a function of one independent variable (flow) in the simple log linear model, and of three different variables (flow, time, and seasonal cycle) in multivariate model. The independent variables act as surrogate variables of water quality in both models. The model coefficients were determined by the monthly data. The water qualities included 5-day Biochemical Oxygen Demand ($BOD_5$), Total Nitrogen (TN), and Total Phosphorus (TP) measured from 1995 to 2000 in the South and the North branches of Han River and the Kyoungan Stream. The results indicated that the multivariate model provided better agreements with field measurements than the simple one in a31 attempted cases. Flow dependency, seasonality, and temporal trends of water quality were tested on the determined coefficients of the multivariate model. The test of flow dependency indicated that BOD concentrations decreased as the water flow increased. In TN and TP concentrations, however, there were no discernible flow effects. From the temporal trend analyses, the following results were obtained: 1) no trends on BOD at all three upstreams, 2) increase on TN at the South Branch and the Kyoungan Stream, 3)decrease on TN at the North Branch,4) no trends on TP at the North and the South Branches and 5) increase on TP at the Kyoungan Stream by 3 to 8% per years. The seasonality test showed that there were significant seasonal variations in all three water qualities at three incoming flows.

The fGARCH(1, 1) as a functional volatility measure of ultra high frequency time series (함수적 변동성 fGARCH(1, 1)모형을 통한 초고빈도 시계열 변동성)

  • Yoon, J.E.;Kim, Jong-Min;Hwang, S.Y.
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.5
    • /
    • pp.667-675
    • /
    • 2018
  • When a financial time series consists of daily (closing) returns, traditional volatility models such as autoregressive conditional heteroskedasticity (ARCH) and generalized ARCH (GARCH) are useful to figure out daily volatilities. With high frequency returns in a day, one may adopt various multivariate GARCH techniques (MGARCH) (Tsay, Multivariate Time Series Analysis With R and Financial Application, John Wiley, 2014) to obtain intraday volatilities as long as the high frequency is moderate. When it comes to the ultra high frequency (UHF) case (e.g., one minute prices are available everyday), a new model needs to be developed to suit UHF time series in order to figure out continuous time intraday-volatilities. Aue et al. (Journal of Time Series Analysis, 38, 3-21; 2017) proposed functional GARCH (fGARCH) to analyze functional volatilities based on UHF data. This article introduces fGARCH to the readers and illustrates how to estimate fGARCH equations using UHF data of KOSPI and Hyundai motor company.

Generalized Linear Mixed Model for Multivariate Multilevel Binomial Data (다변량 다수준 이항자료에 대한 일반화선형혼합모형)

  • Lim, Hwa-Kyung;Song, Seuck-Heun;Song, Ju-Won;Cheon, Soo-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.6
    • /
    • pp.923-932
    • /
    • 2008
  • We are likely to face complex multivariate data which can be characterized by having a non-trivial correlation structure. For instance, omitted covariates may simultaneously affect more than one count in clustered data; hence, the modeling of the correlation structure is important for the efficiency of the estimator and the computation of correct standard errors, i.e., valid inference. A standard way to insert dependence among counts is to assume that they share some common unobservable variables. For this assumption, we fitted correlated random effect models considering multilevel model. Estimation was carried out by adopting the semiparametric approach through a finite mixture EM algorithm without parametric assumptions upon the random coefficients distribution.

Wild bootstrap Ljung-Box test for autocorrelation in vector autoregressive and error correction models (벡터자기회귀모형과 오차수정모형의 자기상관성을 위한 와일드 붓스트랩 Ljung-Box 검정)

  • Lee, Myeongwoo;Lee, Taewook
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.1
    • /
    • pp.61-73
    • /
    • 2016
  • We consider the wild bootstrap Ljung-Box (LB) test for autocorrelation in residuals of fitted multivariate time series models. The asymptotic chi-square distribution under the IID assumption is traditionally used for the LB test; however, size distortion tends to occur in the usage of the LB test, due to the conditional heteroskedasticity of financial time series. In order to overcome such defects, we propose the wild bootstrap LB test for autocorrelation in residuals of fitted vector autoregressive and error correction models. The simulation study and real data analysis are conducted for finite sample performance.

Sampling Strategies for Computer Experiments: Design and Analysis

  • Lin, Dennis K.J.;Simpson, Timothy W.;Chen, Wei
    • International Journal of Reliability and Applications
    • /
    • v.2 no.3
    • /
    • pp.209-240
    • /
    • 2001
  • Computer-based simulation and analysis is used extensively in engineering for a variety of tasks. Despite the steady and continuing growth of computing power and speed, the computational cost of complex high-fidelity engineering analyses and simulations limit their use in important areas like design optimization and reliability analysis. Statistical approximation techniques such as design of experiments and response surface methodology are becoming widely used in engineering to minimize the computational expense of running such computer analyses and circumvent many of these limitations. In this paper, we compare and contrast five experimental design types and four approximation model types in terms of their capability to generate accurate approximations for two engineering applications with typical engineering behaviors and a wide range of nonlinearity. The first example involves the analysis of a two-member frame that has three input variables and three responses of interest. The second example simulates the roll-over potential of a semi-tractor-trailer for different combinations of input variables and braking and steering levels. Detailed error analysis reveals that uniform designs provide good sampling for generating accurate approximations using different sample sizes while kriging models provide accurate approximations that are robust for use with a variety of experimental designs and sample sizes.

  • PDF