• Title/Summary/Keyword: stochastic regression model

Search Result 67, Processing Time 0.024 seconds

Water Quality Assessment and Turbidity Prediction Using Multivariate Statistical Techniques: A Case Study of the Cheurfa Dam in Northwestern Algeria

  • ADDOUCHE, Amina;RIGHI, Ali;HAMRI, Mehdi Mohamed;BENGHAREZ, Zohra;ZIZI, Zahia
    • Applied Chemistry for Engineering
    • /
    • v.33 no.6
    • /
    • pp.563-573
    • /
    • 2022
  • This work aimed to develop a new equation for turbidity (Turb) simulation and prediction using statistical methods based on principal component analysis (PCA) and multiple linear regression (MLR). For this purpose, water samples were collected monthly over a five year period from Cheurfa dam, an important reservoir in Northwestern Algeria, and analyzed for 12 parameters, including temperature (T°), pH, electrical conductivity (EC), turbidity (Turb), dissolved oxygen (DO), ammonium (NH4+), nitrate (NO3-), nitrite (NO2-), phosphate (PO43-), total suspended solids (TSS), biochemical oxygen demand (BOD5) and chemical oxygen demand (COD). The results revealed a strong mineralization of the water and low dissolved oxygen (DO) content during the summer period. High levels of TSS and Turb were recorded during rainy periods. In addition, water was charged with phosphate (PO43-) in the whole period of study. The PCA results revealed ten factors, three of which were significant (eigenvalues >1) and explained 75.5% of the total variance. The F1 and F2 factors explained 36.5% and 26.7% of the total variance, respectively and indicated anthropogenic pollution of domestic agricultural and industrial origin. The MLR turbidity simulation model exhibited a high coefficient of determination (R2 = 92.20%), indicating that 92.20% of the data variability can be explained by the model. TSS, DO, EC, NO3-, NO2-, and COD were the most significant contributing parameters (p values << 0.05) in turbidity prediction. The present study can help with decision-making on the management and monitoring of the water quality of the dam, which is the primary source of drinking water in this region.

Forecast of the Daily Inflow with Artificial Neural Network using Wavelet Transform at Chungju Dam (웨이블렛 변환을 적용한 인공신경망에 의한 충주댐 일유입량 예측)

  • Ryu, Yongjun;Shin, Ju-Young;Nam, Woosung;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.45 no.12
    • /
    • pp.1321-1330
    • /
    • 2012
  • In this study, the daily inflow at the basin of Chungju dam is predicted using wavelet-artificial neural network for nonlinear model. Time series generally consists of a linear combination of trend, periodicity and stochastic component. However, when framing time series model through these data, trend and periodicity component have to be removed. Wavelet transform which is denoising technique is applied to remove nonlinear dynamic noise such as trend and periodicity included in hydrometeorological data and simple noise that arises in the measurement process. The wavelet-artificial neural network (WANN) using data applied wavelet transform as input variable and the artificial neural network (ANN) using only raw data are compared. As a results, coefficient of determination and the slope through linear regression show that WANN is higher than ANN by 0.031 and 0.0115 respectively. And RMSE and RRMSE of WANN are smaller than those of ANN by 37.388 and 0.099 respectively. Therefore, WANN model applied in this study shows more accurate results than ANN and application of denoising technique through wavelet transforms is expected that more accurate predictions than the use of raw data with noise.

An Analysis of Determinants of Medical Cost Inflation using both Deterministic and Stochastic Models (의료비 상승 요인 분석)

  • Kim, Han-Joong;Chun, Ki-Hong
    • Journal of Preventive Medicine and Public Health
    • /
    • v.22 no.4 s.28
    • /
    • pp.542-554
    • /
    • 1989
  • The skyrocketing inflation of medical costs has become a major health problem among most developed countries. Korea, which recently covered the entire population with National Health Insurance, is facing the same problem. The proportion of health expenditure to GNP has increased from 3% to 4.8% during the last decade. This was remarkable, if we consider the rapid economic growth during that time. A few policy analysts began to raise cost containment as an agenda, after recognizing the importance of medical cost inflation. In order to Prepare an appropriate alternative for the agenda, it is necessary to find out reasons for the cost inflation. Then, we should focus on the reasons which are controllable, and those whose control are socially desirable. This study is designed to articulate the theory of medical cost inflation through literature reviews, to find out reasons for cost inflation, by analyzing aggregated data with a deterministic model. Finally to identify determinants of changes in both medical demand and service intensity which are major reasons for cost inflation. The reasons for cost inflation are classified into cost push inflation and demand pull inflation, The former consists of increases in price and intensity of services, while the latter is made of consumer derived demand and supplier induced demand. We used a time series (1983-1987), and cross sectional (over regions) data of health insurance. The deterministic model reveals, that an increase in service intensity is a major cause of inflation in the case of inpatient care, while, more utilization, is a primary attribute in the case of physician visits. Multiple regression analysis shows that an increase in hospital beds is a leading explanatory variable for the increase in hospital care. It also reveals, that an introduction of a deductible clause, an increase in hospital beds and degree of urbanization, are statistically significant variables explaining physician visits. The results are consistent with the existing theory, The magnitude of service intensity is influenced by the level of co-payment, the proportion of old age and an increase in co-payment. In short, an increase in co-payment reduced the utilization, but it induced more intensities or services. We can conclude that the strict fee regulation or increase in the level of co-payment can not be an effective measure for cost containment under the fee for service system. Because the provider can react against the regulation by inducing more services.

  • PDF

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

Impacts assessment of Climate changes in North Korea based on RCP climate change scenarios II. Impacts assessment of hydrologic cycle changes in Yalu River (RCP 기후변화시나리오를 이용한 미래 북한지역의 수문순환 변화 영향 평가 II. 압록강유역의 미래 수문순환 변화 영향 평가)

  • Jeung, Se Jin;Kang, Dong Ho;Kim, Byung Sik
    • Journal of Wetlands Research
    • /
    • v.21 no.spc
    • /
    • pp.39-50
    • /
    • 2019
  • This study aims to assess the influence of climate change on the hydrological cycle at a basin level in North Korea. The selected model for this study is MRI-CGCM 3, the one used for the Coupled Model Intercomparison Project Phase 5 (CMIP5). Moreover, this study adopted the Spatial Disaggregation-Quantile Delta Mapping (SDQDM), which is one of the stochastic downscaling techniques, to conduct the bias correction for climate change scenarios. The comparison between the preapplication and postapplication of the SDQDM supported the study's review on the technique's validity. In addition, as this study determined the influence of climate change on the hydrological cycle, it also observed the runoff in North Korea. In predicting such influence, parameters of a runoff model used for the analysis should be optimized. However, North Korea is classified as an ungauged region for its political characteristics, and it was difficult to collect the country's runoff observation data. Hence, the study selected 16 basins with secured high-quality runoff data, and the M-RAT model's optimized parameters were calculated. The study also analyzed the correlation among variables for basin characteristics to consider multicollinearity. Then, based on a phased regression analysis, the study developed an equation to calculate parameters for ungauged basin areas. To verify the equation, the study assumed the Osipcheon River, Namdaecheon Stream, Yongdang Reservoir, and Yonggang Stream as ungauged basin areas and conducted cross-validation. As a result, for all the four basin areas, high efficiency was confirmed with the efficiency coefficients of 0.8 or higher. The study used climate change scenarios and parameters of the estimated runoff model to assess the changes in hydrological cycle processes at a basin level from climate change in the Amnokgang River of North Korea. The results showed that climate change would lead to an increase in precipitation, and the corresponding rise in temperature is predicted to cause elevating evapotranspiration. However, it was found that the storage capacity in the basin decreased. The result of the analysis on flow duration indicated a decrease in flow on the 95th day; an increase in the drought flow during the periods of Future 1 and Future 2; and an increase in both flows for the period of Future 3.

Use of Space-time Autocorrelation Information in Time-series Temperature Mapping (시계열 기온 분포도 작성을 위한 시공간 자기상관성 정보의 결합)

  • Park, No-Wook;Jang, Dong-Ho
    • Journal of the Korean association of regional geographers
    • /
    • v.17 no.4
    • /
    • pp.432-442
    • /
    • 2011
  • Climatic variables such as temperature and precipitation tend to vary both in space and in time simultaneously. Thus, it is necessary to include space-time autocorrelation into conventional spatial interpolation methods for reliable time-series mapping. This paper introduces and applies space-time variogram modeling and space-time kriging to generate time-series temperature maps using hourly Automatic Weather System(AWS) temperature observation data for a one-month period. First, temperature observation data are decomposed into deterministic trend and stochastic residual components. For trend component modeling, elevation data which have reasonable correlation with temperature are used as secondary information to generate trend component with topographic effects. Then, space-time variograms of residual components are estimated and modelled by using a product-sum space-time variogram model to account for not only autocorrelation both in space and in time, but also their interactions. From a case study, space-time kriging outperforms both conventional space only ordinary kriging and regression-kriging, which indicates the importance of using space-time autocorrelation information as well as elevation data. It is expected that space-time kriging would be a useful tool when a space-poor but time-rich dataset is analyzed.

  • PDF

Mediation analysis of dietary habits, nutrient intakes, daily life in the relationship between working hours of Korean shift workers and metabolic syndrome : the sixth (2013 ~ 2015) Korea National Health and Nutrition Examination Survey (교대근무자의 근무시간과 대사증후군의 관계에서 식습관, 영양섭취상태, 일상생활의 매개효과 분석 : 6기 국민건강영양조사 (2013 ~ 2015) 데이터 이용)

  • Kim, Yoona;Kim, Hyeon Hee;Lim, Dong Hoon
    • Journal of Nutrition and Health
    • /
    • v.51 no.6
    • /
    • pp.567-579
    • /
    • 2018
  • Purpose: This study examined the mediation effects of dietary habits, nutrient intake, daily life in the relationship between the working hours of Korean shift workers and metabolic syndrome. Methods: Data were collected from the sixth (2013-2015) Korea National Health and Nutrition Examination Survey (KNHANES). The stochastic regression imputation was used to fill missing data. Statistical analysis was performed in Korean shift workers with metabolic syndrome using the SPSS 24 program for Windows and a structural equation model (SEM) using an analysis of moment structure (AMOS) 21.0 package. Results: The model fitted the data well in terms of the goodness of fit index (GFI) = 0.939, root mean square error of approximation (RMSEA) = 0.025, normed fit index (NFI) = 0.917, Tucker-Lewis index (TLI) = 0.984, comparative fit index (CFI) = 0.987, and adjusted goodness of fit index (AGFI) = 0.915. Specific mediation effect of dietary habits (p = 0.023) was statistically significant in the impact of the working hours of shift workers on nutrient intake, and specific mediation effect of daily life (p = 0.019) was statistically significant in the impact of the working hours of shift workers on metabolic syndrome. On the other hand, the dietary habits, nutrient intake and daily life had no significant multiple mediator effects on the working hours of shift workers with metabolic syndrome. Conclusion: The appropriate model suggests that working hours have direct effect on the daily life, which has the mediation effect on the risk of metabolic syndrome in shift workers.