• Title/Summary/Keyword: Prediction of variables

검색결과 1,818건 처리시간 0.03초

Runoff Prediction from Machine Learning Models Coupled with Empirical Mode Decomposition: A case Study of the Grand River Basin in Canada

  • Parisouj, Peiman;Jun, Changhyun;Nezhad, Somayeh Moghimi;Narimani, Roya
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2022년도 학술발표회
    • /
    • pp.136-136
    • /
    • 2022
  • This study investigates the possibility of coupling empirical mode decomposition (EMD) for runoff prediction from machine learning (ML) models. Here, support vector regression (SVR) and convolutional neural network (CNN) were considered for ML algorithms. Precipitation (P), minimum temperature (Tmin), maximum temperature (Tmax) and their intrinsic mode functions (IMF) values were used for input variables at a monthly scale from Jan. 1973 to Dec. 2020 in the Grand river basin, Canada. The support vector machine-recursive feature elimination (SVM-RFE) technique was applied for finding the best combination of predictors among input variables. The results show that the proposed method outperformed the individual performance of SVR and CNN during the training and testing periods in the study area. According to the correlation coefficient (R), the EMD-SVR model outperformed the EMD-CNN model in both training and testing even though the CNN indicated a better performance than the SVR before using IMF values. The EMD-SVR model showed higher improvement in R value (38.7%) than that from the EMD-CNN model (7.1%). It should be noted that the coupled models of EMD-SVR and EMD-CNN represented much higher accuracy in runoff prediction with respect to the considered evaluation indicators, including root mean square error (RMSE) and R values.

  • PDF

Long Short-Term Memory를 활용한 건화물운임지수 예측 (Prediction of Baltic Dry Index by Applications of Long Short-Term Memory)

  • 한민수;유성진
    • 품질경영학회지
    • /
    • 제47권3호
    • /
    • pp.497-508
    • /
    • 2019
  • Purpose: The purpose of this study is to overcome limitations of conventional studies that to predict Baltic Dry Index (BDI). The study proposed applications of Artificial Neural Network (ANN) named Long Short-Term Memory (LSTM) to predict BDI. Methods: The BDI time-series prediction was carried out through eight variables related to the dry bulk market. The prediction was conducted in two steps. First, identifying the goodness of fitness for the BDI time-series of specific ANN models and determining the network structures to be used in the next step. While using ANN's generalization capability, the structures determined in the previous steps were used in the empirical prediction step, and the sliding-window method was applied to make a daily (one-day ahead) prediction. Results: At the empirical prediction step, it was possible to predict variable y(BDI time series) at point of time t by 8 variables (related to the dry bulk market) of x at point of time (t-1). LSTM, known to be good at learning over a long period of time, showed the best performance with higher predictive accuracy compared to Multi-Layer Perceptron (MLP) and Recurrent Neural Network (RNN). Conclusion: Applying this study to real business would require long-term predictions by applying more detailed forecasting techniques. I hope that the research can provide a point of reference in the dry bulk market, and furthermore in the decision-making and investment in the future of the shipping business as a whole.

증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용한 공모주의 상장 이후 주가 등락 예측 (The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF)

  • 양수연;이채록;원종관;홍태호
    • 지능정보연구
    • /
    • 제28권2호
    • /
    • pp.237-262
    • /
    • 2022
  • 본 연구는 개인투자자들의 투자의사결정에 도움을 주고자, 증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용해 공모주의 상장 5거래일 이후 주식 가격 등락을 예측하는 모델을 제시한다. 연구 표본은 2009년 6월부터 2020년 12월 사이에 신규 상장된 691개의 국내 IPO 종목이다. 기업, 공모, 시장과 관련된 다양한 재무적 및 비재무적 IPO 관련 변수와 증권신고서의 어조를 분석하여 예측했고, 증권신고서의 어조 분석을 위해서 TF-IDF (Term Frequency - Inverse Document Frequency)에 기반한 텍스트 분석을 이용해 신고서의 투자위험요소란의 텍스트를 긍정적 어조, 중립적 어조, 부정적 어조로 분류하였다. 가격 등락 예측에는 로지스틱 회귀분석(Logistic Regression), 랜덤 포레스트(Random Forest), 서포트벡터머신(Support Vector Machine), 인공신경망(Artificial Neural Network) 기법을 사용하였고, 예측 결과 IPO 관련 변수와 증권신고서 어조 변수를 함께 사용한 모델이 IPO 관련 변수만을 사용한 모델보다 높은 예측 정확도를 보였다. 랜덤 포레스트 모형은 1.45%p 높아진 예측 정확도를 보였으며, 인공신공망 모형과 서포트벡터머신 모형은 각각 4.34%p, 5.07%p 향상을 보였다. 추가적으로 모형간 차이를 맥니마 검정을 통해 통계적으로 검증한 결과, 어조 변수의 유무에 따른 예측 모형의 성과 차이가 유의확률 1% 수준에서 유의했다. 이를 통해, 증권신고서에 표현된 어조가 공모주의 가격 등락 예측에 영향을 미치는 요인이라는 것을 확인할 수 있었다.

DIVERGENT SELECTION FOR POSTWEANING FEED CONVERSION IN ANGUS BEEF CATTLE V. PREDICTION OF FEED CONVERSION USING WEIGHTS AND LINEAR BODY MEASUREMENTS

  • Park, N.H.;Bishop, M.D.;Davis, M.E.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제7권3호
    • /
    • pp.441-448
    • /
    • 1994
  • Postweaning performance data were obtained on 187 group fed purebred Angus calves from 12 selected sires (six high and six low feed conversion sires) in 1985 and 1986. The objective of this portion of the study was to develop prediction equations for feed conversion from a stepwise regression analysis. Variables measured were on-test weight (ONTSTWT), on-test age (ONTSTAG), five weights by 28-d periods, seven linear body measurements: heart girth (HG), hip height (HH), head width (HDW), head length (HDL), muzzle circumference (MC), length between hooks and pins (HOPIN) and length between shoulder and hooks (SHHO), and backfat thickness (BF). Stepwise regressions for maintenance adjusted feed conversion (ADJFC) and unadjusted feed conversion (UNADFC) over the first 140 d of the test, and total feed conversion (FC) until progeny reached 8.89 mm of back fat were obtained separately by conversion groups and sexes and for combined feed conversion groups and sexes. In general, weights were more important than linear body measurements in prediction of feed utilization. To some extent this was expected as weight is related directly to gain which is a component of feed conversion. Weight at 112 d was the most important variable in prediction of feed conversion when data from both feed conversion groups and sexes were combined. Weights at 84 and 140 d were important variables in prediction of UNADFC and FC, respectively, of bulls. ONTSTWT and weight at 140 d had the highest standardized partial regression coefficients for UNADFC and ADJFC, respectively, of heifers. Results indicated that linear measurements, such as MC, HDL and HOPIN, are useful in prediction of feed conversion when feed in takes are unavailable.

경제적 투자효과의 예측 정확도 향상을 위한 실질할인율 분석 (Analysis on Real Discount Rate for Prediction Accuracy Improvement of Economic Investment Effect)

  • 이치주;이을범
    • 한국건설관리학회논문집
    • /
    • 제16권1호
    • /
    • pp.101-109
    • /
    • 2015
  • 투자에 의해 기대되는 경제적 효과는 실질할인율의 자승으로 매년 나누어서 현재가치로 전환된다. 따라서 실질할인율이 경제성 분석결과에 미치는 영향은 다른 요인들보다 크다. 실질할인율을 예측하는 기존의 일반적인 방법은 과거 특정기간의 평균값을 적용하는 것이다. 본 연구에서는 실질할인율의 예측 정확도를 향상시키기 위한 방법을 제안하였다. 먼저 실질할인율을 구성하는 기업대출 이자율과 소비자 물가지수에 영향을 미치는 경제변수들을 도출하였다. 기업대출 이자율에 영향을 주는 변수들로는 콜 금리와 환율, 소비자 물가지수에 영향을 주는 경제변수는 생산자 물가지수를 선정하였다. 다음으로 실질할인율과 선정된 변수들과의 영향관계를 검정하였다. 영향관계가 존재하는 것으로 분석되었다. 마지막으로 관련된 경제 변수들을 기반으로 2008년부터 2010년까지의 실질할인율을 예측하였다. 예측 결과의 정확도는 실측값과 평균값의 결과와 비교되었다. 실측값이 적용된 실질할인율은 -1.58%였으며, 예측 값은 -0.22%, 평균값은 6.06%으로 분석되었다. 본 연구에서 제안한 방법은 금융위기와 같은 특수 상황을 고려하지 않은 것이지만, 평균값보다 예측 정확도가 크게 우수한 것으로 분석되었다.

입력자료 군집화에 따른 앙상블 머신러닝 모형의 수질예측 특성 연구 (The Effect of Input Variables Clustering on the Characteristics of Ensemble Machine Learning Model for Water Quality Prediction)

  • 박정수
    • 한국물환경학회지
    • /
    • 제37권5호
    • /
    • pp.335-343
    • /
    • 2021
  • Water quality prediction is essential for the proper management of water supply systems. Increased suspended sediment concentration (SSC) has various effects on water supply systems such as increased treatment cost and consequently, there have been various efforts to develop a model for predicting SSC. However, SSC is affected by both the natural and anthropogenic environment, making it challenging to predict SSC. Recently, advanced machine learning models have increasingly been used for water quality prediction. This study developed an ensemble machine learning model to predict SSC using the XGBoost (XGB) algorithm. The observed discharge (Q) and SSC in two fields monitoring stations were used to develop the model. The input variables were clustered in two groups with low and high ranges of Q using the k-means clustering algorithm. Then each group of data was separately used to optimize XGB (Model 1). The model performance was compared with that of the XGB model using the entire data (Model 2). The models were evaluated by mean squared error-ob servation standard deviation ratio (RSR) and root mean squared error. The RSR were 0.51 and 0.57 in the two monitoring stations for Model 2, respectively, while the model performance improved to RSR 0.46 and 0.55, respectively, for Model 1.

DataPave 프로그램을 이용한 포장파손예측모델개발 (Development of Pavement Distress Prediction Models Using DataPave Program)

  • 진명섭;윤석준
    • 한국도로학회논문집
    • /
    • 제4권2호
    • /
    • pp.9-18
    • /
    • 2002
  • 포장의 공용성에 영향을 미치는 주요파손은 소성변형, 피로균열, 종단평탄성이다. 따라서 이들 세가지 파손량에 영향을 미치는 요인들을 분석하고 예측모델을 개발하는 것이 포장의 공용성 관리면에서 중요하다. 본 논문에서는 미국에서 개발되어 다양한 포장구간에 대한 광범위한 데이터가 축적되어 있는 DataPave 프로그램을 이용하여 세가지 파손량과 각각에 영향을 미치는 인자들을 추출한 후 파손 예측모델을 개발하였다. 개발된 모델의 입력변수들이 각각의 파손량에 미치는 영향을 파악하기 위해 민감도분석을 수행하였다. 소성변형 예측모델의 민감도분석결과 아스팔트함량, 공극율, 노상의 최적함수비가 주요영향인자로 나타났으며, 피로균열예측모델의 경우 아스팔트점도, 아스팔트함량, 공극율 순으로 나타났다. 종단평탄성 예측모델 분석결과 아스팔트점도, 노상골재의 200번체 통과율, 아스팔트함량 순으로 영향을 미치는 것을 알 수 있었다.

  • PDF

Average Mean Square Error of Prediction for a Multiple Functional Relationship Model

  • Yum, Bong-Jin
    • Journal of the Korean Statistical Society
    • /
    • 제13권2호
    • /
    • pp.107-113
    • /
    • 1984
  • In a linear regression model the idependent variables are frequently subject to measurement errors. For this case, the problem of estimating unknown parameters has been extensively discussed in the literature while very few has been concerned with the effect of measurement errors on prediction. This paper investigates the behavior of the predicted values of the dependent variable in terms of the average mean square error of prediction (AMSEP). AMSEP may be used as a criterion for selecting an appropriate estimation method, for designing an estimation experiment, and for developing cost-effective future sampling schemes.

  • PDF

혈압 판별 분석 -위험요인을 중심으로- (The Discriminant Analysis of Blood Pressure - Including the Risk Factors -)

  • 오현수;서화숙
    • 대한간호학회지
    • /
    • 제28권2호
    • /
    • pp.256-269
    • /
    • 1998
  • The purpose of this study was to evaluate the usefulness of variables which were known to be related to blood pressure for discriminating between hypertensive and normotensive groups. Variables were obesity, serum lipids, life style-related variables such as smoking, alcohol, exercise, and stress, and demographic variables such as age, economical status, and education. The data were collected from 400 male clients who visited one university hospital located in Incheon, Republic of Korea, from May 1996 to December 1996 for a regular physical examination. Variables which showed significance for discriminating systolic blood pressure in this study were age, serum lipids, education, HDL, exercise, total cholesterol, body fat percent, alcohol, stress, and smoking(in order of significance). By using the combination of these variables, the possibility of proper prediction for a high-systolic pressure group was 2%, predicting a normal-systolic pressure group was 70.3%, and total Hit Ratio was 70%. Variables which showed significance for discriminating diastolic blood pressure were exercise, triglyceride, alcohol, smoking, economical status, age, and BMI (in order of significance). By using the combination of these variables, the possibility of proper prediction for a high-diastolic pressure group was 71.2%, predicting a normal-diastolic pressure group was 71.3%, and total Hit Ratio was 71.3%. Multiple regression analysis was performed to examine the association of systolic blood pressure with life style-related variables after adjustment for obesity, serum lipids, and demographic variables. First, the effect of demographic variable alone on the systolic blood pressure was statistically significant (p=.000) and adjusted $R^2$was 0.09. Adding the variable obesity on demographic variables resulted in raising adjusted $R^2$to 0.11 (p=.000) : therefore, the contribution rate of obesity on the systolic blood pressure was 2.0%. On the next step, adding the variable serum lipids on the obesity and demographic variables resulted in raising adjusted R2 to 0.12(P=.000) : therefore, the contribution rate of serum lipid on the systolic pressure was 1.0%. Finally, adding life style-related variables on all other variables resulted in raising the adjusted $R^2$to 0.18(p=.000) ; therefore, the contribution rate of life style-related variables on the systolic blood pressure after adjustment for obesity, serum lipids, and demographic variables was 6.0%. Multiple regression analysis was also performed to examine the association of diastolic blood pressure with life style-related variables after adjustment for obesity, serum lipids, and demographic variables. First, the effect of demographic variable alone on the diastolic blood pressure was statistically significant (p=.01) and adjusted $R^2$was 0.03. Adding the variable obesity on demographic variables resulted in raising adjusted $R^2$to 0.06 (p=.000) ; therefore, the contribution rate of obesity on the diastolic blood pressure was 3.0%. On the next step, adding the variable serum lipids on the obesity and demographic variables resulted in raising the adjusted $R^2$ to 0.09(p=.000) ; therefore, the contribution rate of serum lipid on the diastolic pressure was 3.0%. Finally, adding life style-related variables on all other variables resulted in raising the adjusted $R^2$ to 0.12 (p=.000) : therefore, the contribution rate of life style-related variables on the systolic blood pressure after adjustment for obesity, serum lipids, and demographic variables was 3.0%.

  • PDF

격납건물 종합누설률 예측방법 평가 (Evaluation of Prediction Methods for Containment Integrated Leakage Rate)

  • 양승옥;이광대;오응세
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2004년도 학술대회 논문집 정보 및 제어부문
    • /
    • pp.562-564
    • /
    • 2004
  • The containment leakage rate test performed on the nuclear power plants consists of following phases : pressurizing the containment, stabilizing the atmosphere, conducting a Type A test, conducting a verification test, depressurizing the containment. It takes more than 48 hours from the pressurization to the depressurization and the prediction of the results will help to prepare the next test phase. In this paper, to predict the leakage rate, the prediction methods based on the least square method are evaluated according to the input variables and the measurement period.

  • PDF