• Title/Summary/Keyword: Prediction of variables

Search Result 1,810, Processing Time 0.027 seconds

Prediction of Baltic Dry Index by Applications of Long Short-Term Memory (Long Short-Term Memory를 활용한 건화물운임지수 예측)

  • HAN, Minsoo;YU, Song-Jin
    • Journal of Korean Society for Quality Management
    • /
    • v.47 no.3
    • /
    • pp.497-508
    • /
    • 2019
  • Purpose: The purpose of this study is to overcome limitations of conventional studies that to predict Baltic Dry Index (BDI). The study proposed applications of Artificial Neural Network (ANN) named Long Short-Term Memory (LSTM) to predict BDI. Methods: The BDI time-series prediction was carried out through eight variables related to the dry bulk market. The prediction was conducted in two steps. First, identifying the goodness of fitness for the BDI time-series of specific ANN models and determining the network structures to be used in the next step. While using ANN's generalization capability, the structures determined in the previous steps were used in the empirical prediction step, and the sliding-window method was applied to make a daily (one-day ahead) prediction. Results: At the empirical prediction step, it was possible to predict variable y(BDI time series) at point of time t by 8 variables (related to the dry bulk market) of x at point of time (t-1). LSTM, known to be good at learning over a long period of time, showed the best performance with higher predictive accuracy compared to Multi-Layer Perceptron (MLP) and Recurrent Neural Network (RNN). Conclusion: Applying this study to real business would require long-term predictions by applying more detailed forecasting techniques. I hope that the research can provide a point of reference in the dry bulk market, and furthermore in the decision-making and investment in the future of the shipping business as a whole.

The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF (증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용한 공모주의 상장 이후 주가 등락 예측)

  • Yang, Suyeon;Lee, Chaerok;Won, Jonggwan;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.237-262
    • /
    • 2022
  • There has been a growing interest in IPOs (Initial Public Offerings) due to the profitable returns that IPO stocks can offer to investors. However, IPOs can be speculative investments that may involve substantial risk as well because shares tend to be volatile, and the supply of IPO shares is often highly limited. Therefore, it is crucially important that IPO investors are well informed of the issuing firms and the market before deciding whether to invest or not. Unlike institutional investors, individual investors are at a disadvantage since there are few opportunities for individuals to obtain information on the IPOs. In this regard, the purpose of this study is to provide individual investors with the information they may consider when making an IPO investment decision. This study presents a model that uses machine learning and text analysis to predict whether an IPO stock price would move up or down after the first 5 trading days. Our sample includes 691 Korean IPOs from June 2009 to December 2020. The input variables for the prediction are three tone variables created from IPO prospectuses and quantitative variables that are either firm-specific, issue-specific, or market-specific. The three prospectus tone variables indicate the percentage of positive, neutral, and negative sentences in a prospectus, respectively. We considered only the sentences in the Risk Factors section of a prospectus for the tone analysis in this study. All sentences were classified into 'positive', 'neutral', and 'negative' via text analysis using TF-IDF (Term Frequency - Inverse Document Frequency). Measuring the tone of each sentence was conducted by machine learning instead of a lexicon-based approach due to the lack of sentiment dictionaries suitable for Korean text analysis in the context of finance. For this reason, the training set was created by randomly selecting 10% of the sentences from each prospectus, and the sentence classification task on the training set was performed after reading each sentence in person. Then, based on the training set, a Support Vector Machine model was utilized to predict the tone of sentences in the test set. Finally, the machine learning model calculated the percentages of positive, neutral, and negative sentences in each prospectus. To predict the price movement of an IPO stock, four different machine learning techniques were applied: Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Network. According to the results, models that use quantitative variables using technical analysis and prospectus tone variables together show higher accuracy than models that use only quantitative variables. More specifically, the prediction accuracy was improved by 1.45% points in the Random Forest model, 4.34% points in the Artificial Neural Network model, and 5.07% points in the Support Vector Machine model. After testing the performance of these machine learning techniques, the Artificial Neural Network model using both quantitative variables and prospectus tone variables was the model with the highest prediction accuracy rate, which was 61.59%. The results indicate that the tone of a prospectus is a significant factor in predicting the price movement of an IPO stock. In addition, the McNemar test was used to verify the statistically significant difference between the models. The model using only quantitative variables and the model using both the quantitative variables and the prospectus tone variables were compared, and it was confirmed that the predictive performance improved significantly at a 1% significance level.

DIVERGENT SELECTION FOR POSTWEANING FEED CONVERSION IN ANGUS BEEF CATTLE V. PREDICTION OF FEED CONVERSION USING WEIGHTS AND LINEAR BODY MEASUREMENTS

  • Park, N.H.;Bishop, M.D.;Davis, M.E.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.7 no.3
    • /
    • pp.441-448
    • /
    • 1994
  • Postweaning performance data were obtained on 187 group fed purebred Angus calves from 12 selected sires (six high and six low feed conversion sires) in 1985 and 1986. The objective of this portion of the study was to develop prediction equations for feed conversion from a stepwise regression analysis. Variables measured were on-test weight (ONTSTWT), on-test age (ONTSTAG), five weights by 28-d periods, seven linear body measurements: heart girth (HG), hip height (HH), head width (HDW), head length (HDL), muzzle circumference (MC), length between hooks and pins (HOPIN) and length between shoulder and hooks (SHHO), and backfat thickness (BF). Stepwise regressions for maintenance adjusted feed conversion (ADJFC) and unadjusted feed conversion (UNADFC) over the first 140 d of the test, and total feed conversion (FC) until progeny reached 8.89 mm of back fat were obtained separately by conversion groups and sexes and for combined feed conversion groups and sexes. In general, weights were more important than linear body measurements in prediction of feed utilization. To some extent this was expected as weight is related directly to gain which is a component of feed conversion. Weight at 112 d was the most important variable in prediction of feed conversion when data from both feed conversion groups and sexes were combined. Weights at 84 and 140 d were important variables in prediction of UNADFC and FC, respectively, of bulls. ONTSTWT and weight at 140 d had the highest standardized partial regression coefficients for UNADFC and ADJFC, respectively, of heifers. Results indicated that linear measurements, such as MC, HDL and HOPIN, are useful in prediction of feed conversion when feed in takes are unavailable.

Analysis on Real Discount Rate for Prediction Accuracy Improvement of Economic Investment Effect (경제적 투자효과의 예측 정확도 향상을 위한 실질할인율 분석)

  • Lee, Chijoo;Lee, Eul-Bum
    • Korean Journal of Construction Engineering and Management
    • /
    • v.16 no.1
    • /
    • pp.101-109
    • /
    • 2015
  • The expected economic effect by investment was divided by square of real discount rate annually for change to present value. Thus, the impact of real discount rate on economic analysis is larger than other factors. The existing general method for prediction of real discount rate is application of average data during past certain period. This study proposed prediction method of real discount rate for accuracy improvement. First, the economic variables which impact on interest rate of business loan and consumer price of real discount rate were determined. The variables which impact on interest rate of business loan were selected to call rate and exchange rate. The variable which impact on consumer price index was selected to producer price index. Next, the effect relation was analyzed between real discount rate and selected variables. The significant effect relation were analyzed to exit. Lastly, the real discount rate was predicted from 2008 to 2010 based on related economic variables. The accuracy of prediction result was compared with actual data and average data. The real discount rate based on actual data, predicted data, and average data were analyzed to -1.58%, -0.22%, and 6.06%, respectively. Though the proposed method in this study was not considered special condition such as financial crisis, the prediction accuracy was much higher than result based on average data.

The Effect of Input Variables Clustering on the Characteristics of Ensemble Machine Learning Model for Water Quality Prediction (입력자료 군집화에 따른 앙상블 머신러닝 모형의 수질예측 특성 연구)

  • Park, Jungsu
    • Journal of Korean Society on Water Environment
    • /
    • v.37 no.5
    • /
    • pp.335-343
    • /
    • 2021
  • Water quality prediction is essential for the proper management of water supply systems. Increased suspended sediment concentration (SSC) has various effects on water supply systems such as increased treatment cost and consequently, there have been various efforts to develop a model for predicting SSC. However, SSC is affected by both the natural and anthropogenic environment, making it challenging to predict SSC. Recently, advanced machine learning models have increasingly been used for water quality prediction. This study developed an ensemble machine learning model to predict SSC using the XGBoost (XGB) algorithm. The observed discharge (Q) and SSC in two fields monitoring stations were used to develop the model. The input variables were clustered in two groups with low and high ranges of Q using the k-means clustering algorithm. Then each group of data was separately used to optimize XGB (Model 1). The model performance was compared with that of the XGB model using the entire data (Model 2). The models were evaluated by mean squared error-ob servation standard deviation ratio (RSR) and root mean squared error. The RSR were 0.51 and 0.57 in the two monitoring stations for Model 2, respectively, while the model performance improved to RSR 0.46 and 0.55, respectively, for Model 1.

Development of Pavement Distress Prediction Models Using DataPave Program (DataPave 프로그램을 이용한 포장파손예측모델개발)

  • Jin, Myung-Sub;Yoon, Seok-Joon
    • International Journal of Highway Engineering
    • /
    • v.4 no.2 s.12
    • /
    • pp.9-18
    • /
    • 2002
  • The main distresses that influence pavement performance are rutting, fatigue cracking, and longitudinal roughness. Thus, it is important to analyze the factors that affect these three distresses, and to develop prediction models. In this paper, three distress prediction models were developed using DataPave program which stores data from a wide variety of pavement sections In the United States. Also, sensitivity studies were conducted to evaluate how the input variables impact on the distresses. The result of sensitivity study for the prediction model of rutting showed that asphalt content, air void, and optimum moisture content of subgrade were the major factors that affect rutting. The output of sensitivity study for the prediction model of fatigue cracking revealed that asphalt consistency, asphalt content, and air void were the most influential variables. The prediction model of longitudinal roughness indicated asphalt consistency, #200 passing percent of subgrade aggregate, and asphalt content were the factors that affect longitudinal roughness.

  • PDF

Average Mean Square Error of Prediction for a Multiple Functional Relationship Model

  • Yum, Bong-Jin
    • Journal of the Korean Statistical Society
    • /
    • v.13 no.2
    • /
    • pp.107-113
    • /
    • 1984
  • In a linear regression model the idependent variables are frequently subject to measurement errors. For this case, the problem of estimating unknown parameters has been extensively discussed in the literature while very few has been concerned with the effect of measurement errors on prediction. This paper investigates the behavior of the predicted values of the dependent variable in terms of the average mean square error of prediction (AMSEP). AMSEP may be used as a criterion for selecting an appropriate estimation method, for designing an estimation experiment, and for developing cost-effective future sampling schemes.

  • PDF

The Discriminant Analysis of Blood Pressure - Including the Risk Factors - (혈압 판별 분석 -위험요인을 중심으로-)

  • 오현수;서화숙
    • Journal of Korean Academy of Nursing
    • /
    • v.28 no.2
    • /
    • pp.256-269
    • /
    • 1998
  • The purpose of this study was to evaluate the usefulness of variables which were known to be related to blood pressure for discriminating between hypertensive and normotensive groups. Variables were obesity, serum lipids, life style-related variables such as smoking, alcohol, exercise, and stress, and demographic variables such as age, economical status, and education. The data were collected from 400 male clients who visited one university hospital located in Incheon, Republic of Korea, from May 1996 to December 1996 for a regular physical examination. Variables which showed significance for discriminating systolic blood pressure in this study were age, serum lipids, education, HDL, exercise, total cholesterol, body fat percent, alcohol, stress, and smoking(in order of significance). By using the combination of these variables, the possibility of proper prediction for a high-systolic pressure group was 2%, predicting a normal-systolic pressure group was 70.3%, and total Hit Ratio was 70%. Variables which showed significance for discriminating diastolic blood pressure were exercise, triglyceride, alcohol, smoking, economical status, age, and BMI (in order of significance). By using the combination of these variables, the possibility of proper prediction for a high-diastolic pressure group was 71.2%, predicting a normal-diastolic pressure group was 71.3%, and total Hit Ratio was 71.3%. Multiple regression analysis was performed to examine the association of systolic blood pressure with life style-related variables after adjustment for obesity, serum lipids, and demographic variables. First, the effect of demographic variable alone on the systolic blood pressure was statistically significant (p=.000) and adjusted $R^2$was 0.09. Adding the variable obesity on demographic variables resulted in raising adjusted $R^2$to 0.11 (p=.000) : therefore, the contribution rate of obesity on the systolic blood pressure was 2.0%. On the next step, adding the variable serum lipids on the obesity and demographic variables resulted in raising adjusted R2 to 0.12(P=.000) : therefore, the contribution rate of serum lipid on the systolic pressure was 1.0%. Finally, adding life style-related variables on all other variables resulted in raising the adjusted $R^2$to 0.18(p=.000) ; therefore, the contribution rate of life style-related variables on the systolic blood pressure after adjustment for obesity, serum lipids, and demographic variables was 6.0%. Multiple regression analysis was also performed to examine the association of diastolic blood pressure with life style-related variables after adjustment for obesity, serum lipids, and demographic variables. First, the effect of demographic variable alone on the diastolic blood pressure was statistically significant (p=.01) and adjusted $R^2$was 0.03. Adding the variable obesity on demographic variables resulted in raising adjusted $R^2$to 0.06 (p=.000) ; therefore, the contribution rate of obesity on the diastolic blood pressure was 3.0%. On the next step, adding the variable serum lipids on the obesity and demographic variables resulted in raising the adjusted $R^2$ to 0.09(p=.000) ; therefore, the contribution rate of serum lipid on the diastolic pressure was 3.0%. Finally, adding life style-related variables on all other variables resulted in raising the adjusted $R^2$ to 0.12 (p=.000) : therefore, the contribution rate of life style-related variables on the systolic blood pressure after adjustment for obesity, serum lipids, and demographic variables was 3.0%.

  • PDF

Evaluation of Prediction Methods for Containment Integrated Leakage Rate (격납건물 종합누설률 예측방법 평가)

  • Yang, Seung-Ok;Lee, Kwang-Dae;Oh, Eung-Se
    • Proceedings of the KIEE Conference
    • /
    • 2004.11c
    • /
    • pp.562-564
    • /
    • 2004
  • The containment leakage rate test performed on the nuclear power plants consists of following phases : pressurizing the containment, stabilizing the atmosphere, conducting a Type A test, conducting a verification test, depressurizing the containment. It takes more than 48 hours from the pressurization to the depressurization and the prediction of the results will help to prepare the next test phase. In this paper, to predict the leakage rate, the prediction methods based on the least square method are evaluated according to the input variables and the measurement period.

  • PDF

Analyzing Customer Management Data by Data Mining: Case Study on Chum Prediction Models for Insurance Company in Korea

  • Cho, Mee-Hye;Park, Eun-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1007-1018
    • /
    • 2008
  • The purpose of this case study is to demonstrate database-marketing management. First, we explore original variables for insurance customer's data, modify them if necessary, and go through variable selection process before analysis. Then, we develop churn prediction models using logistic regression, neural network and SVM analysis. We also compare these three data mining models in terms of misclassification rate.

  • PDF