Search | Korea Science

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

Chun, Se-Hak
- Journal of Intelligence and Information Systems
- /
- v.25 no.3
- /
- pp.239-251
- /
- 2019
Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.
https://doi.org/10.13088/jiis.2019.25.3.239 인용 PDF KSCI

Time Series Analysis and Forecast for Labor Cost of Actual Cost Data (시계열분석을 통한 실적공사비의 노무비 분석 및 예측에 관한 연구)

Lee, Hyun-Seok;Lee, Eun-Young;Kim, Yea-Sang
- Korean Journal of Construction Engineering and Management
- /
- v.14 no.4
- /
- pp.24-34
- /
- 2013
Since 2004, the government decided to gradually introduce Actual Cost Data into cost estimate for improving problems of below-cost tendering and to reflect fair market price through competition and carry contract efficiently. However, there are many concerns that Actual Cost Data has not reflected real market price, even that has contributed to reduce the government's budget. General construction firm's burden for labor cost is imputed to specialty contractors and eventually it becomes construction worker's burden. Therefore, realization of Actual Cost Data is very important factor to settle this system. To understand realization level and make short term forecast, this paper drew construction group of which labor cost constitutes more than 95% of direct cost, and compares their Actual Cost Data with relevant skilled workers's unit wage and predicts using time series analysis. The bid price which is not be reflected market price accelerates work environment changes and leads to directly affect such as late disbursement of wages, bankruptcy to workers. Therefore this paper is expected to be used to the preliminary data for solving the problem and establishing improvement of Actual Cost Data.
https://doi.org/10.6106/KJCEM.2013.14.4.024 인용 PDF KSCI

Water Supply forecast Using Multiple ARMA Model Based on the Analysis of Water Consumption Mode with Wavelet Transform. (Wavelet Transform을 이용한 물수요량의 특성분석 및 다원 ARMA모형을 통한 물수요량예측)

Jo, Yong-Jun;Kim, Jong-Mun
- Journal of Korea Water Resources Association
- /
- v.31 no.3
- /
- pp.317-326
- /
- 1998
Water consumption characteristics on the northern part of Seoul were analyzed using wavelet transform with a base function of Coiflets 5. It turns out that long term evolution mode detected at 212 scale in 1995 was in a shape of hyperbolic tangent over the entire period due to the development of Sanggae resident site. Furthermore, there was seasonal water demand having something to do with economic cycle which reached its peak at the ends of June and December. The amount of this additional consumption was about $1,700\;\textrm{cm}^3/hr$ on June and $500\;\textrm{cm}^3/hr$ on December. It was also shown that the periods of energy containing sinusoidal component were 3.13 day, 33.33 hr, 23.98 hr and 12 hr, respectively, and the amplitude of 23.98 hr component was the most humongous. The components of relatively short frequency detected at $2^i$[i = 1,2,…12] scale were following Gaussian PDF. The most reliable predictive models are multiple AR[32,16,23] and ARMA[20, 16, 10, 23] which the input of temperature from the view point of minimized predictive error, mutual independence or residuals and the availableness of reliable meteorological data. The predicted values of water supply were quite consistent with the measured data which cast a possibility of the deployment of the predictive model developed in this study for the optimal management of water supply facilities.
PDF

A Review of Time Series Analysis for Environmental and Ecological Data (환경생태 자료 분석을 위한 시계열 분석 방법 연구)

Mo, Hyoung-ho;Cho, Kijong;Shin, Key-Il
- Korean Journal of Environmental Biology
- /
- v.34 no.4
- /
- pp.365-373
- /
- 2016
Much of the data used in the analysis of environmental ecological data is being obtained over time. If the number of time points is small, the data will not be given enough information, so repeated measurements or multiple survey points data should be used to perform a comprehensive analysis. The method used for that case is longitudinal data analysis or mixed model analysis. However, if the amount of information is sufficient due to the large number of time points, repetitive data are not needed and these data are analyzed using time series analysis technique. In particular, with a large number of data points in the current situation, when we want to predict how each variable affects each other, or what trends will be expected in the future, we should analyze the data using time series analysis techniques. In this study, we introduce univariate time series analysis, intervention time series model, transfer function model, and multivariate time series model and review research papers studied in Korea. We also introduce an error correction model, which can be used to analyze environmental ecological data.
https://doi.org/10.11626/KJEB.2016.34.4.365 인용 PDF KSCI

Labor market forecasts for Information and communication construction business (정보통신공사업 인력수급차 분석 및 전망)

Kwak, Jeong Ho;Kwun, Tae Hee;Oh, Dong-Suk;Kim, Jung-Woo
- Journal of Internet Computing and Services
- /
- v.16 no.2
- /
- pp.99-107
- /
- 2015
In this era of smart convergent environment wherein all industries are converged on ICT infrastructure and industries and cultures come together, the information and communication construction business is becoming more important. For the information and communication construction business to continue growing, it is very important to ensure that technical manpower is stably supplied. To date, however, there has been no theoretically methodical analysis of manpower supply and demand in the information and communications construction business. The need for the analysis of manpower supply and demand has become even more important after the government announced the road map for the development of construction business in December 2014 to seek measures to strengthen the human resources capacity based on the mid- to long-term manpower supply and demand analysis. As such, this study developed the manpower supply and demand forecast model for the information and communications construction business and presented the result of manpower supply and demand analysis. The analysis suggested that an overdemand situation would arise since the number of graduates of technical colleges decreased beginning 2007 because of fewer students entering technical colleges and due to the restructuring and reform of departments. In conclusion, it cited the need for the reeducation of existing manpower, continuous upgrading of professional development in the information and communications construction business, and provision of various policy incentives.
https://doi.org/10.7472/jksii.2015.16.2.99 인용 PDF KSCI

Characteristics and Prediction of Total Ozone and UV-B Irradiance in East Asia Including the Korean Peninsula (한반도를 포함한 동아시아 영역에서 오존전량과 유해자외선의 특성과 예측)

Moon, Yun-Seob;Seok, Min-Woo;Kim, Yoo-Keun
- Journal of Environmental Science International
- /
- v.15 no.8
- /
- pp.701-718
- /
- 2006
The average ratio of the daily UV-B to total solar (75) irradiance at Busan (35.23$^{\circ}$N, 129.07$^{\circ}$E) in Korea is found as 0.11%. There is also a high exponential relationship between hourly UV-B and total solar irradiance: UV-B=exp (a$\times$(75-b))(R$^2$=0.93). The daily variation of total ozone is compared with the UV-B irradiance at Pohang (36.03$^{\circ}$N, 129.40$^{\circ}$E) in Korea using the Total Ozone Mapping Spectrometer (TOMS) data during the period of May to July in 2005. The total ozone (TO) has been maintained to a decreasing trend since 1979, which leading to a negative correlation with the ground-level UV-B irradiance doting the given period of cloudless day: UV-B=239.23-0.056 TO (R$^2$=0.52). The statistical predictions of daily total ozone are analyzed by using the data of the Brewer spectrophotometer and TOMS in East Asia including the Korean peninsula. The long-term monthly averages of total ozone using the multiplicative seasonal AutoRegressive Integrated Moving Average (ARIMA) model are used to predict the hourly mean UV-B irradiance by interpolating the daily mean total ozone far the predicting period. We also can predict the next day's total ozone by using regression models based on the present day's total ozone by TOMS and the next day's predicted maximum air temperature by the Meteorological Mesoscale Model 5 (MM5). These predicted and observed total ozone amounts are used to input data of the parameterization model (PM) of hourly UV-B irradiance. The PM of UV-B irradiance is based on the main parameters such as cloudiness, solar zenith angle, total ozone, opacity of aerosols, altitude, and surface albedo. The input data for the model requires daily total ozone, hourly amount and type of cloud, visibility and air pressure. To simplify cloud effects in the model, the constant cloud transmittance are used. For example, the correlation coefficient of the PM using these cloud transmissivities is shown high in more than 0.91 for cloudy days in Busan, and the relative mean bias error (RMBE) and the relative root mean square error (RRMSE) are less than 21% and 27%, respectively. In this study, the daily variations of calculated and predicted UV-B irradiance are presented in high correlation coefficients of more than 0.86 at each monitoring site of the Korean peninsula as well as East Asia. The RMBE is within 10% of the mean measured hourly irradiance, and the RRMSE is within 15% for hourly irradiance, respectively. Although errors are present in cloud amounts and total ozone, the results are still acceptable.
https://doi.org/10.5322/JES.2006.15.8.701 인용 PDF KSCI

Search Result 296, Processing Time 0.027 seconds

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

Time Series Analysis and Forecast for Labor Cost of Actual Cost Data (시계열분석을 통한 실적공사비의 노무비 분석 및 예측에 관한 연구)

Water Supply forecast Using Multiple ARMA Model Based on the Analysis of Water Consumption Mode with Wavelet Transform. (Wavelet Transform을 이용한 물수요량의 특성분석 및 다원 ARMA모형을 통한 물수요량예측)

A Review of Time Series Analysis for Environmental and Ecological Data (환경생태 자료 분석을 위한 시계열 분석 방법 연구)

Labor market forecasts for Information and communication construction business (정보통신공사업 인력수급차 분석 및 전망)

Characteristics and Prediction of Total Ozone and UV-B Irradiance in East Asia Including the Korean Peninsula (한반도를 포함한 동아시아 영역에서 오존전량과 유해자외선의 특성과 예측)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)