• Title/Summary/Keyword: data value prediction

Search Result 1,091, Processing Time 0.034 seconds

A Study on the Prediction of Cabbage Price Using Ensemble Voting Techniques (앙상블 Voting 기법을 활용한 배추 가격 예측에 관한 연구)

  • Lee, Chang-Min;Song, Sung-Kwang;Chung, Sung-Wook
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.3
    • /
    • pp.1-10
    • /
    • 2022
  • Vegetables such as cabbage are greatly affected by natural disasters, so price fluctuations increase due to disasters such as heavy rain and disease, which affects the farm economy. Various efforts have been made to predict the price of agricultural products to solve this problem, but it is difficult to predict extreme price prediction fluctuations. In this study, cabbage prices were analyzed using the ensemble Voting technique, a method of determining the final prediction results through various classifiers by combining a single classifier. In addition, the results were compared with LSTM, a time series analysis method, and XGBoost and RandomForest, a boosting technique. Daily data was used for price data, and weather information and price index that affect cabbage prices were used. As a result of the study, the RMSE value showing the difference between the actual value and the predicted value is about 236. It is expected that this study can be used to select other time series analysis research models such as predicting agricultural product prices

Cryptocurrency Auto-trading Program Development Using Prophet Algorithm (Prophet 알고리즘을 활용한 가상화폐의 자동 매매 프로그램 개발)

  • Hyun-Sun Kim;Jae Joon Ahn
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.1
    • /
    • pp.105-111
    • /
    • 2023
  • Recently, research on prediction algorithms using deep learning has been actively conducted. In addition, algorithmic trading (auto-trading) based on predictive power of artificial intelligence is also becoming one of the main investment methods in stock trading field, building its own history. Since the possibility of human error is blocked at source and traded mechanically according to the conditions, it is likely to be more profitable than humans in the long run. In particular, for the virtual currency market at least for now, unlike stocks, it is not possible to evaluate the intrinsic value of each cryptocurrencies. So it is far effective to approach them with technical analysis and cryptocurrency market might be the field that the performance of algorithmic trading can be maximized. Currently, the most commonly used artificial intelligence method for financial time series data analysis and forecasting is Long short-term memory(LSTM). However, even t4he LSTM also has deficiencies which constrain its widespread use. Therefore, many improvements are needed in the design of forecasting and investment algorithms in order to increase its utilization in actual investment situations. Meanwhile, Prophet, an artificial intelligence algorithm developed by Facebook (META) in 2017, is used to predict stock and cryptocurrency prices with high prediction accuracy. In particular, it is evaluated that Prophet predicts the price of virtual currencies better than that of stocks. In this study, we aim to show Prophet's virtual currency price prediction accuracy is higher than existing deep learning-based time series prediction method. In addition, we execute mock investment with Prophet predicted value. Evaluating the final value at the end of the investment, most of tested coins exceeded the initial investment recording a positive profit. In future research, we continue to test other coins to determine whether there is a significant difference in the predictive power by coin and therefore can establish investment strategies.

Uncertainty Analysis of Flash-flood Prediction using Remote Sensing and a Geographic Information System based on GcIUH in the Yeongdeok Basin, Korea

  • Choi, Hyun;Chung, Yong-Hyun;Yoon, Hong-Joo
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.884-887
    • /
    • 2006
  • This paper focuses on minimizing flood damage in the Yeongdeok basin of South Korea by establishing a flood prediction model based on a geographic information system (GIS), remote sensing, and geomorphoclimatic instantaneous unit hydrograph (GcIUH) techniques. The GIS database for flash flood prediction was created using data from digital elevation models (DEMs), soil maps, and Landsat satellite imagery. Flood prediction was based on the peak discharge calculated at the sub-basin scale using hydrogeomorphologic techniques and the threshold runoff value. Using the developed flash flood prediction model, rainfall conditions with the potential to cause flooding were determined based on the cumulative rainfall for 20 minutes, considering rainfall duration, peak discharge, and flooding in the Yeongdeok basin.

  • PDF

Machine Learning based Prediction of The Value of Buildings

  • Lee, Woosik;Kim, Namgi;Choi, Yoon-Ho;Kim, Yong Soo;Lee, Byoung-Dai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.8
    • /
    • pp.3966-3991
    • /
    • 2018
  • Due to the lack of visualization services and organic combinations between public and private buildings data, the usability of the basic map has remained low. To address this issue, this paper reports on a solution that organically combines public and private data while providing visualization services to general users. For this purpose, factors that can affect building prices first were examined in order to define the related data attributes. To extract the relevant data attributes, this paper presents a method of acquiring public information data and real estate-related information, as provided by private real estate portal sites. The paper also proposes a pretreatment process required for intelligent machine learning. This report goes on to suggest an intelligent machine learning algorithm that predicts buildings' value pricing and future value by using big data regarding buildings' spatial information, as acquired from a database containing building value attributes. The algorithm's availability was tested by establishing a prototype targeting pilot areas, including Suwon, Anyang, and Gunpo in South Korea. Finally, a prototype visualization solution was developed in order to allow general users to effectively use buildings' value ranking and value pricing, as predicted by intelligent machine learning.

Machine Learning based Firm Value Prediction Model: using Online Firm Reviews (머신러닝 기반의 기업가치 예측 모형: 온라인 기업리뷰를 활용하여)

  • Lee, Hanjun;Shin, Dongwon;Kim, Hee-Eun
    • Journal of Internet Computing and Services
    • /
    • v.22 no.5
    • /
    • pp.79-86
    • /
    • 2021
  • As the usefulness of big data analysis has been drawing attention, many studies in the business research area begin to use big data to predict firm performance. Previous studies mainly rely on data outside of the firm through news articles and social media platforms. The voices within the firm in the form of employee satisfaction or evaluation of the strength and weakness of the firm can potentially affect firm value. However, there is insufficient evidence that online employee reviews are valid to predict firm value because the data is relatively difficult to obtain. To fill this gap, from 2014 to 2019, we employed 97,216 reviews collected by JobPlanet, an online firm review website in Korea, and developed a machine learning-based predictive model. Among the proposed models, the LSTM-based model showed the highest accuracy at 73.2%, and the MAE showed the lowest error at 0.359. We expect that this study can be a useful case in the field of firm value prediction on domestic companies.

A Study on Occupancy Estimation Method of a Private Room Using IoT Sensor Data Based Decision Tree Algorithm (IoT 센서 데이터를 이용한 단위실의 재실추정을 위한 Decision Tree 알고리즘 성능분석)

  • Kim, Seok-Ho;Seo, Dong-Hyun
    • Journal of the Korean Solar Energy Society
    • /
    • v.37 no.2
    • /
    • pp.23-33
    • /
    • 2017
  • Accurate prediction of stochastic behavior of occupants is a well known problem for improving prediction performance of building energy use. Many researchers have been tried various sensors that have information on the status of occupant such as $CO_2$ sensor, infrared motion detector, RFID etc. to predict occupants, while others have been developed some algorithm to find occupancy probability with those sensors or some indirect monitoring data such as energy consumption in spaces. In this research, various sensor data and energy consumption data are utilized for decision tree algorithms (C4.5 & CART) for estimation of sub-hourly occupancy status. Although the experiment is limited by space (private room) and period (cooling season), the prediction result shows good agreement of above 95% accuracy when energy consumption data are used instead of measured $CO_2$ value. This result indicates potential of IoT data for awareness of indoor environmental status.

Machine Learning Data Analysis for Tool Wear Prediction in Core Multi Process Machining (코어 다중가공에서 공구마모 예측을 위한 기계학습 데이터 분석)

  • Choi, Sujin;Lee, Dongju;Hwang, Seungkuk
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.20 no.9
    • /
    • pp.90-96
    • /
    • 2021
  • As real-time data of factories can be collected using various sensors, the adaptation of intelligent unmanned processing systems is spreading via the establishment of smart factories. In intelligent unmanned processing systems, data are collected in real time using sensors. The equipment is controlled by predicting future situations using the collected data. Particularly, a technology for the prediction of tool wear and for determining the exact timing of tool replacement is needed to prevent defected or unprocessed products due to tool breakage or tool wear. Directly measuring the tool wear in real time is difficult during the cutting process in milling. Therefore, tool wear should be predicted indirectly by analyzing the cutting load of the main spindle, current, vibration, noise, etc. In this study, data from the current and acceleration sensors; displacement data along the X, Y, and Z axes; tool wear value, and shape change data observed using Newroview were collected from the high-speed, two-edge, flat-end mill machining process of SKD11 steel. The support vector machine technique (machine learning technique) was applied to predict the amount of tool wear using the aforementioned data. Additionally, the prediction accuracies of all kernels were compared.

A Hybrid SVM Classifier for Imbalanced Data Sets (불균형 데이터 집합의 분류를 위한 하이브리드 SVM 모델)

  • Lee, Jae Sik;Kwon, Jong Gu
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.125-140
    • /
    • 2013
  • We call a data set in which the number of records belonging to a certain class far outnumbers the number of records belonging to the other class, 'imbalanced data set'. Most of the classification techniques perform poorly on imbalanced data sets. When we evaluate the performance of a certain classification technique, we need to measure not only 'accuracy' but also 'sensitivity' and 'specificity'. In a customer churn prediction problem, 'retention' records account for the majority class, and 'churn' records account for the minority class. Sensitivity measures the proportion of actual retentions which are correctly identified as such. Specificity measures the proportion of churns which are correctly identified as such. The poor performance of the classification techniques on imbalanced data sets is due to the low value of specificity. Many previous researches on imbalanced data sets employed 'oversampling' technique where members of the minority class are sampled more than those of the majority class in order to make a relatively balanced data set. When a classification model is constructed using this oversampled balanced data set, specificity can be improved but sensitivity will be decreased. In this research, we developed a hybrid model of support vector machine (SVM), artificial neural network (ANN) and decision tree, that improves specificity while maintaining sensitivity. We named this hybrid model 'hybrid SVM model.' The process of construction and prediction of our hybrid SVM model is as follows. By oversampling from the original imbalanced data set, a balanced data set is prepared. SVM_I model and ANN_I model are constructed using the imbalanced data set, and SVM_B model is constructed using the balanced data set. SVM_I model is superior in sensitivity and SVM_B model is superior in specificity. For a record on which both SVM_I model and SVM_B model make the same prediction, that prediction becomes the final solution. If they make different prediction, the final solution is determined by the discrimination rules obtained by ANN and decision tree. For a record on which SVM_I model and SVM_B model make different predictions, a decision tree model is constructed using ANN_I output value as input and actual retention or churn as target. We obtained the following two discrimination rules: 'IF ANN_I output value <0.285, THEN Final Solution = Retention' and 'IF ANN_I output value ${\geq}0.285$, THEN Final Solution = Churn.' The threshold 0.285 is the value optimized for the data used in this research. The result we present in this research is the structure or framework of our hybrid SVM model, not a specific threshold value such as 0.285. Therefore, the threshold value in the above discrimination rules can be changed to any value depending on the data. In order to evaluate the performance of our hybrid SVM model, we used the 'churn data set' in UCI Machine Learning Repository, that consists of 85% retention customers and 15% churn customers. Accuracy of the hybrid SVM model is 91.08% that is better than that of SVM_I model or SVM_B model. The points worth noticing here are its sensitivity, 95.02%, and specificity, 69.24%. The sensitivity of SVM_I model is 94.65%, and the specificity of SVM_B model is 67.00%. Therefore the hybrid SVM model developed in this research improves the specificity of SVM_B model while maintaining the sensitivity of SVM_I model.

A Novel Thresholding for Prediction Analytics with Machine Learning Techniques

  • Shakir, Khan;Reemiah Muneer, Alotaibi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.33-40
    • /
    • 2023
  • Machine-learning techniques are discovering effective performance on data analytics. Classification and regression are supported for prediction on different kinds of data. There are various breeds of classification techniques are using based on nature of data. Threshold determination is essential to making better model for unlabelled data. In this paper, threshold value applied as range, based on min-max normalization technique for creating labels and multiclass classification performed on rainfall data. Binary classification is applied on autism data and classification techniques applied on child abuse data. Performance of each technique analysed with the evaluation metrics.

Comparison of machine learning algorithms for Chl-a prediction in the middle of Nakdong River (focusing on water quality and quantity factors) (머신러닝 기법을 활용한 낙동강 중류 지역의 Chl-a 예측 알고리즘 비교 연구(수질인자 및 수량 중심으로))

  • Lee, Sang-Min;Park, Kyeong-Deok;Kim, Il-Kyu
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.34 no.4
    • /
    • pp.277-288
    • /
    • 2020
  • In this study, we performed algorithms to predict algae of Chlorophyll-a (Chl-a). Water quality and quantity data of the middle Nakdong River area were used. At first, the correlation analysis between Chl-a and water quality and quantity data was studied. We extracted ten factors of high importance for water quality and quantity data about the two weirs. Algorithms predicted how ten factors affected Chl-a occurrence. We performed algorithms about decision tree, random forest, elastic net, gradient boosting with Python. The root mean square error (RMSE) value was used to evaluate excellent algorithms. The gradient boosting showed 10.55 of RMSE value for the Gangjeonggoryeong (GG) site and 11.43 of RMSE value for the Dalsung (DS) site. The gradient boosting algorithm showed excellent results for GG and DS sites. Prediction value for the four algorithms was also evaluated through the Receiver operating characteristic (ROC) curve and Area under curve (AUC). As a result of the evaluation, the AUC value was 0.877 at GG site and the AUC value was 0.951 at DS site. So the algorithm's ability to interpret seemed to be excellent.