• Title/Summary/Keyword: absolute model accuracy

Search Result 252, Processing Time 0.03 seconds

A Study on the Index Estimation of Missing Real Estate Transaction Cases Using Machine Learning (머신러닝을 활용한 결측 부동산 매매 지수의 추정에 대한 연구)

  • Kim, Kyung-Min;Kim, Kyuseok;Nam, Daisik
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.25 no.1
    • /
    • pp.171-181
    • /
    • 2022
  • The real estate price index plays key roles as quantitative data in real estate market analysis. International organizations including OECD publish the real estate price indexes by country, and the Korea Real Estate Board announces metropolitan-level and municipal-level indexes. However, when the index is set on the smaller spatial unit level than metropolitan and municipal-level, problems occur: missing values. As the spatial scope is narrowed down, there are cases where there are few or no transactions depending on the unit period, which lead index calculation difficult or even impossible. This study suggests a supervised learning-based machine learning model to compensate for missing values that may occur due to no transaction in a specific range and period. The models proposed in our research verify the accuracy of predicting the existing values and missing values.

Machine Learning-based landslide susceptibility mapping - Inje area, South Korea

  • Chanul Choi;Le Xuan Hien;Seongcheon Kwon;Giha Lee
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.248-248
    • /
    • 2023
  • In recent years, the number of landslides in Korea has been increasing due to extreme weather events such as localized heavy rainfall and typhoons. Landslides often occur with debris flows, land subsidence, and earthquakes. They cause significant damage to life and property. 64% of Korea's land area is made up of mountains, the government wanted to predict landslides to reduce damage. In response, the Korea Forest Service has established a 'Landslide Information System' to predict the likelihood of landslides. This system selects a total of 13 landslide factors based on past landslide events. Using the LR technique (Logistic Regression) to predict the possibility of a landslide occurrence and the accuracy is known to be 0.75. However, most of the data used for learning in the current system is on landslides that occurred from 2005 to 2011, and it does not reflect recent typhoons or heavy rain. Therefore, in this study, we will apply a total of six machine learning techniques (KNN, LR, SVM, XGB, RF, GNB) to predict the occurrence of landslides based on the data of Inje, Gangwon-do, which was recently produced by the National Institute of Forest. To predict the occurrence of landslides, it is necessary to process converting landslide events and factors data into a suitable form for machine learning techniques through ArcGIS and Python. In addition, there is a large difference in the number of data between areas where landslides occurred or not. Therefore, the prediction was performed after correcting the unbalanced data using Tomek Links and Near Miss techniques. Moreover, to control unbalanced data, a model that reflects soil properties will use to remove absolute safe areas.

  • PDF

Intelligent System for the Prediction of Heart Diseases Using Machine Learning Algorithms with Anew Mixed Feature Creation (MFC) technique

  • Rawia Elarabi;Abdelrahman Elsharif Karrar;Murtada El-mukashfi El-taher
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.5
    • /
    • pp.148-162
    • /
    • 2023
  • Classification systems can significantly assist the medical sector by allowing for the precise and quick diagnosis of diseases. As a result, both doctors and patients will save time. A possible way for identifying risk variables is to use machine learning algorithms. Non-surgical technologies, such as machine learning, are trustworthy and effective in categorizing healthy and heart-disease patients, and they save time and effort. The goal of this study is to create a medical intelligent decision support system based on machine learning for the diagnosis of heart disease. We have used a mixed feature creation (MFC) technique to generate new features from the UCI Cleveland Cardiology dataset. We select the most suitable features by using Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination with Random Forest feature selection (RFE-RF) and the best features of both LASSO RFE-RF (BLR) techniques. Cross-validated and grid-search methods are used to optimize the parameters of the estimator used in applying these algorithms. and classifier performance assessment metrics including classification accuracy, specificity, sensitivity, precision, and F1-Score, of each classification model, along with execution time and RMSE the results are presented independently for comparison. Our proposed work finds the best potential outcome across all available prediction models and improves the system's performance, allowing physicians to diagnose heart patients more accurately.

Ensembles of neural network with stochastic optimization algorithms in predicting concrete tensile strength

  • Hu, Juan;Dong, Fenghui;Qiu, Yiqi;Xi, Lei;Majdi, Ali;Ali, H. Elhosiny
    • Steel and Composite Structures
    • /
    • v.45 no.2
    • /
    • pp.205-218
    • /
    • 2022
  • Proper calculation of splitting tensile strength (STS) of concrete has been a crucial task, due to the wide use of concrete in the construction sector. Following many recent studies that have proposed various predictive models for this aim, this study suggests and tests the functionality of three hybrid models in predicting the STS from the characteristics of the mixture components including cement compressive strength, cement tensile strength, curing age, the maximum size of the crushed stone, stone powder content, sand fine modulus, water to binder ratio, and the ratio of sand. A multi-layer perceptron (MLP) neural network incorporates invasive weed optimization (IWO), cuttlefish optimization algorithm (CFOA), and electrostatic discharge algorithm (ESDA) which are among the newest optimization techniques. A dataset from the earlier literature is used for exploring and extrapolating the STS behavior. The results acquired from several accuracy criteria demonstrated a nice learning capability for all three hybrid models viz. IWO-MLP, CFOA-MLP, and ESDA-MLP. Also in the prediction phase, the prediction products were in a promising agreement (above 88%) with experimental results. However, a comparative look revealed the ESDA-MLP as the most accurate predictor. Considering mean absolute percentage error (MAPE) index, the error of ESDA-MLP was 9.05%, while the corresponding value for IWO-MLP and CFOA-MLP was 9.17 and 13.97%, respectively. Since the combination of MLP and ESDA can be an effective tool for optimizing the concrete mixture toward a desirable STS, the last part of this study is dedicated to extracting a predictive formula from this model.

A Preliminary Study on Evaluation of TimeDependent Radionuclide Removal Performance Using Artificial Intelligence for Biological Adsorbents

  • Janghee Lee;Seungsoo Jang;Min-Jae Lee;Woo-Sung Cho;Joo Yeon Kim;Sangsoo Han;Sung Gyun Shin;Sun Young Lee;Dae Hyuk Jang;Miyong Yun;Song Hyun Kim
    • Journal of Radiation Protection and Research
    • /
    • v.48 no.4
    • /
    • pp.175-183
    • /
    • 2023
  • Background: Recently, biological adsorbents have been developed for removing radionuclides from radioactive liquid waste due to their high selectivity, eco-friendliness, and renewability. However, since they can be damaged by radiation in radioactive waste, a method for estimating the bio-adsorbent performance as a time should consider the radiation damages in terms of their renewability. This paper aims to develop a simulation method that applies a deep learning technique to rapidly and accurately estimate the adsorption performance of bio-adsorbents when inserted into liquid radioactive waste. Materials and Methods: A model that describes various interactions between a bio-adsorbent and liquid has been constructed using numerical methods to estimate the adsorption capacity of the bio-adsorbent. To generate datasets for machine learning, Monte Carlo N-Particle (MCNP) simulations were conducted while considering radioactive concentrations in the adsorbent column. Results and Discussion: Compared with the result of the conventional method, the proposed method indicates that the accuracy is in good agreement, within 0.99% and 0.06% for the R2 score and mean absolute percentage error, respectively. Furthermore, the estimation speed is improved by over 30 times. Conclusion: Note that an artificial neural network can rapidly and accurately estimate the survival rate of a bio-adsorbent from radiation ionization compared with the MCNP simulation and can determine if the bio-adsorbents are reusable.

How to improve the accuracy of recommendation systems: Combining ratings and review texts sentiment scores (평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구)

  • Hyun, Jiyeon;Ryu, Sangyi;Lee, Sang-Yong Tom
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.219-239
    • /
    • 2019
  • As the importance of providing customized services to individuals becomes important, researches on personalized recommendation systems are constantly being carried out. Collaborative filtering is one of the most popular systems in academia and industry. However, there exists limitation in a sense that recommendations were mostly based on quantitative information such as users' ratings, which made the accuracy be lowered. To solve these problems, many studies have been actively attempted to improve the performance of the recommendation system by using other information besides the quantitative information. Good examples are the usages of the sentiment analysis on customer review text data. Nevertheless, the existing research has not directly combined the results of the sentiment analysis and quantitative rating scores in the recommendation system. Therefore, this study aims to reflect the sentiments shown in the reviews into the rating scores. In other words, we propose a new algorithm that can directly convert the user 's own review into the empirically quantitative information and reflect it directly to the recommendation system. To do this, we needed to quantify users' reviews, which were originally qualitative information. In this study, sentiment score was calculated through sentiment analysis technique of text mining. The data was targeted for movie review. Based on the data, a domain specific sentiment dictionary is constructed for the movie reviews. Regression analysis was used as a method to construct sentiment dictionary. Each positive / negative dictionary was constructed using Lasso regression, Ridge regression, and ElasticNet methods. Based on this constructed sentiment dictionary, the accuracy was verified through confusion matrix. The accuracy of the Lasso based dictionary was 70%, the accuracy of the Ridge based dictionary was 79%, and that of the ElasticNet (${\alpha}=0.3$) was 83%. Therefore, in this study, the sentiment score of the review is calculated based on the dictionary of the ElasticNet method. It was combined with a rating to create a new rating. In this paper, we show that the collaborative filtering that reflects sentiment scores of user review is superior to the traditional method that only considers the existing rating. In order to show that the proposed algorithm is based on memory-based user collaboration filtering, item-based collaborative filtering and model based matrix factorization SVD, and SVD ++. Based on the above algorithm, the mean absolute error (MAE) and the root mean square error (RMSE) are calculated to evaluate the recommendation system with a score that combines sentiment scores with a system that only considers scores. When the evaluation index was MAE, it was improved by 0.059 for UBCF, 0.0862 for IBCF, 0.1012 for SVD and 0.188 for SVD ++. When the evaluation index is RMSE, UBCF is 0.0431, IBCF is 0.0882, SVD is 0.1103, and SVD ++ is 0.1756. As a result, it can be seen that the prediction performance of the evaluation point reflecting the sentiment score proposed in this paper is superior to that of the conventional evaluation method. In other words, in this paper, it is confirmed that the collaborative filtering that reflects the sentiment score of the user review shows superior accuracy as compared with the conventional type of collaborative filtering that only considers the quantitative score. We then attempted paired t-test validation to ensure that the proposed model was a better approach and concluded that the proposed model is better. In this study, to overcome limitations of previous researches that judge user's sentiment only by quantitative rating score, the review was numerically calculated and a user's opinion was more refined and considered into the recommendation system to improve the accuracy. The findings of this study have managerial implications to recommendation system developers who need to consider both quantitative information and qualitative information it is expect. The way of constructing the combined system in this paper might be directly used by the developers.

Development of Artificial Intelligence Model for Predicting Citrus Sugar Content based on Meteorological Data (기상 데이터 기반 감귤 당도 예측 인공지능 모델 개발)

  • Seo, Dongmin
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.6
    • /
    • pp.35-43
    • /
    • 2021
  • Citrus quality is generally determined by its sugar content and acidity. In particular, sugar content is a very important factor because it determines the taste of citrus. Currently, the most commonly used method of measuring citrus sugar content in farms is a portable juiced sugar meter and a non-destructive sugar meter. This method can be easily measured by individuals, but the accuracy of the sugar content is inferior to that of the citrus NongHyup official machine. In particular, there is an error difference of 0.5 Brix or more, which is still insufficient for use in the field. Therefore, in this paper, we propose an AI model that predicts the citrus sugar content of unmeasured days within the error range of 0.5 Brix or less based on the previously collected citrus sugar content and meteorological data (average temperature, humidity, rainfall, solar radiation, and average wind speed). In addition, it was confirmed that the prediction model proposed through performance evaluation had an mean absolute error of 0.1154 for Seongsan area and 0.1983 for the Hawon area in Jeju Island. Lastly, the proposed model supports an error difference of less than 0.5 Brix and is a technology that supports predictive measurement, so it is expected that its usability will be highly progressive.

Application of Integrated Modelling Framework Consisted of Delft3D and HABITAT for Habitat Suitability Assessment (생물서식지 적합성 평가를 위한 Delft3D와 HABITAT 모델의 연계 적용)

  • Lim, Hyejung;Na, Eun Hye;Jeon, Hyeong Cheol;Song, Hojin;Yoo, Hojun;Hwang, Soon Hong;Ryu, Hui-Seong
    • Journal of Korean Society on Water Environment
    • /
    • v.37 no.3
    • /
    • pp.217-228
    • /
    • 2021
  • This paper discusses a methodology where an integrated modelling framework is used to quantify the risk derived from anthropic activities on habitats and species. To achieve this purpose, a tool comprising the Delft3D and HABITAT model, was applied in the Yeongsan river. Delft3D effectively simulated the operational condition and flow of weirs in river. In accuracy evaluation of the Delft3D-FLOW, the Bias, Pbias, Mean Absolute Error (MAE), Nash-Sutcliffe Efficiency (NSE), and Index of Agreement (IOA) were used, and the result was evaluated as grade above 'Satisfactory'. The HABITAT calculated Habitat Suitability Value (HSV) for the following eight species: mammal, fish, aquatic plant, and benthic macroinvertebrate. An Area was defined as a suitable habitat if the HSV was larger than 0.5. HABITAT was judged accurately by measuring the Correct Classification rate (CCR) and the area under the ROC curve (AUC). For benthic macroinvertebrate, the CCR and AUC were 77% and 0.834, respectively, at thresholds of 0.017 and 4 inds/m2 for HSV and individuals per unit area. This meant that the HABITAT model accurately predicted the appearance of the benthic macroinvertebrates by approximately 77% and that the probability of false alarms was also very low. As a result of evaluating the suitability of habitats, in the Yeongsan river, if the annual "lowest level" (Seungchon weir: 2.5 EL.m/ Juksan weir: -1.35 EL.m) was maintained, the average habitat improvement effect of 6.5%P compared to the 'reference' scenario was predicted. Consequently, it was demonstrated that the integrated modelling framework for habitat suitability assessment is able to support the remedy aquatic ecological management.

Simultaneous Estimation of State of Charge and Capacity using Extended Kalman Filter in Battery Systems (확장칼만필터를 활용한 배터리 시스템에서의 State of Charge와 용량 동시 추정)

  • Mun, Yejin;Kim, Namhoon;Ryu, Jihoon;Lee, Kyungmin;Lee, Jonghyeok;Cho, Wonhee;Kim, Yeonsoo
    • Korean Chemical Engineering Research
    • /
    • v.60 no.3
    • /
    • pp.363-370
    • /
    • 2022
  • In this paper, an estimation algorithm for state of charge (SOC) was applied using an equivalent circuit model (ECM) and an Extended Kalman Filter (EKF) to improve the estimation accuracy of the battery system states. In particular, an observer was designed to estimate SOC along with the aged capacity. In the case of the fresh battery, when SOC was estimated by Kalman Filter (KF), the mean absolute percentage error (MAPE) was 0.27% which was smaller than MAPE of 1.43% when the SOC was calculated by the model without the observer. In the driving mode of the vehicle, the general KF or EKF algorithm cannot be used to estimate both SOC and capacity. Considering that the battery aging does not occur in a short period of time, a strategy of periodically estimating the battery capacity during charging was proposed. In the charging mode, since the current is fixed at some intervals, a strategy for estimating the capacity along with the SOC in this situation was suggested. When the current was fixed, MAPE of SOC estimation was 0.54%, and the MAPE of capacity estimation was 2.24%. Since the current is fixed when charging, it is feasible to estimate the battery capacity and SOC simultaneously using the general EKF. This method can be used to periodically perform battery capacity correction when charging the battery. When driving, the SOC can be estimated using EKF with the corrected capacity.

Social Network-based Hybrid Collaborative Filtering using Genetic Algorithms (유전자 알고리즘을 활용한 소셜네트워크 기반 하이브리드 협업필터링)

  • Noh, Heeryong;Choi, Seulbi;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.19-38
    • /
    • 2017
  • Collaborative filtering (CF) algorithm has been popularly used for implementing recommender systems. Until now, there have been many prior studies to improve the accuracy of CF. Among them, some recent studies adopt 'hybrid recommendation approach', which enhances the performance of conventional CF by using additional information. In this research, we propose a new hybrid recommender system which fuses CF and the results from the social network analysis on trust and distrust relationship networks among users to enhance prediction accuracy. The proposed algorithm of our study is based on memory-based CF. But, when calculating the similarity between users in CF, our proposed algorithm considers not only the correlation of the users' numeric rating patterns, but also the users' in-degree centrality values derived from trust and distrust relationship networks. In specific, it is designed to amplify the similarity between a target user and his or her neighbor when the neighbor has higher in-degree centrality in the trust relationship network. Also, it attenuates the similarity between a target user and his or her neighbor when the neighbor has higher in-degree centrality in the distrust relationship network. Our proposed algorithm considers four (4) types of user relationships - direct trust, indirect trust, direct distrust, and indirect distrust - in total. And, it uses four adjusting coefficients, which adjusts the level of amplification / attenuation for in-degree centrality values derived from direct / indirect trust and distrust relationship networks. To determine optimal adjusting coefficients, genetic algorithms (GA) has been adopted. Under this background, we named our proposed algorithm as SNACF-GA (Social Network Analysis - based CF using GA). To validate the performance of the SNACF-GA, we used a real-world data set which is called 'Extended Epinions dataset' provided by 'trustlet.org'. It is the data set contains user responses (rating scores and reviews) after purchasing specific items (e.g. car, movie, music, book) as well as trust / distrust relationship information indicating whom to trust or distrust between users. The experimental system was basically developed using Microsoft Visual Basic for Applications (VBA), but we also used UCINET 6 for calculating the in-degree centrality of trust / distrust relationship networks. In addition, we used Palisade Software's Evolver, which is a commercial software implements genetic algorithm. To examine the effectiveness of our proposed system more precisely, we adopted two comparison models. The first comparison model is conventional CF. It only uses users' explicit numeric ratings when calculating the similarities between users. That is, it does not consider trust / distrust relationship between users at all. The second comparison model is SNACF (Social Network Analysis - based CF). SNACF differs from the proposed algorithm SNACF-GA in that it considers only direct trust / distrust relationships. It also does not use GA optimization. The performances of the proposed algorithm and comparison models were evaluated by using average MAE (mean absolute error). Experimental result showed that the optimal adjusting coefficients for direct trust, indirect trust, direct distrust, indirect distrust were 0, 1.4287, 1.5, 0.4615 each. This implies that distrust relationships between users are more important than trust ones in recommender systems. From the perspective of recommendation accuracy, SNACF-GA (Avg. MAE = 0.111943), the proposed algorithm which reflects both direct and indirect trust / distrust relationships information, was found to greatly outperform a conventional CF (Avg. MAE = 0.112638). Also, the algorithm showed better recommendation accuracy than the SNACF (Avg. MAE = 0.112209). To confirm whether these differences are statistically significant or not, we applied paired samples t-test. The results from the paired samples t-test presented that the difference between SNACF-GA and conventional CF was statistical significant at the 1% significance level, and the difference between SNACF-GA and SNACF was statistical significant at the 5%. Our study found that the trust/distrust relationship can be important information for improving performance of recommendation algorithms. Especially, distrust relationship information was found to have a greater impact on the performance improvement of CF. This implies that we need to have more attention on distrust (negative) relationships rather than trust (positive) ones when tracking and managing social relationships between users.