• Title/Summary/Keyword: Improvement of prediction performance

Search Result 440, Processing Time 0.028 seconds

Scalable Collaborative Filtering Technique based on Adaptive Clustering (적응형 군집화 기반 확장 용이한 협업 필터링 기법)

  • Lee, O-Joun;Hong, Min-Sung;Lee, Won-Jin;Lee, Jae-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.73-92
    • /
    • 2014
  • An Adaptive Clustering-based Collaborative Filtering Technique was proposed to solve the fundamental problems of collaborative filtering, such as cold-start problems, scalability problems and data sparsity problems. Previous collaborative filtering techniques were carried out according to the recommendations based on the predicted preference of the user to a particular item using a similar item subset and a similar user subset composed based on the preference of users to items. For this reason, if the density of the user preference matrix is low, the reliability of the recommendation system will decrease rapidly. Therefore, the difficulty of creating a similar item subset and similar user subset will be increased. In addition, as the scale of service increases, the time needed to create a similar item subset and similar user subset increases geometrically, and the response time of the recommendation system is then increased. To solve these problems, this paper suggests a collaborative filtering technique that adapts a condition actively to the model and adopts the concepts of a context-based filtering technique. This technique consists of four major methodologies. First, items are made, the users are clustered according their feature vectors, and an inter-cluster preference between each item cluster and user cluster is then assumed. According to this method, the run-time for creating a similar item subset or user subset can be economized, the reliability of a recommendation system can be made higher than that using only the user preference information for creating a similar item subset or similar user subset, and the cold start problem can be partially solved. Second, recommendations are made using the prior composed item and user clusters and inter-cluster preference between each item cluster and user cluster. In this phase, a list of items is made for users by examining the item clusters in the order of the size of the inter-cluster preference of the user cluster, in which the user belongs, and selecting and ranking the items according to the predicted or recorded user preference information. Using this method, the creation of a recommendation model phase bears the highest load of the recommendation system, and it minimizes the load of the recommendation system in run-time. Therefore, the scalability problem and large scale recommendation system can be performed with collaborative filtering, which is highly reliable. Third, the missing user preference information is predicted using the item and user clusters. Using this method, the problem caused by the low density of the user preference matrix can be mitigated. Existing studies on this used an item-based prediction or user-based prediction. In this paper, Hao Ji's idea, which uses both an item-based prediction and user-based prediction, was improved. The reliability of the recommendation service can be improved by combining the predictive values of both techniques by applying the condition of the recommendation model. By predicting the user preference based on the item or user clusters, the time required to predict the user preference can be reduced, and missing user preference in run-time can be predicted. Fourth, the item and user feature vector can be made to learn the following input of the user feedback. This phase applied normalized user feedback to the item and user feature vector. This method can mitigate the problems caused by the use of the concepts of context-based filtering, such as the item and user feature vector based on the user profile and item properties. The problems with using the item and user feature vector are due to the limitation of quantifying the qualitative features of the items and users. Therefore, the elements of the user and item feature vectors are made to match one to one, and if user feedback to a particular item is obtained, it will be applied to the feature vector using the opposite one. Verification of this method was accomplished by comparing the performance with existing hybrid filtering techniques. Two methods were used for verification: MAE(Mean Absolute Error) and response time. Using MAE, this technique was confirmed to improve the reliability of the recommendation system. Using the response time, this technique was found to be suitable for a large scaled recommendation system. This paper suggested an Adaptive Clustering-based Collaborative Filtering Technique with high reliability and low time complexity, but it had some limitations. This technique focused on reducing the time complexity. Hence, an improvement in reliability was not expected. The next topic will be to improve this technique by rule-based filtering.

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

The Adaptive Personalization Method According to Users Purchasing Index : Application to Beverage Purchasing Predictions (고객별 구매빈도에 동적으로 적응하는 개인화 시스템 : 음료수 구매 예측에의 적용)

  • Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.95-108
    • /
    • 2011
  • TThis is a study of the personalization method that intelligently adapts the level of clustering considering purchasing index of a customer. In the e-biz era, many companies gather customers' demographic and transactional information such as age, gender, purchasing date and product category. They use this information to predict customer's preferences or purchasing patterns so that they can provide more customized services to their customers. The previous Customer-Segmentation method provides customized services for each customer group. This method clusters a whole customer set into different groups based on their similarity and builds predictive models for the resulting groups. Thus, it can manage the number of predictive models and also provide more data for the customers who do not have enough data to build a good predictive model by using the data of other similar customers. However, this method often fails to provide highly personalized services to each customer, which is especially important to VIP customers. Furthermore, it clusters the customers who already have a considerable amount of data as well as the customers who only have small amount of data, which causes to increase computational cost unnecessarily without significant performance improvement. The other conventional method called 1-to-1 method provides more customized services than the Customer-Segmentation method for each individual customer since the predictive model are built using only the data for the individual customer. This method not only provides highly personalized services but also builds a relatively simple and less costly model that satisfies with each customer. However, the 1-to-1 method has a limitation that it does not produce a good predictive model when a customer has only a few numbers of data. In other words, if a customer has insufficient number of transactional data then the performance rate of this method deteriorate. In order to overcome the limitations of these two conventional methods, we suggested the new method called Intelligent Customer Segmentation method that provides adaptive personalized services according to the customer's purchasing index. The suggested method clusters customers according to their purchasing index, so that the prediction for the less purchasing customers are based on the data in more intensively clustered groups, and for the VIP customers, who already have a considerable amount of data, clustered to a much lesser extent or not clustered at all. The main idea of this method is that applying clustering technique when the number of transactional data of the target customer is less than the predefined criterion data size. In order to find this criterion number, we suggest the algorithm called sliding window correlation analysis in this study. The algorithm purposes to find the transactional data size that the performance of the 1-to-1 method is radically decreased due to the data sparity. After finding this criterion data size, we apply the conventional 1-to-1 method for the customers who have more data than the criterion and apply clustering technique who have less than this amount until they can use at least the predefined criterion amount of data for model building processes. We apply the two conventional methods and the newly suggested method to Neilsen's beverage purchasing data to predict the purchasing amounts of the customers and the purchasing categories. We use two data mining techniques (Support Vector Machine and Linear Regression) and two types of performance measures (MAE and RMSE) in order to predict two dependent variables as aforementioned. The results show that the suggested Intelligent Customer Segmentation method can outperform the conventional 1-to-1 method in many cases and produces the same level of performances compare with the Customer-Segmentation method spending much less computational cost.

Accelerated Degradation Test and Failure Analysis of Rapid Curing Epoxy Resin for Restoration of Cultural Heritage (문화재 복원용 속(速)경화형 Epoxy계 수지의 가속열화시험 및 고장분석 연구)

  • Nam, Byeong Jik;Jang, Sung Yoon
    • Journal of Conservation Science
    • /
    • v.33 no.6
    • /
    • pp.467-483
    • /
    • 2017
  • In this study, the degradation properties by temperature stress of $Araldite^{(R)}$ rapid-curing epoxy resin used for inorganic cultural heritages, was identified. The tensile and tensile shear strength of durability decreased for 12,624 hours at temperatures of $40{\sim}60^{\circ}C$. In terms of stability of external stress and temperature, the slow-curing epoxy was superior to the rapid-curing epoxy, and cultural heritage conservation plans should therefore consider the strength and stress properties of restoration materials. Color differences increased for 12,624 hours at temperatures of $40{\sim}60^{\circ}C$, and glossiness decreased. Both color and gloss stability were weak, which necessitates the improvement of optical properties. Thermal properties (weight loss, decomposition temperature, and glass transition temperature) of adhesives are linked to mechanical properties. Interfacial properties of the adherend and water vapor transmission rates of adhesives are linked to performance variation. For porous media (ceramics, brick, and stone), isothermal and isohumid environments are important. For outdoor artifacts on display in museums, changes in physical properties by exposure to varying environmental conditions need to be minimized. These results can be used as baseline data in the study of the degradation velocity and lifetime prediction of rapid-curing epoxy resin for the restoration of cultural heritages.

Social Network-based Hybrid Collaborative Filtering using Genetic Algorithms (유전자 알고리즘을 활용한 소셜네트워크 기반 하이브리드 협업필터링)

  • Noh, Heeryong;Choi, Seulbi;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.19-38
    • /
    • 2017
  • Collaborative filtering (CF) algorithm has been popularly used for implementing recommender systems. Until now, there have been many prior studies to improve the accuracy of CF. Among them, some recent studies adopt 'hybrid recommendation approach', which enhances the performance of conventional CF by using additional information. In this research, we propose a new hybrid recommender system which fuses CF and the results from the social network analysis on trust and distrust relationship networks among users to enhance prediction accuracy. The proposed algorithm of our study is based on memory-based CF. But, when calculating the similarity between users in CF, our proposed algorithm considers not only the correlation of the users' numeric rating patterns, but also the users' in-degree centrality values derived from trust and distrust relationship networks. In specific, it is designed to amplify the similarity between a target user and his or her neighbor when the neighbor has higher in-degree centrality in the trust relationship network. Also, it attenuates the similarity between a target user and his or her neighbor when the neighbor has higher in-degree centrality in the distrust relationship network. Our proposed algorithm considers four (4) types of user relationships - direct trust, indirect trust, direct distrust, and indirect distrust - in total. And, it uses four adjusting coefficients, which adjusts the level of amplification / attenuation for in-degree centrality values derived from direct / indirect trust and distrust relationship networks. To determine optimal adjusting coefficients, genetic algorithms (GA) has been adopted. Under this background, we named our proposed algorithm as SNACF-GA (Social Network Analysis - based CF using GA). To validate the performance of the SNACF-GA, we used a real-world data set which is called 'Extended Epinions dataset' provided by 'trustlet.org'. It is the data set contains user responses (rating scores and reviews) after purchasing specific items (e.g. car, movie, music, book) as well as trust / distrust relationship information indicating whom to trust or distrust between users. The experimental system was basically developed using Microsoft Visual Basic for Applications (VBA), but we also used UCINET 6 for calculating the in-degree centrality of trust / distrust relationship networks. In addition, we used Palisade Software's Evolver, which is a commercial software implements genetic algorithm. To examine the effectiveness of our proposed system more precisely, we adopted two comparison models. The first comparison model is conventional CF. It only uses users' explicit numeric ratings when calculating the similarities between users. That is, it does not consider trust / distrust relationship between users at all. The second comparison model is SNACF (Social Network Analysis - based CF). SNACF differs from the proposed algorithm SNACF-GA in that it considers only direct trust / distrust relationships. It also does not use GA optimization. The performances of the proposed algorithm and comparison models were evaluated by using average MAE (mean absolute error). Experimental result showed that the optimal adjusting coefficients for direct trust, indirect trust, direct distrust, indirect distrust were 0, 1.4287, 1.5, 0.4615 each. This implies that distrust relationships between users are more important than trust ones in recommender systems. From the perspective of recommendation accuracy, SNACF-GA (Avg. MAE = 0.111943), the proposed algorithm which reflects both direct and indirect trust / distrust relationships information, was found to greatly outperform a conventional CF (Avg. MAE = 0.112638). Also, the algorithm showed better recommendation accuracy than the SNACF (Avg. MAE = 0.112209). To confirm whether these differences are statistically significant or not, we applied paired samples t-test. The results from the paired samples t-test presented that the difference between SNACF-GA and conventional CF was statistical significant at the 1% significance level, and the difference between SNACF-GA and SNACF was statistical significant at the 5%. Our study found that the trust/distrust relationship can be important information for improving performance of recommendation algorithms. Especially, distrust relationship information was found to have a greater impact on the performance improvement of CF. This implies that we need to have more attention on distrust (negative) relationships rather than trust (positive) ones when tracking and managing social relationships between users.

A Study on Particulate Matter Forecasting Improvement by using Asian Dust Emissions in East Asia (황사배출량을 적용한 동아시아 미세먼지 예보 개선 연구)

  • Choi, Daeryun;Yun, Huiyoung;Chang, Limseok;Lee, Jaebum;Lee, Younghee;Myoung, Jisu;Kim, Taehee;Koo, Younseo
    • Journal of the Korean Society of Urban Environment
    • /
    • v.18 no.4
    • /
    • pp.531-546
    • /
    • 2018
  • Air quality forecasting system with Asian dust emissions was developed in East Asia, and $PM_{10}$ forecasting performance of chemical transport model with Asian dust emissions was validated and evaluated. The chemical transport model (CTM) with Asian dust emission was found to supplement $PM_{10}$ concentrations that had been under-estimated in China regions and improved statistics for performance of CTM, although the model were overestimated during some periods in China. In Korea, the prediction model adequately simulated inflow of Asian dust events on February 22~24 and March 16~17, but the model is found to be overestimated during no Asian dust event periods on April. However, the model supplemented $PM_{10}$ concentrations, which was underestimated in most regions in Korea and the statistics for performance of the models were improved. The $PM_{10}$ forecasting performance of air quality forecasting model with Asian dust emissions tends to improve POD (Probability of Detection) compared to basic model without Asian dust emissions, but A (Accuracy) has shown similar or decreased, and FAR (False Alarms) have increased during 2017.Therefore, the developed air quality forecasting model with Asian dust emission was not proposed as a representative $PM_{10}$ forecast model in South Korea.

Modeling of heat efficiency of hot stove based on neural network using feature extraction (특성 추출과 신경회로망을 이용한 열 풍로 열효율에 대한 모델링)

  • Min Kwang Gi;Choi Tae Hwa;Han Chong Hun;Chang Kun Soo
    • Journal of the Korean Institute of Gas
    • /
    • v.2 no.4
    • /
    • pp.60-66
    • /
    • 1998
  • The hot stove system is a process that is continuously and constantly generating the hot combustion air required for the blast furnace. The hot stove process is considered as a main energy consumption process because it consumes about $20\%$ of the total energy in steel making works. So, many researchers have interested in the improvement of the heat efficiency of the hot stove to reduce the energy consumption. But they have difficulties in improving the heat efficiency of the hot stove because there is no precise information on heat transformation occurring during the heating period. In order to model the relationship between the operating conditions and heat efficiencies, we propose a neural network using feature extraction as one of experimental modeling methods. In order to show the performance of the model, we compare it with Partial Least Square (PLS) method. Both methods have similarities in using the dimension reduction technique. And then we present the simulation results on the prediction of the heat efficiency of the hot stove.

  • PDF

Leg Fracture Recovery Monitoring Simulation using Dual T-type Defective Microstrip Patch Antenna (쌍 T-형 결함 마이크로스트립 패치 안테나를 활용한 다리 골절 회복 모니터링 모의실험)

  • Byung-Mun Kim;Lee-Ho Yun;Sang-Min Lee;Yeon-Taek Park;Jae-Pyo Hong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.4
    • /
    • pp.587-594
    • /
    • 2023
  • In this paper, we present the design and optimization process of an on-body microstrip patch antenna with a paired T-type defect for monitoring fracture recovery of human legs. This antenna is designed to be light, thin and compact despite the improvement of return loss and bandwidth performance by adjusting the size of the T-type defect. The structure around the applied human leg is structured as a 5-layer dielectric plane, and the complex dielectric constant of each layer is calculated using the 4-pole Cole-Cole model parameters. In a normal case without bone fracture, the return loss of the on-body antenna is -66.71dB at 4.0196GHz, and the return loss difference ΔS11 is 37.95dB when the gallus layer have a length of 10.0mm, width of 1.0mme, and height of 2.0mm. A 3'rd degree polynomial is presented to predict the height of the gallus layer for the change in return loss, and the polynomial has a very high prediction suitability as RSS = 1.4751, R2 = 0.9988246, P-value = 0.0001841.

Combining Bias-correction on Regional Climate Simulations and ENSO Signal for Water Management: Case Study for Tampa Bay, Florida, U.S. (ENSO 패턴에 대한 MM5 강수 모의 결과의 유역단위 성능 평가: 플로리다 템파 지역을 중심으로)

  • Hwang, Syewoon;Hernandez, Jose
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.14 no.4
    • /
    • pp.143-154
    • /
    • 2012
  • As demand of water resources and attentions to changes in climate (e.g., due to ENSO) increase, long/short term prediction of precipitation is getting necessary in water planning. This research evaluated the ability of MM5 to predict precipitation in the Tampa Bay region over 23 year period from 1986 to 2008. Additionally MM5 results were statistically bias-corrected using observation data at 33 stations over the study area using CDF-mapping approach and evaluated comparing to raw results for each ENSO phase (i.e., El Ni$\tilde{n}$o and La Ni$\tilde{n}$a). The bias-corrected model results accurately reproduced the monthly mean point precipitation values. Areal average daily/monthly precipitation predictions estimated using block-kriging algorithm showed fairly high accuracy with mean error of daily precipitation, 0.8 mm and mean error of monthly precipitation, 7.1 mm. The results evaluated according to ENSO phase showed that the accuracy in model output varies with the seasons and ENSO phases. Reasons for low predictions skills and alternatives for simulation improvement are discussed. A comprehensive evaluation including sensitivity to physics schemes, boundary conditions reanalysis products and updating land use maps is suggested to enhance model performance. We believe that the outcome of this research guides to a better implementation of regional climate modeling tools in water management at regional/seasonal scale.

Building a Traffic Accident Frequency Prediction Model at Unsignalized Intersections in Urban Areas by Using Adaptive Neuro-Fuzzy Inference System (적응 뉴로-퍼지를 이용한 도시부 비신호교차로 교통사고예측모형 구축)

  • Kim, Kyung Whan;Kang, Jung Hyun;Kang, Jong Ho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.32 no.2D
    • /
    • pp.137-145
    • /
    • 2012
  • According to the National Police Agency, the total number of traffic accidents which occurred in 2010 was 226,878. Intersection accidents accounts for 44.8%, the largest portion of the entire traffic accidents. An research on the signalized intersection is constantly made, while an research on the unsignalized intersection is yet insufficient. This study selected traffic volume, road width, and sight distance as the input variables which affect unsignalized intersection accidents, and number of accidents as the output variable to build a model using ANFIS(Adaptive Neuro-Fuzzy Inference System). The forecast performance of this model is evaluated by comparing the actual measurement value with the forecasted value. The compatibility is evaluated by R2, the coefficient of determination, along with Mean Absolute Error (MAE) and Mean Square Error (MSE), the indicators which represent the degree of error and distribution. The result shows that the $R^2$ is 0.9817, while MAE and MSE are 0.4773 and 0.3037 respectively, which means that the explanatory power of the model is quite decent. This study is expected to provide the basic data for establishment of safety measure for unsignalized intersection and the improvement of traffic accidents.