Predictive Clustering-based Collaborative Filtering Technique for Performance-Stability of Recommendation System (추천 시스템의 성능 안정성을 위한 예측적 군집화 기반 협업 필터링 기법)
-
- Journal of Intelligence and Information Systems
- /
- v.21 no.1
- /
- pp.119-142
- /
- 2015
With the explosive growth in the volume of information, Internet users are experiencing considerable difficulties in obtaining necessary information online. Against this backdrop, ever-greater importance is being placed on a recommender system that provides information catered to user preferences and tastes in an attempt to address issues associated with information overload. To this end, a number of techniques have been proposed, including content-based filtering (CBF), demographic filtering (DF) and collaborative filtering (CF). Among them, CBF and DF require external information and thus cannot be applied to a variety of domains. CF, on the other hand, is widely used since it is relatively free from the domain constraint. The CF technique is broadly classified into memory-based CF, model-based CF and hybrid CF. Model-based CF addresses the drawbacks of CF by considering the Bayesian model, clustering model or dependency network model. This filtering technique not only improves the sparsity and scalability issues but also boosts predictive performance. However, it involves expensive model-building and results in a tradeoff between performance and scalability. Such tradeoff is attributed to reduced coverage, which is a type of sparsity issues. In addition, expensive model-building may lead to performance instability since changes in the domain environment cannot be immediately incorporated into the model due to high costs involved. Cumulative changes in the domain environment that have failed to be reflected eventually undermine system performance. This study incorporates the Markov model of transition probabilities and the concept of fuzzy clustering with CBCF to propose predictive clustering-based CF (PCCF) that solves the issues of reduced coverage and of unstable performance. The method improves performance instability by tracking the changes in user preferences and bridging the gap between the static model and dynamic users. Furthermore, the issue of reduced coverage also improves by expanding the coverage based on transition probabilities and clustering probabilities. The proposed method consists of four processes. First, user preferences are normalized in preference clustering. Second, changes in user preferences are detected from review score entries during preference transition detection. Third, user propensities are normalized using patterns of changes (propensities) in user preferences in propensity clustering. Lastly, the preference prediction model is developed to predict user preferences for items during preference prediction. The proposed method has been validated by testing the robustness of performance instability and scalability-performance tradeoff. The initial test compared and analyzed the performance of individual recommender systems each enabled by IBCF, CBCF, ICFEC and PCCF under an environment where data sparsity had been minimized. The following test adjusted the optimal number of clusters in CBCF, ICFEC and PCCF for a comparative analysis of subsequent changes in the system performance. The test results revealed that the suggested method produced insignificant improvement in performance in comparison with the existing techniques. In addition, it failed to achieve significant improvement in the standard deviation that indicates the degree of data fluctuation. Notwithstanding, it resulted in marked improvement over the existing techniques in terms of range that indicates the level of performance fluctuation. The level of performance fluctuation before and after the model generation improved by 51.31% in the initial test. Then in the following test, there has been 36.05% improvement in the level of performance fluctuation driven by the changes in the number of clusters. This signifies that the proposed method, despite the slight performance improvement, clearly offers better performance stability compared to the existing techniques. Further research on this study will be directed toward enhancing the recommendation performance that failed to demonstrate significant improvement over the existing techniques. The future research will consider the introduction of a high-dimensional parameter-free clustering algorithm or deep learning-based model in order to improve performance in recommendations.
As broadcasting and communication are converged recently, communication is jointed to TV. TV viewing has brought about many changes. The IPTV (Internet Protocol Television) provides information service, movie contents, broadcast, etc. through internet with live programs + VOD (Video on demand) jointed. Using communication network, it becomes an issue of new business. In addition, new technical issues have been created by imaging technology for the service, networking technology without video cuts, security technologies to protect copyright, etc. Through this IPTV network, users can watch their desired programs when they want. However, IPTV has difficulties in search approach, menu approach, or finding programs. Menu approach spends a lot of time in approaching programs desired. Search approach can't be found when title, genre, name of actors, etc. are not known. In addition, inserting letters through remote control have problems. However, the bigger problem is that many times users are not usually ware of the services they use. Thus, to resolve difficulties when selecting VOD service in IPTV, a personalized service is recommended, which enhance users' satisfaction and use your time, efficiently. This paper provides appropriate programs which are fit to individuals not to save time in order to solve IPTV's shortcomings through filtering and recommendation-related system. The proposed recommendation system collects TV program information, the user's preferred program genres and detailed genre, channel, watching program, and information on viewing time based on individual records of watching IPTV. To look for these kinds of similarities, similarities can be compared by using ontology for TV programs. The reason to use these is because the distance of program can be measured by the similarity comparison. TV program ontology we are using is one extracted from TV-Anytime metadata which represents semantic nature. Also, ontology expresses the contents and features in figures. Through world net, vocabulary similarity is determined. All the words described on the programs are expanded into upper and lower classes for word similarity decision. The average of described key words was measured. The criterion of distance calculated ties similar programs through K-medoids dividing method. K-medoids dividing method is a dividing way to divide classified groups into ones with similar characteristics. This K-medoids method sets K-unit representative objects. Here, distance from representative object sets temporary distance and colonize it. Through algorithm, when the initial n-unit objects are tried to be divided into K-units. The optimal object must be found through repeated trials after selecting representative object temporarily. Through this course, similar programs must be colonized. Selecting programs through group analysis, weight should be given to the recommendation. The way to provide weight with recommendation is as the follows. When each group recommends programs, similar programs near representative objects will be recommended to users. The formula to calculate the distance is same as measure similar distance. It will be a basic figure which determines the rankings of recommended programs. Weight is used to calculate the number of watching lists. As the more programs are, the higher weight will be loaded. This is defined as cluster weight. Through this, sub-TV programs which are representative of the groups must be selected. The final TV programs ranks must be determined. However, the group-representative TV programs include errors. Therefore, weights must be added to TV program viewing preference. They must determine the finalranks.Based on this, our customers prefer proposed to recommend contents. So, based on the proposed method this paper suggested, experiment was carried out in controlled environment. Through experiment, the superiority of the proposed method is shown, compared to existing ways.
A model and a measure which can evaluate the risk of rear end collision are developed. Most traffic accidents involve multiple causes such as the human factor, the vehicle factor, and the highway element at any given time. Thus, these factors should be considered in analyzing the risk of an accident and in developing safety models. Although most risky situations and accidents on the roads result from the poor response of a driver to various stimuli, many researchers have modeled the risk or accident by analyzing only the stimuli without considering the response of a driver. Hence, the reliabilities of those models turned out to be low. Thus in developing the model behaviors of a driver, such as reaction time and deceleration rate, are considered. In the past, most studies tried to analyze the relationships between a risk and an accident directly but they, due to the difficulty of finding out the directional relationships between these factors, developed a model by considering these factors, developed a model by considering indirect factors such as volume, speed, etc. However, if the relationships between risk and accidents are looked into in detail, it can be seen that they are linked by the behaviors of a driver, and depending on drivers the risk as it is on the road-vehicle system may be ignored or call drivers' attention. Therefore, an accident depends on how a driver handles risk, so that the more related risk to and accident occurrence is not the risk itself but the risk responded by a driver. Thus, in this study, the behaviors of a driver are considered in the model and to reflect these behaviors three concepts related to accidents are introduced. And safe stopping distance and accident occurrence probability were used for better understanding and for more reliable modeling of the risk. The index which can represent the risk is also developed based on measures used in evaluating noise level, and for the risk comparison between various situations, the equivalent risk level, considering the intensity and duration time, is developed by means of the weighted average. Validation is performed with field surveys on the expressway of Seoul, and the test vehicle was made to collect the traffic flow data, such as deceleration rate, speed and spacing. Based on this data, the risk by section, lane and traffic flow conditions are evaluated and compared with the accident data and traffic conditions. The evaluated risk level corresponds closely to the patterns of actual traffic conditions and counts of accident. The model and the method developed in this study can be applied to various fields, such as safety test of traffic flow, establishment of operation & management strategy for reliable traffic flow, and the safety test for the control algorithm in the advanced safety vehicles and many others.
In the utilization of optical satellite imagery, which is greatly affected by clouds, periodic composite technique is a useful method to minimize the influence of clouds. Recently, a technique for selecting the optimal pixel that is least affected by the cloud and shadow during a certain period by directly inputting cloud and cloud shadow information during period compositing has been proposed. Accurate extraction of clouds and cloud shadowsis essential in order to derive optimal composite results. Also, in the case of an surface targets where spectral information is important, such as crops, the loss of spectral information should be minimized during cloud-free compositing. In thisstudy, clouds using two spectral indicators (Haze Optimized Tranformation and MeanVis) were used to derive a detection technique with low loss ofspectral information while maintaining high detection accuracy of clouds and cloud shadowsfor cabbage fieldsin the highlands of Gangwon-do. These detection results were compared and analyzed with cloud and cloud shadow information provided by Sentinel-2A/B. As a result of analyzing data from 2019 to 2021, cloud information from Sentinel-2A/B satellites showed detection accuracy with an F1 value of 0.91, but bright artifacts were falsely detected as clouds. On the other hand, the cloud detection result obtained by applying the threshold (=0.05) to the HOT showed relatively low detection accuracy (F1=0.72), but the loss ofspectral information was minimized due to the small number of false positives. In the case of cloud shadows, only minimal shadows were detected in the Sentinel-2A/B additional layer, but when a threshold (= 0.015) was applied to MeanVis, cloud shadowsthat could be distinguished from the topographically generated shadows could be detected. By inputting spectral indicators-based cloud and shadow information,stable monthly cloud-free composited vegetation index results were obtained, and in the future, high-accuracy cloud information of Sentinel-2A/B will be input to periodic cloud-free composite for comparison.
Lifestyle is changing rapidly, and food consumption patterns vary widely among households as dietary and food processing technologies evolve. This paper reclassified the food group of consumer panel data established by the Rural Development Administration, which contains information on purchasing agricultural products by household unit, and compared the consumption characteristics of agricultural products by age group. The criteria for age classification were divided into groups in their 60s and older with a prevalence of 20% or more metabolic diseases and groups in their 30s and 40s with less than 10%. Using the LightGBM algorithm, we classified the differences in food consumption patterns in their 30s and 50s and 60s and found that the precision was 0.85, the reproducibility was 0.71, and F1_score was 0.77. The results of variable importance were confectionery, folio, seasoned vegetables, fruit vegetables, and marine products, followed by the top five values of the SHAP indicator: confectionery, marine products, seasoned vegetables, fruit vegetables, and folio vegetables. As a result of binary classification of consumption patterns as a median instead of the average sensitive to outliers, confectionery showed that those in their 30s and 40s were more than twice as high as those in their 60s. Other variables also showed significant differences between those in their 30s and 40s and those in their 60s and older. According to the study, people in their 30s and 40s consumed more than twice as much confectionery as those in their 60s, while those in their 60s consumed more than twice as much marine products, seasoned vegetables, fruit vegetables, and folioce or logistics as much as those in their 30s and 40s. In addition to the top five items, consumption of 30s and 40s in wheat-processed snacks, breads and noodles was high, which differed from food consumption patterns in their 60s.
Ultrasound imaging uses sound waves of frequencies to cause physical actions such as reflection, absorption, refraction, and transmission at the edge between different tissues. Improvement is needed because there is a lot of noise due to the characteristics of the data generated from the ultrasound equipment, and it is difficult to grasp the shape of the tissue to be actually observed because the edge is vague. The edge enhancement method is used as a method to solve the case where the edge surface looks clumped due to a decrease in image quality. In this paper, as a method to strengthen the interface, the quality improvement was confirmed by strengthening the interface, which is the high-frequency part, in each image using an unsharpening mask and high boost. The mask filtering used for each image was evaluated by measuring PSNR and SNR. Abdominal, head, heart, liver, kidney, breast, and fetal images were obtained from Philips epiq5g and affiniti70g and Alpinion E-cube 15 ultrasound equipment. The program used to implement the algorithm was implemented with MATLAB R2022a of MathWorks. The unsharpening and high-boost mask array size was set to 3*3, and the laplacian filter, a spatial filter used to create outline-enhanced images, was applied equally to both masks. ImageJ program was used for quantitative evaluation of image quality. As a result of applying the mask filter to various ultrasound images, the subjective image quality showed that the overall contour lines of the image were clearly visible when unsharpening and high-boost mask were applied to the original image. When comparing the quantitative image quality, the image quality of the image to which the unsharpening mask and the high boost mask were applied was evaluated higher than that of the original image. In the portal vein, head, gallbladder, and kidney images, the SNR, PSNR, RMSE and MAE of the image to which the high-boost mask was applied were measured to be high. Conversely, for images of the heart, breast, and fetus, SNR, PSNR, RMSE and MAE values were measured as images with the unsharpening mask applied. It is thought that using the optimal mask according to the image will help to improve the image quality, and the contour information was provided to improve the image quality.
This study investigated the predictive accuracy of a model of landslide displacement in Jecheon-si, where a great number of landslides were triggered by heavy rain on both natural (non-clear-cut) and clear-cut slopes during August 2020. This was accomplished by applying three flow direction methods (single flow direction, SFD; multiple flow direction, MFD; infinite flow direction, IFD) and the degree of root cohesion to an infinite slope stability equation. The application assumed that the soil saturation and any changes in root cohesion occurred following the timber harvest (clear-cutting). In the study area, 830 landslide locations were identified via landslide inventory mapping from satellite images and 25 cm resolution aerial photographs. The results of the landslide modeling comparison showed the accuracy of the models that considered changes in the root cohesion following clear-cutting to be improved by 1.3% to 2.6% when compared with those not considered in the area under the receiver operating characteristics (AUROC) analysis. Furthermore, the accuracy of the models that used the MFD algorithm improved by up to 1.3% when compared with the models that used the other algorithms in the AUROC analysis. These results suggest that the discriminatory application of the root cohesion, which considers changes in the vegetation condition, and the selection of the flow direction method may influence the accuracy of landslide predictive modeling. In the future, the results of this study should be verified by examining the root cohesion and its dynamic changes according to the tree species using the field hydrological monitoring technique.
The wall shear stress in the vicinity of end-to end anastomoses under steady flow conditions was measured using a flush-mounted hot-film anemometer(FMHFA) probe. The experimental measurements were in good agreement with numerical results except in flow with low Reynolds numbers. The wall shear stress increased proximal to the anastomosis in flow from the Penrose tubing (simulating an artery) to the PTFE: graft. In flow from the PTFE graft to the Penrose tubing, low wall shear stress was observed distal to the anastomosis. Abnormal distributions of wall shear stress in the vicinity of the anastomosis, resulting from the compliance mismatch between the graft and the host artery, might be an important factor of ANFH formation and the graft failure. The present study suggests a correlation between regions of the low wall shear stress and the development of anastomotic neointimal fibrous hyperplasia(ANPH) in end-to-end anastomoses. 30523 T00401030523 ^x Air pressure decay(APD) rate and ultrafiltration rate(UFR) tests were performed on new and saline rinsed dialyzers as well as those roused in patients several times. C-DAK 4000 (Cordis Dow) and CF IS-11 (Baxter Travenol) reused dialyzers obtained from the dialysis clinic were used in the present study. The new dialyzers exhibited a relatively flat APD, whereas saline rinsed and reused dialyzers showed considerable amount of decay. C-DAH dialyzers had a larger APD(11.70
The wall shear stress in the vicinity of end-to end anastomoses under steady flow conditions was measured using a flush-mounted hot-film anemometer(FMHFA) probe. The experimental measurements were in good agreement with numerical results except in flow with low Reynolds numbers. The wall shear stress increased proximal to the anastomosis in flow from the Penrose tubing (simulating an artery) to the PTFE: graft. In flow from the PTFE graft to the Penrose tubing, low wall shear stress was observed distal to the anastomosis. Abnormal distributions of wall shear stress in the vicinity of the anastomosis, resulting from the compliance mismatch between the graft and the host artery, might be an important factor of ANFH formation and the graft failure. The present study suggests a correlation between regions of the low wall shear stress and the development of anastomotic neointimal fibrous hyperplasia(ANPH) in end-to-end anastomoses. 30523 T00401030523 ^x Air pressure decay(APD) rate and ultrafiltration rate(UFR) tests were performed on new and saline rinsed dialyzers as well as those roused in patients several times. C-DAK 4000 (Cordis Dow) and CF IS-11 (Baxter Travenol) reused dialyzers obtained from the dialysis clinic were used in the present study. The new dialyzers exhibited a relatively flat APD, whereas saline rinsed and reused dialyzers showed considerable amount of decay. C-DAH dialyzers had a larger APD(11.70
Nuclear medicine images (SPECT, PET) were widely used tool for assessment of myocardial viability and perfusion. However it had difficult to define accurate myocardial infarct region. The purpose of this study was to investigate methodological approach for automatic measurement of rat myocardial infarct size using polar map with adaptive threshold. Rat myocardial infarction model was induced by ligation of the left circumflex artery. PET images were obtained after intravenous injection of 37 MBq