• Title/Summary/Keyword: Sparsity Technique

Search Result 59, Processing Time 0.027 seconds

A Study on Evaluation Methods for Interpreting AI Results in Malware Analysis (악성코드 분석에서의 AI 결과해석에 대한 평가방안 연구)

  • Kim, Jin-gang;Hwang, Chan-woong;Lee, Tae-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.6
    • /
    • pp.1193-1204
    • /
    • 2021
  • In information security, AI technology is used to detect unknown malware. Although AI technology guarantees high accuracy, it inevitably entails false positives, so we are considering introducing XAI to interpret the results predicted by AI. However, XAI evaluation studies that evaluate or verify the interpretation only provide simple interpretation results are lacking. XAI evaluation is essential to ensure safety which technique is more accurate. In this paper, we interpret AI results as features that have significantly contributed to AI prediction in the field of malware, and present an evaluation method for the interpretation of AI results. Interpretation of results is performed using two XAI techniques on a tree-based AI model with an accuracy of about 94%, and interpretation of AI results is evaluated by analyzing descriptive accuracy and sparsity. As a result of the experiment, it was confirmed that the AI result interpretation was properly calculated. In the future, it is expected that the adoption and utilization of XAI will gradually increase due to XAI evaluation, and the reliability and transparency of AI will be greatly improved.

Hybrid Movie Recommendation System Using Clustering Technique (클러스터링 기법을 이용한 하이브리드 영화 추천 시스템)

  • Sophort Siet;Sony Peng;Yixuan Yang;Sadriddinov Ilkhomjon;DaeYoung Kim;Doo-Soon Park
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.357-359
    • /
    • 2023
  • This paper proposes a hybrid recommendation system (RS) model that overcomes the limitations of traditional approaches such as data sparsity, cold start, and scalability by combining collaborative filtering and context-aware techniques. The objective of this model is to enhance the accuracy of recommendations and provide personalized suggestions by leveraging the strengths of collaborative filtering and incorporating user context features to capture their preferences and behavior more effectively. The approach utilizes a novel method that combines contextual attributes with the original user-item rating matrix of CF-based algorithms. Furthermore, we integrate k-mean++ clustering to group users with similar preferences and finally recommend items that have highly rated by other users in the same cluster. The process of partitioning is the use of the rating matrix into clusters based on contextual information offers several advantages. First, it bypasses of the computations over the entire data, reducing runtime and improving scalability. Second, the partitioned clusters hold similar ratings, which can produce greater impacts on each other, leading to more accurate recommendations and providing flexibility in the clustering process. keywords: Context-aware Recommendation, Collaborative Filtering, Kmean++ Clustering.

Jaccard Index Reflecting Time-Context for User-based Collaborative Filtering

  • Soojung Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.163-170
    • /
    • 2023
  • The user-based collaborative filtering technique, one of the implementation methods of the recommendation system, recommends the preferred items of neighboring users based on the calculations of neighboring users with similar rating histories. However, it fundamentally has a data scarcity problem in which the quality of recommendations is significantly reduced when there is little common rating history. To solve this problem, many existing studies have proposed various methods of combining Jaccard index with a similarity measure. In this study, we introduce a time-aware concept to Jaccard index and propose a method of weighting common items with different weights depending on the rating time. As a result of conducting experiments using various performance metrics and time intervals, it is confirmed that the proposed method showed the best performance compared to the original Jaccard index at most metrics, and that the optimal time interval differs depending on the type of performance metric.

An Item-based Collaborative Filtering Technique by Associative Relation Clustering in Personalized Recommender Systems (개인화 추천 시스템에서 연관 관계 군집에 의한 아이템 기반의 협력적 필터링 기술)

  • 정경용;김진현;정헌만;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.467-477
    • /
    • 2004
  • While recommender systems were used by a few E-commerce sites former days, they are now becoming serious business tools that are re-shaping the world of I-commerce. And collaborative filtering has been a very successful recommendation technique in both research and practice. But there are two problems in personalized recommender systems, it is First-Rating problem and Sparsity problem. In this paper, we solve these problems using the associative relation clustering and “Lift” of association rules. We produce “Lift” between items using user's rating data. And we apply Threshold by -cut to the association between items. To make an efficiency of associative relation cluster higher, we use not only the existing Hypergraph Clique Clustering algorithm but also the suggested Split Cluster method. If the cluster is completed, we calculate a similarity iten in each inner cluster. And the index is saved in the database for the fast access. We apply the creating index to predict the preference for new items. To estimate the Performance, the suggested method is compared with existing collaborative filtering techniques. As a result, the proposed method is efficient for improving the accuracy of prediction through solving problems of existing collaborative filtering techniques.

Scalable Collaborative Filtering Technique based on Adaptive Clustering (적응형 군집화 기반 확장 용이한 협업 필터링 기법)

  • Lee, O-Joun;Hong, Min-Sung;Lee, Won-Jin;Lee, Jae-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.73-92
    • /
    • 2014
  • An Adaptive Clustering-based Collaborative Filtering Technique was proposed to solve the fundamental problems of collaborative filtering, such as cold-start problems, scalability problems and data sparsity problems. Previous collaborative filtering techniques were carried out according to the recommendations based on the predicted preference of the user to a particular item using a similar item subset and a similar user subset composed based on the preference of users to items. For this reason, if the density of the user preference matrix is low, the reliability of the recommendation system will decrease rapidly. Therefore, the difficulty of creating a similar item subset and similar user subset will be increased. In addition, as the scale of service increases, the time needed to create a similar item subset and similar user subset increases geometrically, and the response time of the recommendation system is then increased. To solve these problems, this paper suggests a collaborative filtering technique that adapts a condition actively to the model and adopts the concepts of a context-based filtering technique. This technique consists of four major methodologies. First, items are made, the users are clustered according their feature vectors, and an inter-cluster preference between each item cluster and user cluster is then assumed. According to this method, the run-time for creating a similar item subset or user subset can be economized, the reliability of a recommendation system can be made higher than that using only the user preference information for creating a similar item subset or similar user subset, and the cold start problem can be partially solved. Second, recommendations are made using the prior composed item and user clusters and inter-cluster preference between each item cluster and user cluster. In this phase, a list of items is made for users by examining the item clusters in the order of the size of the inter-cluster preference of the user cluster, in which the user belongs, and selecting and ranking the items according to the predicted or recorded user preference information. Using this method, the creation of a recommendation model phase bears the highest load of the recommendation system, and it minimizes the load of the recommendation system in run-time. Therefore, the scalability problem and large scale recommendation system can be performed with collaborative filtering, which is highly reliable. Third, the missing user preference information is predicted using the item and user clusters. Using this method, the problem caused by the low density of the user preference matrix can be mitigated. Existing studies on this used an item-based prediction or user-based prediction. In this paper, Hao Ji's idea, which uses both an item-based prediction and user-based prediction, was improved. The reliability of the recommendation service can be improved by combining the predictive values of both techniques by applying the condition of the recommendation model. By predicting the user preference based on the item or user clusters, the time required to predict the user preference can be reduced, and missing user preference in run-time can be predicted. Fourth, the item and user feature vector can be made to learn the following input of the user feedback. This phase applied normalized user feedback to the item and user feature vector. This method can mitigate the problems caused by the use of the concepts of context-based filtering, such as the item and user feature vector based on the user profile and item properties. The problems with using the item and user feature vector are due to the limitation of quantifying the qualitative features of the items and users. Therefore, the elements of the user and item feature vectors are made to match one to one, and if user feedback to a particular item is obtained, it will be applied to the feature vector using the opposite one. Verification of this method was accomplished by comparing the performance with existing hybrid filtering techniques. Two methods were used for verification: MAE(Mean Absolute Error) and response time. Using MAE, this technique was confirmed to improve the reliability of the recommendation system. Using the response time, this technique was found to be suitable for a large scaled recommendation system. This paper suggested an Adaptive Clustering-based Collaborative Filtering Technique with high reliability and low time complexity, but it had some limitations. This technique focused on reducing the time complexity. Hence, an improvement in reliability was not expected. The next topic will be to improve this technique by rule-based filtering.

A Hybrid Collaborative Filtering Using a Low-dimensional Linear Model (저차원 선형 모델을 이용한 하이브리드 협력적 여과)

  • Ko, Su-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.10
    • /
    • pp.777-785
    • /
    • 2009
  • Collaborative filtering is a technique used to predict whether a particular user will like a particular item. User-based or item-based collaborative techniques have been used extensively in many commercial recommender systems. In this paper, a hybrid collaborative filtering method that combines user-based and item-based methods using a low-dimensional linear model is proposed. The proposed method solves the problems of sparsity and a large database by using NMF among the low-dimensional linear models. In collaborative filtering systems the methods using the NMF are useful in expressing users as semantic relations. However, they are model-based methods and the process of computation is complex, so they can not recommend items dynamically. In order to complement the shortcomings, the proposed method clusters users into groups by using NMF and selects features of groups by using TF-IDF. Mutual information is then used to compute similarities between items. The proposed method clusters users into groups and extracts features of groups on offline and determines the most suitable group for an active user using the features of groups on online. Finally, the proposed method reduces the time required to classify an active user into a group and outperforms previous methods by combining user-based and item-based collaborative filtering methods.

Domain Knowledge Incorporated Counterfactual Example-Based Explanation for Bankruptcy Prediction Model (부도예측모형에서 도메인 지식을 통합한 반사실적 예시 기반 설명력 증진 방법)

  • Cho, Soo Hyun;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.307-332
    • /
    • 2022
  • One of the most intensively conducted research areas in business application study is a bankruptcy prediction model, a representative classification problem related to loan lending, investment decision making, and profitability to financial institutions. Many research demonstrated outstanding performance for bankruptcy prediction models using artificial intelligence techniques. However, since most machine learning algorithms are "black-box," AI has been identified as a prominent research topic for providing users with an explanation. Although there are many different approaches for explanations, this study focuses on explaining a bankruptcy prediction model using a counterfactual example. Users can obtain desired output from the model by using a counterfactual-based explanation, which provides an alternative case. This study introduces a counterfactual generation technique based on a genetic algorithm (GA) that leverages both domain knowledge (i.e., causal feasibility) and feature importance from a black-box model along with other critical counterfactual variables, including proximity, distribution, and sparsity. The proposed method was evaluated quantitatively and qualitatively to measure the quality and the validity.

The Adaptive Personalization Method According to Users Purchasing Index : Application to Beverage Purchasing Predictions (고객별 구매빈도에 동적으로 적응하는 개인화 시스템 : 음료수 구매 예측에의 적용)

  • Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.95-108
    • /
    • 2011
  • TThis is a study of the personalization method that intelligently adapts the level of clustering considering purchasing index of a customer. In the e-biz era, many companies gather customers' demographic and transactional information such as age, gender, purchasing date and product category. They use this information to predict customer's preferences or purchasing patterns so that they can provide more customized services to their customers. The previous Customer-Segmentation method provides customized services for each customer group. This method clusters a whole customer set into different groups based on their similarity and builds predictive models for the resulting groups. Thus, it can manage the number of predictive models and also provide more data for the customers who do not have enough data to build a good predictive model by using the data of other similar customers. However, this method often fails to provide highly personalized services to each customer, which is especially important to VIP customers. Furthermore, it clusters the customers who already have a considerable amount of data as well as the customers who only have small amount of data, which causes to increase computational cost unnecessarily without significant performance improvement. The other conventional method called 1-to-1 method provides more customized services than the Customer-Segmentation method for each individual customer since the predictive model are built using only the data for the individual customer. This method not only provides highly personalized services but also builds a relatively simple and less costly model that satisfies with each customer. However, the 1-to-1 method has a limitation that it does not produce a good predictive model when a customer has only a few numbers of data. In other words, if a customer has insufficient number of transactional data then the performance rate of this method deteriorate. In order to overcome the limitations of these two conventional methods, we suggested the new method called Intelligent Customer Segmentation method that provides adaptive personalized services according to the customer's purchasing index. The suggested method clusters customers according to their purchasing index, so that the prediction for the less purchasing customers are based on the data in more intensively clustered groups, and for the VIP customers, who already have a considerable amount of data, clustered to a much lesser extent or not clustered at all. The main idea of this method is that applying clustering technique when the number of transactional data of the target customer is less than the predefined criterion data size. In order to find this criterion number, we suggest the algorithm called sliding window correlation analysis in this study. The algorithm purposes to find the transactional data size that the performance of the 1-to-1 method is radically decreased due to the data sparity. After finding this criterion data size, we apply the conventional 1-to-1 method for the customers who have more data than the criterion and apply clustering technique who have less than this amount until they can use at least the predefined criterion amount of data for model building processes. We apply the two conventional methods and the newly suggested method to Neilsen's beverage purchasing data to predict the purchasing amounts of the customers and the purchasing categories. We use two data mining techniques (Support Vector Machine and Linear Regression) and two types of performance measures (MAE and RMSE) in order to predict two dependent variables as aforementioned. The results show that the suggested Intelligent Customer Segmentation method can outperform the conventional 1-to-1 method in many cases and produces the same level of performances compare with the Customer-Segmentation method spending much less computational cost.

Incorporating Social Relationship discovered from User's Behavior into Collaborative Filtering (사용자 행동 기반의 사회적 관계를 결합한 사용자 협업적 여과 방법)

  • Thay, Setha;Ha, Inay;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.1-20
    • /
    • 2013
  • Nowadays, social network is a huge communication platform for providing people to connect with one another and to bring users together to share common interests, experiences, and their daily activities. Users spend hours per day in maintaining personal information and interacting with other people via posting, commenting, messaging, games, social events, and applications. Due to the growth of user's distributed information in social network, there is a great potential to utilize the social data to enhance the quality of recommender system. There are some researches focusing on social network analysis that investigate how social network can be used in recommendation domain. Among these researches, we are interested in taking advantages of the interaction between a user and others in social network that can be determined and known as social relationship. Furthermore, mostly user's decisions before purchasing some products depend on suggestion of people who have either the same preferences or closer relationship. For this reason, we believe that user's relationship in social network can provide an effective way to increase the quality in prediction user's interests of recommender system. Therefore, social relationship between users encountered from social network is a common factor to improve the way of predicting user's preferences in the conventional approach. Recommender system is dramatically increasing in popularity and currently being used by many e-commerce sites such as Amazon.com, Last.fm, eBay.com, etc. Collaborative filtering (CF) method is one of the essential and powerful techniques in recommender system for suggesting the appropriate items to user by learning user's preferences. CF method focuses on user data and generates automatic prediction about user's interests by gathering information from users who share similar background and preferences. Specifically, the intension of CF method is to find users who have similar preferences and to suggest target user items that were mostly preferred by those nearest neighbor users. There are two basic units that need to be considered by CF method, the user and the item. Each user needs to provide his rating value on items i.e. movies, products, books, etc to indicate their interests on those items. In addition, CF uses the user-rating matrix to find a group of users who have similar rating with target user. Then, it predicts unknown rating value for items that target user has not rated. Currently, CF has been successfully implemented in both information filtering and e-commerce applications. However, it remains some important challenges such as cold start, data sparsity, and scalability reflected on quality and accuracy of prediction. In order to overcome these challenges, many researchers have proposed various kinds of CF method such as hybrid CF, trust-based CF, social network-based CF, etc. In the purpose of improving the recommendation performance and prediction accuracy of standard CF, in this paper we propose a method which integrates traditional CF technique with social relationship between users discovered from user's behavior in social network i.e. Facebook. We identify user's relationship from behavior of user such as posts and comments interacted with friends in Facebook. We believe that social relationship implicitly inferred from user's behavior can be likely applied to compensate the limitation of conventional approach. Therefore, we extract posts and comments of each user by using Facebook Graph API and calculate feature score among each term to obtain feature vector for computing similarity of user. Then, we combine the result with similarity value computed using traditional CF technique. Finally, our system provides a list of recommended items according to neighbor users who have the biggest total similarity value to the target user. In order to verify and evaluate our proposed method we have performed an experiment on data collected from our Movies Rating System. Prediction accuracy evaluation is conducted to demonstrate how much our algorithm gives the correctness of recommendation to user in terms of MAE. Then, the evaluation of performance is made to show the effectiveness of our method in terms of precision, recall, and F1-measure. Evaluation on coverage is also included in our experiment to see the ability of generating recommendation. The experimental results show that our proposed method outperform and more accurate in suggesting items to users with better performance. The effectiveness of user's behavior in social network particularly shows the significant improvement by up to 6% on recommendation accuracy. Moreover, experiment of recommendation performance shows that incorporating social relationship observed from user's behavior into CF is beneficial and useful to generate recommendation with 7% improvement of performance compared with benchmark methods. Finally, we confirm that interaction between users in social network is able to enhance the accuracy and give better recommendation in conventional approach.