User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)
-
- Journal of Intelligence and Information Systems
- /
- v.20 no.2
- /
- pp.93-107
- /
- 2014
In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.
Consumer consumption patterns are shifting rapidly as buyers migrate from offline markets to e-commerce routes, such as shopping channels on TV and internet shopping malls. In the offline markets consumers go shopping, see the shopping items, and choose from them. Recently consumers tend towards buying at shopping sites free from time and place. However, as e-commerce markets continue to expand, customers are complaining that it is becoming a bigger hassle to shop online. In the online shopping, shoppers have very limited information on the products. The delivered products can be different from what they have wanted. This case results to purchase cancellation. Because these things happen frequently, they are likely to refer to the consumer reviews and companies should be concerned about consumer's voice. E-commerce is a very important marketing tool for suppliers. It can recommend products to customers and connect them directly with suppliers with just a click of a button. The recommender system is being studied in various ways. Some of the more prominent ones include recommendation based on best-seller and demographics, contents filtering, and collaborative filtering. However, these systems all share two weaknesses : they cannot recommend products to consumers on a personal level, and they cannot recommend products to new consumers with no buying history. To fix these problems, we can use the information which has been collected from the questionnaires about their demographics and preference ratings. But, consumers feel these questionnaires are a burden and are unlikely to provide correct information. This study investigates combining collaborative filtering with the centrality of social network analysis. This centrality measure provides the information to infer the preference of new consumers from the shopping history of existing and previous ones. While the past researches had focused on the existing consumers with similar shopping patterns, this study tried to improve the accuracy of recommendation with all shopping information, which included not only similar shopping patterns but also dissimilar ones. Data used in this study, Movie Lens' data, was made by Group Lens research Project Team at University of Minnesota to recommend movies with a collaborative filtering technique. This data was built from the questionnaires of 943 respondents which gave the information on the preference ratings on 1,684 movies. Total data of 100,000 was organized by time, with initial data of 50,000 being existing customers and the latter 50,000 being new customers. The proposed recommender system consists of three systems : [+] group recommender system, [-] group recommender system, and integrated recommender system. [+] group recommender system looks at customers with similar buying patterns as 'neighbors', whereas [-] group recommender system looks at customers with opposite buying patterns as 'contraries'. Integrated recommender system uses both of the aforementioned recommender systems to recommend movies that both recommender systems pick. The study of three systems allows us to find the most suitable recommender system that will optimize accuracy and customer satisfaction. Our analysis showed that integrated recommender system is the best solution among the three systems studied, followed by [-] group recommended system and [+] group recommender system. This result conforms to the intuition that the accuracy of recommendation can be improved using all the relevant information. We provided contour maps and graphs to easily compare the accuracy of each recommender system. Although we saw improvement on accuracy with the integrated recommender system, we must remember that this research is based on static data with no live customers. In other words, consumers did not see the movies actually recommended from the system. Also, this recommendation system may not work well with products other than movies. Thus, it is important to note that recommendation systems need particular calibration for specific product/customer types.
The market size of e-Commerce in Japan was 15 trillion Yen in 2006, and B2C Internet shopping sales were over 6.57 trillion in 2009. Lakuten is a representative Internet shopping company whose market share is 45%. Lakuten has over 70,000 online stores and Japanese shoppers trust them based on the fair competition rule and pre-control system on e-commerce. Japanese consumers accept new technology rapidly and highly use Internet and mobile channel. This research analyse online shopping behaviors of Japan, a big e-commerce market. Internet shopping intention, satisfaction, and recommendation by Internet shopping motivations, perceived risks, shopping innovativeness were analyzed. A questionnaire survey of 464 Japanese consumer was performed and ANOVA, factor analysis, reliability test have done by SPSS 12.0. As the results, Internet shopping intentions were higher in groups of olders, higher innovativeness. House wives' satisfaction of Internet shopping is highest. High innovativeness group showed higher internet shopping motivation of economics, connivence, hedonic, and social. Student, women, and low income group perceives high risks to Internet shopping. Implications and further researches were suggested based on the results.
As mobile devices that can be connected to the Internet have spread and networking has become possible whenever/wherever, the Internet has become central in the dissemination and consumption of news. Accordingly, the ways news is gathered, disseminated, and consumed have changed greatly. In the traditional news media such as magazines and newspapers, expert editors determined what events were worthy of deploying their staffs or freelancers to cover and what stories from newswires or other sources would be printed. Furthermore, they determined how these stories would be displayed in their publications in terms of page placement, space allocation, type sizes, photographs, and other graphic elements. In turn, readers-news consumers-judged the importance of news not only by its subject and content, but also through subsidiary information such as its location and how it was displayed. Their judgments reflected their acceptance of an assumption that these expert editors had the knowledge and ability not only to serve as gatekeepers in determining what news was valuable and important but also how to rank its value and importance. As such, news assembled, dispensed, and consumed in this manner can be said to be expert-based recommended news. However, in the era of Internet news, the role of expert editors as gatekeepers has been greatly diminished. Many Internet news sites offer a huge volume of news on diverse topics from many media companies, thereby eliminating in many cases the gatekeeper role of expert editors. One result has been to turn news users from passive receptacles into activists who search for news that reflects their interests or tastes. To solve the problem of an overload of information and enhance the efficiency of news users' searches, Internet news sites have introduced numerous recommendation techniques. Recommendations based on popularity constitute one of the most frequently used of these techniques. This popularity-based approach shows a list of those news items that have been read and shared by many people, based on users' behavior such as clicks, evaluations, and sharing. "most-viewed list," "most-replied list," and "real-time issue" found on news sites belong to this system. Given that collective intelligence serves as the premise of these popularity-based recommendations, popularity-based news recommendations would be considered highly important because stories that have been read and shared by many people are presumably more likely to be better than those preferred by only a few people. However, these recommendations may reflect a popularity bias because stories judged likely to be more popular have been placed where they will be most noticeable. As a result, such stories are more likely to be continuously exposed and included in popularity-based recommended news lists. Popular news stories cannot be said to be necessarily those that are most important to readers. Given that many people use popularity-based recommended news and that the popularity-based recommendation approach greatly affects patterns of news use, a review of whether popularity-based news recommendations actually reflect important news can be said to be an indispensable procedure. Therefore, in this study, popularity-based news recommendations of an Internet news portal was compared with top placements of news in printed newspapers, and news users' judgments of which stories were personally and socially important were analyzed. The study was conducted in two stages. In the first stage, content analyses were used to compare the content of the popularity-based news recommendations of an Internet news site with those of the expert-based news recommendations of printed newspapers. Five days of news stories were collected. "most-viewed list" of the Naver portal site were used as the popularity-based recommendations; the expert-based recommendations were represented by the top pieces of news from five major daily newspapers-the Chosun Ilbo, the JoongAng Ilbo, the Dong-A Daily News, the Hankyoreh Shinmun, and the Kyunghyang Shinmun. In the second stage, along with the news stories collected in the first stage, some Internet news stories and some news stories from printed newspapers that the Internet and the newspapers did not have in common were randomly extracted and used in online questionnaire surveys that asked the importance of these selected news stories. According to our analysis, only 10.81% of the popularity-based news recommendations were similar in content with the expert-based news judgments. Therefore, the content of popularity-based news recommendations appears to be quite different from the content of expert-based recommendations. The differences in importance between these two groups of news stories were analyzed, and the results indicated that whereas the two groups did not differ significantly in their recommendations of stories of personal importance, the expert-based recommendations ranked higher in social importance. This study has importance for theory in its examination of popularity-based news recommendations from the two theoretical viewpoints of collective intelligence and popularity bias and by its use of both qualitative (content analysis) and quantitative methods (questionnaires). It also sheds light on the differences in the role of media channels that fulfill an agenda-setting function and Internet news sites that treat news from the viewpoint of markets.
Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used