• Title/Summary/Keyword: Online search

Search Result 683, Processing Time 0.021 seconds

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)

  • Kim, Minsung;Im, Il
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.137-148
    • /
    • 2014
  • Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

    . Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.

  • Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)

    • Yu, Eunji;Kim, Yoosin;Kim, Namgyu;Jeong, Seung Ryul
      • Journal of Intelligence and Information Systems
      • /
      • v.19 no.1
      • /
      • pp.95-110
      • /
      • 2013
    • Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.

    Color Analyses on Digital Photos Using Machine Learning and KSCA - Focusing on Korean Natural Daytime/nighttime Scenery - (머신러닝과 KSCA를 활용한 디지털 사진의 색 분석 -한국 자연 풍경 낮과 밤 사진을 중심으로-)

    • Gwon, Huieun;KOO, Ja Joon
      • Trans-
      • /
      • v.12
      • /
      • pp.51-79
      • /
      • 2022
    • This study investigates the methods for deriving colors which can serve as a reference to users such as designers and or contents creators who search for online images from the web portal sites using specific words for color planning and more. Two experiments were conducted in order to accomplish this. Digital scenery photos within the geographic scope of Korea were downloaded from web portal sites, and those photos were studied to find out what colors were used to describe daytime and nighttime. Machine learning was used as the study methodology to classify colors in daytime and nighttime, and KSCA was used to derive the color frequency of daytime and nighttime photos and to compare and analyze the two results. The results of classifying the colors of daytime and nighttime photos using machine learning show that, when classifying the colors by 51~100%, the area of daytime colors was approximately 2.45 times greater than that of nighttime colors. The colors of the daytime class were distributed by brightness with white as its center, while that of the nighttime class was distributed with black as its center. Colors that accounted for over 70% of the daytime class were 647, those over 70% of the nighttime class were 252, and the rest (31-69%) were 101. The number of colors in the middle area was low, while other colors were classified relatively clearly into day and night. The resulting color distributions in the daytime and nighttime classes were able to provide the borderline color values of the two classes that are classified by brightness. As a result of analyzing the frequency of digital photos using KSCA, colors around yellow were expressed in generally bright daytime photos, while colors around blue value were expressed in dark night photos. For frequency of daytime photos, colors on the upper 40% had low chroma, almost being achromatic. Also, colors that are close to white and black showed the highest frequency, indicating a large difference in brightness. Meanwhile, for colors with frequency from top 5 to 10, yellow green was expressed darkly, and navy blue was expressed brightly, partially composing a complex harmony. When examining the color band, various colors, brightness, and chroma including light blue, achromatic colors, and warm colors were shown, failing to compose a generally harmonious arrangement of colors. For the frequency of nighttime photos, colors in approximately the upper 50% are dark colors with a brightness value of 2 (Munsell signal). In comparison, the brightness of middle frequency (50-80%) is relatively higher (brightness values of 3-4), and the brightness difference of various colors was large in the lower 20%. Colors that are not cool colors could be found intermittently in the lower 8% of frequency. When examining the color band, there was a general harmonious arrangement of colors centered on navy blue. As the results of conducting the experiment using two methods in this study, machine learning could classify colors into two or more classes, and could evaluate how close an image was with certain colors to a certain class. This method cannot be used if an image cannot be classified into a certain class. The result of such color distribution would serve as a reference when determining how close a certain color is to one of the two classes when the color is used as a dominant color in the base or background color of a certain design. Also, when dividing the analyzed images into several classes, even colors that have not been used in the analyzed image can be determined to find out how close they are to a certain class according to the color distribution properties of each class. Nevertheless, the results cannot be used to find out whether a specific color was used in the class and by how much it was used. To investigate such an issue, frequency analysis was conducted using KSCA. The color frequency could be measured within the range of images used in the experiment. The resulting values of color distribution and frequency from this study would serve as references for color planning of digital design regarding natural scenery in the geographic scope of Korea. Also, the two experiments are meaningful attempts for searching the methods for deriving colors that can be a useful reference among numerous images for content creator users of the relevant field.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.