• Title/Summary/Keyword: Web 2.0/3.0

Search Result 631, Processing Time 0.033 seconds

A Hybrid Recommender System based on Collaborative Filtering with Selective Use of Overall and Multicriteria Ratings (종합 평점과 다기준 평점을 선택적으로 활용하는 협업필터링 기반 하이브리드 추천 시스템)

  • Ku, Min Jung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.85-109
    • /
    • 2018
  • Recommender system recommends the items expected to be purchased by a customer in the future according to his or her previous purchase behaviors. It has been served as a tool for realizing one-to-one personalization for an e-commerce service company. Traditional recommender systems, especially the recommender systems based on collaborative filtering (CF), which is the most popular recommendation algorithm in both academy and industry, are designed to generate the items list for recommendation by using 'overall rating' - a single criterion. However, it has critical limitations in understanding the customers' preferences in detail. Recently, to mitigate these limitations, some leading e-commerce companies have begun to get feedback from their customers in a form of 'multicritera ratings'. Multicriteria ratings enable the companies to understand their customers' preferences from the multidimensional viewpoints. Moreover, it is easy to handle and analyze the multidimensional ratings because they are quantitative. But, the recommendation using multicritera ratings also has limitation that it may omit detail information on a user's preference because it only considers three-to-five predetermined criteria in most cases. Under this background, this study proposes a novel hybrid recommendation system, which selectively uses the results from 'traditional CF' and 'CF using multicriteria ratings'. Our proposed system is based on the premise that some people have holistic preference scheme, whereas others have composite preference scheme. Thus, our system is designed to use traditional CF using overall rating for the users with holistic preference, and to use CF using multicriteria ratings for the users with composite preference. To validate the usefulness of the proposed system, we applied it to a real-world dataset regarding the recommendation for POI (point-of-interests). Providing personalized POI recommendation is getting more attentions as the popularity of the location-based services such as Yelp and Foursquare increases. The dataset was collected from university students via a Web-based online survey system. Using the survey system, we collected the overall ratings as well as the ratings for each criterion for 48 POIs that are located near K university in Seoul, South Korea. The criteria include 'food or taste', 'price' and 'service or mood'. As a result, we obtain 2,878 valid ratings from 112 users. Among 48 items, 38 items (80%) are used as training dataset, and the remaining 10 items (20%) are used as validation dataset. To examine the effectiveness of the proposed system (i.e. hybrid selective model), we compared its performance to the performances of two comparison models - the traditional CF and the CF with multicriteria ratings. The performances of recommender systems were evaluated by using two metrics - average MAE(mean absolute error) and precision-in-top-N. Precision-in-top-N represents the percentage of truly high overall ratings among those that the model predicted would be the N most relevant items for each user. The experimental system was developed using Microsoft Visual Basic for Applications (VBA). The experimental results showed that our proposed system (avg. MAE = 0.584) outperformed traditional CF (avg. MAE = 0.591) as well as multicriteria CF (avg. AVE = 0.608). We also found that multicriteria CF showed worse performance compared to traditional CF in our data set, which is contradictory to the results in the most previous studies. This result supports the premise of our study that people have two different types of preference schemes - holistic and composite. Besides MAE, the proposed system outperformed all the comparison models in precision-in-top-3, precision-in-top-5, and precision-in-top-7. The results from the paired samples t-test presented that our proposed system outperformed traditional CF with 10% statistical significance level, and multicriteria CF with 1% statistical significance level from the perspective of average MAE. The proposed system sheds light on how to understand and utilize user's preference schemes in recommender systems domain.

Isotopic Determination of Food Sources of Benthic Invertebrates in Two Different Macroalgal Habitats in the Korean Coasts (동위원소 분석에 의한 동해와 남해 연안의 상이한 해조류 군락에 서식하는 저서무척추동물 먹이원 평가)

  • Kang, Chang-Keun;Choy, Eun-Jung;Song, Haeng-Seop;Park, Hyun-Je;Soe, In-Soo;Jo, Q-Tae;Lee, Kun-Seop
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.12 no.4
    • /
    • pp.380-389
    • /
    • 2007
  • Stable carbon and nitrogen isotopes were analyzed in suspended particulate organic matter, macroalgae and macrobenthic invertebrates in order to determine the importance of primary organic matter sources in supporting food webs of rocky subtidal and intertidal macroalgal beds in the Korean coasts. Investigations were conducted at the inter tidal sites within Gwangyang bay, a semi-enclosed and eutrophicated bay, and the subtidal sites of the east coast, a relatively oligotrophic and open environment, in May and June 2005. Water-column suspension feeders showed more negative $\delta^{13}C$ values than those of the other feeding guilds, indicating trophic linkage with phytoplankton and thereby association with pelagic food chains. In contrast, animals of the other feeding guilds, including interface suspension feeders, herbivores, deposit feeders, omnivores and predators, displayed relatively less negative $\delta^{13}C$ values than those of the water-column suspension feeders and similar with that of macroalgae, indicating exclusive use of macroalgae-derived organic matter and association with benthic food chains. Most the macrobenthic species were considered to form strong trophic links with benthic food chains. In addition, the distribution of higher $\delta^{15}N$ values in macrobenthic consumers and macroalgae at the intertidal sites of Gwangyang Bay than those at the subtidal sites of the east coast suggests that anthropogenic nutrients may enhance the macroalgal production at the intertidal sites and in turn be incorporated into the particular littoral food web in Gwangyag Bay. These results confirm the dominant role of macroalgae in supporting rocky subtidal and intertidal food webs in the Korean coasts.

Current feeding practices and maternal nutritional knowledge on complementary feeding in Korea (이유기 보충식 현황과 어머니 인식 조사)

  • Yom, Hye Won;Seo, Jeong Wan;Park, Hyesook;Choi, Kwang Hae;Chang, Ju Young;Ryoo, Eell;Yang, Hye Ran;Kim, Jae Young;Seo, Ji Hyun;Kim, Yong Joo;Moon, Kyung Rye;Kang, Ki Soo;Park, Kie Young;Lee, Seong Soo;Shim, Jeong Ok
    • Clinical and Experimental Pediatrics
    • /
    • v.52 no.10
    • /
    • pp.1090-1102
    • /
    • 2009
  • Purpose:To evaluate current feeding practices and maternal nutritional knowledge on complementary feeding. Methods:Mothers of babies aged 9-15 months who visited pediatric clinics of 14 general hospitals between September and December 2008 were asked to fill questionnaires. Data from 1,078 questionnaires were analyzed. Results:Complementary food was introduced at 4-7 months in 89% of babies. Home-made rice gruel was the first complementary food in 93% cases. Spoons were used for initial feeding in 97% cases. At 6-7 months, <50% of babies were fed meat (beef, 43%). Less than 12-month-old babies were fed salty foods such as salted laver (35%) or bean-paste soup (51%) and cow's milk (11%). The following were the maternal sources of information on complementary feeding: books/magazines (58%), friends (30%), internet web sites (29%), relatives (14%), and hospitals (4%). Compared to the 1993 survey, the incidence of complementary food introduction before 4 months (0.4% vs. 21%) and initial use of commercial food (7% vs. 39%) had decreased. Moreover, spoons were increasingly used for initial feeding (97% vs. 57%). The average maternal nutritional knowledge score was 7.5/10. Less percentage of mothers agreed with the following suggestions: bottle formula weaning before 15-18 months (68%), no commercial baby drinks as complementary food (67%), considering formula (or cow's milk) better than soy milk (65%), and feeding minced meat from 6-7 months (57%). Conclusion:Complementary feeding practices have considerably improved since the last decade. Pediatricians should advise timely introduction of appropriate complementary foods and monitor diverse information sources on complementary feeding.

Gene Expression Analysis of Inducible cAMP Early Repressor (ICER) Gene in Longissimus dorsi of High- and Low Marbled Hanwoo Steers (한우 등심부위 근육 내 조지방함량에 따른 inducible cAMP early repressor (ICER) 유전자발현 분석)

  • Lee, Seung-Hwan;Kim, Nam-Kuk;Kim, Sung-Kon;Cho, Yong-Min;Yoon, Du-hak;Oh, Sung-Jong;Im, Seok-Ki;Park, Eung-Woo
    • Journal of Life Science
    • /
    • v.18 no.8
    • /
    • pp.1090-1095
    • /
    • 2008
  • Marbling (intramuscular fat) is an important factor in determining meat quality in Korean beef market. A grain based finishing system for improving marbling leads to inefficient meat production due to an excessive fat production. Identification of intramuscular fat-specific gene might be achieved more targeted meat production through alternative genetic improvement program such as marker assisted selection (MAS). We carried out ddRT-PCR in 12 and 27 month old Hanwoo steers and detected 300 bp PCR product of the inducible cAMP early repressor (ICER) gene, showing highly gene expression in 27 months old. A 1.5 kb sequence was re-sequenced using primer designed base on the Hanwoo EST sequence. We then predicted the open reading frame (ORF) of ICER gene in ORF finder web program. Tissue distribution of ICER gene expression was analysed in eight Hanwoo tissue using realtime PCR analysis. The highest ICER gene expression showed in Small intestine followed by Longissimus dorsi. Interestingly, the ICER gene expressed 2.5 time higher in longissimus dorsi than in same muscle type, Rump. For gene expression analysis in high- and low marbled individuals, we selected 4 and 3 animal based on the muscle crude fat contents (high is 17-32%, low is 6-7% of crude fat contents). The ICER gene expression was analysed using ANOVA model. Marbling (muscle crude fat contents) was affected by ICER gene (P=0.012). Particularly, the ICER gene expression was 4 times higher in high group (n=4) than low group (n=3). Therefore, ICER gene might be a functional candidate gene related to marbling in Hanwoo.

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.

Effect of Whalakyuoleyng-dan plus Yinsamyangwui-tang on Anti-angionesis (활락효영단합인삼양위탕(活絡效靈丹合人蔘養胃湯)이 혈관신생(血管新生) 억제(抑制)에 미치는 영향(影響))

  • Ko, Ki-Wan;Park, Joon-Hyuk;Kang, Hee;Kim, Sung-Hoon;Yu, Young-Beob;Shim, Bum-Sang;Choi, Seung-Hoon;Ahn, Koo-Seok
    • THE JOURNAL OF KOREAN ORIENTAL ONCOLOGY
    • /
    • v.7 no.1
    • /
    • pp.77-97
    • /
    • 2001
  • Anti-angiogenesis is one of therapies which have been high-lightened on the research of cancer treatment. Anti-angiogenesis means that new blood vessels are created from a existing capillary tube and it is a important process on metastasis and permeation when cancer is created or formed. Since angiogenesis have been under research, a complete recovery oriented treatment against cancer have been suggested blocking metastasis, delaying the growth of cancer cell, and blocking the supply of oxygen and nutritive substance through the web of blood vessels. Until now, there are several anti-angiogenesis, which have been known to the public, such as thalidomide, angiostatin, endostatin, 2-methoxyestradiol, TNP-470, and marimastat, etc. Additionally, 17 clinical testing projects about anti-angiogenesis are on the process in NCI(National Cancer Institute). Especially, TNP-470 showed effectiveness against cancer on clinical testing after finishing animal testing. Based on existing researches showing that Yinsamyangwui-tang is effective to strengthening body resistance and Whallakhyolenyng-dan effects cells on the inside of blood vessel because Whallakhyolenyng- dan restrains cell adhesion during the restraining period of a blood vessel, I tried to research the effect of Whalakhyolenyng-dan plus Yinsamyangwui-tang on angiogenesis. I made a conclusion putting into operation through using SK-Hep-1 (KCLB 30052), A549(KCLB 10185), AGS(KCLB 21739), and BCE(Bovine Capillary Endothelial Cell). Followings are the results of my experimental research: 1. According to the researching results of anti-cancer activation against cancer cell, Whallkhyoleyng dan plus Yinsamyangwui-tang decreased the number of cancer cells -- While injecting $600{\mu}g/ml$, injected groups decreased 3.1% more comparing with the contrastive group of SK-Hep-1, 49.7% more comparing with the contrastive group of A549, and 31.0% more comparing with the contrastive group of AGS. 2. According to the researching results of DNA composition effect between BCE and cancer cell, Whallakhyoleyng-dan plus Yinsamyangwui-tang reduced the rate of SK-Hep-1 synthesis inhibition by 59.1% at $600{\mu}g/ml$ intensity comparing with contrastive group; for A549, 72.6%; for AGS, 6.1%, for BCE, 28.9%. 3. According to the researching results about the effect of BCE cell to angiogenesis, angiogenesis was restrained at $400{\mu}g/ml$ intensity during 18 hours observation. 4. In the case of aortic ring assay, the half level of angiogenesis was reduced comparing with the contrastive group while injecting with $400{\mu}g/ml$ intensity; with $800{\mu}g/ml$, under 10% comparing with contrastive group; and with $1600{\mu}g/ml$, complete restrain. According to the above results, Whallakhyoleyng-dan plus Yinsamyangwui-tang was proved to have an anti-angiogenetic effects.

  • PDF

Social Network Analysis for the Effective Adoption of Recommender Systems (추천시스템의 효과적 도입을 위한 소셜네트워크 분석)

  • Park, Jong-Hak;Cho, Yoon-Ho
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.305-316
    • /
    • 2011
  • Recommender system is the system which, by using automated information filtering technology, recommends products or services to the customers who are likely to be interested in. Those systems are widely used in many different Web retailers such as Amazon.com, Netfix.com, and CDNow.com. Various recommender systems have been developed. Among them, Collaborative Filtering (CF) has been known as the most successful and commonly used approach. CF identifies customers whose tastes are similar to those of a given customer, and recommends items those customers have liked in the past. Numerous CF algorithms have been developed to increase the performance of recommender systems. However, the relative performances of CF algorithms are known to be domain and data dependent. It is very time-consuming and expensive to implement and launce a CF recommender system, and also the system unsuited for the given domain provides customers with poor quality recommendations that make them easily annoyed. Therefore, predicting in advance whether the performance of CF recommender system is acceptable or not is practically important and needed. In this study, we propose a decision making guideline which helps decide whether CF is adoptable for a given application with certain transaction data characteristics. Several previous studies reported that sparsity, gray sheep, cold-start, coverage, and serendipity could affect the performance of CF, but the theoretical and empirical justification of such factors is lacking. Recently there are many studies paying attention to Social Network Analysis (SNA) as a method to analyze social relationships among people. SNA is a method to measure and visualize the linkage structure and status focusing on interaction among objects within communication group. CF analyzes the similarity among previous ratings or purchases of each customer, finds the relationships among the customers who have similarities, and then uses the relationships for recommendations. Thus CF can be modeled as a social network in which customers are nodes and purchase relationships between customers are links. Under the assumption that SNA could facilitate an exploration of the topological properties of the network structure that are implicit in transaction data for CF recommendations, we focus on density, clustering coefficient, and centralization which are ones of the most commonly used measures to capture topological properties of the social network structure. While network density, expressed as a proportion of the maximum possible number of links, captures the density of the whole network, the clustering coefficient captures the degree to which the overall network contains localized pockets of dense connectivity. Centralization reflects the extent to which connections are concentrated in a small number of nodes rather than distributed equally among all nodes. We explore how these SNA measures affect the performance of CF performance and how they interact to each other. Our experiments used sales transaction data from H department store, one of the well?known department stores in Korea. Total 396 data set were sampled to construct various types of social networks. The dependant variable measuring process consists of three steps; analysis of customer similarities, construction of a social network, and analysis of social network patterns. We used UCINET 6.0 for SNA. The experiments conducted the 3-way ANOVA which employs three SNA measures as dependant variables, and the recommendation accuracy measured by F1-measure as an independent variable. The experiments report that 1) each of three SNA measures affects the recommendation accuracy, 2) the density's effect to the performance overrides those of clustering coefficient and centralization (i.e., CF adoption is not a good decision if the density is low), and 3) however though the density is low, the performance of CF is comparatively good when the clustering coefficient is low. We expect that these experiment results help firms decide whether CF recommender system is adoptable for their business domain with certain transaction data characteristics.

Moderating Effect of Lifestyle on Consumer Behavior of Loungewear with Korean Traditional Fashion Design Elements (소비자대함유한국전통시상설계원소적편복적소비행위지우생활방식적조절작용(消费者对含有韩国传统时尚设计元素的便服的消费行为之于生活方式的调节作用))

  • Ko, Eun-Ju;Lee, Jee-Hyun;Kim, Angella Ji-Young;Burns, Leslie Davis
    • Journal of Global Scholars of Marketing Science
    • /
    • v.20 no.1
    • /
    • pp.15-26
    • /
    • 2010
  • Due to the globalization across various industries and cultural trade among many countries, oriental concepts have been attracting world’s attentions. In fashion industry, one's traditional culture is often developed as fashion theme for designers' creation and became strong strategies to stand out among competitors. Because of the increase of preferences for oriental images, opportunities abound to introduce traditional fashion goods and expand culture based business to global fashion markets. However, global fashion brands that include Korean traditional culture are yet to be developed. In order to develop a global fashion brand with Korean taste, it is very important for native citizen to accept their own culture in domestic apparel market prior to expansion into foreign market. Loungewear is evaluated to be appropriate for adopting Korean traditional details into clothing since this wardrobe category embraces various purposes which will easily lead to natural adaptation and wide spread use. Also, this market is seeing an increased demand for multipurpose wardrobes and fashionable underwear (Park et al. 2009). Despite rapid growth in the loungewear market, specific studies of loungewear is rare; and among research on developing modernized-traditional clothing, fashion items and brands do not always include the loungewear category. Therefore, this study investigated the Korean loungewear market and studied consumer evaluation toward loungewear with Korean traditional fashion design elements. Relationship among antecedents of purchase intention for Korean traditional fashion design elements were analyzed and compared between lifestyle groups for consumer targeting purposes. Product quality, retail service quality, perceived value, and preference on loungewear with Korean traditional design elements were chosen as antecedents of purchase intention and a structural equation model was designed to examine their relationship as well as their influence on purchase intention. Product quality and retail service quality among marketing mixes were employed as factors affecting preference and perceived value of loungewear with Korean traditional fashion design elements. Also effects of preference and perceived value on purchase intention were examined through the same model. A total of 357 self-administered questionnaires were completed by female consumers via web survey system. A questionnaire was developed to measure samples' lifestyle, product and retail service quality as purchasing criteria, perceived value, preference and purchase intention of loungewear with Korean traditional fashion design elements. Also, loungewear purchasing and usage behavior were asked as well in order to examine Korean loungewear market status. Data was analyzed through descriptive analysis, factor analysis, cluster analysis, ANOVA and structural equation model was tested via AMOS 7.0. As for the result of Korean loungewear market status investigation, loungewear was purchased by most of the consumers in our sample. Loungewear is currently recognized as clothes that are worn at home and consumers are showing comparably low involvement toward loungewear. Most of consumers in this study purchase loungewear only two to three times a year and they spend less than US$10. A total of 12 items and four factors of loungewear consumer lifestyle were found: traditional value oriented lifestyle, brand-affected lifestyle, pursuit of leisure lifestyle, and health oriented lifestyle. Drawing on lifestyle factors, loungewear consumers were classified into two groups; Well-being and Conservative. Relationships among constructs of purchasing behavior related to loungewear with Korean traditional fashion design elements were estimated. Preference and perceived value of loungewear were affected by both product quality and retail service quality. This study proved that high qualities in product and retail service develop positive preference toward loungewear. Perceived value and preference of loungewear positively influenced purchase intention. The results indicated that high preference and perceived value of loungewear with Korean traditional fashion design elements strengthen purchase intention and proved importance of developing preference and elevate perceived value in order to make sales. In a model comparison between two lifestyle groups: Well-being and Conservative lifestyle groups, results showed that product quality and retail service quality had positive influences on both preference and perceived value in case of Well-being group. However, for Conservative group, only retail service quality had a positive effect on preference and its influence to purchase intention. Since Well-being group showed more significant influence on purchase intention, loungewear brands with Korean traditional fashion design elements may want to focus on characteristics of Well-being group. However, Conservative group's relationship between preference and purchase intention of loungewear with Korean traditional fashion design elements was stronger, so that loungewear brands with Korean traditional fashion design elements should focus on creating conservative consumers' positive preference toward loungewear. The results offered information on Korean loungewear consumers' lifestyle and provided useful information for fashion brands that are planning to enter Korean loungewear market, particularly targeting female consumers similar to the sample of the present study. This study offers strategic and marketing insight for loungewear brands and also for fashion brands that are planning to create highly value-added fashion brands with Korean traditional fashion design elements. Considering different types of lifestyle groups that are associated with loungewear or traditional fashion goods, brand managers and marketers can use the results of this paper as a reference to positioning, targeting and marketing strategy buildings.

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.