• Title/Summary/Keyword: 2 step cluster analysis

Search Result 63, Processing Time 0.021 seconds

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

Implication of the Ratio of Exchangeable Cations in Mountain Wetlands (산지습지 치환성 양이온 함량비의 특성과 함의)

  • Shin, Young Ho;Kim, Sung Hwan;Rhew, Hosahang
    • Journal of the Korean Geographical Society
    • /
    • v.49 no.2
    • /
    • pp.221-244
    • /
    • 2014
  • We suggested several implications by examining geochemical properties of sediments in Simjeok, Jangdo, and Hwaeomneup mountain wetlands which are natural preservation areas. Geochemical properties of wetland sediments show that all wetlands were included in the type of fens, but their distribution patterns were different from one another. We classified three sub-groups of sediments using the two step cluster analysis on the ratio of exchangeable cations. Wetland sediments can be grouped into Ca-dominated, Mg-dominated, and K-dominated types. Simjeok wetland have Ca-dominated sediments, while the sediments of Jangdo wetland indicate the Mg-dominated and Ca-dominated characteristics. Hwaeomneup wetland is composed of K-dominated sediment mainly. Different properties in the ratio are affected by various environmental factors such as geological, pedological, and vegetational settings. Because these geochemical properties will be affected by climate change and human impacts, these will be environmental indicator in mountain wetlands and be used in wetland management. This scheme can be used for classification of mountain wetlands. Therefore, we should work on geochemical properties of wetland sediments and classification schemes based on geochemical properties not only to widen understanding in geomorphic system or ecosystem of mountain wetlands but to conserve mountain wetlands properly.

  • PDF

The Relationship between Driving Behavior, Driving Anger, and Ambivalence Over Emotional Expressiveness in an Anonymous Situation (익명상황의 운전행동과 운전분노 및 정서표현갈등과의 관계)

  • Bo Young Yun ;Soon Chul Lee
    • Korean Journal of Culture and Social Issue
    • /
    • v.17 no.3
    • /
    • pp.321-341
    • /
    • 2011
  • This study examines how anonymity between drivers affects aggressive driving and why, in an anonymous situation, some drive aggressively and others do not. Two surveys were conducted. The first survey covered 200 participants and found that people are more likely to drive aggressively in an anonymous situation than in a face-to-face situation. The second survey covered 384 participants with a history of aggressive driving and found that these aggressive drivers could be classified into three groups using a two-step cluster analysis. Drivers who often exhibit aggressive driving in anonymous situations were found in the second questionnaire to have a high tendency towards driving anger and towards ambivalence over emotional expressiveness. The tendency towards self-defensive ambivalence factor, one of the factors in the ambivalence over emotional expressiveness questionnaire, was also found to be high. Individuals who tended to drive aggressively in an anonymous situation were found to be susceptible to driving anger, usually faced ambivalence over emotional expressiveness, and typically were indecisive. The results of this study suggest that rather than intensifying the enforcement of traffic regulations, a better remedy for those who drive recklessly would be to have them undertake some candid self-reflection.

  • PDF