• Title/Summary/Keyword: business network

Search Result 2,916, Processing Time 0.033 seconds

A Study on Analyzing Sentiments on Movie Reviews by Multi-Level Sentiment Classifier (영화 리뷰 감성분석을 위한 텍스트 마이닝 기반 감성 분류기 구축)

  • Kim, Yuyoung;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.71-89
    • /
    • 2016
  • Sentiment analysis is used for identifying emotions or sentiments embedded in the user generated data such as customer reviews from blogs, social network services, and so on. Various research fields such as computer science and business management can take advantage of this feature to analyze customer-generated opinions. In previous studies, the star rating of a review is regarded as the same as sentiment embedded in the text. However, it does not always correspond to the sentiment polarity. Due to this supposition, previous studies have some limitations in their accuracy. To solve this issue, the present study uses a supervised sentiment classification model to measure a more accurate sentiment polarity. This study aims to propose an advanced sentiment classifier and to discover the correlation between movie reviews and box-office success. The advanced sentiment classifier is based on two supervised machine learning techniques, the Support Vector Machines (SVM) and Feedforward Neural Network (FNN). The sentiment scores of the movie reviews are measured by the sentiment classifier and are analyzed by statistical correlations between movie reviews and box-office success. Movie reviews are collected along with a star-rate. The dataset used in this study consists of 1,258,538 reviews from 175 films gathered from Naver Movie website (movie.naver.com). The results show that the proposed sentiment classifier outperforms Naive Bayes (NB) classifier as its accuracy is about 6% higher than NB. Furthermore, the results indicate that there are positive correlations between the star-rate and the number of audiences, which can be regarded as the box-office success of a movie. The study also shows that there is the mild, positive correlation between the sentiment scores estimated by the classifier and the number of audiences. To verify the applicability of the sentiment scores, an independent sample t-test was conducted. For this, the movies were divided into two groups using the average of sentiment scores. The two groups are significantly different in terms of the star-rated scores.

A study on the Wonju Medical Equipment Industry Cluster (원주의료기기산업 클러스터의 형성과정에 관한 연구)

  • Lee, Woo-Chun;Yoon, Hyung-Ro
    • Journal of the Korean Academic Society of Industrial Cluster
    • /
    • v.1 no.1
    • /
    • pp.67-86
    • /
    • 2007
  • Wonju Medical Equipment Industry, despite of its short history, poor sales and weak manpower and so on, have shown remarkable outcomes in a relatively short period. At the end of 2007, totally 79 enterprises (only 4.6% of whole enterprises in Korea) made 10% of the nationwide production and 15% of the nationwide exports with an annual average growth rate of 66.7%, contributing domestic medical equipment industry tremendously. In addition, many leading medical equipment enterprises in various fields already moved or plan to move to Wonju, accelerating Wonju Medical Equipment Cluster. Wonju Medical Equipment Industry Cluster now enters into the growth stage, getting out of the initial business setup stage. Especially, the nomination of Wonju cluster project from the government accelerates networking (e.g. the development of the universal parts, the establishment of the mutual collaboration model among enterprises, and the mutual marketing), making a rapid growth in Wonju Medical Equipment Industry. Wonju Medical Equipment Industry Cluster revealed positive outcomes despite of the weakness in investment size and infra-structure comparing with the other medical industry cluster in the advanced country, while many domestic enterprises pursued their own growth models and thus failed to promote the international competitive power. Wonju Medical Equipment Industry has been developed rapidly. However, there are many challenging problems to support enterprises: small R&D investment and thus weak technology power, difficulties in recruiting R&D engineers, and poor marketing capabilities, financial infrastructure & policies, and network architecture. In order to develop a world-competitive medical equipment industry cluster at Wonju, the complement of infrastructures, the technology innovation, the mutual marketing, and the network expansion to support enterprises are further required. Wonju' s experiences in developing medical equipment industry so far suggest that our own flexible cluster model considering the industry structure and maturity for different regions should be developed, and specific action plans from the local and central governments based on their systematic strategies for industry development should be implemented in order to build world-competitive industry clusters in Korea.

  • PDF

A study on the developmental plan of Alarm Monitoring Service (기계경비의 발전적 대응방안에 관한 연구)

  • Chung, Tae-Hwang;So, Seung-Young
    • Korean Security Journal
    • /
    • no.22
    • /
    • pp.145-168
    • /
    • 2010
  • Since Alarm Monitoring Service was introduced in Korea in 1981, the market has been increasing and is expected to increase continually. Some factors such as the increase of social security need and the change of safety consciousness, increase of persons who live alone could be affected positively on Alarm Monitoring Service industry. As Alarm Monitoring Service come into wide use, the understanding of electronic security service is spread and consumer's demand is difficult, so consideration about new developmental plan is need to respond to the change actively. Electronic security system is consist of various kinds of element, so every element could do their role equally. Alarm Monitoring Service should satisfy consumer's various needs because it is not necessary commodity, also electronic security device could be easily operated and it's appearance has to have a good design. To solve the false alarm problem, detection sensor's improvement should be considered preferentially and development of new type of sensor that operate dissimilarly to replace former sensor is needed. On the other hand, to settle the matter that occurred by response time, security company could explain the limit on Alarm Monitoring System to consumer honestly and ask for an understanding. If consumer could be joined into security activity by security agent's explanation, better security service would be provided with mutual confidence. To save response time the consideration on the introduction of GIS(Global Information System) is needed rather than GPS(Global Positioning System). Although training program for security agents is important, several benefits for security agents should be considered together. The development of new business model is required for preparation against market stagnation and the development of new commodity to secure consumer for housing service rather than commercial facility service. for the purpose of those, new commodity related to home-network system and video surveillance system could be considered, also new added service with network between security company and consumer for a basis is to be considered.

  • PDF

An Analysis of the Roles of Experience in Information System Continuance (정보시스템의 지속적 사용에서 경험의 역할에 대한 분석)

  • Lee, Woong-Kyu
    • Asia pacific journal of information systems
    • /
    • v.21 no.4
    • /
    • pp.45-62
    • /
    • 2011
  • The notion of information systems (IS) continuance has recently emerged as one of the most important research issues in the field of IS. A great deal of research has been conducted thus far on the basis of theories adapted from various disciplines including consumer behaviors and social psychology, in addition to theories regarding information technology (IT) acceptance. This previous body of knowledge provides a robust research framework that can already account for the determination of IS continuance; however, this research points to other, thus-far-unelucidated determinant factors such as habit, which were not included in traditional IT acceptance frameworks, and also re-emphasizes the importance of emotion-related constructs such as satisfaction in addition to conscious intention with rational beliefs such as usefulness. Experiences should also be considered one of the most important factors determining the characteristics of information system (IS) continuance and the features distinct from those determining IS acceptance, because more experienced users may have more opportunities for IS use, which would allow them more frequent use than would be available to less experienced or non-experienced users. Interestingly, experience has dual features that may contradictorily influence IS use. On one hand, attitudes predicated on direct experience have been shown to predict behavior better than attitudes from indirect experience or without experience; as more information is available, direct experience may render IS use a more salient behavior, and may also make IS use more accessible via memory. Therefore, experience may serve to intensify the relationship between IS use and conscious intention with evaluations, On the other hand, experience may culminate in the formation of habits: greater experience may also imply more frequent performance of the behavior, which may lead to the formation of habits, Hence, like experience, users' activation of an IS may be more dependent on habit-that is, unconscious automatic use without deliberation regarding the IS-and less dependent on conscious intentions, Furthermore, experiences can provide basic information necessary for satisfaction with the use of a specific IS, thus spurring the formation of both conscious intentions and unconscious habits, Whereas IT adoption Is a one-time decision, IS continuance may be a series of users' decisions and evaluations based on satisfaction with IS use. Moreover. habits also cannot be formed without satisfaction, even when a behavior is carried out repeatedly. Thus, experiences also play a critical role in satisfaction, as satisfaction is the consequence of direct experiences of actual behaviors. In particular, emotional experiences such as enjoyment can become as influential on IS use as are utilitarian experiences such as usefulness; this is especially true in light of the modern increase in membership-based hedonic systems - including online games, web-based social network services (SNS), blogs, and portals-all of which attempt to provide users with self-fulfilling value. Therefore, in order to understand more clearly the role of experiences in IS continuance, analysis must be conducted under a research framework that includes intentions, habits, and satisfaction, as experience may not only have duration-based moderating effects on the relationship between both intention and habit and the activation of IS use, but may also have content-based positive effects on satisfaction. This is consistent with the basic assumptions regarding the determining factors in IS continuance as suggested by Oritz de Guinea and Markus: consciousness, emotion, and habit. The principal objective of this study was to explore and assess the effects of experiences in IS continuance, with special consideration given to conscious intentions and unconscious habits, as well as satisfaction. IN service of this goal, along with a review of the relevant literature regarding the effects of experiences and habit on continuous IS use, this study suggested a research model that represents the roles of experience: its moderating role in the relationships of IS continuance with both conscious intention and unconscious habit, and its antecedent role in the development of satisfaction. For the validation of this research model. Korean university student users of 'Cyworld', one of the most influential social network services in South Korea, were surveyed, and the data were analyzed via partial least square (PLS) analysis to assess the implications of this study. In result most hypotheses in our research model were statistically supported with the exception of one. Although one hypothesis was not supported, the study's findings provide us with some important implications. First the role of experience in IS continuance differs from its role in IS acceptance. Second, the use of IS was explained by the dynamic balance between habit and intention. Third, the importance of satisfaction was confirmed from the perspective of IS continuance with experience.

An Analysis of IT Trends Using Tweet Data (트윗 데이터를 활용한 IT 트렌드 분석)

  • Yi, Jin Baek;Lee, Choong Kwon;Cha, Kyung Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.143-159
    • /
    • 2015
  • Predicting IT trends has been a long and important subject for information systems research. IT trend prediction makes it possible to acknowledge emerging eras of innovation and allocate budgets to prepare against rapidly changing technological trends. Towards the end of each year, various domestic and global organizations predict and announce IT trends for the following year. For example, Gartner Predicts 10 top IT trend during the next year, and these predictions affect IT and industry leaders and organization's basic assumptions about technology and the future of IT, but the accuracy of these reports are difficult to verify. Social media data can be useful tool to verify the accuracy. As social media services have gained in popularity, it is used in a variety of ways, from posting about personal daily life to keeping up to date with news and trends. In the recent years, rates of social media activity in Korea have reached unprecedented levels. Hundreds of millions of users now participate in online social networks and communicate with colleague and friends their opinions and thoughts. In particular, Twitter is currently the major micro blog service, it has an important function named 'tweets' which is to report their current thoughts and actions, comments on news and engage in discussions. For an analysis on IT trends, we chose Tweet data because not only it produces massive unstructured textual data in real time but also it serves as an influential channel for opinion leading on technology. Previous studies found that the tweet data provides useful information and detects the trend of society effectively, these studies also identifies that Twitter can track the issue faster than the other media, newspapers. Therefore, this study investigates how frequently the predicted IT trends for the following year announced by public organizations are mentioned on social network services like Twitter. IT trend predictions for 2013, announced near the end of 2012 from two domestic organizations, the National IT Industry Promotion Agency (NIPA) and the National Information Society Agency (NIA), were used as a basis for this research. The present study analyzes the Twitter data generated from Seoul (Korea) compared with the predictions of the two organizations to analyze the differences. Thus, Twitter data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. To overcome these challenges, we used SAS IRS (Information Retrieval Studio) developed by SAS to capture the trend in real-time processing big stream datasets of Twitter. The system offers a framework for crawling, normalizing, analyzing, indexing and searching tweet data. As a result, we have crawled the entire Twitter sphere in Seoul area and obtained 21,589 tweets in 2013 to review how frequently the IT trend topics announced by the two organizations were mentioned by the people in Seoul. The results shows that most IT trend predicted by NIPA and NIA were all frequently mentioned in Twitter except some topics such as 'new types of security threat', 'green IT', 'next generation semiconductor' since these topics non generalized compound words so they can be mentioned in Twitter with other words. To answer whether the IT trend tweets from Korea is related to the following year's IT trends in real world, we compared Twitter's trending topics with those in Nara Market, Korea's online e-Procurement system which is a nationwide web-based procurement system, dealing with whole procurement process of all public organizations in Korea. The correlation analysis show that Tweet frequencies on IT trending topics predicted by NIPA and NIA are significantly correlated with frequencies on IT topics mentioned in project announcements by Nara market in 2012 and 2013. The main contribution of our research can be found in the following aspects: i) the IT topic predictions announced by NIPA and NIA can provide an effective guideline to IT professionals and researchers in Korea who are looking for verified IT topic trends in the following topic, ii) researchers can use Twitter to get some useful ideas to detect and predict dynamic trends of technological and social issues.

Personalized Recommendation System for IPTV using Ontology and K-medoids (IPTV환경에서 온톨로지와 k-medoids기법을 이용한 개인화 시스템)

  • Yun, Byeong-Dae;Kim, Jong-Woo;Cho, Yong-Seok;Kang, Sang-Gil
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.147-161
    • /
    • 2010
  • As broadcasting and communication are converged recently, communication is jointed to TV. TV viewing has brought about many changes. The IPTV (Internet Protocol Television) provides information service, movie contents, broadcast, etc. through internet with live programs + VOD (Video on demand) jointed. Using communication network, it becomes an issue of new business. In addition, new technical issues have been created by imaging technology for the service, networking technology without video cuts, security technologies to protect copyright, etc. Through this IPTV network, users can watch their desired programs when they want. However, IPTV has difficulties in search approach, menu approach, or finding programs. Menu approach spends a lot of time in approaching programs desired. Search approach can't be found when title, genre, name of actors, etc. are not known. In addition, inserting letters through remote control have problems. However, the bigger problem is that many times users are not usually ware of the services they use. Thus, to resolve difficulties when selecting VOD service in IPTV, a personalized service is recommended, which enhance users' satisfaction and use your time, efficiently. This paper provides appropriate programs which are fit to individuals not to save time in order to solve IPTV's shortcomings through filtering and recommendation-related system. The proposed recommendation system collects TV program information, the user's preferred program genres and detailed genre, channel, watching program, and information on viewing time based on individual records of watching IPTV. To look for these kinds of similarities, similarities can be compared by using ontology for TV programs. The reason to use these is because the distance of program can be measured by the similarity comparison. TV program ontology we are using is one extracted from TV-Anytime metadata which represents semantic nature. Also, ontology expresses the contents and features in figures. Through world net, vocabulary similarity is determined. All the words described on the programs are expanded into upper and lower classes for word similarity decision. The average of described key words was measured. The criterion of distance calculated ties similar programs through K-medoids dividing method. K-medoids dividing method is a dividing way to divide classified groups into ones with similar characteristics. This K-medoids method sets K-unit representative objects. Here, distance from representative object sets temporary distance and colonize it. Through algorithm, when the initial n-unit objects are tried to be divided into K-units. The optimal object must be found through repeated trials after selecting representative object temporarily. Through this course, similar programs must be colonized. Selecting programs through group analysis, weight should be given to the recommendation. The way to provide weight with recommendation is as the follows. When each group recommends programs, similar programs near representative objects will be recommended to users. The formula to calculate the distance is same as measure similar distance. It will be a basic figure which determines the rankings of recommended programs. Weight is used to calculate the number of watching lists. As the more programs are, the higher weight will be loaded. This is defined as cluster weight. Through this, sub-TV programs which are representative of the groups must be selected. The final TV programs ranks must be determined. However, the group-representative TV programs include errors. Therefore, weights must be added to TV program viewing preference. They must determine the finalranks.Based on this, our customers prefer proposed to recommend contents. So, based on the proposed method this paper suggested, experiment was carried out in controlled environment. Through experiment, the superiority of the proposed method is shown, compared to existing ways.

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

The Case on Valuation of IT Enterprise (IT 기업의 가치평가 사례연구)

  • Lee, Jae-Il;Yang, Hae-Sul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.8 no.4
    • /
    • pp.881-893
    • /
    • 2007
  • IT(Information Technology)-based industries have caused a recent digital revolution and the appearance of various types' information service, being largely expanded toward info-communication device company, info-communication service company, software company etc.. Therefore, the needs to evaluate the company value of IT business for M&A or liquidation are growing tremendously. Unlike other industries, however, IT industry has a short lift cycle and so it doesn't have not only a company value-evaluating model for general businesses but the objective one for IT companies yet. So, this thesis analyzes various value-evaluating technique and newly rising ROV. DCF, the change method of company's cash flow including tangible assets into future value, had been applied during the past industrialization economy era and has been persuasively applied to the present. However, the DCF valuation has no option but to make many mistakes because IT companies have more intangible assets than tangible assets. Accordingly, it is ROV, recognized as the new method of evaluating companies' various options normally and quantitatively, that is brought up recently. But the evaluation on the companies' various options is too subjective and theoretical up to now and due to the lack of objective ground and options, it's not possible to be applied to reality. In this thesis, it is found that ROV is more accurate than DCF, comparing DCF and ROV through four examples. As the options applied to ROV are excessively limited, we tried to develop ROV into a new method by deriving five invisible value factors within IT companies. Therefore, on this occasion, we should set up the basic valuation methods on IT companies and should research and develop an effective and various valuation methods suitable to each company like an internet-based company, a S/W developing enterprise, a network-related company among IT companies.

  • PDF

The current state and prospects of travel business development under the COVID-19 pandemic

  • Tkachenko, Tetiana;Pryhara, Olha;Zatsepina, Nataly;Bryk, Stepan;Holubets, Iryna;Havryliuk, Alla
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12spc
    • /
    • pp.664-674
    • /
    • 2021
  • The relevance of this scientific research is determined by the negative impact of the COVID-19 pandemic on the current trends and dynamics of world tourism development. This article aims to identify patterns of development of the modern tourist market, analysis of problems and prospects of development in the context of the COVID-19 pandemic. Materials and methods. General scientific methods and methods of research are used in the work: analysis, synthesis, comparison, analysis of statistical data. The analysis of the viewpoints of foreign and domestic authors on the research of the international tourist market allowed us to substantiate the actual directions of tourism development due to the influence of negative factors connected with the spread of a new coronavirus infection COVID-19. Economic-statistical, abstract-logical, and economic-mathematical methods of research were used during the process of study and data processing. Results. The analysis of the current state of the tourist market by world regions was carried out. It was found that tourism is one of the most affected sectors from COVID-19, as, by the end of 2020, the total number of tourist arrivals in the world decreased by 74% compared to the same period in 2019. The consequence of this decline was a loss of total global tourism revenues by the end of 2020, which equaled $1.3 trillion. 27% of all destinations are completely closed to international tourism. At the end of 2020, the economy of international tourism has shrunk by about 80%. In 2020 the world traveled 98 million fewer people (-83%) relative to the same period last year. Tourism was hit hardest by the pandemic in the Asia-Pacific region, where travel restrictions are as strict as possible. International arrivals in this region fell by 84% (300 million). The Middle East and Africa recorded declines of 75 and 70 percent. Despite a small and short-lived recovery in the summer of 2020, Europe lost 71% of the tourist flow, with the European continent recording the largest drop in absolute terms compared with 2019, 500 million. In North and South America, foreign arrivals declined. It is revealed that a significant decrease in tourist flows leads to a massive loss of jobs, a sharp decline in foreign exchange earnings and taxes, which limits the ability of states to support the tourism industry. Three possible scenarios of exit of the tourist industry from the crisis, reflecting the most probable changes of monthly tourist flows, are considered. The characteristics of respondents from Ukraine, Germany, and the USA and their attitude to travel depending on gender, age, education level, professional status, and monthly income are presented. About 57% of respondents from Ukraine, Poland, and the United States were planning a tourist trip in 2021. Note that people with higher or secondary education were more willing to plan such a trip. The results of the empirical study confirm that interest in domestic tourism has increased significantly in 2021. The regression model of dependence of the number of domestic tourist trips on the example of Ukraine with time tendency (t) and seasonal variations (Turˆt = 7288,498 - 20,58t - 410,88∑5) it forecast for 2020, which allows stabilizing the process of tourist trips after the pandemic to use this model to forecast for any country. Discussion. We should emphasize the seriousness of the COVID-19 pandemic and the fact that many experts and scientists believe in the long-term recovery of the tourism industry. In our opinion, the governments of the countries need to refocus on domestic tourism and deal with infrastructure development, search for new niches, formats, formation of new package deals in new - domestic - segment (new products' development (tourist routes, exhibitions, sightseeing programs, special rehabilitation programs after COVID) -19 in sanatoriums, etc.); creation of individual offers for different target audiences). Conclusions. Thus, the identified trends are associated with a decrease in the number of tourist flows, the negative impact of the pandemic on employment and income from tourism activities. International tourism needs two to four years before it returns to the level of 2019.

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.