• Title/Summary/Keyword: Web search

Search Result 1,646, Processing Time 0.023 seconds

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

Smart Store in Smart City: The Development of Smart Trade Area Analysis System Based on Consumer Sentiments (Smart Store in Smart City: 소비자 감성기반 상권분석 시스템 개발)

  • Yoo, In-Jin;Seo, Bong-Goon;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.25-52
    • /
    • 2018
  • This study performs social network analysis based on consumer sentiment related to a location in Seoul using data reflecting consumers' web search activities and emotional evaluations associated with commerce. The study focuses on large commercial districts in Seoul. In addition, to consider their various aspects, social network indexes were combined with the trading area's public data to verify factors affecting the area's sales. According to R square's change, We can see that the model has a little high R square value even though it includes only the district's public data represented by static data. However, the present study confirmed that the R square of the model combined with the network index derived from the social network analysis was even improved much more. A regression analysis of the trading area's public data showed that the five factors of 'number of market district,' 'residential area per person,' 'satisfaction of residential environment,' 'rate of change of trade,' and 'survival rate over 3 years' among twenty two variables. The study confirmed a significant influence on the sales of the trading area. According to the results, 'residential area per person' has the highest standardized beta value. Therefore, 'residential area per person' has the strongest influence on commercial sales. In addition, 'residential area per person,' 'number of market district,' and 'survival rate over 3 years' were found to have positive effects on the sales of all trading area. Thus, as the number of market districts in the trading area increases, residential area per person increases, and as the survival rate over 3 years of each store in the trading area increases, sales increase. On the other hand, 'satisfaction of residential environment' and 'rate of change of trade' were found to have a negative effect on sales. In the case of 'satisfaction of residential environment,' sales increase when the satisfaction level is low. Therefore, as consumer dissatisfaction with the residential environment increases, sales increase. The 'rate of change of trade' shows that sales increase with the decreasing acceleration of transaction frequency. According to the social network analysis, of the 25 regional trading areas in Seoul, Yangcheon-gu has the highest degree of connection. In other words, it has common sentiments with many other trading areas. On the other hand, Nowon-gu and Jungrang-gu have the lowest degree of connection. In other words, they have relatively distinct sentiments from other trading areas. The social network indexes used in the combination model are 'density of ego network,' 'degree centrality,' 'closeness centrality,' 'betweenness centrality,' and 'eigenvector centrality.' The combined model analysis confirmed that the degree centrality and eigenvector centrality of the social network index have a significant influence on sales and the highest influence in the model. 'Degree centrality' has a negative effect on the sales of the districts. This implies that sales decrease when holding various sentiments of other trading area, which conflicts with general social myths. However, this result can be interpreted to mean that if a trading area has low 'degree centrality,' it delivers unique and special sentiments to consumers. The findings of this study can also be interpreted to mean that sales can be increased if the trading area increases consumer recognition by forming a unique sentiment and city atmosphere that distinguish it from other trading areas. On the other hand, 'eigenvector centrality' has the greatest effect on sales in the combined model. In addition, the results confirmed a positive effect on sales. This finding shows that sales increase when a trading area is connected to others with stronger centrality than when it has common sentiments with others. This study can be used as an empirical basis for establishing and implementing a city and trading area strategy plan considering consumers' desired sentiments. In addition, we expect to provide entrepreneurs and potential entrepreneurs entering the trading area with sentiments possessed by those in the trading area and directions into the trading area considering the district-sentiment structure.

Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs (TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석)

  • Choeh, Joon Yeon;Baek, Haedeuk;Choi, Jinho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.163-176
    • /
    • 2014
  • Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.

Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)

  • Yu, Eunji;Kim, Yoosin;Kim, Namgyu;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.95-110
    • /
    • 2013
  • Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.

Directions of Implementing Documentation Strategies for Local Regions (지역 기록화를 위한 도큐멘테이션 전략의 적용)

  • Seol, Moon-Won
    • The Korean Journal of Archival Studies
    • /
    • no.26
    • /
    • pp.103-149
    • /
    • 2010
  • Documentation strategy has been experimented in various subject areas and local regions since late 1980's when it was proposed as archival appraisal and selection methods by archival communities in the United States. Though it was criticized to be too ideal, it needs to shed new light on the potentialities of the strategy for documenting local regions in digital environment. The purpose of this study is to analyse the implementation issues of documentation strategy and to suggest the directions for documenting local regions of Korea through the application of the strategy. The documentation strategy which was developed more than twenty years ago in mostly western countries gives us some implications for documenting local regions even in current digital environments. They are as follows; Firstly, documentation strategy can enhance the value of archivists as well as archives in local regions because archivist should be active shaper of history rather than passive receiver of archives according to the strategy. It can also be a solution for overcoming poor conditions of local archives management in Korea. Secondly, the strategy can encourage cooperation between collecting institutions including museums, libraries, archives, cultural centers, history institutions, etc. in each local region. In the networked environment the cooperation can be achieved more effectively than in traditional environment where the heavy workload of cooperative institutions is needed. Thirdly, the strategy can facilitate solidarity of various groups in local region. According to the analysis of the strategy projects, it is essential to collect their knowledge, passion, and enthusiasm of related groups to effectively implement the strategy. It can also provide a methodology for minor groups of society to document their memories. This study suggests the directions of documenting local regions in consideration of current archival infrastructure of Korean as follows; Firstly, very selective and intensive documentation should be pursued rather than comprehensive one for documenting local regions. Though it is a very political problem to decide what subject has priority for documentation, interests of local community members as well as professional groups should be considered in the decision-making process seriously. Secondly, it is effective to plan integrated representation of local history in the distributed custody of local archives. It would be desirable to implement archival gateway for integrated search and representation of local archives regardless of the location of archives. Thirdly, it is necessary to try digital documentation using Web 2.0 technologies. Documentation strategy as the methodology of selecting and acquiring archives can not avoid subjectivity and prejudices of appraiser completely. To mitigate the problems, open documentation system should be prepared for reflecting different interests of different groups. Fourth, it is desirable to apply a conspectus model used in cooperative collection management of libraries to document local regions digitally. Conspectus can show existing documentation strength and future documentation intensity for each participating institution. Using this, documentation level of each subject area can be set up cooperatively and effectively in the local regions.

A Study on the System of Aircraft Investigation (항공기(航空機) 사고조사제도(事故調査制度)에 관한 연구(硏究))

  • Kim, Doo-Hwan
    • The Korean Journal of Air & Space Law and Policy
    • /
    • v.9
    • /
    • pp.85-143
    • /
    • 1997
  • The main purpose of the investigation of an accident caused by aircraft is to be prevented the sudden and casual accidents caused by wilful misconduct and fault from pilots, air traffic controllers, hijack, trouble of engine and machinery of aircraft, turbulence during the bad weather, collision between birds and aircraft, near miss flight by aircrafts etc. It is not the purpose of this activity to apportion blame or liability for offender of aircraft accidents. Accidents to aircraft, especially those involving the general public and their property, are a matter of great concern to the aviation community. The system of international regulation exists to improve safety and minimize, as far as possible, the risk of accidents but when they do occur there is a web of systems and procedures to investigate and respond to them. I would like to trace the general line of regulation from an international source in the Chicago Convention of 1944. Article 26 of the Convention lays down the basic principle for the investigation of the aircraft accident. Where there has been an accident to an aircraft of a contracting state which occurs in the territory of another contracting state and which involves death or serious injury or indicates serious technical defect in the aircraft or air navigation facilities, the state in which the accident occurs must institute an inquiry into the circumstances of the accident. That inquiry will be in accordance, in so far as its law permits, with the procedure which may be recommended from time to time by the International Civil Aviation Organization ICAO). There are very general provisions but they state two essential principles: first, in certain circumstances there must be an investigation, and second, who is to be responsible for undertaking that investigation. The latter is an important point to establish otherwise there could be at least two states claiming jurisdiction on the inquiry. The Chicago Convention also provides that the state where the aircraft is registered is to be given the opportunity to appoint observers to be present at the inquiry and the state holding the inquiry must communicate the report and findings in the matter to that other state. It is worth noting that the Chicago Convention (Article 25) also makes provision for assisting aircraft in distress. Each contracting state undertakes to provide such measures of assistance to aircraft in distress in its territory as it may find practicable and to permit (subject to control by its own authorities) the owner of the aircraft or authorities of the state in which the aircraft is registered, to provide such measures of assistance as may be necessitated by circumstances. Significantly, the undertaking can only be given by contracting state but the duty to provide assistance is not limited to aircraft registered in another contracting state, but presumably any aircraft in distress in the territory of the contracting state. Finally, the Convention envisages further regulations (normally to be produced under the auspices of ICAO). In this case the Convention provides that each contracting state, when undertaking a search for missing aircraft, will collaborate in co-ordinated measures which may be recommended from time to time pursuant to the Convention. Since 1944 further international regulations relating to safety and investigation of accidents have been made, both pursuant to Chicago Convention and, in particular, through the vehicle of the ICAO which has, for example, set up an accident and reporting system. By requiring the reporting of certain accidents and incidents it is building up an information service for the benefit of member states. However, Chicago Convention provides that each contracting state undertakes collaborate in securing the highest practicable degree of uniformity in regulations, standards, procedures and organization in relation to aircraft, personnel, airways and auxiliary services in all matters in which such uniformity will facilitate and improve air navigation. To this end, ICAO is to adopt and amend from time to time, as may be necessary, international standards and recommended practices and procedures dealing with, among other things, aircraft in distress and investigation of accidents. Standards and Recommended Practices for Aircraft Accident Injuries were first adopted by the ICAO Council on 11 April 1951 pursuant to Article 37 of the Chicago Convention on International Civil Aviation and were designated as Annex 13 to the Convention. The Standards Recommended Practices were based on Recommendations of the Accident Investigation Division at its first Session in February 1946 which were further developed at the Second Session of the Division in February 1947. The 2nd Edition (1966), 3rd Edition, (1973), 4th Edition (1976), 5th Edition (1979), 6th Edition (1981), 7th Edition (1988), 8th Edition (1992) of the Annex 13 (Aircraft Accident and Incident Investigation) of the Chicago Convention was amended eight times by the ICAO Council since 1966. Annex 13 sets out in detail the international standards and recommended practices to be adopted by contracting states in dealing with a serious accident to an aircraft of a contracting state occurring in the territory of another contracting state, known as the state of occurrence. It provides, principally, that the state in which the aircraft is registered is to be given the opportunity to appoint an accredited representative to be present at the inquiry conducted by the state in which the serious aircraft accident occurs. Article 26 of the Chicago Convention does not indicate what the accredited representative is to do but Annex 13 amplifies his rights and duties. In particular, the accredited representative participates in the inquiry by visiting the scene of the accident, examining the wreckage, questioning witnesses, having full access to all relevant evidence, receiving copies of all pertinent documents and making submissions in respect of the various elements of the inquiry. The main shortcomings of the present system for aircraft accident investigation are that some contracting sates are not applying Annex 13 within its express terms, although they are contracting states. Further, and much more important in practice, there are many countries which apply the letter of Annex 13 in such a way as to sterilise its spirit. This appears to be due to a number of causes often found in combination. Firstly, the requirements of the local law and of the local procedures are interpreted and applied so as preclude a more efficient investigation under Annex 13 in favour of a legalistic and sterile interpretation of its terms. Sometimes this results from a distrust of the motives of persons and bodies wishing to participate or from commercial or related to matters of liability and bodies. These may be political, commercial or related to matters of liability and insurance. Secondly, there is said to be a conscious desire to conduct the investigation in some contracting states in such a way as to absolve from any possibility of blame the authorities or nationals, whether manufacturers, operators or air traffic controllers, of the country in which the inquiry is held. The EEC has also had an input into accidents and investigations. In particular, a directive was issued in December 1980 encouraging the uniformity of standards within the EEC by means of joint co-operation of accident investigation. The sharing of and assisting with technical facilities and information was considered an important means of achieving these goals. It has since been proposed that a European accident investigation committee should be set up by the EEC (Council Directive 80/1266 of 1 December 1980). After I would like to introduce the summary of the legislation examples and system for aircraft accidents investigation of the United States, the United Kingdom, Canada, Germany, The Netherlands, Sweden, Swiss, New Zealand and Japan, and I am going to mention the present system, regulations and aviation act for the aircraft accident investigation in Korea. Furthermore I would like to point out the shortcomings of the present system and regulations and aviation act for the aircraft accident investigation and then I will suggest my personal opinion on the new and dramatic innovation on the system for aircraft accident investigation in Korea. I propose that it is necessary and desirable for us to make a new legislation or to revise the existing aviation act in order to establish the standing and independent Committee of Aircraft Accident Investigation under the Korean Government.

  • PDF