• Title/Summary/Keyword: Text mining analysis

Search Result 1,200, Processing Time 0.029 seconds

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

Analysis of Domestic and Foreign Local Biodiversity Strategies and Action Plan (LBSAP) using Semantic Network Analysis (언어네트워크 분석을 이용한 국내·외 지역생물다양성 전략 분석)

  • Lee, Hyeon-jae;Sung, Kijune
    • Journal of Environmental Impact Assessment
    • /
    • v.27 no.1
    • /
    • pp.92-104
    • /
    • 2018
  • The loss of biodiversity has become a global issue. In order to cope with this problem, national biodiversity strategies and action plan (NBSAP) at national level as well as local biodiversity strategies and action plan (LBSAP) at local level have been established in many countries. In this study, we analyzed 8 domestic LBSAPs and 41 foreign LBSAPs through semantic network analysis to investigate the characteristics of domestic and foreign LBSAPs. The results showed that conservation and management were the most used keywords in both domestic and foreign LBSAPs but the ranking of other keywords used in vision, goal, strategy, and action plan sector was different. Thus, it has been found that there is a difference between domestic and foreign practical approaches to conservation and management of biodiversity. Results of the network analysis showed that the domestic network has a more detailed distributed network, while the foreign network has a more comprehensive and integrally configured dense network. These differences may be due to differences of threats to biodiversity, problem recognition, or differences in local circumstances. These results are expected to help establish LBSAP in other region or to assess the local roles to achieve the strategic goals of the Convention on Biological Diversity.

Measurement of Classes Complexity in the Object-Oriented Analysis Phase (객체지향 분석 단계에서의 클래스 복잡도 측정)

  • Kim, Yu-Kyung;Park, Jai-Nyun
    • Journal of KIISE:Software and Applications
    • /
    • v.28 no.10
    • /
    • pp.720-731
    • /
    • 2001
  • Complexity metrics have been developed for the structured paradigm of software development are not suitable for use with the object-oriented(OO) paradigm, because they do not support key object-oriented concepts such as inheritance, polymorphism. message passing and encapsulation. There are many researches on OO software metrics such as program complexity or design metrics. But metrics measuring the complexity of classes at the OO analysis phase are needed because they provide earlier feedback to the development project. and earlier feedback means more effective developing and less costly maintenance. In this paper, we propose the new metrics to measure the complexity of analysis classes which draw out in the analysis based on RUP(Rational Unified Process). By the collaboration complexity, is denoted by CC, we mean the maximum number of the collaborations can be achieved with each of the collaborator and determine the potential complexity. And the interface complexity, is denoted by IC, shows the difficulty related to understand the interface of collaborators each other. We verify theoretically the suggested metrics for Weyuker's nine properties. Moreover, we show the computation results for analysis classes of the system which automatically respond to questions of the user using the text mining technique. As a result of the comparison of CC and CBO and WMC suggested by Chidamber and Kemerer, the class that have highly the proposed metric value maintain the high complexity at the design phase too. And the complexity can be represented by CC and IC more than CBO and WMC. We can expect that our metrics may provide us the earlier feedback and hence possible to predict the efforts, costs and time required to remainder processes. As a result, we expect to develop the cost-effective OO software by reviewing the complexity of analysis classes in the first stage of SDLC(Software Development Life Cycle).

  • PDF

Spatial analysis based on topic modeling using foreign tourist review data: Case of Daegu (외국인 관광객 리뷰데이터를 활용한 토픽모델링 기반의 공간분석: 대구광역시를 사례로)

  • Jung, Ji-Woo;Kim, Seo-Yun;Kim, Hyeon-Yu;Yoon, Ju-Hyeok;Jang, Won-Jun;Kim, Keun-Wook
    • Journal of Digital Convergence
    • /
    • v.19 no.8
    • /
    • pp.33-42
    • /
    • 2021
  • As smartphone-based tourism platforms have become active, policy establishment and service enhancement using review data are being made in various fields. In the case of the preceding studies using tourism review data, most of the studies centered on domestic tourists were conducted, and in the case of foreign tourist studies, studies were conducted only on data collected in some languages and text mining techniques. In this study, 3,515 review data written by foreigners were collected by designating the "Daegu attractions" keyword through the online review site. And LDA-based topic modeling was performed to derive tourism topics. The spatial approach through global and local spatial autocorrelation analysis for each topic can be said to be different from previous studies. As a result of the analysis, it was confirmed that there is a global spatial autocorrelation, and that tourist destinations mainly visited by foreigners are concentrated locally. In addition, hot spots have been drawn around Jung-gu in most of the topics. Based on the analysis results, it is expected to be used as a basic research for spatial analysis based on local government foreign tourism policy establishment and topic modeling. And The limitations of this study were also presented.

A study on the User Experience at Unmanned Checkout Counter Using Big Data Analysis (빅데이터 분석을 통한 무인계산대 사용자 경험에 관한 연구)

  • Kim, Ae-sook;Jung, Sun-mi;Ryu, Gi-hwan;Kim, Hee-young
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.2
    • /
    • pp.343-348
    • /
    • 2022
  • This study aims to analyze the user experience of unmanned checkout counters perceived by consumers using SNS big data. For this study, blogs, news, intellectuals, cafes, intellectuals (tips), and web documents were analyzed on Naver and Daum, and 'unmanned checkpoints' were used as keywords for data search. The data analysis period was selected as two years from January 1, 2020 to December 31, 2021. For data collection and analysis, frequency and matrix data were extracted through Textom, and network analysis and visualization analysis were conducted using the NetDraw function of the UCINET 6 program. As a result, the perception of the checkout counter was clustered into accessibility, usability, continuous use intention, and others according to the definition of consumers' experience factors. From a supplier's point of view, if unmanned checkpoints spread indiscriminately to solve the problem of raising the minimum wage and shortening working hours, a bigger employment problem will arise from a social point of view. In addition, institutionalization is needed to supply easy and convenient unmanned checkout counters for the elderly and younger generations, children, and foreigners who are not familiar with unmanned calculation.

A Study on Research Trends in Metaverse Platform Using Big Data Analysis (빅데이터 분석을 활용한 메타버스 플랫폼 연구 동향 분석)

  • Hong, Jin-Wook;Han, Jung-Wan
    • Journal of Digital Convergence
    • /
    • v.20 no.5
    • /
    • pp.627-635
    • /
    • 2022
  • As the non-face-to-face situation continues for a long time due to COVID-19, the underlying technologies of the 4th industrial revolution such as IOT, AR, VR, and big data are affecting the metaverse platform overall. Such changes in the external environment such as society and culture can affect the development of academics, and it is very important to systematically organize existing achievements in preparation for changes. The Korea Educational Research Information Service (RISS) collected data including the 'metaverse platform' in the keyword and used the text mining technique, one of the big data analysis. The collected data were analyzed for word cloud frequency, connection strength between keywords, and semantic network analysis to examine the trends of metaverse platform research. As a result of the study, keywords appeared in the order of 'use', 'digital', 'technology', and 'education' in word cloud analysis. As a result of analyzing the connection strength (N-gram) between keywords, 'Edue→Tech' showed the highest connection strength and a total of three clusters of word chain clusters were derived. Detailed research areas were classified into five areas, including 'digital technology'. Considering the analysis results comprehensively, It seems necessary to discover and discuss more active research topics from the long-term perspective of developing a metaverse platform.

Proposal of Promotion Strategy of Mobile Easy Payment Service Using Topic Modeling and PEST-SWOT Analysis (모바일 간편 결제 서비스 활성화 전략 : 토픽 모델링과 PEST - SWOT 분석 방법론을 기반으로)

  • Park, Seongwoo;Kim, Sehyoung;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.365-385
    • /
    • 2022
  • The easy payment service is a payment and remittance service that uses a simple authentication method. As online transactions have increased due to COVID-19, the use of an easy payment service is increasing. At the same time, electronic financial industries such as Naver Pay, Kakao Pay, and Toss are diversifying the competition structure of the easy payment market; meanwhile overseas fintech companies PayPal and Alibaba have a unique market share in their own countries, while competition is intensifying in the domestic easy payment market, as there is no unique market share. In this study, the participants in the easy payment market were classified as electronic financial companies, mobile phone manufacturers, and financial companies, and a SWOT analysis was conducted on the representative services in each industry. The analysis examined the user reviews of Google Play Store via a topic modeling analysis, and it employed positive topics as strengths and negative topics as weaknesses. In addition, topic modeling was conducted by dividing news articles into political, economic, social, and technology (PEST) articles to derive the opportunities and threats to easy payment services. Through this research, we intend to confirm the service capabilities of easy payment companies and propose a service activation strategy that allows gaining the upper hand in the market.

Wireless Earphone Consumers Using LDA Topic Modeling Comparative Analysis of Purchase Intention and Satisfaction: Focused on Samsung and Apple wireless earphone reviews in Coupang (LDA 토픽 모델링을 활용한 무선이어폰 소비자 구매 의도 및 만족도 비교 분석: 쿠팡에서의 삼성과 애플 무선이어폰 리뷰를 중심으로)

  • Tuul Yondon;Tae-Gu Kang
    • Journal of Industrial Convergence
    • /
    • v.21 no.8
    • /
    • pp.23-33
    • /
    • 2023
  • Consumer review analysis is important for product development, customer satisfaction, competitive advantage, and effective marketing. Increased use of wireless earphones is expected to reach $45.7 billion by 2026 with growth in lifestyle. Therefore, in consideration of the growth and importance of the market, consumer reviews of wireless earphones from Apple and Samsung were analyzed. In this study, 11,320 wireless earphone reviews from Apple and Samsung sold on Coupang were collected to analyze consumers' purchase intentions and analyze consumer satisfaction through analysis of the frequency, sensitivity, and LDA topic model of text mining. As a result of topic modeling, 16 topics were derived and classified into sound quality, connection, shopping mall service, purchase intention, battery, delivery, and price. As a result of brand comparison, Samsung purchased a lot for gift purposes, had a high positive sentiment for price, and Apple had a high positive sentiment for battery, sound quality, connection, service, and delivery. The results of this study can be used as data for related industries as a result of research that can obtain improvements and insights on customer satisfaction, quality and market trends, including manufacturing, retail, marketers, and consumers.

Analyzing the Issue Life Cycle by Mapping Inter-Period Issues (기간별 이슈 매핑을 통한 이슈 생명주기 분석 방법론)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.25-41
    • /
    • 2014
  • Recently, the number of social media users has increased rapidly because of the prevalence of smart devices. As a result, the amount of real-time data has been increasing exponentially, which, in turn, is generating more interest in using such data to create added value. For instance, several attempts are being made to analyze the relevant search keywords that are frequently used on new portal sites and the words that are regularly mentioned on various social media in order to identify social issues. The technique of "topic analysis" is employed in order to identify topics and themes from a large amount of text documents. As one of the most prevalent applications of topic analysis, the technique of issue tracking investigates changes in the social issues that are identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has two limitations. First, when a new period is included, topic analysis must be repeated for all the documents of the entire period, rather than being conducted only on the new documents of the added period. This creates practical limitations in the form of significant time and cost burdens. Therefore, this traditional approach is difficult to apply in most applications that need to perform an analysis on the additional period. Second, the issue is not only generated and terminated constantly, but also one issue can sometimes be distributed into several issues or multiple issues can be integrated into one single issue. In other words, each issue is characterized by a life cycle that consists of the stages of creation, transition (merging and segmentation), and termination. The existing issue tracking methods do not address the connection and effect relationship between these issues. The purpose of this study is to overcome the two limitations of the existing issue tracking method, one being the limitation regarding the analysis method and the other being the limitation involving the lack of consideration of the changeability of the issues. Let us assume that we perform multiple topic analysis for each multiple period. Then it is essential to map issues of different periods in order to trace trend of issues. However, it is not easy to discover connection between issues of different periods because the issues derived for each period mutually contain heterogeneity. In this study, to overcome these limitations without having to analyze the entire period's documents simultaneously, the analysis can be performed independently for each period. In addition, we performed issue mapping to link the identified issues of each period. An integrated approach on each details period was presented, and the issue flow of the entire integrated period was depicted in this study. Thus, as the entire process of the issue life cycle, including the stages of creation, transition (merging and segmentation), and extinction, is identified and examined systematically, the changeability of the issues was analyzed in this study. The proposed methodology is highly efficient in terms of time and cost, as it sufficiently considered the changeability of the issues. Further, the results of this study can be used to adapt the methodology to a practical situation. By applying the proposed methodology to actual Internet news, the potential practical applications of the proposed methodology are analyzed. Consequently, the proposed methodology was able to extend the period of the analysis and it could follow the course of progress of each issue's life cycle. Further, this methodology can facilitate a clearer understanding of complex social phenomena using topic analysis.

A Method of Analyzing Sentiment Polarity of Multilingual Social Media: A Case of Korean-Chinese Languages (다국어 소셜미디어에 대한 감성분석 방법 개발: 한국어-중국어를 중심으로)

  • Cui, Meina;Jin, Yoonsun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.91-111
    • /
    • 2016
  • It is crucial for the social media based marketing practices to perform sentiment analyze the unstructured data written by the potential consumers of their products and services. In particular, when it comes to the companies which are interested in global business, the companies must collect and analyze the data from the social media of multinational settings (e.g. Youtube, Instagram, etc.). In this case, since the texts are multilingual, they usually translate the sentences into a certain target language before conducting sentiment analysis. However, due to the lack of cultural differences and highly qualified data dictionary, translated sentences suffer from misunderstanding the true meaning. These result in decreasing the quality of sentiment analysis. Hence, this study aims to propose a method to perform a multilingual sentiment analysis, focusing on Korean-Chinese cases, while avoiding language translations. To show the feasibility of the idea proposed in this paper, we compare the performance of the proposed method with those of the legacy methods which adopt language translators. The results suggest that our method outperforms in terms of RMSE, and can be applied by the global business institutions.