• Title/Summary/Keyword: social media data

Search Result 1,210, Processing Time 0.03 seconds

A Comparative Analysis of Teukyakmeip and Consignment (특약매입과 콘사인먼트 비교분석)

  • Kim, Dong-Ho;Kim, Sung-Soo;Jung, Myung-Hee;Youn, Myoung-Kil
    • Journal of Distribution Science
    • /
    • v.12 no.4
    • /
    • pp.5-9
    • /
    • 2014
  • Purpose - The purpose of this study was to compare and contrast the applicability and effectiveness of both teukyakmeip contracts of Korea and consignment contracts of the United State to demonstrate the effectiveness and practicability of teukyakmeip in Korea. These are popular contract agreements between large retailers and their suppliers and vendors. In recent years, teukyakmeip was critically examined and scrutinized by the politicians, the media, and the public of Korea. Consequently, this paper focusesheavily on identifying and analyzing different types of contract agreements between large retailers and their suppliers that currently exist in Korea and compares and contrasts those analyzed contract agreements with teukyakmeip. The article also comparesand contrasts teukyakmeip with the consignment agreements of the United States to identify similarities and differences. Research design, data, and methodology - This study is a descriptive study and has used personal interviews to collect and analyze the data. This study also fits the definition of the case study wherein it is entirely focused on investigating a real-life event: analyzing and examining contract agreements in the distribution industry. Both randomly selected management and vendor representatives from the three major department stores, Lotte, Hyundai, and Shinsegae, in Korea were interviewed between July and September 2013. The analysis of the consignment agreement was conducted based on existing secondary data. Results - Although the evidence of the abuse of teukyakmeip and consignment by large retailers from both countries clearly exists, the findings suggestthat both contract agreements would remain as the most relevant and effective legal contracts between large retailers and their suppliers. Based on the comparisonanalysis of teukyakmeip and consignment, both contracts indicated that suppliers are fully responsible for inventory and inventory management. If sales person is necessary for promoting special product, then suppliers are responsible for providing a sales person and their wages under both contracts. However, American department stores, those located outside urban area, tend to use their own employees to perform special product and sales promotion. The retailersare fully responsible for any interior or floor design or redesign of the retail store to accommodate the products from vendors under consignment; however, both suppliers and retailers share the cost of designing and redesigning the interior to accommodate vendors'products under teukyakmeip. Suppliers are responsible for pricing and supplying the quantity of the products under both agreements. Both contracts allow special sales commission as long as vendors agreed. Vendors use this special commissionto introduce their new products or apply market penetration strategy. Conclusions -The findings of this study showed the changing pattern of contract agreements between large retailers and their suppliers from both countries. Furthermore, this study evidently generated policy implications of teukyakmeip which recently became the major social issue in Korea and attracted many policymakers to gain political points by criticizing the teukyakmeip system and the large retailers. The findings of the study would be valuable to policy makers in making appropriate decisions and to large retailers and vendors in making beneficial agreements. The major implication of this study is that teukyakmeip and consignment agreements include very similar or almost identical characteristics, and they are popular among department stores and suppliers. The issue of abolishing teukyakmeip in Korea needs to be examined cautiously because teukyakmeip is the best one available at the moment, and the study suggests that no one benefits from abolishing this system.

A Study on the Care Needs of Family-Caregivers to the Patients with Stroke (뇌졸중환자 가족의 간호요구)

  • Kim Mi-Hee
    • Journal of Korean Academy of Fundamentals of Nursing
    • /
    • v.4 no.2
    • /
    • pp.175-192
    • /
    • 1997
  • The purpose of this study was to identify the care needs of family-caregivers to the patients with stroke. Subjects were 115 family-caregivers caring for the patients while they were in-patients or out-patients with stroke in two general hospitals and one oriental medicine hospital located in Seoul and Kwang-Ju. The instrument used for this study was made by the researcher on the basis of results of literature review and interviews with family-caregivers, composed of 35 items. Internal validity by calculation of cronbach's alpha with data of respondents was 0.91, which was regarded as high. The Data were analyzed by SAS program, with percentage, mean, t-test, and ANOVA. Factor structures of care needs of family-caregivers were elicited by factor analysis(PCA, Varimax rotation). Datum collection had been from July 1 to August 14, 1997. The results of this study were as follows : 1. The mean score of the sum of the care needs of family-caregivers was 3.96 and the highest-mean item was 'need for immediate care(M=4.77)', and the lowest-mean item was 'need for chaplian's visit (M=2.82)'. 2. Care needs of the family-caregivers were : Need to be informed of the disease, treatment and care ; need of education and assistance related to physical functional level ; need of social support and consultation ; need of management of nursing problem related to immobility ; need of appreciation ; need of the way to communicate with patients ; need of immediate care and help. The highest mean factor was the 'need for immediate care and help(M=4.74)', and the lowest mean factor was the 'need of appreciation(M=3.58)'. 3. The variables influencing the degree of care needs perceived by family-caregivers to the patients with stroke were as follows : There were significant differences between need to be informed of the disease, treatment and care and general characteristic factors, which were family caregiver's sex (p=.0178), caring period(p=.0223) and patient's suffering period(p=.0244). There were significant differences between need of education and assistance related to physical functional level and general characteristic factors, which were patient's paralysis(p=.0177), patient's ADL dependency(p=.0032). There were significant differences between need of social support and consultation and general characteristic factors, which were family caregiver's sex(p=.0055), occupation(p=.0159), religion(p=.0093) and patient's sex(p=.0134). There was significant difference in the degree of need of management of nursing problem related to immobility, according to the patient's ADL dependency(p=.0493). There were significant differences between need of appreciation and general characteristic factors, which were family caregiver's age(p=.0107), sex(p=.0133), and patient's age(p=.0338). There were significant differences between need of the way to communicate with patient and general characteristic factors, which were patient's paralysis(p=.0002) and aphasia(p=.0001). There were significant differences between need of immediate care and help and general characteristic factors, which were family caregiver's caring period(p=.0162) and patient's suffering period(p=.0116). 4. The mean score of patient's ADL dependency was 3. 38 and the highest-mean item was 'ascending and descending stairs(M=4.12)', and the lowest-mean item was 'drinking(M=2.60)'. There was no significant difference in the degrees of care needs related to the patient's ADL dependency. 5. The highest information source of family-caregivers was from the doctors about the disease, treatment and care(26.1%). The second highest one was from mass media(20.8%), and the third one was from the nurses. The above findings may be used as the basic data to seek more efficient way of elevating nursing practice and quality for family-caregivers to the patients with stroke.

  • PDF

Product Community Analysis Using Opinion Mining and Network Analysis: Movie Performance Prediction Case (오피니언 마이닝과 네트워크 분석을 활용한 상품 커뮤니티 분석: 영화 흥행성과 예측 사례)

  • Jin, Yu;Kim, Jungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.49-65
    • /
    • 2014
  • Word of Mouth (WOM) is a behavior used by consumers to transfer or communicate their product or service experience to other consumers. Due to the popularity of social media such as Facebook, Twitter, blogs, and online communities, electronic WOM (e-WOM) has become important to the success of products or services. As a result, most enterprises pay close attention to e-WOM for their products or services. This is especially important for movies, as these are experiential products. This paper aims to identify the network factors of an online movie community that impact box office revenue using social network analysis. In addition to traditional WOM factors (volume and valence of WOM), network centrality measures of the online community are included as influential factors in box office revenue. Based on previous research results, we develop five hypotheses on the relationships between potential influential factors (WOM volume, WOM valence, degree centrality, betweenness centrality, closeness centrality) and box office revenue. The first hypothesis is that the accumulated volume of WOM in online product communities is positively related to the total revenue of movies. The second hypothesis is that the accumulated valence of WOM in online product communities is positively related to the total revenue of movies. The third hypothesis is that the average of degree centralities of reviewers in online product communities is positively related to the total revenue of movies. The fourth hypothesis is that the average of betweenness centralities of reviewers in online product communities is positively related to the total revenue of movies. The fifth hypothesis is that the average of betweenness centralities of reviewers in online product communities is positively related to the total revenue of movies. To verify our research model, we collect movie review data from the Internet Movie Database (IMDb), which is a representative online movie community, and movie revenue data from the Box-Office-Mojo website. The movies in this analysis include weekly top-10 movies from September 1, 2012, to September 1, 2013, with in total. We collect movie metadata such as screening periods and user ratings; and community data in IMDb including reviewer identification, review content, review times, responder identification, reply content, reply times, and reply relationships. For the same period, the revenue data from Box-Office-Mojo is collected on a weekly basis. Movie community networks are constructed based on reply relationships between reviewers. Using a social network analysis tool, NodeXL, we calculate the averages of three centralities including degree, betweenness, and closeness centrality for each movie. Correlation analysis of focal variables and the dependent variable (final revenue) shows that three centrality measures are highly correlated, prompting us to perform multiple regressions separately with each centrality measure. Consistent with previous research results, our regression analysis results show that the volume and valence of WOM are positively related to the final box office revenue of movies. Moreover, the averages of betweenness centralities from initial community networks impact the final movie revenues. However, both of the averages of degree centralities and closeness centralities do not influence final movie performance. Based on the regression results, three hypotheses, 1, 2, and 4, are accepted, and two hypotheses, 3 and 5, are rejected. This study tries to link the network structure of e-WOM on online product communities with the product's performance. Based on the analysis of a real online movie community, the results show that online community network structures can work as a predictor of movie performance. The results show that the betweenness centralities of the reviewer community are critical for the prediction of movie performance. However, degree centralities and closeness centralities do not influence movie performance. As future research topics, similar analyses are required for other product categories such as electronic goods and online content to generalize the study results.

Literature Review on Applying Digital Therapeutic Art Therapy for Adolescent Substance Addiction Treatment (청소년 마약류 중독 치료를 위한 디지털치료제 예술치료 적용을 위한 문헌연구)

  • Jiwon Kim;Daniel H. Byun
    • Trans-
    • /
    • v.16
    • /
    • pp.1-31
    • /
    • 2024
  • The advent of digital media has facilitated easy access for adolescents to environments conducive to the purchase of narcotics. In particular, there's an increasing trend in the purchase and consumption of narcotics mediated through Social Network Services (SNS) and messenger services. Adolescents, sensitive to such environments, are at risk of experiencing neurological and mental health issues due to narcotic addiction, increasing their exposure to criminal activities, hence necessitating national-level management and support. Consequently, the quest for sustainable treatment methods for adolescents exposed to narcotics emerges as a critical challenge. In the context of high relapse rates in narcotic addiction, the necessity for cost-effective and user-friendly treatment programs is emphasized. This study conducts a literature review aimed at utilizing digital platforms to create an environment where adolescents can voluntarily participate, focusing on the development of therapeutic content through art. Specifically, it reviews societal perceptions and treatment statuses of adolescent drug addiction, analyzes the impact of narcotic addiction on adolescent brain activity and cognitive function degradation, and explores approaches for developing digital therapeutics to promote the rehabilitation of the addicted brain through analysis of precedential case studies. Moreover, the study investigates the benefits that the integration of digital therapeutic approaches and art therapy can provide in the treatment process and proposes the possibility of enhancing therapeutic effects through various treatment programs such as drama therapy, music therapy, and art therapy. The application of art therapy methods is anticipated to offer positive effects in terms of tool expansion, diversification of expression, data acquisition, and motivation. Through such approaches, an enhancement in the effectiveness of treatments for adolescent narcotic addiction is anticipated. Overall, this study undertakes foundational research for the development of digital therapeutics and related applications, offering economically viable and sustainable treatment options in consideration of the societal context of adolescent narcotic addiction.

An Analysis of the Internal Marketing Impact on the Market Capitalization Fluctuation Rate based on the Online Company Reviews from Jobplanet (직원을 위한 내부마케팅이 기업의 시가 총액 변동률에 미치는 영향 분석: 잡플래닛 기업 리뷰를 중심으로)

  • Kichul Choi;Sang-Yong Tom Lee
    • Information Systems Review
    • /
    • v.20 no.2
    • /
    • pp.39-62
    • /
    • 2018
  • Thanks to the growth of computing power and the recent development of data analytics, researchers have started to work on the data produced by users through the Internet or social media. This study is in line with these recent research trends and attempts to adopt data analytical techniques. We focus on the impact of "internal marketing" factors on firm performance, which is typically studied through survey methodologies. We looked into the job review platform Jobplanet (www.jobplanet.co.kr), which is a website where employees and former employees anonymously review companies and their management. With web crawling processes, we collected over 40K data points and performed morphological analysis to classify employees' reviews for internal marketing data. We then implemented econometric analysis to see the relationship between internal marketing and market capitalization. Contrary to the findings of extant survey studies, internal marketing is positively related to a firm's market capitalization only within a limited area. In most of the areas, the relationships are negative. Particularly, female-friendly environment and human resource development (HRD) are the areas exhibiting positive relations with market capitalization in the manufacturing industry. In the service industry, most of the areas, such as employ welfare and work-life balance, are negatively related with market capitalization. When firm size is small (or the history is short), female-friendly environment positively affect firm performance. On the contrary, when firm size is big (or the history is long), most of the internal marketing factors are either negative or insignificant. We explain the theoretical contributions and managerial implications with these results.

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.

A Study on Education Need and Satisfaction of the KNOU Nursing Students (방송대 간호학생의 교육요구 및 만족에 관한 연구)

  • Lee, Sun-Ock;Kim, Young-Im;Lee, Sang-Me
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.2
    • /
    • pp.75-94
    • /
    • 1996
  • This survey study was aimed at identifying the degree of educational need of the KNOU(Korea National Open University) nursing students defined as admission purposes, satisfaction of distance learning education, learning methods, and courses after graduation. Among randomly assigned 1000 students, 320 KNOU nursing students who allowed to participate in the study completed the questionnaires. The data were analyzed using descriptive statistics, chi-square test, and t-test, Results of this study were as follows : 1. The admission purposes of the KNOU nursing students were 'in order to get a bachelor's degree(83.8%)', 'to acquire knowledge for task(61.3%)', or 'to be admitted for the graduate school (53.1%)' etc. Comparing the admission purposes by age, tow items- 'to explore new possibilities for myself' and 'excellent curriculum' showed statistically significant differences. These two items were also found to show significant differences by marital status. 2. For the media maintenance, the results showed that students use their own cassett radios(96.3%), VTR(49.4%), TV only for the study (44.1%), personal computer (3.31%), or joining Hitel (6.3%). 3. Listening rates of the radio lecture were 'over 80%(9.1%)', '50-80%(9.1%)', '20-50%(18.1%)', 'below 20%(30%)' and 'never(33.1%)', And record lecture showed listening rates as follows : 'over80%(17.2%)', '50-80%(15.9%)', '20-50%(24.4%)', 'below 20%(27,2%)' and 'never(14.4%)'. 4. The difficulties with KNOU life were 'listening radio lectures(38.8%)', studying by following teaching schedules (37.8%)', 'isolated self-study(10.3%)', and 'appearance in the attending classes(8.1%)'. 5. As for satisfaction with teaching methods, the data showed that 81.2% of the respondents were satisfied (or very satisfied) with 'attending classes' and 75%, with 'paper lectures'. On the other hand some of respondents were very dissatisfied with 'recorded lecture(12.8%)' and 'radio lecture(10.9%)' 6. The results also showed that the students want to have 'video conferencing lecture(77.2%)', 'cable TV(64.1%)' and 'CD ROM program' to improve learning effects. 7. Concerning learning attitudes, 48.8% of the students reported 'study mainly for examnination', and only 4.1% answered 'study every day with plan'. The learning attitude showed significant differences by marital status and age. The students also evaluated themelves as 'study very hard(5.9%)', 'study hard in general(41.6%)', 'study a little(40.3%)' and 'study little(11.9%)'. 8. The students responded the most effective learning material was the 'textbook (92.2%)'. 9. For the purposes of using the local center, the results showed 'for the attending classes(76.3%)', 'for the use of references(14.7%)', and 'for the study group(66.7%)'. 10. The results revealed that 20.3% of the respondents had ever experienced unregistration or temporary withdrawal, and 53.4% among them did not register more than one time. The most common reason for the unregistration was 'due to family affairs or their job (70.8%)'. 11. 88.1% of the respondents answered 'they will graduate without fail'. 12. Regarding the benefits from the KNOU graduation, respondents indicated 'graduate school admission(38.1%)', 'self-confidence in social life(17.5%)', and understanding social problems (10.9%)'. 13. 64.4% of the students showed that they have intention to enter the graduate school. The item 'changing work place' showed statistically significant differences by marital status and age.

  • PDF

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

A study on detective story authors' style differentiation and style structure based on Text Mining (텍스트 마이닝 기법을 활용한 고전 추리 소설 작가 간 문체적 차이와 문체 구조에 대한 연구)

  • Moon, Seok Hyung;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.89-115
    • /
    • 2019
  • This study was conducted to present the stylistic differences between Arthur Conan Doyle and Agatha Christie, famous as writers of classical mystery novels, through data analysis, and further to present the analytical methodology of the study of style based on text mining. The reason why we chose mystery novels for our research is because the unique devices that exist in classical mystery novels have strong stylistic characteristics, and furthermore, by choosing Arthur Conan Doyle and Agatha Christie, who are also famous to the general reader, as subjects of analysis, so that people who are unfamiliar with the research can be familiar with them. The primary objective of this study is to identify how the differences exist within the text and to interpret the effects of these differences on the reader. Accordingly, in addition to events and characters, which are key elements of mystery novels, the writer's grammatical style of writing was defined in style and attempted to analyze it. Two series and four books were selected by each writer, and the text was divided into sentences to secure data. After measuring and granting the emotional score according to each sentence, the emotions of the page progress were visualized as a graph, and the trend of the event progress in the novel was identified under eight themes by applying Topic modeling according to the page. By organizing co-occurrence matrices and performing network analysis, we were able to visually see changes in relationships between people as events progressed. In addition, the entire sentence was divided into a grammatical system based on a total of six types of writing style to identify differences between writers and between works. This enabled us to identify not only the general grammatical writing style of the author, but also the inherent stylistic characteristics in their unconsciousness, and to interpret the effects of these characteristics on the reader. This series of research processes can help to understand the context of the entire text based on a defined understanding of the style, and furthermore, by integrating previously individually conducted stylistic studies. This prior understanding can also contribute to discovering and clarifying the existence of text in unstructured data, including online text. This could help enable more accurate recognition of emotions and delivery of commands on an interactive artificial intelligence platform that currently converts voice into natural language. In the face of increasing attempts to analyze online texts, including New Media, in many ways and discover social phenomena and managerial values, it is expected to contribute to more meaningful online text analysis and semantic interpretation through the links to these studies. However, the fact that the analysis data used in this study are two or four books by author can be considered as a limitation in that the data analysis was not attempted in sufficient quantities. The application of the writing characteristics applied to the Korean text even though it was an English text also could be limitation. The more diverse stylistic characteristics were limited to six, and the less likely interpretation was also considered as a limitation. In addition, it is also regrettable that the research was conducted by analyzing classical mystery novels rather than text that is commonly used today, and that various classical mystery novel writers were not compared. Subsequent research will attempt to increase the diversity of interpretations by taking into account a wider variety of grammatical systems and stylistic structures and will also be applied to the current frequently used online text analysis to assess the potential for interpretation. It is expected that this will enable the interpretation and definition of the specific structure of the style and that various usability can be considered.