• Title/Summary/Keyword: decision tree and system analysis

Search Result 215, Processing Time 0.034 seconds

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

Predicting Corporate Bankruptcy using Simulated Annealing-based Random Fores (시뮬레이티드 어니일링 기반의 랜덤 포레스트를 이용한 기업부도예측)

  • Park, Hoyeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.155-170
    • /
    • 2018
  • Predicting a company's financial bankruptcy is traditionally one of the most crucial forecasting problems in business analytics. In previous studies, prediction models have been proposed by applying or combining statistical and machine learning-based techniques. In this paper, we propose a novel intelligent prediction model based on the simulated annealing which is one of the well-known optimization techniques. The simulated annealing is known to have comparable optimization performance to the genetic algorithms. Nevertheless, since there has been little research on the prediction and classification of business decision-making problems using the simulated annealing, it is meaningful to confirm the usefulness of the proposed model in business analytics. In this study, we use the combined model of simulated annealing and machine learning to select the input features of the bankruptcy prediction model. Typical types of combining optimization and machine learning techniques are feature selection, feature weighting, and instance selection. This study proposes a combining model for feature selection, which has been studied the most. In order to confirm the superiority of the proposed model in this study, we apply the real-world financial data of the Korean companies and analyze the results. The results show that the predictive accuracy of the proposed model is better than that of the naïve model. Notably, the performance is significantly improved as compared with the traditional decision tree, random forests, artificial neural network, SVM, and logistic regression analysis.

A comparative analysis of the related body compositions by riding-horse breed in Korea (국내 승용마의 체형상관에 따른 품종별 비교 분석)

  • Oh, Woon-Yong;Do, Kyoung-Tag;Cho, Byung-Wook;Park, Kyung-Do;Kim, Sung-Hoon;Lee, Hak-Kyo;Shin, Young-Soo;Cho, Young-Seuk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.3
    • /
    • pp.515-521
    • /
    • 2011
  • There are increasing demands for the producing and breeding new domestic riding horses for the vitalizations of horse riding industry in Korea, according as 'Horse Industry Support Act' became. In this study, we were to develop the functional relation through the conformation comparison & body composition analysis. 76 heads of 5 breeds utilized for riding horses in Korea were used and their body measurements on 12 items were measured and cluster analysis was conducted to determine the correlation relation among them. The measurements were standardized that (height, croup height, pelvis length), and (hip width, width of pelvis) were highly correlated. In these results of the decision tree, we confirmed to classify the breed type determination by their body measurements (hip height, hip width, head length, croup height). This result can be used as basic data for the development of horse type determination (racing, riding, Riding for the Disabled, Working, or fattening) through the analysis of body composition, and be utilized as the basic data for the producing and breeding new domestic riding horses through the 3D Stereosocpic image system analyze.

Development of Intelligent Internet Shopping Mall Supporting Tool Based on Software Agents and Knowledge Discovery Technology (소프트웨어 에이전트 및 지식탐사기술 기반 지능형 인터넷 쇼핑몰 지원도구의 개발)

  • 김재경;김우주;조윤호;김제란
    • Journal of Intelligence and Information Systems
    • /
    • v.7 no.2
    • /
    • pp.153-177
    • /
    • 2001
  • Nowadays, product recommendation is one of the important issues regarding both CRM and Internet shopping mall. Generally, a recommendation system tracks past actions of a group of users to make a recommendation to individual members of the group. The computer-mediated marketing and commerce have grown rapidly and thereby automatic recommendation methodologies have got great attentions. But the researches and commercial tools for product recommendation so far, still have many aspects that merit further considerations. To supplement those aspects, we devise a recommendation methodology by which we can get further recommendation effectiveness when applied to Internet shopping mall. The suggested methodology is based on web log information, product taxonomy, association rule mining, and decision tree learning. To implement this we also design and intelligent Internet shopping mall support system based on agent technology and develop it as a prototype system. We applied this methodology and the prototype system to a leading Korean Internet shopping mall and provide some experimental results. Through the experiment, we found that the suggested methodology can perform recommendation tasks both effectively and efficiently in real world problems. Its systematic validity issues are also discussed.

  • PDF

A Study on the Revitalization of Tourism Industry through Big Data Analysis (한국관광 실태조사 빅 데이터 분석을 통한 관광산업 활성화 방안 연구)

  • Lee, Jungmi;Liu, Meina;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.149-169
    • /
    • 2018
  • Korea is currently accumulating a large amount of data in public institutions based on the public data open policy and the "Government 3.0". Especially, a lot of data is accumulated in the tourism field. However, the academic discussions utilizing the tourism data are still limited. Moreover, the openness of the data of restaurants, hotels, and online tourism information, and how to use SNS Big Data in tourism are still limited. Therefore, utilization through tourism big data analysis is still low. In this paper, we tried to analyze influencing factors on foreign tourists' satisfaction in Korea through numerical data using data mining technique and R programming technique. In this study, we tried to find ways to revitalize the tourism industry by analyzing about 36,000 big data of the "Survey on the actual situation of foreign tourists from 2013 to 2015" surveyed by the Korea Culture & Tourism Research Institute. To do this, we analyzed the factors that have high influence on the 'Satisfaction', 'Revisit intention', and 'Recommendation' variables of foreign tourists. Furthermore, we analyzed the practical influences of the variables that are mentioned above. As a procedure of this study, we first integrated survey data of foreign tourists conducted by Korea Culture & Tourism Research Institute, which is stored in the tourist information system from 2013 to 2015, and eliminate unnecessary variables that are inconsistent with the research purpose among the integrated data. Some variables were modified to improve the accuracy of the analysis. And we analyzed the factors affecting the dependent variables by using data-mining methods: decision tree(C5.0, CART, CHAID, QUEST), artificial neural network, and logistic regression analysis of SPSS IBM Modeler 16.0. The seven variables that have the greatest effect on each dependent variable were derived. As a result of data analysis, it was found that seven major variables influencing 'overall satisfaction' were sightseeing spot attraction, food satisfaction, accommodation satisfaction, traffic satisfaction, guide service satisfaction, number of visiting places, and country. Variables that had a great influence appeared food satisfaction and sightseeing spot attraction. The seven variables that had the greatest influence on 'revisit intention' were the country, travel motivation, activity, food satisfaction, best activity, guide service satisfaction and sightseeing spot attraction. The most influential variables were food satisfaction and travel motivation for Korean style. Lastly, the seven variables that have the greatest influence on the 'recommendation intention' were the country, sightseeing spot attraction, number of visiting places, food satisfaction, activity, tour guide service satisfaction and cost. And then the variables that had the greatest influence were the country, sightseeing spot attraction, and food satisfaction. In addition, in order to grasp the influence of each independent variables more deeply, we used R programming to identify the influence of independent variables. As a result, it was found that the food satisfaction and sightseeing spot attraction were higher than other variables in overall satisfaction and had a greater effect than other influential variables. Revisit intention had a higher ${\beta}$ value in the travel motive as the purpose of Korean Wave than other variables. It will be necessary to have a policy that will lead to a substantial revisit of tourists by enhancing tourist attractions for the purpose of Korean Wave. Lastly, the recommendation had the same result of satisfaction as the sightseeing spot attraction and food satisfaction have higher ${\beta}$ value than other variables. From this analysis, we found that 'food satisfaction' and 'sightseeing spot attraction' variables were the common factors to influence three dependent variables that are mentioned above('Overall satisfaction', 'Revisit intention' and 'Recommendation'), and that those factors affected the satisfaction of travel in Korea significantly. The purpose of this study is to examine how to activate foreign tourists in Korea through big data analysis. It is expected to be used as basic data for analyzing tourism data and establishing effective tourism policy. It is expected to be used as a material to establish an activation plan that can contribute to tourism development in Korea in the future.

Economic Evaluation and Budget Impact Analysis of the Surveillance Program for Hepatocellular Carcinoma in Thai Chronic Hepatitis B Patients

  • Sangmala, Pannapa;Chaikledkaew, Usa;Tanwandee, Tawesak;Pongchareonsuk, Petcharat
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.20
    • /
    • pp.8993-9004
    • /
    • 2014
  • Background: The incidence rate and the treatment costs of hepatocellular carcinoma (HCC) are high, especially in Thailand. Previous studies indicated that early detection by a surveillance program could help by down-staging. This study aimed to compare the costs and health outcomes associated with the introduction of a HCC surveillance program with no program and to estimate the budget impact if the HCC surveillance program were implemented. Materials and Methods: A cost utility analysis using a decision tree and Markov models was used to compare costs and outcomes during the lifetime period based on a societal perspective between alternative HCC surveillance strategies with no program. Costs included direct medical, direct non-medical, and indirect costs. Health outcomes were measured as life years (LYs), and quality adjusted life years (QALYs). The results were presented in terms of the incremental cost-effectiveness ratio (ICER) in Thai THB per QALY gained. One-way and probabilistic sensitivity analyses were applied to investigate parameter uncertainties. Budget impact analysis (BIA) was performed based on the governmental perspective. Results: Semi-annual ultrasonography (US) and semi-annual ultrasonography plus alpha-fetoprotein (US plus AFP) as the first screening for HCC surveillance would be cost-effective options at the willingness to pay (WTP) threshold of 160,000 THB per QALY gained compared with no surveillance program (ICER=118,796 and ICER=123,451 THB/QALY), respectively. The semi-annual US plus AFP yielded more net monetary benefit, but caused a substantially higher budget (237 to 502 million THB) than semi-annual US (81 to 201 million THB) during the next ten fiscal years. Conclusions: Our results suggested that a semi-annual US program should be used as the first screening for HCC surveillance and included in the benefit package of Thai health insurance schemes for both chronic hepatitis B males and females aged between 40-50 years. In addition, policy makers considered the program could be feasible, but additional evidence is needed to support the whole prevention system before the implementation of a strategic plan.

A Literature Review and Classification of Recommender Systems on Academic Journals (추천시스템관련 학술논문 분석 및 분류)

  • Park, Deuk-Hee;Kim, Hyea-Kyeong;Choi, Il-Young;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.139-152
    • /
    • 2011
  • Recommender systems have become an important research field since the emergence of the first paper on collaborative filtering in the mid-1990s. In general, recommender systems are defined as the supporting systems which help users to find information, products, or services (such as books, movies, music, digital products, web sites, and TV programs) by aggregating and analyzing suggestions from other users, which mean reviews from various authorities, and user attributes. However, as academic researches on recommender systems have increased significantly over the last ten years, more researches are required to be applicable in the real world situation. Because research field on recommender systems is still wide and less mature than other research fields. Accordingly, the existing articles on recommender systems need to be reviewed toward the next generation of recommender systems. However, it would be not easy to confine the recommender system researches to specific disciplines, considering the nature of the recommender system researches. So, we reviewed all articles on recommender systems from 37 journals which were published from 2001 to 2010. The 37 journals are selected from top 125 journals of the MIS Journal Rankings. Also, the literature search was based on the descriptors "Recommender system", "Recommendation system", "Personalization system", "Collaborative filtering" and "Contents filtering". The full text of each article was reviewed to eliminate the article that was not actually related to recommender systems. Many of articles were excluded because the articles such as Conference papers, master's and doctoral dissertations, textbook, unpublished working papers, non-English publication papers and news were unfit for our research. We classified articles by year of publication, journals, recommendation fields, and data mining techniques. The recommendation fields and data mining techniques of 187 articles are reviewed and classified into eight recommendation fields (book, document, image, movie, music, shopping, TV program, and others) and eight data mining techniques (association rule, clustering, decision tree, k-nearest neighbor, link analysis, neural network, regression, and other heuristic methods). The results represented in this paper have several significant implications. First, based on previous publication rates, the interest in the recommender system related research will grow significantly in the future. Second, 49 articles are related to movie recommendation whereas image and TV program recommendation are identified in only 6 articles. This result has been caused by the easy use of MovieLens data set. So, it is necessary to prepare data set of other fields. Third, recently social network analysis has been used in the various applications. However studies on recommender systems using social network analysis are deficient. Henceforth, we expect that new recommendation approaches using social network analysis will be developed in the recommender systems. So, it will be an interesting and further research area to evaluate the recommendation system researches using social method analysis. This result provides trend of recommender system researches by examining the published literature, and provides practitioners and researchers with insight and future direction on recommender systems. We hope that this research helps anyone who is interested in recommender systems research to gain insight for future research.

Application of HACCP System on Establishing Hygienic Standards in Pizza Specialty Restaurant - Focused on Salad Items - (HACCP제도를 활용한 피자 전문 패스트푸드 업체의 자체 위생관리기준 설정 - 샐러드를 중심으로 -)

  • Lee Bog-Hieu;Kim In-Ho;Huh Kyoung-Sook;Cho Kyong-Dong
    • Journal of the Korean Home Economics Association
    • /
    • v.41 no.10 s.188
    • /
    • pp.101-116
    • /
    • 2003
  • The study was conducted to establish hygienic standards of salad items for pizza restaurant located in Seoul by applying HACCP system during the summer of 2000. The study measured temperature, time, pH, Aw and microbial assessments. The hygienic conditions of the kitchen and workers were on the average(1.21, 1.0 out of 3 pts.), but some improvement should be made: separate use of trash can and leftover disposal, separate use of knives and cutting boards, habits for hand washing and wearing hygienic gloves. For salad production, all procedures were peformed under food safety danger zone ($5{\~}60^{\circ}C$). The ingredients were mostly above pH 5.0 and high in Aw($0.94{\~}0.99$). Microbial assessments for salad production revealed that TPC($1.8{\times}10^3{\~}1.0{\times}10^{10}CFU/g$) and coliforms($1.5{\times}10{\~}5.2{\times}10^5 CFU/g$) exceeded the standards by Solberg et al.(TPC: $10^6CFU/g$, coliforms: $10^3CFU/g$). S. aureus was not detected but Salmonella was found in three food items(egg, macaroni and macaroni salad). Moreover, the workers' hands contained 3.1 104 CFU/g of TPC and 4.2 102 CFU/g of S. aureus requiring further remedy since it exceeded the safety standards suggested by Harrigan and McCance (500 CFU/g of TPC per $100cm^2$ and 10 CFU/g of coliforms per $100cm^2$). According to the critical control point(CCP) decision tree analysis, vegetable receiving, vegetable holding, mixing, display on coleslaw, macaroni draining, display on macaroni salad, egg peeling & cutting, apple cutting, and display on salad bar were determined as CCPs. From the findings it would be suggested that purchase of Quality materials, short holding and display time, storing food at right temperature, using sanitary cooking utensils, and improvement of workers' food handing practices are needed to ensure the safe salad production in this specific pizza restaurant.

A Study on the Effect of the Document Summarization Technique on the Fake News Detection Model (문서 요약 기법이 가짜 뉴스 탐지 모형에 미치는 영향에 관한 연구)

  • Shim, Jae-Seung;Won, Ha-Ram;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.201-220
    • /
    • 2019
  • Fake news has emerged as a significant issue over the last few years, igniting discussions and research on how to solve this problem. In particular, studies on automated fact-checking and fake news detection using artificial intelligence and text analysis techniques have drawn attention. Fake news detection research entails a form of document classification; thus, document classification techniques have been widely used in this type of research. However, document summarization techniques have been inconspicuous in this field. At the same time, automatic news summarization services have become popular, and a recent study found that the use of news summarized through abstractive summarization has strengthened the predictive performance of fake news detection models. Therefore, the need to study the integration of document summarization technology in the domestic news data environment has become evident. In order to examine the effect of extractive summarization on the fake news detection model, we first summarized news articles through extractive summarization. Second, we created a summarized news-based detection model. Finally, we compared our model with the full-text-based detection model. The study found that BPN(Back Propagation Neural Network) and SVM(Support Vector Machine) did not exhibit a large difference in performance; however, for DT(Decision Tree), the full-text-based model demonstrated a somewhat better performance. In the case of LR(Logistic Regression), our model exhibited the superior performance. Nonetheless, the results did not show a statistically significant difference between our model and the full-text-based model. Therefore, when the summary is applied, at least the core information of the fake news is preserved, and the LR-based model can confirm the possibility of performance improvement. This study features an experimental application of extractive summarization in fake news detection research by employing various machine-learning algorithms. The study's limitations are, essentially, the relatively small amount of data and the lack of comparison between various summarization technologies. Therefore, an in-depth analysis that applies various analytical techniques to a larger data volume would be helpful in the future.

An Empirical Study of Profiling Model for the SMEs with High Demand for Standards Using Data Mining (데이터마이닝을 이용한 표준정책 수요 중소기업의 프로파일링 연구: R&D 동기와 사업화 지원 정책을 중심으로)

  • Jun, Seung-pyo;Jung, JaeOong;Choi, San
    • Journal of Korea Technology Innovation Society
    • /
    • v.19 no.3
    • /
    • pp.511-544
    • /
    • 2016
  • Standards boost technological innovation by promoting information sharing, compatibility, stability and quality. Identifying groups of companies that particularly benefit from these functions of standards in their technological innovation and commercialization helps to customize planning and implementation of standards-related policies for demand groups. For this purpose, this study engages in profiling of SMEs whose R&D objective is to respond to standards as well as those who need to implement standards system for technological commercialization. Then it suggests a prediction model that can distinguish such companies from others. To this end, decision tree analysis is conducted for profiling of characteristics of subject SMEs through data mining. Subject SMEs include (1) those that engage in R&D to respond to standards (Group1) or (2) those in need of product standard or technological certification policies for commercialization purposes (Group 2). Then the study proposes a prediction model that can distinguish Groups 1 and 2 from others based on several variables by adopting discriminant analysis. The practicality of discriminant formula is statistically verified. The study suggests that Group 1 companies are distinguished in variables such as time spent on R&D planning, KoreanStandardIndustryClassification (KSIC) category, number of employees and novelty of technologies. Profiling result of Group 2 companies suggests that they are differentiated in variables such as KSIC category, major clients of the companies, time spent on R&D and ability to test and verify their technologies. The prediction model proposed herein is designed based on the outcomes of profiling and discriminant analysis. Its purpose is to serve in the planning or implementation processes of standards-related policies through providing objective information on companies in need of relevant support and thereby to enhance overall success rate of standards-related projects.