• Title/Summary/Keyword: 데이터마이닝 의사결정나무 분석

Search Result 109, Processing Time 0.024 seconds

Effect of Mothers' Oral Health Knowledge and Behaviour on Dental Caries in Their Preschool Children (데이터마이닝을 이용한 유치치아우식증 관련요인 분석)

  • Kim, Jin-Soo;Kim, Hyo-Jin;Jorn, Hong-Suk
    • Journal of Korean society of Dental Hygiene
    • /
    • v.5 no.2
    • /
    • pp.171-184
    • /
    • 2005
  • In order to investigate correlation between mother's dental ca re for her children and their dental caries, this study was conducted wi th the dental examination record of 365 children who showed the same number of questionnaires with those examined for dental conditions and questionnaires written by mothers among children between three and six years of age and their mothers in Yeoncheon, Gyeonggi province in June 2004 to estimate frequency and percentage of general properties of subjects and mother's oral health care behaviors for her children by research items, to carry out cross-tabulation analysis and correlation analysis following Chi-square distribution for the presence of dental caries in deciduous teeth and oral health care behaviors, and to use decision tree analysis among data mining techniques for those factors associated with the presence of dental caries in deciduous teeth, and drew the following conclusions. 1. For mother's oral health care behaviors and attitudes for her children, 225 mothers(61.6%) confirmed their children's teeth-brushing; 278(76.2%) used no fluorine; and 286(78.6%) observed their children's teeth, 322 mothers(88.2%) instructed their children in teeth-brushing while 268 (73.4%) provided dental care, 232 mothers(63.7%) treated their children's cavity; 290(79.4%) believed that their children had good dental conditions; and 294(80.5%) answered that they began to provide their children with dental care in deciduous teeth. 2. As for the presence of dental caries in deciduous teeth and dental health care behaviors, there were statistically significant differences in employment, confirmation after teeth-brushing, teeth observation, instruction in time for teeth-brushing, use of fluorine, cavity treatment, time for dental care, and perception of dental conditions(p<0.05). 3. As for correlation between dental caries in deciduous teeth and oral health care behaviors, mothers who worked, who believed that their children didn't have good dental condition, and who thought that it was necessary to begin to provide dental care in permanent teeth were found to get their children to suffer from dental caries in deciduous teeth. Besides, those who failed to confirm teeth-brushing, who used no fluorine, and who failed to observe teeth and gave no instruction in time for teeth-brushing were shown to get their children to suffer from dental caries in deciduous teeth. 4. Variables to determine the presence of dental caries in deciduous teeth were classified by cavity treatment, mother's employment, time for dental care, and observation of children's teeth. The first node to determine the presence of dental caries in deciduous teeth was found to be cavity treatment; the next criteria for classification after cavity treatment were shown to be mother's employment and time for dental care. In case of children with no cavity, they were found to be mother's employment and teeth observation.

  • PDF

A Study on analysis of severity-adjustment length of stay in hospital for community-acquired pneumonia (지역사회획득 폐렴 환자의 중증도 보정 재원일수 분석)

  • Kim, Yoo-Mi;Choi, Yun-Kyoung;Kang, Sung-Hong;Kim, Won-Joong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.3
    • /
    • pp.1234-1243
    • /
    • 2011
  • Our study was carried out to develop the severity-adjustment model for length of stay in hospital for community-acquired pneumonia so that we analysed the factors on the variation in length of stay(LOS). The subjects were 5,353 community-acquired pneumonia inpatients of the Korean National Hospital Discharge In-depth Injury Survey data from 2004 through 2006. The data were analyzed using t-test and ANOVA and the severity-adjustment model was developed using data mining technique. There are differences according to gender, age, type of insurance, type of admission, but there is no difference of whether patients died in hospital. After yielding the standardized value of the difference between crude and expected length of stay, we analysed the variation of length of stay for community-acquired pneumonia. There was variation of LOS in regional differences and insurance type, though there was no variation according whether patients receive their care in their residences. The variation of length of stay controlling the case mix or severity of illness can be explained the factors of provider. This supply factors in LOS variations should be more studied for individual practice style or patient management practices and healthcare resources or environment. We expect that the severity-adjustment model using administrative databases should be more adapted in other diseases in practical.

Comparison of Hospital Standardized Mortality Ratio Using National Hospital Discharge Injury Data (퇴원손상심층조사 자료를 이용한 의료기관 중증도 보정 사망비 비교)

  • Park, Jong-Ho;Kim, Yoo-Mi;Kim, Sung-Soo;Kim, Won-Joong;Kang, Sung-Hong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.4
    • /
    • pp.1739-1750
    • /
    • 2012
  • This study was to develop the assessment of medical service outcome using administration data through compared with hospital standardized mortality ratios(HSMR) in various hospitals. This study analyzed 63,664 cases of Hospital Discharge Injury Data of 2007 and 2008, provided by Korea Centers for Disease Control and Prevention. We used data mining technique and compared decision tree and logistic regression for developing risk-adjustment model of in-hospital mortality. Our Analysis shows that gender, length of stay, Elixhauser comorbidity index, hospitalization path, and primary diagnosis are main variables which influence mortality ratio. By comparing hospital standardized mortality ratios(HSMR) with standardized variables, we found concrete differences (55.6-201.6) of hospital standardized mortality ratios(HSMR) among hospitals. This proves that there are quality-gaps of medical service among hospitals. This study outcome should be utilized more to achieve the improvement of the quality of medical service.

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

A Study on the Turbidity Estimation Model Using Data Mining Techniques in the Water Supply System (데이터마이닝 기법을 이용한 상수도 시스템 내의 탁도 예측모형 개발에 관한 연구)

  • Park, No-Suk;Kim, Soonho;Lee, Young Joo;Yoon, Sukmin
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.38 no.2
    • /
    • pp.87-95
    • /
    • 2016
  • Turbidity is a key indicator to the user that the 'Discolored Water' phenomenon known to be caused by corrosion of the pipeline in the water supply system. 'Discolored Water' is defined as a state with a turbidity of the degree to which the user visually be able to recognize water. Therefore, this study used data mining techniques in order to estimate turbidity changes in water supply system. Decision tree analysis was applied in data mining techniques to develop estimation models for turbidity changes in the water supply system. The pH and residual chlorine dataset was used as variables of the turbidity estimation model. As a result, the case of applying both variables(pH and residual chlorine) were shown more reasonable estimation results than models only using each variable. However, the estimation model developed in this study were shown to have underestimated predictions for the peak observed values. To overcome this disadvantage, a high-pass filter method was introduced as a pretreatment of estimation model. Modified model using high-pass filter method showed more exactly predictions for the peak observed values as well as improved prediction performance than the conventional model.

A Literature Review and Classification of Recommender Systems on Academic Journals (추천시스템관련 학술논문 분석 및 분류)

  • Park, Deuk-Hee;Kim, Hyea-Kyeong;Choi, Il-Young;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.139-152
    • /
    • 2011
  • Recommender systems have become an important research field since the emergence of the first paper on collaborative filtering in the mid-1990s. In general, recommender systems are defined as the supporting systems which help users to find information, products, or services (such as books, movies, music, digital products, web sites, and TV programs) by aggregating and analyzing suggestions from other users, which mean reviews from various authorities, and user attributes. However, as academic researches on recommender systems have increased significantly over the last ten years, more researches are required to be applicable in the real world situation. Because research field on recommender systems is still wide and less mature than other research fields. Accordingly, the existing articles on recommender systems need to be reviewed toward the next generation of recommender systems. However, it would be not easy to confine the recommender system researches to specific disciplines, considering the nature of the recommender system researches. So, we reviewed all articles on recommender systems from 37 journals which were published from 2001 to 2010. The 37 journals are selected from top 125 journals of the MIS Journal Rankings. Also, the literature search was based on the descriptors "Recommender system", "Recommendation system", "Personalization system", "Collaborative filtering" and "Contents filtering". The full text of each article was reviewed to eliminate the article that was not actually related to recommender systems. Many of articles were excluded because the articles such as Conference papers, master's and doctoral dissertations, textbook, unpublished working papers, non-English publication papers and news were unfit for our research. We classified articles by year of publication, journals, recommendation fields, and data mining techniques. The recommendation fields and data mining techniques of 187 articles are reviewed and classified into eight recommendation fields (book, document, image, movie, music, shopping, TV program, and others) and eight data mining techniques (association rule, clustering, decision tree, k-nearest neighbor, link analysis, neural network, regression, and other heuristic methods). The results represented in this paper have several significant implications. First, based on previous publication rates, the interest in the recommender system related research will grow significantly in the future. Second, 49 articles are related to movie recommendation whereas image and TV program recommendation are identified in only 6 articles. This result has been caused by the easy use of MovieLens data set. So, it is necessary to prepare data set of other fields. Third, recently social network analysis has been used in the various applications. However studies on recommender systems using social network analysis are deficient. Henceforth, we expect that new recommendation approaches using social network analysis will be developed in the recommender systems. So, it will be an interesting and further research area to evaluate the recommendation system researches using social method analysis. This result provides trend of recommender system researches by examining the published literature, and provides practitioners and researchers with insight and future direction on recommender systems. We hope that this research helps anyone who is interested in recommender systems research to gain insight for future research.

The Comparison of Risk-adjusted Mortality Rate between Korea and United States (한국과 미국 의료기관의 중증도 보정 사망률 비교)

  • Chung, Tae-Kyoung;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.11 no.5
    • /
    • pp.371-384
    • /
    • 2013
  • The purpose of this study was to develop the risk-adjusted mortality model using Korean Hospital Discharge Injury data and US National Hospital Discharge Survey data and to suggest some ways to manage hospital mortality rates through comparison of Korea and United States Hospital Standardized Mortality Ratios(HSMR). This study used data mining techniques, decision tree and logistic regression, for developing Korea and United States risk-adjustment model of in-hospital mortality. By comparing Hospital Standardized Mortality Ratio(HSMR) with standardized variables, analysis shows the concrete differences between the two countries. While Korean Hospital Standardized Mortality Ratio(HSMR) is increasing every year(101.0 in 2006, 101.3 in 2007, 103.3 in 2008), HSMR appeared to be reduced in the United States(102.3 in 2006, 100.7 in 2007, 95.9 in 2008). Korean Hospital Standardized Mortality Ratios(HSMR) by hospital beds were higher than that of the United States. A two-aspect approach to management of hospital mortality rates is suggested; national and hospital levels. The government is to release Hospital Standardized Mortality Ratio(HSMR) of large hospitals and to offer consulting on effective hospital mortality management to small and medium hospitals.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.