• Title/Summary/Keyword: Decision Tree analysis

Search Result 725, Processing Time 0.03 seconds

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.

Analysis of employee's characteristic using data visualization (데이터 시각화를 이용한 취업자 특성분석)

  • Cho, Jang Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.4
    • /
    • pp.727-736
    • /
    • 2014
  • The fundamental concerns of this paper are to analyze the effects of some characteristics on the employment of new college graduated students in viewpoint of data visualization. We use individual and department characteristic data of K-university graduated students in 2010. We apply multiple correspondence analysis, decision tree analysis, association rules and social network analysis for data visualization. The results of the analysis are summarized as follows. First, an analysis of the determinants of employment shows that GPA, department category, age and number of majors, recruiting time affect the employment rate. Second, higher GPA and natural category of department positively affect the employment rate. Finally, low age, single major and early recruiting time also positively affect the employment rate.

Job-Matching Function Analysis Using Social Network Analysis (사회연결망분석을 이용한 잡매칭함수 분석)

  • Cho, Jang-Sik;Park, Sung-Ik
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.6
    • /
    • pp.675-685
    • /
    • 2011
  • This paper proposes a job matching function that calculates the job matching probability of a job-seeker to an employer taking the working conditions of a job-seeker and an employer into account. In addition, this study analysis the degree of centrality that means interactions of a job-seeker and an employer utilizing social network analysis. The results are follows. First, a degree of centrality is found to be severely concentrated in certain job-seekers or certain employers; in addition, there are many job-seekers and employers who have no matching results. Second, according to decision tree analysis, characteristics of a job-seeker that influences the degree of centrality are gender, age and degree of education in order of importance. The characteristics of a employer that influences the degree of centrality are proposed salary, industry classification and firm size in order of importance.

development of Decision Support System for the Management of hypertension using Datamining Technology (데이터마이닝 기법을 활용한 고혈압 관리를 위한 의사결정지원시스템의 개발)

  • 호승희;채영문;조승연;최동훈;송용욱;박충식;조경원;송지원
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.04a
    • /
    • pp.271-282
    • /
    • 2000
  • 본 연구의 목적은 데이터마이닝 기법을 임상적으로 중요한 위치를 차지하고 있는 고혈압 환자의 특성과 치료에 따른 예후를 예측할 수 있는 지식을 발굴하고 이의 임상적용의 타당성을 검증하여 의사결정지원시스템을 개발하고 이의 유용성을 평가하는데 있다. 이에 연세대학교 의과대학 부속 세브란스 병원의 환자를 대상으로 로지스틱 회귀분석을 이용하여 혈압조절상의 위험요인의 규명하고, 의사결정나무분석을 통해 치료약제별 혈압조절군과 비조절군의 특성을 도출하고 각 대상군을 결정짓는 규칙을 생성하였으며, 이를 활용한 의사결정지원시스템의 개발 및c 평가를 시행하였다. 그 결과 기존 임상이론만을 활용한 시스템의 처방에 의한 혈압조절군보다 데이터마이닝 기법을 활용한 시스템의 처방에 의한 혈압조절군의 비율이 전체적으로 더 높게 나타남을 알 수 있었다. 본 연구의 결과는 우리나라 현실에 부합되는 고혈압 진료지침을 개발하고 적용, 평가하는데 기여할 수 있을 것으로 판단되며, 이와 같은 의사결정지원 시스템을 운영을 통해 실제 임상 진료에 적용해 봄으로써 그 효과와 실증적 가치를 창출할 수 있을 것이다.

  • PDF

Prediction of commitment and persistence in heterosexual involvements according to the styles of loving using a datamining technique (데이터마이닝을 활용한 사랑의 형태에 따른 연인관계 몰입수준 및 관계 지속여부 예측)

  • Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.69-85
    • /
    • 2016
  • Successful relationship with loving partners is one of the most important factors in life. In psychology, there have been some previous researches studying the factors influencing romantic relationships. However, most of these researches were performed based on statistical analysis; thus they have limitations in analyzing complex non-linear relationships or rules based reasoning. This research analyzes commitment and persistence in heterosexual involvement according to styles of loving using a datamining technique as well as statistical methods. In this research, we consider six different styles of loving - 'eros', 'ludus', 'stroge', 'pragma', 'mania' and 'agape' which influence romantic relationships between lovers, besides the factors suggested by the previous researches. These six types of love are defined by Lee (1977) as follows: 'eros' is romantic, passionate love; 'ludus' is a game-playing or uncommitted love; 'storge' is a slow developing, friendship-based love; 'pragma' is a pragmatic, practical, mutually beneficial relationship; 'mania' is an obsessive or possessive love and, lastly, 'agape' is a gentle, caring, giving type of love, brotherly love, not concerned with the self. In order to do this research, data from 105 heterosexual couples were collected. Using the data, a linear regression method was first performed to find out the important factors associated with a commitment to partners. The result shows that 'satisfaction', 'eros' and 'agape' are significant factors associated with the commitment level for both male and female. Interestingly, in male cases, 'agape' has a greater effect on commitment than 'eros'. On the other hand, in female cases, 'eros' is a more significant factor than 'agape' to commitment. In addition to that, 'investment' of the male is also crucial factor for male commitment. Next, decision tree analysis was performed to find out the characteristics of high commitment couples and low commitment couples. In order to build decision tree models in this experiment, 'decision tree' operator in the datamining tool, Rapid Miner was used. The experimental result shows that males having a high satisfaction level in relationship show a high commitment level. However, even though a male may not have a high satisfaction level, if he has made a lot of financial or mental investment in relationship, and his partner shows him a certain amount of 'agape', then he also shows a high commitment level to the female. In the case of female, a women having a high 'eros' and 'satisfaction' level shows a high commitment level. Otherwise, even though a female may not have a high satisfaction level, if her partner shows a certain amount of 'mania' then the female also shows a high commitment level. Finally, this research built a prediction model to establish whether the relationship will persist or break up using a decision tree. The result shows that the most important factor influencing to the break up is a 'narcissistic tendency' of the male. In addition to that, 'satisfaction', 'investment' and 'mania' of both male and female also affect a break up. Interestingly, while the 'mania' level of a male works positively to maintain the relationship, that of a female has a negative influence. The contribution of this research is adopting a new technique of analysis using a datamining method for psychology. In addition, the results of this research can provide useful advice to couples for building a harmonious relationship with each other. This research has several limitations. First, the experimental data was sampled based on oversampling technique to balance the size of each classes. Thus, it has a limitation of evaluating performances of the predictive models objectively. Second, the result data, whether the relationship persists of not, was collected relatively in short periods - 6 months after the initial data collection. Lastly, most of the respondents of the survey is in their 20's. In order to get more general results, we would like to extend this research to general populations.

Analysis of Characteristics of the Cancelled Districts of Housing Redevelopment Project - Focusing on Decision Tree Analysis - (재정비사업 해제구역 의사결정 특성 연구 - 의사결정나무기법 중심으로 -)

  • Lee, Do-Ghil
    • Journal of the Korean Regional Science Association
    • /
    • v.37 no.4
    • /
    • pp.49-59
    • /
    • 2021
  • This study aims to identify the characteristics of the cancelled districts of housing redevelopment and housing reconstruction project. The subject of this study is 189 project districts(121 promoted districts, 68 cancelled districts). Both 121 promoted districts and 68 cancelled districts were analyzed by Decision Tree Analysis. The first separation of the release zone influencing factors was made by the Development Actors. In other words, the most important independent variable for determining the release zone influence factor was shown to be the presence or absence of propulsion actors. Of the 89 districts without propellers, 41 were lifted and 48 were promoted, and 9 out of 100 districts with propellers were lifted and 91 were promoted. The second separation of the impact factors on the zone was then made by Land Owners, and the probability of cancellation increased if the number of landowners was less than 468 and 37 out of 62 were removed. On the other hand, four out of 27 districts with more than 468 landowners were lifted and 23 districts were promoted. The third separation was made by the Average Land Assessment, and 35 zones were lifted below the standard of KRW 269.64 million/m2 approximately KRW 8.91 million per pyeong, and two zones were lifted at higher official prices. In the second division, the number of landowners was 468 or more, and in node4, four areas were removed from areas with a public land area ratio of 29.43% or more, and no areas less were released. This study used SPSS Statistics 26 S/W for analysis.

Factors influencing the return of spontaneous circulation of patients with out-of-hospital cardiac arrest (병원외 심정지 환자의 자발적 순환 회복에 영향을 미치는 요인)

  • Park, Il-Su;Kim, Eun-Ju;Sohn, Hae-Sook;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.11 no.9
    • /
    • pp.229-238
    • /
    • 2013
  • Out-of-hospital cardiac arrest is a major public health problem in Korea. The survival rate to discharge remains at approximately 3.5% and only 1% have good neurological function. To increase the survival rate, prehospital care should restore spontaneous circulation. The purpose of this study was to analyze the factors associated with return of spontaneous circulation(ROSC) after out-of-hospital cardiac arrest. Data used for this study were collected from KCDC Out-of-Hospital Cardiac Arrest Surveillance 2009. As for the results of decision tree analysis, it is clear that prehospital CPR, cardiac arrest witness, activity, past history(cancer/heart disease/stroke), place, bystander CPR, response time, age, etc are significant contributing factors in ROSC. Among 16 cardiac arrest types from decision tree classification, the ROSC rate of type 1 is the highest(29.6%). Also notable is the fact that bystander CPR was strongly correlated with ROSC of patents with cardiac arrest occurring in non-public places. Community resources should be concentrated on increasing bystander CPR and early prehospital emergency care.

Automatic ADL Classification Using 3 Axial Accelerometers and RFID Sensor (3차원 가속 센서 및 RFID 센서를 이용한 ADL 자동 분류)

  • Im, Sae-Mi;Kim, Ig-Jae;Ahn, Sang-Chul;Kim, Hyoung-Gon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.3
    • /
    • pp.135-141
    • /
    • 2008
  • We propose a new method for recognizing the activities of daily living(ADL) based on the state-dependent motion analysis using 3-axial accelerometers and a glove type RFID reader. Two accelerometers are used for the classification of 5 body states based on the decision tree. Classification of the instrumental activities is performed based on the hand interaction with an object ID using an accelerometer and a RFID reader. Object-dependent hand movements are classified into 5 categories in advance and final decision combines the body state and the instrumental activities. Experiment shows that the suggested hierarchical motion analysis provides accuracy rate of over 90% for all 20 ADLs.

Determining Factors of Intention to Actual Use of Charged Long-term Care Services for the Aged (유료노인장기요양보호서비스 이용의사 결정요인)

  • Yoo, Jin-Yeong;Chun, Jin-Ho
    • Journal of Preventive Medicine and Public Health
    • /
    • v.38 no.1
    • /
    • pp.16-24
    • /
    • 2005
  • Objectives : To help develop strategies to cope with the changes arising from the rapid aging process by predicting the determining factors of intention to actual use of the charged long-term care services for elderly as perceived by the middle aged who play the major role of supports. Methods : Subjects were the parents (men 177, women 507) in their 40s of the students selected from a university of Busan city. A questionnaire survey was conducted for 4 weeks in October 2003 about the knowledge for long-term care service, the intention of actual use, and the preferences about the type of service suppliers. Data analysis was performed with frequency, chi-square test, and t-test using SPSS program (ver 10.0K), along with data mining using decision tree of Enterprise Miner V8.2 by SAS. Results : About half of the subjects (53.7%) had the actual experiences of elderly supports. Intentions to use the charged services were relatively high in home visiting nursing care service (40.1%) and long-term care facilities service (40.4%), and were influenced by previous knowledge about the services. The intentions were stronger in women, those with higher education, and those with greater income levels. Actual elderly supports were mostly (80%) done by women, and the perceived burdens for the supports were bigger in women and those of lower socioeconomic level. Desired charges were about 10,000 won for the bath service, 20,000 won for the rests services per day, and about 500,000 won for the long-term care facilities service per month. From the result of decision tree analysis, the job professionalism was the most important determining factor of intention to actual use of the services with validation as $63{\sim}71%$. Health and welfare mixed type facilities were preferred, and the most important consideration was the level of professionalism. Conclusions : Intention to actual use of the charged services was largely determined by the aspects of time and cost. Polices to increase the number of service suppliers and to decrease the burdens perceived by actual supporters were strongly recommended.

Identification of High-risk Groups of Suicide from the Depressed Elderly using Decision Tree Analysis (의사결정나무 분석법을 이용한 우울 노인 중 자살 고위험군 규명)

  • Hong, Sehoon;Lee, Dongwon
    • Research in Community and Public Health Nursing
    • /
    • v.30 no.2
    • /
    • pp.130-140
    • /
    • 2019
  • Purpose: The aim of this study is to explore levels of suicidal ideation and identify subgroups of high suicidal risk among the depressed elderly in Korea. Methods: A descriptive cross-sectional design was adopted on secondary data from the 6th (1st year) Korean national health and nutrition examination survey (KNHANES). A total of 239 depressed elders aged 60 or over who participated in the KNHANES. The prevalence of suicidal ideation and its related factors, including sociodemographic, physical, psychological characteristics and quality of life (EQ-5D index) were examined. Descriptive statistics and a decision tree analysis were performed using the SPSS/WIN 23.0 and SPSS Modeler 14.2 programs. Results: Of the depressed elderly, 28.9% had suicidal ideation. Three groups with high suicidal ideation were identified. Predictive factors included perceived stress level, household income level, quality of life and restriction of activity. In the highest risk group were those depressed elderly with moderate and low levels of stress, less than .71 of EQ-5D index and restriction of activity, and 80.0% of these participants had suicidal ideation. The accuracy of the model was 80.8%, its sensitivity 85.9%, and its specificity 68.1%. Conclusion: Multi-dimensional intervention should be designed to decrease suicide among the depressed elderly, particularly focusing on subgroups with high risk factors. This research is expected to contribute itself to the policy design and solution building in the future as it suggests policy implications in preventing the suicide of the depressed elderly.