• Title/Summary/Keyword: Random Testing

Search Result 354, Processing Time 0.021 seconds

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

  • Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.23-45
    • /
    • 2020
  • Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.

Development of Evaluation Method for Jointed Concrete Pavement with FWD and Finite Element Analysis (FWD와 유한요소해석을 이용한 줄눈콘크리트포장 평가법 개발)

  • Yun, Kyong-Ku;Lee, Joo-Hyung;Choi, Seong-Yong
    • International Journal of Highway Engineering
    • /
    • v.1 no.1
    • /
    • pp.107-119
    • /
    • 1999
  • The joints in the jointed concrete pavement provide a control against transverse or longitudinal cracking at slab, which may be caused by temperature or moisture variation during or after hydration. Without control of cracking, random cracks cause more serious distresses and result in structural or functional failure of pavement system. However, joints nay cause distresses due to its inherent weakness in structural integrity. Thus, the evaluation at joint is very important. and the joint-related distresses should be evaluated reasonably for economic rehabilitation. The purpose of this paper was to develop an evaluation system at joints of jointed concrete pavement using finite element analysis program, ILLI-SLAB, and nondestructive testing device. FWD. To develop an evaluation system for JCP, a sensitivity analysis was performed using ILLI-SLAB program with a selected variables which might affect fairly to on the performance of transverse joints. The most significant variables were selected from precise analysis. An evaluation charts were made for jointed concrete pavement by adopting the field FWD data. It was concluded that the variables which most significantly affect to pavement deflections are the modulus of subgrade reaction(K) and the modulus of dowel/concrete interaction(G), and limiting criteria on the performance of joints at JCP are 300pci. 500,000 lb/in. respectively. Using these variables and FWD test, a charts of load transfer ratio versus surface deflection at joints were made in order to evaluate the performance of JCP. Practically, Chungbu highway was evaluated by these evaluation charts and FWD field data for jointed concrete pavement. For Chungbu highway, only one joint showed smaller value than limiting criterion of the modulus of dowel/concrete interaction(G). The rest joints showed larger values than limiting criteria of the modulus of subgrade reaction(K) and the modulus of dowel/concrete interaction(G).

  • PDF

Validation of the Proximity of Clothing to Self Scale for Older Persons (의복의 자아 근접성 척도 검증 - 노년층을 대상으로 -)

  • Lee, Young-A;Sontag, M. Suzanne
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.31 no.6 s.165
    • /
    • pp.848-858
    • /
    • 2007
  • Sontag and Lee (2004) recently developed an objectively measurable instrument, the Proximity of Clothing to Self(PCS) Scale, which measured the psychological closeness of clothing to self. They validated a 4-factor, 24-item PCS Scale for use with adolescents and identified the need for confirmation of the factor structure with other age groups. This paper extends the work of Sontag and Lee by employing the PCS Scale with older persons, age 65 and over, and reports the validation of a 3-factor, 19-item PCS Scale for older persons. A mail survey was sent to a national random sample of 1,700 older Persons by means of a list purchased from a U.S. survey sampling company in late November 2004. Total usuable number of respondents was 250 with an adjusted response rate of 15.6 percent. Three analytical rounds of confirmatory factor analysis(CFA) to test the construct validity of the PCS Scale were conducted by using AMOS 5.0(Analysis of Moment Structures), one of several structural equation modeling(SEM) programs. Completion of three rounds of the CFA resulted in a 3-factor, 19-item PCS Scale with demonstrated construct validity and reliability for older persons. The three PCS dimensions are clothing in relation to 1) self as structure-process(PCS Dimension 1-2-3 combined), 2) self-esteem-evaluative and affective processes(PCS Dimension 4-5 combined), and 3) body image and body cathexis(PCS Dimension 6). The initially hypothesized 6-factor scale(Sontag & Lee, 2004) was not confirmed for adolescents in their study nor with older persons in this study. In addition, the 4-factor solution for the adolescent group did not hold for older persons. It appears that the self-system of older persons is more integrated than may be true for younger individuals. Recommendations for future testing of construct validity of the PCS Scale are made.

Development and Testing of the Model of Health Promotion Behavior in Predicting Exercise Behavior

  • O'Donnell, Michael P.
    • Korean Journal of Health Education and Promotion
    • /
    • v.2 no.1
    • /
    • pp.31-61
    • /
    • 2000
  • Introduction. Despite the fact that half of premature deaths are caused by unhealthy lifestyles such as smoking tobacco, sedentary lifestyle, alcohol and drug abuse and poor nutrition, there are no theoretical models which accurately explain these health promotion related behaviors. This study tests a new model of health behavior called the Model of Health Promotion Behavior. This model draws on elements and frameworks suggested by the Health Belief Model, Social Cognitive Theory, the Theory of Planned Action and the Health Promotion Model. This model is intended as a general model of behavior but this first test of the model uses amount of exercise as the outcome behavior. Design. This study utilized a cross sectional mail-out, mail-back survey design to determine the elements within the model that best explained intentions to exercise and those that best explained amount of exercise. A follow-up questionnaire was mailed to all respondents to the first questionnaire about 10 months after the initial survey. A pretest was conducted to refine the questionnaire and a pilot study to test the protocols and assumptions used to calculate the required sample size. Sample. The sample was drawn from 2000 eligible participants at two blue collar (utility company and part of a hospital) and two white collar (bank and pharmaceutical) companies located in Southeastern Michigan. Both white collar site had employee fitness centers and all four sites offered health promotion programs. In the first survey, 982 responses were received (49.1%) after two mailings to non-respondents and one additional mailing to secure answers to missing data, with 845 usable cases for the analyzing current intentions and 918 usable cases for the explaining of amount of current exercise analysis. In the follow-up survey, questionnaires were mailed to the 982 employees who responded to the initial survey. After one follow-up mailing to non-respondents, and one mailing to secure answers to missing data, 697 (71.0%) responses were received, with 627 (63.8%) usable cases to predict intentions and 673 (68.5%) usable cases to predict amount of exercise. Measures. The questionnaire in the initial survey had 15 scales and 134 items; these scales measured each of the variables in the model. Thirteen of the scales were drawn from the literature, all had Cronbach's alpha scores above .74 and all but three had scores above .80. The questionnaire in the second mailing had only 10 items, and measured only outcome variables. Analysis. The analysis included calculation of scale scores, Cronbach's alpha, zero order correlations, and factor analysis, ordinary least square analysis, hierarchical tests of interaction terms and path analysis, and comparisons of results based on a random split of the data and splits based on gender and employer site. The power of the regression analysis was .99 at the .01 significance level for the model as a whole. Results. Self efficacy and Non-Health Benefits emerged as the most powerful predictors of Intentions to exercise, together explaining approximately 19% of the variance in future Intentions. Intentions, and the interaction of Intentions with Barriers, with Support of Friends, and with Self Efficacy were the most consistent predictors of amount of future exercise, together explaining 38% of the variance. With the inclusion of Prior Exercise History the model explained 52% of the variance in amount of exercise 10 months later. There were very few differences in the variables that emerged as important predictors of intentions or exercise in the different employer sites or between males and females. Discussion. This new model is viable in predicting intentions to exercise and amount of exercise, both in absolute terms and when compared to existing models.

  • PDF