• Title/Summary/Keyword: Factor Classification

Search Result 1,381, Processing Time 0.03 seconds

Demension reduction for high-dimensional data via mixtures of common factor analyzers-an application to tumor classification

  • Baek, Jang-Sun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.3
    • /
    • pp.751-759
    • /
    • 2008
  • Mixtures of factor analyzers(MFA) is useful to model the distribution of high-dimensional data on much lower dimensional space where the number of observations is very large relative to their dimension. Mixtures of common factor analyzers(MCFA) can reduce further the number of parameters in the specification of the component covariance matrices as the number of classes is not small. Moreover, the factor scores of MCFA can be displayed in low-dimensional space to distinguish the groups. We propose the factor scores of MCFA as new low-dimensional features for classification of high-dimensional data. Compared with the conventional dimension reduction methods such as principal component analysis(PCA) and canonical covariates(CV), the proposed factor score was shown to have higher correct classification rates for three real data sets when it was used in parametric and nonparametric classifiers.

  • PDF

A Study on the Classification Criteria Between Urban and Rural Area (도시와 농촌 지역 구분 기준 연구)

  • Kang, Dae-Koo
    • Journal of Agricultural Extension & Community Development
    • /
    • v.16 no.3
    • /
    • pp.557-586
    • /
    • 2009
  • The objective is to find the classification criteria between urban and rural, and to classify the urban and rural area all the country in Korea. For the research objectives, reviews of related literature and statistical yearbooks were used for finding criteria and analysing data. Through reviewing the literature, some indicators were selected in views of rurality and urbanity, and gathered the data from statistical yearbooks. And factor analysis was used to find first and second factor for classifying region. Six factors as a city surrounding(36%), non-farmer household population ratio(28.1%), cultivated acreage(12.48%), agricultural production surrounding (12.40%), the farm family number change(5.58%) and household number rise and fall(5.54%) were finding. And rurality factors were cultivated acreage, agricultural production surrounding, the farm family number change and household number rise and fall, and urbanity factors were city surrounding and non-farmer household population ratio. Based on the first and second factor loaded amount, four type regional classification was followed.

  • PDF

A Study on the Classification of Neck-Base Circumference by Three-Dimensional Automatic Measurements of the Human Body - With the Focus on Women in their 20's - (3차원 인제 형상 데이터를 이용만 목밑둘레 유형화 연구 - 20대 여성을 중심으로 -)

  • Cho, Shin-Hyun;Seok, Hye-Jung
    • Journal of the Korean Society of Costume
    • /
    • v.58 no.6
    • /
    • pp.35-41
    • /
    • 2008
  • The purposes of this study lied in the analysis and classification of neck-base circumference shapes of the women in their twenties, by the application of three-dimensional automatic measurement data of human body, and thereby in the understanding of neck-base circumference shapes by the classified type. The findings are as follows: 1. The comparison of three-dimensional human body measurement items relating to the neck-base circumference part of the women in their twenties indicated that the largest individual difference was found in cervicale-center-anterior neck radius than in other items. 2. The factor analysis, which was conducted to extract the factors constituting the neck-base circumference, showed the shape of cervicale(factor 1), the shape of section neck(factor 2), the thickness of neck(factor 3), the shape of anterior neck(factor 4), and the shape of side neck(factor 5). 3. The classification of the neck-base circumference shapes resulted in three types. Type 1 was the shape of a reverse triangle hanging forward, Type 2 was that of a circle, and Type 3 was that of an oval open to the sides.

A credit classification method based on generalized additive models using factor scores of mixtures of common factor analyzers (공통요인분석자혼합모형의 요인점수를 이용한 일반화가법모형 기반 신용평가)

  • Lim, Su-Yeol;Baek, Jang-Sun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.2
    • /
    • pp.235-245
    • /
    • 2012
  • Logistic discrimination is an useful statistical technique for quantitative analysis of financial service industry. Especially it is not only easy to be implemented, but also has good classification rate. Generalized additive model is useful for credit scoring since it has the same advantages of logistic discrimination as well as accounting ability for the nonlinear effects of the explanatory variables. It may, however, need too many additive terms in the model when the number of explanatory variables is very large and there may exist dependencies among the variables. Mixtures of factor analyzers can be used for dimension reduction of high-dimensional feature. This study proposes to use the low-dimensional factor scores of mixtures of factor analyzers as the new features in the generalized additive model. Its application is demonstrated in the classification of some real credit scoring data. The comparison of correct classification rates of competing techniques shows the superiority of the generalized additive model using factor scores.

Factor-analysis based questionnaire categorization method for reliability improvement of evaluation of working conditions in construction enterprises

  • Lin, Jeng-Wen;Shen, Pu Fun
    • Structural Engineering and Mechanics
    • /
    • v.51 no.6
    • /
    • pp.973-988
    • /
    • 2014
  • This paper presents a factor-analysis based questionnaire categorization method to improve the reliability of the evaluation of working conditions without influencing the completeness of the questionnaire both in Taiwanese and Chinese construction enterprises for structural engineering applications. The proposed approach springs from the AI application and expert systems in structural engineering. Questions with a similar response pattern are grouped into or categorized as one factor. Questions that form a single factor usually have higher reliability than the entire questionnaire, especially in the case when the questionnaire is complex and inconsistent. By classifying questions based on the meanings of the words used in them and the responded scores, reliability could be increased. The principle for classification was that 90% of the questions in the same classified group must satisfy the proposed classification rule and consequently the lowest one was 92%. The results show that the question classification method could improve the reliability of the questionnaires for at least 0.7. Compared to the question deletion method using SPSS, 75% of the questions left were verified the same as the results obtained by applying the classification method.

Review of Classification Models for Reliability Distributions from the Perspective of Practical Implementation (실무적 적용 관점에서 신뢰성 분포의 유형화 모형의 고찰)

  • Choi, Sung-Woon
    • Journal of the Korea Safety Management & Science
    • /
    • v.13 no.1
    • /
    • pp.195-202
    • /
    • 2011
  • The study interprets each of three classification models based on Bath-Tub Failure Rate (BTFR), Extreme Value Distribution (EVD) and Conjugate Bayesian Distribution (CBD). The classification model based on BTFR is analyzed by three failure patterns of decreasing, constant, or increasing which utilize systematic management strategies for reliability of time. Distribution model based on BTFR is identified using individual factors for each of three corresponding cases. First, in case of using shape parameter, the distribution based on BTFR is analyzed with a factor of component or part number. In case of using scale parameter, the distribution model based on BTFR is analyzed with a factor of time precision. Meanwhile, in case of using location parameter, the distribution model based on BTFR is analyzed with a factor of guarantee time. The classification model based on EVD is assorted into long-tailed distribution, medium-tailed distribution, and short-tailed distribution by the length of right-tail in distribution, and depended on asymptotic reliability property which signifies skewness and kurtosis of distribution curve. Furthermore, the classification model based on CBD is relied upon conjugate distribution relations between prior function, likelihood function and posterior function for dimension reduction and easy tractability under the occasion of Bayesian posterior updating.

A Study on Road Characteristic Classification using Exploratory Factor Analysis (탐색적 요인분석을 이용한 도로특성분류에 관한 연구)

  • Cho, Jun-Han;Kim, Seong-Ho;Rho, Jeong-Hyun
    • Journal of Korean Society of Transportation
    • /
    • v.26 no.3
    • /
    • pp.53-66
    • /
    • 2008
  • This research is to the establishment of a conceptual framework that supports road characteristic classification from a new point of view in order to complement of the existing road functional classification and examine of traffic pattern. The road characteristic classification(RCC) is expected to use important performance criteria that produced a policy guidelines for transportation planning and operational management. For this study, the traffic data used the permanent traffic counters(PTCs) located within the national highway between 2002 and 2006. The research has described for a systematic review and assessment of how exploratory factor analysis should be applied from 12 explanatory variables. The optimal number of components and clusters are determined by interpretation of the factor analysis results. As a result, the scenario including all 12 explanatory variables is better than other scenarios. The four components is produced the optimal number of factors. This research made contributions to the understanding of the exploratory factor analysis for the road characteristic classification, further applying the objective input data for various analysis method, such as cluster analysis, regression analysis and discriminant analysis.

The Effect of Motor Ability in Children with Cerebral Palsy on Mastery Motivation (뇌성마비 아동의 신체기능이 완수동기에 미치는 영향)

  • Lee, Na-Jung;Oh, Tae-Young
    • The Journal of Korean Physical Therapy
    • /
    • v.26 no.5
    • /
    • pp.315-323
    • /
    • 2014
  • Purpose: This study was conducted in order to investigate the effect of motor ability on mastery motivation in children with cerebral palsy. Methods: Sixty children with cerebral palsy (5~12 years) and their parents participated in the study. Data on general characteristics and disability condition, Gross Motor Functional Classification System, Manual Ability Classification System, and The Dimensions of Mastery questionnaire were collected for this study. Independent t-test, and ANOVA were used for analysis of the effect of The Dimensions of Mastery questionnaire according to general and disability condition, Gross Motor Functional Classification System, and Manual Ability Classification System. Linear regression analysis was performed to determine the effects of Gross Motor Functional Classification System and Manual Ability Classification System on The Dimensions of Mastery questionnaire. SPSS win. 22.0 was used and Tukey was used for post hoc analysis, level of statistical significance was less than 0.05. Results: The Dimensions of Mastery questionnaire score showed statistically significant difference according to gender, region, type, disability rating, Gross Motor Functional Classification System, and Manual Ability Classification System (p<0.05). Gross Motor Functional Classification System and Manual Ability Classification System were the effect factor on The Dimensions of Mastery questionnaire significantly (p<0.05). Conclusion: These results suggest that motor ability of children with cerebral palsy was an important factor having an effect on The Dimensions of Mastery questionnaire.

Classification of Terrestrial LiDAR Data Using Factor and Cluster Analysis (요인 및 군집분석을 이용한 지상 라이다 자료의 분류)

  • Choi, Seung-Pil;Cho, Ji-Hyun;Kim, Yeol;Kim, Jun-Seong
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.19 no.4
    • /
    • pp.139-144
    • /
    • 2011
  • This study proposed a classification method of LIDAR data by using simultaneously the color information (R, G, B) and reflection intensity information (I) obtained from terrestrial LIDAR and by analyzing the association between these data through the use of statistical classification methods. To this end, first, the factors that maximize variance were calculated using the variables, R, G, B, and I, whereby the factor matrix between the principal factor and each variable was calculated. However, although the factor matrix shows basic data by reducing them, it is difficult to know clearly which variables become highly associated by which factors; therefore, Varimax method from orthogonal rotation was used to obtain the factor matrix and then the factor scores were calculated. And, by using a non-hierarchical clustering method, K-mean method, a cluster analysis was performed on the factor scores obtained via K-mean method as factor analysis, and afterwards the classification accuracy of the terrestrial LiDAR data was evaluated.

Method for Assessing Landslide Susceptibility Using SMOTE and Classification Algorithms (SMOTE와 분류 기법을 활용한 산사태 위험 지역 결정 방법)

  • Yoon, Hyung-Koo
    • Journal of the Korean Geotechnical Society
    • /
    • v.39 no.6
    • /
    • pp.5-12
    • /
    • 2023
  • Proactive assessment of landslide susceptibility is necessary for minimizing casualties. This study proposes a methodology for classifying the landslide safety factor using a classification algorithm based on machine learning techniques. The high-risk area model is adopted to perform the classification and eight geotechnical parameters are adopted as inputs. Four classification algorithms-namely decision tree, k-nearest neighbor, logistic regression, and random forest-are employed for comparing classification accuracy for the safety factors ranging between 1.2 and 2.0. Notably, a high accuracy is demonstrated in the safety factor range of 1.2~1.7, but a relatively low accuracy is obtained in the range of 1.8~2.0. To overcome this issue, the synthetic minority over-sampling technique (SMOTE) is adopted to generate additional data. The application of SMOTE improves the average accuracy by ~250% in the safety factor range of 1.8~2.0. The results demonstrate that SMOTE algorithm improves the accuracy of classification algorithms when applied to geotechnical data.