• Title/Summary/Keyword: Hierarchical Classification

Search Result 395, Processing Time 0.022 seconds

Classification of Daily Precipitation Patterns in South Korea using Mutivariate Statistical Methods

  • Mika, Janos;Kim, Baek-Jo;Park, Jong-Kil
    • Journal of Environmental Science International
    • /
    • v.15 no.12
    • /
    • pp.1125-1139
    • /
    • 2006
  • The cluster analysis of diurnal precipitation patterns is performed by using daily precipitation of 59 stations in South Korea from 1973 to 1996 in four seasons of each year. Four seasons are shifted forward by 15 days compared to the general ones. Number of clusters are 15 in winter, 16 in spring and autumn, and 26 in summer, respectively. One of the classes is the totally dry day in each season, indicating that precipitation is never observed at any station. This is treated separately in this study. Distribution of the days among the clusters is rather uneven with rather low area-mean precipitation occurring most frequently. These 4 (seasons)$\times$2 (wet and dry days) classes represent more than the half (59 %) of all days of the year. On the other hand, even the smallest seasonal clusters show at least $5\sim9$ members in the 24 years (1973-1996) period of classification. The cluster analysis is directly performed for the major $5\sim8$ non-correlated coefficients of the diurnal precipitation patterns obtained by factor analysis In order to consider the spatial correlation. More specifically, hierarchical clustering based on Euclidean distance and Ward's method of agglomeration is applied. The relative variance explained by the clustering is as high as average (63%) with better capability in spring (66%) and winter (69 %), but lower than average in autumn (60%) and summer (59%). Through applying weighted relative variances, i.e. dividing the squared deviations by the cluster averages, we obtain even better values, i.e 78 % in average, compared to the same index without clustering. This means that the highest variance remains in the clusters with more precipitation. Besides all statistics necessary for the validation of the final classification, 4 cluster centers are mapped for each season to illustrate the range of typical extremities, paired according to their area mean precipitation or negative pattern correlation. Possible alternatives of the performed classification and reasons for their rejection are also discussed with inclusion of a wide spectrum of recommended applications.

Land-Cover Classification of Barton Peninsular around King Sejong station located in the Antarctic using KOMPSAT-2 Satellite Imagery (KOMPSAT-2 위성 영상을 이용한 남극 세종기지 주변 바톤반도의 토지피복분류)

  • Kim, Sang-Il;Kim, Hyun-Cheol;Shin, Jung-Il;Hong, Soon-Gu
    • Korean Journal of Remote Sensing
    • /
    • v.29 no.5
    • /
    • pp.537-544
    • /
    • 2013
  • Baton Peninsula, where Sejong station is located, mainly covered with snow and vegetation. Because this area is sensitive to climate change, monitoring of surface variation is important to understand climate change on the polar region. Due to the inaccessibility, the remote sensing is useful to continuously monitor the area. The objectives of this research are 1) map classification of land-cover types in the Barton Peninsular around King Sejong station and 2) grasp distribution of vegetation species in classified area. A KOMPSAT-2 multispectral satellite image was used to classify land-cover types and vegetation species. We performed classification with hierarchical procedure using KOMPSAT-2 satellite image and ground reference data, and the result is evaluated for accuracy as well. As the results, vegetation and non-vegetation were clearly classified although species shown lower accuracies within vegetation class.

Evaluation of the Homogeneity of Korean Diagnosis Related Groups (한국형진단명기준환자군 분류체계의 동질성 평가)

  • Kim, Hyung Seon;Lee, Sun Hee;Nam, Chung Mo
    • Health Policy and Management
    • /
    • v.23 no.1
    • /
    • pp.44-51
    • /
    • 2013
  • Background: This study designed to evaluate the homogeneity of Korean diagnosis related group (KDRG) version 3.4 classification system. Methods: The total 5,921,873 claims data submitted to the Health Insurance Review and Assessment Service during 2010 were used. Both coefficient of variation (CV) and reduction in variance of cost were measured for evaluation. This analysis was divided into before and after trimming outliers at the level of adjacent DRG (ADRG), aged ADRG (AADRG) split by age, and DRG split by complication and comorbidity. Results: At the each three level of ADRG, AADRG, and DRG, there were 38.9%, 38.7%, and 30.0% of which had a CV > 100% in the untrimmed data and there were 1.4%, 1.4%, and 1.9% in the trimmed one. Before trimming outliers, ADRGs explained 52.5% of the variability in resource use, AADRGs did 53.1% and DRGs did 57.1%. The additional explanatory power by age and comorbidity and complication (CC) split were 0.6%p and 4.6%p for each, which were statistically significant. After trimming outliers, ADRGs explained 75.2% of the variability in resource use, AADRGs did 75.6%, and DRGs did 77.1%. The additional explanatory power were 0.4%p and 2.0%p for each, which were statistically significant too. Conclusion: The results demonstrated that KDRG showed high homogeneity within groups and performance after trimming outliers. But there were DRGs CV > 100% after age or CC split and the most contributing factor to high performance of KDRG was the ADRG rather than age or CC split. Therefore, it is recommended that the efforts for improving clinical homogeneity of KDRG such as review of the hierarchical structure of classification systems and classification variables.

Improving SVM Classification by Constructing Ensemble (앙상블 구성을 이용한 SVM 분류성능의 향상)

  • 제홍모;방승양
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.251-258
    • /
    • 2003
  • A support vector machine (SVM) is supposed to provide a good generalization performance, but the actual performance of a actually implemented SVM is often far from the theoretically expected level. This is largely because the implementation is based on an approximated algorithm, due to the high complexity of time and space. To improve this limitation, we propose ensemble of SVMs by using Bagging (bootstrap aggregating) and Boosting. By a Bagging stage each individual SVM is trained independently using randomly chosen training samples via a bootstrap technique. By a Boosting stage an individual SVM is trained by choosing training samples according to their probability distribution. The probability distribution is updated by the error of independent classifiers, and the process is iterated. After the training stage, they are aggregated to make a collective decision in several ways, such ai majority voting, the LSE(least squares estimation) -based weighting, and double layer hierarchical combining. The simulation results for IRIS data classification, the hand-written digit recognition and Face detection show that the proposed SVM ensembles greatly outperforms a single SVM in terms of classification accuracy.

Classification of Terrestrial LiDAR Data Using Factor and Cluster Analysis (요인 및 군집분석을 이용한 지상 라이다 자료의 분류)

  • Choi, Seung-Pil;Cho, Ji-Hyun;Kim, Yeol;Kim, Jun-Seong
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.19 no.4
    • /
    • pp.139-144
    • /
    • 2011
  • This study proposed a classification method of LIDAR data by using simultaneously the color information (R, G, B) and reflection intensity information (I) obtained from terrestrial LIDAR and by analyzing the association between these data through the use of statistical classification methods. To this end, first, the factors that maximize variance were calculated using the variables, R, G, B, and I, whereby the factor matrix between the principal factor and each variable was calculated. However, although the factor matrix shows basic data by reducing them, it is difficult to know clearly which variables become highly associated by which factors; therefore, Varimax method from orthogonal rotation was used to obtain the factor matrix and then the factor scores were calculated. And, by using a non-hierarchical clustering method, K-mean method, a cluster analysis was performed on the factor scores obtained via K-mean method as factor analysis, and afterwards the classification accuracy of the terrestrial LiDAR data was evaluated.

Construction of Hierarchical Classification of User Tags using WordNet-based Formal Concept Analysis (WordNet기반의 형식개념분석기법을 이용한 사용자태그 분류체계의 구축)

  • Hwang, Suk-Hyung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.10
    • /
    • pp.149-161
    • /
    • 2013
  • In this paper, we propose a novel approach to construction of classification hierarchies for user tags of folksonomies, using WordNet-based Formal Concept Analysis tool, called TagLighter, which is developed on this research. Finally, to give evidence of the usefulness of this approach in practice, we describe some experiments on user tag data of Bibsonomy.org site. The classification hierarchies of user tags constructed by our approach allow us to gain a better and further understanding and insight in tagged data during information retrieval and data analysis on the folksonomy-based systems. We expect that the proposed approach can be used in the fields of web data mining for folksonomy-based web services, social networking systems and semantic web applications.

A Study on Urban Flower Landscape Type Classification - Focused on Literature and Expert FGI - (도시 화훼경관 유형화에 관한 연구 - 문헌 및 전문가 FGI를 중심으로 -)

  • Yoon, Duck-Kyu;Kim, Gun-Woo
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.48 no.5
    • /
    • pp.42-58
    • /
    • 2020
  • The purpose of this study is to classify types of urban flower landscape. As a result of the study, first, through literature and case review, it was found that the four elements of place element, form element, natural element, artificial element, should be included in the sentence and key expression for defining the concept of flower landscape. In contemplating these four elements, a newly reconstructed concept of flower landscape was presented. This is expected to be the basis for the flower landscape integration theory. Second, flower landscape was defined as a genre and a unit of urban landscape. In addition, in order to build a system of flower landscape as a specialized area, after considering the concept, characteristics, and functions of a large category of urban landscape, its hierarchical categories with flower landscape were newly arranged. Thus, the flower landscape as an urban landscape was suggested. Third, in order to provide rational selection materials to consumers through type classification, related theories were investigated by expanding not only to the flower field, but also to the urban planning and urban ecology fields. 41 elements for the type classification were extracted, and 4 core elements were derived through the clustering process. Based on the 4 elements as the classification criteria, through the opinion verification from the FGI with experts, 9 types of middle-classification and 30 types of small-classification were derived. As a follow-up research suggestion, if a valid type is additionally established through a monitoring in the type application process, and more specified application types are developed and organized by expanding second-level classification hierarchy to the third-level hierarchy, this will lead to great studies improving the system of the types.

Reinforcement Method for Automated Text Classification using Post-processing and Training with Definition Criteria (학습방법개선과 후처리 분석을 이용한 자동문서분류의 성능향상 방법)

  • Choi, Yun-Jeong;Park, Seung-Soo
    • The KIPS Transactions:PartB
    • /
    • v.12B no.7 s.103
    • /
    • pp.811-822
    • /
    • 2005
  • Automated text categorization is to classify free text documents into predefined categories automatically and whose main goals is to reduce considerable manual process required to the task. The researches to improving the text categorization performance(efficiency) in recent years, focused on enhancing existing classification models and algorithms itself, but, whose range had been limited by feature based statistical methodology. In this paper, we propose RTPost system of different style from i.ny traditional method, which takes fault tolerant system approach and data mining strategy. The 2 important parts of RTPost system are reinforcement training and post-processing part. First, the main point of training method deals with the problem of defining category to be classified before selecting training sample documents. And post-processing method deals with the problem of assigning category, not performance of classification algorithms. In experiments, we applied our system to documents getting low classification accuracy which were laid on a decision boundary nearby. Through the experiments, we shows that our system has high accuracy and stability in actual conditions. It wholly did not depend on some variables which are important influence to classification power such as number of training documents, selection problem and performance of classification algorithms. In addition, we can expect self learning effect which decrease the training cost and increase the training power with employing active learning advantage.

Characteristics and Trends in the Classifications of Scientific Literacy Definitions (과학적 소양의 정의 분류의 특성 및 경향)

  • Lee, Myeongje
    • Journal of The Korean Association For Science Education
    • /
    • v.34 no.2
    • /
    • pp.55-62
    • /
    • 2014
  • This study is to reclassify the classifications or definitions of scientific literacy in scientific literacy researches since 1960s and grasp the classification trends of scientific literacy definitions. Sixteen articles have been selected among the articles that have been introduced in the two articles. Classification criteria are as follows: 1) "be learned," "competence," or "be able to function in society" as meanings of "literate," 2) "terms" or "description" as the ways of representing scientific literacy, 3) "singular structure," "hierarchical structure," or "parallel structure" as the inner structure of scientific literacy definitions. The results of this study are as follows: First, hierarchical structures in scientific literacy have almost always accompanied "terms" representing scientific literacy and also accepted the hierarchy between "be learned" and "competence," but not the definition of scientific literacy as functioning in society. All parallel structures in scientific literacy have accompanied the definition as functioning in society. And singular structure almost always appears in researches based on the views of scientific literacy in relatively recent times. Second, researches who have used "terms" as ways of representing scientific literacy have increased. Based on the results in this study, the meanings of scientific literacy have been emphasized in view of the ability of playing a role in a social context as well as learning and competence these days. To meet this movement in scientific literacy actively, science education community should get out of traditional teaching and learning scientific concepts and give emphasis on application in various context and social role of science learners.

Classification of Bodytype on Adult Male for the Apparel Sizing System (I) - Bodytype of Trunk from the Anthropometric Data - (남성복(男性服)의 치수규격을 위한 체형분류(I) - 직접계측자료에 의한 동체부의 분류 -)

  • Kim, Ku Ja;Lee, Soon Weon
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.17 no.2
    • /
    • pp.281-289
    • /
    • 1993
  • Concept of the comfort and fitness becomes a major concern in the basic function of the ready-made clothes. Accordingly a more sophiscated classification of the human morphological characteristics is strongly required for the effective clothing construction. This research was performed to classify and characterize Korean adult males anthropometrically. Sample size was 1290 subjects and their age range was from 19 to 54 years old. Sampling was carried out by the stratified sampling method. Data were collected by the direct anthropometric measurement. 75 variables in total were applied to classify the bodytypes. Data were analyzed by the multivariate method, especially factor and cluster analysis. The high factor loading items extracted by factor analysis were based to determine the variables of the cluster analysis for the similar bodytypes respectively. In the part of the trunk, 19 variables from the data were applied to classify the bodytypes of trunk by Ward's minimum variance method. The groups forming a cluster were subdivided into 5 sets by cross-tabulation extracted by the hierarchical culster analysis. Type 3 and 4 in trunk were composed of the majority of 55.6% of the subjects. The Korean adult males had relatively well-balanced bodytypes in trunk.

  • PDF