• Title/Summary/Keyword: one class classification

Search Result 349, Processing Time 0.027 seconds

DDC문학류의 조합식 분류시스템 분석 - 20판을 중심으로

  • 윤희윤
    • Journal of Korean Library and Information Science Society
    • /
    • v.20
    • /
    • pp.351-381
    • /
    • 1993
  • The purpose of this study is to analyze the various processes and patterns to build or synthesize class numbers in the 800 class of the Dewey Decimal Classification, Edition 20(1989). The results of the analysis are as follows: 1. The 800(Literature and rhetoric) class in the DDC system is the main class added analytico-synthetic principle positively to an enumerative scheme. 2. The facets to be a n.0, pplied in literature are language literary form literary period ; kind, scope, or medium ; notation 08(collection) or 09(criticism) literary feature, subject, author, etc. 3. In the 800 class, there are the five tables of precedence for literary forms aspects ; specific kinds of persons ; literary, period in relation to the aspects for works treating more than one literary form subforms, aspects and literary periods in the works treating a specific literary form. 4. The basic number synthesis of literary works proceeds through the various facets in the following sequence, as far as necessary for the item : base no. + literary form + literary time or period + kind, scope, or medium + notation 08 or 09 + subform + additional notation from T3C and other tables. 5. In view of the multiplicity of facets, their synthesis formulas take the following order : (1) Works about the literature : base no.(schedule) + language(T6) or form(T3B) (2) Works by or about individual author : base no.(schedule) + form (T3A) + period(schedule) + subform(T3A) (3) Works by or about more than one author, not restricted by language facet : base no.(schedule) + period(T1) ; base no.(schedule) + kind, scope, medium(T3B), or feature(T3C), or person(T5). (4) Works by or about more than one author, restricted by language facet : base no.(schedule) + form (T3B) + period(schedule) + subform(T3B) + notation 08 or 09(T3B) ; base no.(schedule) + notation 08 or 09(T3B) + 9(T3C) + area notation(T2) : base no.(schedule) + form (T3B) + notation 008 or 009(T3B) : base no.(schedule) + form (T3B) + kind, scope, medium(T3B) + notation 08 or 09(T3B) + period(schedule). (5) Affiliated literatures for which period numbers are not us base no.(schedule) + form (T3A or T3B), or notation 08 or 09(T3B) : base no.(schedule) + kind, scope, medium(T3B), feature(T3C), or person(T5) 6. The problems in the number building of the 800 class are the complexity and difficulty of number synthesis, the intrinsic weakness of from distinction and the inconvenience of retrieval inherent in the form class. In order to solve these problems, therefore, the citation orders and methods of DDC should be improved and synthesis patterns simplified from the point of view of its applicability and its usefulness in the "literature class".

  • PDF

Ensemble Learning for Solving Data Imbalance in Bankruptcy Prediction (기업부실 예측 데이터의 불균형 문제 해결을 위한 앙상블 학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.3
    • /
    • pp.1-15
    • /
    • 2009
  • In a classification problem, data imbalance occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. This paper proposes a Geometric Mean-based Boosting (GM-Boost) to resolve the problem of data imbalance. Since GM-Boost introduces the notion of geometric mean, it can perform learning process considering both majority and minority sides, and reinforce the learning on misclassified data. An empirical study with bankruptcy prediction on Korea companies shows that GM-Boost has the higher classification accuracy than previous methods including Under-sampling, Over-Sampling, and AdaBoost, used in imbalanced data and robust learning performance regardless of the degree of data imbalance.

  • PDF

Feature Selection of Fuzzy Pattern Classifier by using Fuzzy Mapping (퍼지 매핑을 이용한 퍼지 패턴 분류기의 Feature Selection)

  • Roh, Seok-Beom;Kim, Yong Soo;Ahn, Tae-Chon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.6
    • /
    • pp.646-650
    • /
    • 2014
  • In this paper, in order to avoid the deterioration of the pattern classification performance which results from the curse of dimensionality, we propose a new feature selection method. The newly proposed feature selection method is based on Fuzzy C-Means clustering algorithm which analyzes the data points to divide them into several clusters and the concept of a function with fuzzy numbers. When it comes to the concept of a function where independent variables are fuzzy numbers and a dependent variable is a label of class, a fuzzy number should be related to the only one class label. Therefore, a good feature is a independent variable of a function with fuzzy numbers. Under this assumption, we calculate the goodness of each feature to pattern classification problem. Finally, in order to evaluate the classification ability of the proposed pattern classifier, the machine learning data sets are used.

Suggestion for Trophic State Index of Korean Lakes (Upper Layer) (한국 호소 상층부의 영양상태지수 제안)

  • Kong, Dongsoo;Kim, Bomchul
    • Journal of Korean Society on Water Environment
    • /
    • v.35 no.4
    • /
    • pp.340-351
    • /
    • 2019
  • In this study, the relationship between trophic state indices was analyzed based on the monthly or weekly water quality data of 81 lakes (mostly man-made) in Korea between 2013-2017. Carlson's $TSI_C$ and Aizaki's $TSI_m$ were calculated using the summer (Jun.-Sep.) average data at the upper water layer. The previous Korean trophic state index ($TSI_{KO}$) and the newly suggested index ($TSI_{KON}$) was calculated using the annual average data at the whole layer and at the upper layer, respectively. While previous trophic state index (TSI) such as Carlson's TSI included logarithmic function, we devised newly Monod-type $TSI_{KON}$(Chl) that is 50 when half-saturation concentration of chlorophyll ${\alpha}$ ($Chl.{\alpha}$) measured by UNESCO-method is $10{\mu}gL^{-1}$. MMF-type $TSI_{KON}$(TP) was derived based on the relationship between TP and $Chl.{\alpha}$. A comprehensive $TSI_{KON}$ was decided as the larger one of the two $TSI_{KON}$ values. The range of previous TSI was usually 40-50 for the mesotrophic state, which seemed narrow to discriminate trophic characteristics of the class. The upper limits of $TSI_{KON}$ for oligotrophic, mesotrophic, and eutrophic state were set to 23, 50 and 75, respectively. Classification by $TSI_C$ and $TSI_m$ showed higher frequency of eutrophic class compared to $TSI_{KO}$ and $TSI_{KON}$. This means that the estimation by TSIs developed in foreign natural lakes can lead to distorted results in the classification of the trophic state of Korean lakes. This is due to the decrease of transparency by non-algal material and the reduction in phosphorus availability to algal growth, particularly in Monsoon period.

A Study on the Validation Test for Open Set Face Recognition Method with a Dummy Class (더미 클래스를 가지는 열린 집합 얼굴 인식 방법의 유효성 검증에 대한 연구)

  • Ahn, Jung-Ho;Choi, KwonTaeg
    • Journal of Digital Contents Society
    • /
    • v.18 no.3
    • /
    • pp.525-534
    • /
    • 2017
  • The open set recognition method should be used for the cases that the classes of test data are not known completely in the training phase. So it is required to include two processes of classification and the validation test. This kind of research is very necessary for commercialization of face recognition modules, but few domestic researches results about it have been published. In this paper, we propose an open set face recognition method that includes two sequential validation phases. In the first phase, with dummy classes we perform classification based on sparse representation. Here, when the test data is classified into a dummy class, we conclude that the data is invalid. If the data is classified into one of the regular training classes, for second validation test we extract four features and apply them for the proposed decision function. In experiments, we proposed a simulation method for open set recognition and showed that the proposed validation test outperform SCI of the well-known validation method

An Application of Artificial Intelligence System for Accuracy Improvement in Classification of Remotely Sensed Images (원격탐사 영상의 분류정확도 향상을 위한 인공지능형 시스템의 적용)

  • 양인태;한성만;박재국
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.20 no.1
    • /
    • pp.21-31
    • /
    • 2002
  • This study applied each Neural Networks theory and Fuzzy Set theory to improve accuracy in remotely sensed images. Remotely sensed data have been used to map land cover. The accuracy is dependent on a range of factors related to the data set and methods used. Thus, the accuracy of maps derived from conventional supervised image classification techniques is a function of factors related to the training, allocation, and testing stages of the classification. Conventional image classification techniques assume that all the pixels within the image are pure. That is, that they represent an area of homogeneous cover of a single land-cover class. But, this assumption is often untenable with pixels of mixed land-cover composition abundant in an image. Mixed pixels are a major problem in land-cover mapping applications. For each pixel, the strengths of class membership derived in the classification may be related to its land-cover composition. Fuzzy classification techniques are the concept of a pixel having a degree of membership to all classes is fundamental to fuzzy-sets-based techniques. A major problem with the fuzzy-sets and probabilistic methods is that they are slow and computational demanding. For analyzing large data sets and rapid processing, alterative techniques are required. One particularly attractive approach is the use of artificial neural networks. These are non-parametric techniques which have been shown to generally be capable of classifying data as or more accurately than conventional classifiers. An artificial neural networks, once trained, may classify data extremely rapidly as the classification process may be reduced to the solution of a large number of extremely simple calculations which may be performed in parallel.

A New Clustering Method for Minimum Classification Error (분류 오류 최소화를 위한 클러스터링 기법)

  • Heo, Gyeong-Yong;Kim, Seong-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.7
    • /
    • pp.1-8
    • /
    • 2014
  • Clustering is one of the most popular unsupervised learning methods, which is widely used to form clusters with homogeneous data. Clustering was used to extract contexts corresponding to clusters and a classification method was applied to each context or cluster individually. However, it is difficult to say that the unsupervised clustering is the best context forming method from the view of classification. In this paper, a new clustering method considering classification was proposed. The proposed method tries to minimize classification error in each cluster when a classification method is applied to each context locally. For this purpose, the proposed method adds constraints forcing two data points belong to the same class to have small distances, and two data points belong to different classes to have large distances in each cluster like in linear discriminant analysis. The usefulness of the proposed method is confirmed by experimental results.

Weighted Least Squares Based on Feature Transformation using Distance Computation for Binary Classification (이진 분류를 위하여 거리계산을 이용한 특징 변환 기반의 가중된 최소 자승법)

  • Jang, Se-In;Park, Choong-Shik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.2
    • /
    • pp.219-224
    • /
    • 2020
  • Binary classification has been broadly investigated in machine learning. In addition, binary classification can be easily extended to multi class problems. To successfully utilize machine learning methods for classification tasks, preprocessing and feature extraction steps are essential. These are important steps to improve their classification performances. In this paper, we propose a new learning method based on weighted least squares. In the weighted least squares, designing weights has a significant role. Due to this necessity, we also propose a new technique to obtain weights that can achieve feature transformation. Based on this weighting technique, we also propose a method to combine the learning and feature extraction processes together to perform both processes simultaneously in one step. The proposed method shows the promising performance on five UCI machine learning data sets.

Online anomaly detection algorithm based on deep support vector data description using incremental centroid update (점진적 중심 갱신을 이용한 deep support vector data description 기반의 온라인 비정상 탐지 알고리즘)

  • Lee, Kibae;Ko, Guhn Hyeok;Lee, Chong Hyun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.2
    • /
    • pp.199-209
    • /
    • 2022
  • Typical anomaly detection algorithms are trained by using prior data. Thus the batch learning based algorithms cause inevitable performance degradation when characteristics of newly incoming normal data change over time. We propose an online anomaly detection algorithm which can consider the gradual characteristic changes of incoming normal data. The proposed algorithm based on one-class classification model includes both offline and online learning procedures. In offline learning procedure, the algorithm learns the prior data to be close to centroid of the latent space and then updates the centroid of the latent space incrementally by new incoming data. In the online learning, the algorithm continues learning by using the updated centroid. Through experiments using public underwater acoustic data, the proposed online anomaly detection algorithm takes only approximately 2 % additional learning time for the incremental centroid update and learning. Nevertheless, the proposed algorithm shows 19.10 % improvement in Area Under the receiver operating characteristic Curve (AUC) performance compared to the offline learning model when new incoming normal data comes.

Accuracy of one-step automated orthodontic diagnosis model using a convolutional neural network and lateral cephalogram images with different qualities obtained from nationwide multi-hospitals

  • Yim, Sunjin;Kim, Sungchul;Kim, Inhwan;Park, Jae-Woo;Cho, Jin-Hyoung;Hong, Mihee;Kang, Kyung-Hwa;Kim, Minji;Kim, Su-Jung;Kim, Yoon-Ji;Kim, Young Ho;Lim, Sung-Hoon;Sung, Sang Jin;Kim, Namkug;Baek, Seung-Hak
    • The korean journal of orthodontics
    • /
    • v.52 no.1
    • /
    • pp.3-19
    • /
    • 2022
  • Objective: The purpose of this study was to investigate the accuracy of one-step automated orthodontic diagnosis of skeletodental discrepancies using a convolutional neural network (CNN) and lateral cephalogram images with different qualities from nationwide multi-hospitals. Methods: Among 2,174 lateral cephalograms, 1,993 cephalograms from two hospitals were used for training and internal test sets and 181 cephalograms from eight other hospitals were used for an external test set. They were divided into three classification groups according to anteroposterior skeletal discrepancies (Class I, II, and III), vertical skeletal discrepancies (normodivergent, hypodivergent, and hyperdivergent patterns), and vertical dental discrepancies (normal overbite, deep bite, and open bite) as a gold standard. Pre-trained DenseNet-169 was used as a CNN classifier model. Diagnostic performance was evaluated by receiver operating characteristic (ROC) analysis, t-stochastic neighbor embedding (t-SNE), and gradient-weighted class activation mapping (Grad-CAM). Results: In the ROC analysis, the mean area under the curve and the mean accuracy of all classifications were high with both internal and external test sets (all, > 0.89 and > 0.80). In the t-SNE analysis, our model succeeded in creating good separation between three classification groups. Grad-CAM figures showed differences in the location and size of the focus areas between three classification groups in each diagnosis. Conclusions: Since the accuracy of our model was validated with both internal and external test sets, it shows the possible usefulness of a one-step automated orthodontic diagnosis tool using a CNN model. However, it still needs technical improvement in terms of classifying vertical dental discrepancies.