• Title/Summary/Keyword: Supervised Classification

Search Result 403, Processing Time 0.027 seconds

Semantic-based Genetic Algorithm for Feature Selection (의미 기반 유전 알고리즘을 사용한 특징 선택)

  • Kim, Jung-Ho;In, Joo-Ho;Chae, Soo-Hoan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.4
    • /
    • pp.1-10
    • /
    • 2012
  • In this paper, an optimal feature selection method considering sematic of features, which is preprocess of document classification is proposed. The feature selection is very important part on classification, which is composed of removing redundant features and selecting essential features. LSA (Latent Semantic Analysis) for considering meaning of the features is adopted. However, a supervised LSA which is suitable method for classification problems is used because the basic LSA is not specialized for feature selection. We also apply GA (Genetic Algorithm) to the features, which are obtained from supervised LSA to select better feature subset. Finally, we project documents onto new selected feature subset and classify them using specific classifier, SVM (Support Vector Machine). It is expected to get high performance and efficiency of classification by selecting optimal feature subset using the proposed hybrid method of supervised LSA and GA. Its efficiency is proved through experiments using internet news classification with low features.

Utilizing Unlabeled Documents in Automatic Classification with Inter-document Similarities (문헌간 유사도를 이용한 자동분류에서 미분류 문헌의 활용에 관한 연구)

  • Kim, Pan-Jun;Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.1 s.63
    • /
    • pp.251-271
    • /
    • 2007
  • This paper studies the problem of classifying documents with labeled and unlabeled learning data, especially with regards to using document similarity features. The problem of using unlabeled data is practically important because in many information systems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. There are two steps In general semi-supervised learning algorithm. First, it trains a classifier using the available labeled documents, and classifies the unlabeled documents. Then, it trains a new classifier using all the training documents which were labeled either manually or automatically. We suggested two types of semi-supervised learning algorithm with regards to using document similarity features. The one is one step semi-supervised learning which is using unlabeled documents only to generate document similarity features. And the other is two step semi-supervised learning which is using unlabeled documents as learning examples as well as similarity features. Experimental results, obtained using support vector machines and naive Bayes classifier, show that we can get improved performance with small labeled and large unlabeled documents then the performance of supervised learning which uses labeled-only data. When considering the efficiency of a classifier system, the one step semi-supervised learning algorithm which is suggested in this study could be a good solution for improving classification performance with unlabeled documents.

Performance Analysis of MixMatch-Based Semi-Supervised Learning for Defect Detection in Manufacturing Processes (제조 공정 결함 탐지를 위한 MixMatch 기반 준지도학습 성능 분석)

  • Ye-Jun Kim;Ye-Eun Jeong;Yong Soo Kim
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.4
    • /
    • pp.312-320
    • /
    • 2023
  • Recently, there has been an increasing attempt to replace defect detection inspections in the manufacturing industry using deep learning techniques. However, obtaining substantial high-quality labeled data to enhance the performance of deep learning models entails economic and temporal constraints. As a solution for this problem, semi-supervised learning, using a limited amount of labeled data, has been gaining traction. This study assesses the effectiveness of semi-supervised learning in the defect detection process of manufacturing using the MixMatch algorithm. The MixMatch algorithm incorporates three dominant paradigms in the semi-supervised field: Consistency regularization, Entropy minimization, and Generic regularization. The performance of semi-supervised learning based on the MixMatch algorithm was compared with that of supervised learning using defect image data from the metal casting process. For the experiments, the ratio of labeled data was adjusted to 5%, 10%, 25%, and 50% of the total data. At a labeled data ratio of 5%, semi-supervised learning achieved a classification accuracy of 90.19%, outperforming supervised learning by approximately 22%p. At a 10% ratio, it surpassed supervised learning by around 8%p, achieving a 92.89% accuracy. These results demonstrate that semi-supervised learning can achieve significant outcomes even with a very limited amount of labeled data, suggesting its invaluable application in real-world research and industrial settings where labeled data is limited.

Unsupervised feature learning for classification

  • Abdullaev, Mamur;Alikhanov, Jumabek;Ko, Seunghyun;Jo, Geun Sik
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2016.07a
    • /
    • pp.51-54
    • /
    • 2016
  • In computer vision especially in image processing, it has become popular to apply deep convolutional networks for supervised learning. Convolutional networks have shown a state of the art results in classification, object recognition, detection as well as semantic segmentation. However, supervised learning has two major disadvantages. One is it requires huge amount of labeled data to get high accuracy, the second one is to train so much data takes quite a bit long time. On the other hand, unsupervised learning can handle these problems more cheaper way. In this paper we show efficient way to learn features for classification in an unsupervised way. The network trained layer-wise, used backpropagation and our network learns features from unlabeled data. Our approach shows better results on Caltech-256 and STL-10 dataset.

  • PDF

ACCOUNTING FOR IMPORTANCE OF VARIABLES IN MUL TI-SENSOR DATA FUSION USING RANDOM FORESTS

  • Park No-Wook;Chi Kwang-Hoon
    • Proceedings of the KSRS Conference
    • /
    • 2005.10a
    • /
    • pp.283-285
    • /
    • 2005
  • To account for the importance of variable in multi-sensor data fusion, random forests are applied to supervised land-cover classification. The random forests approach is a non-parametric ensemble classifier based on CART-like trees. Its distinguished feature is that the importance of variable can be estimated by randomly permuting the variable of interest in all the out-of-bag samples for each classifier. Supervised classification with a multi-sensor remote sensing data set including optical and polarimetric SAR data was carried out to illustrate the applicability of random forests. From the experimental result, the random forests approach could extract important variables or bands for land-cover discrimination and showed good performance, as compared with other non-parametric data fusion algorithms.

  • PDF

Evaluation of Attribute Selection Methods and Prior Discretization in Supervised Learning

  • Cha, Woon Ock;Huh, Moon Yul
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.879-894
    • /
    • 2003
  • We evaluated the efficiencies of applying attribute selection methods and prior discretization to supervised learning, modelled by C4.5 and Naive Bayes. Three databases were obtained from UCI data archive, which consisted of continuous attributes except for one decision attribute. Four methods were used for attribute selection : MDI, ReliefF, Gain Ratio and Consistency-based method. MDI and ReliefF can be used for both continuous and discrete attributes, but the other two methods can be used only for discrete attributes. Discretization was performed using the Fayyad and Irani method. To investigate the effect of noise included in the database, noises were introduced into the data sets up to the extents of 10 or 20%, and then the data, including those either containing the noises or not, were processed through the steps of attribute selection, discretization and classification. The results of this study indicate that classification of the data based on selected attributes yields higher accuracy than in the case of classifying the full data set, and prior discretization does not lower the accuracy.

Classification ofWarm Temperate Vegetations and GIS-based Forest Management System

  • Cho, Sung-Min
    • International journal of advanced smart convergence
    • /
    • v.10 no.1
    • /
    • pp.216-224
    • /
    • 2021
  • Aim of this research was to classify forest types at Wando in Jeonnam Province and develop warm temperate forest management system with application of Remote Sensing and GIS. Another emphasis was given to the analysis of satellite images to compare forest type changes over 10 year periods from 2009 to 2019. We have accomplished this study by using ArcGIS Pro and ENVI. For this research, Landsat satellite images were obtained by means of terrestrial, airborne and satellite imagery. Based on the field survey data, all land uses and forest types were divided into 5 forest classes; Evergreen broad-leaved forest, Evergreen Coniferous forest, Deciduous broad-leaved forest, Mixed fores, and others. Supervised classification was carried out with a random forest classifier based on manually collected training polygons in ROI. Accuracy assessment of the different forest types and land-cover classifications was calculated based on the reference polygons. Comparison of forest changes over 10 year periods resulted in different vegetation biomass volumes, producing the loss of deciduous forests in 2019 probably due to the expansion of residential areas and rapid deforestation.

An Experimental Study on Opinion Classification Using Supervised Latent Semantic Indexing(LSI) (지도적 잠재의미색인(LSI)기법을 이용한 의견 문서 자동 분류에 관한 실험적 연구)

  • Lee, Ji-Hye;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.451-462
    • /
    • 2009
  • The aim of this study is to apply latent semantic indexing(LSI) techniques for efficient automatic classification of opinionated documents. For the experiments, we collected 1,000 opinionated documents such as reviews and news, with 500 among them labelled as positive documents and the remaining 500 as negative. In this study, sets of content words and sentiment words were extracted using a POS tagger in order to identify the optimal feature set in opinion classification. Findings addressed that it was more effective to employ LSI techniques than using a term indexing method in sentiment classification. The best performance was achieved by a supervised LSI technique.

Supervised Rank Normalization for Support Vector Machines (SVM을 위한 교사 랭크 정규화)

  • Lee, Soojong;Heo, Gyeongyong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.11
    • /
    • pp.31-38
    • /
    • 2013
  • Feature normalization as a pre-processing step has been widely used in classification problems to reduce the effect of different scale in each feature dimension and error as a result. Most of the existing methods, however, assume some distribution function on feature distribution. Even worse, existing methods do not use the labels of data points and, as a result, do not guarantee the optimality of the normalization results in classification. In this paper, proposed is a supervised rank normalization which combines rank normalization and a supervised learning technique. The proposed method does not assume any feature distribution like rank normalization and uses class labels of nearest neighbors in classification to reduce error. SVM, in particular, tries to draw a decision boundary in the middle of class overlapping zone, the reduction of data density in that area helps SVM to find a decision boundary reducing generalized error. All the things mentioned above can be verified through experimental results.

A Comparative Study of Image Classification Method to Classify Onion and Garlic Using Unmanned Aerial Vehicle (UAV) Imagery

  • Lee, Kyung-Do;Lee, Ye-Eun;Park, Chan-Won;Na, Sang-Il
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.49 no.6
    • /
    • pp.743-750
    • /
    • 2016
  • Recently, usage of UAV (Unmanned Aerial Vehicle) has increased in agricultural part. This study was conducted to classify onion and garlic using supervised classification of a fixed-wing UAV (Model : Ebee) images for evaluation of possibility about estimation of onion and garlic cultivation area using UAV images. Aerial images were obtained 11~12 times from study sites in Changryeng-gun and Hapcheon-gun during farming season from 2015 to 2016. The result for accuracy in onion and garlic image classification by R-G-B and R-G-NIR images showed highest Kappa coefficients for the maximum likelihood method. The result for accuracy in onion and garlic classification showed high Kappa coefficients of 0.75~0.97 from DOY 105 to DOY 141, implying that UAV images could be used to estimate onion and garlic cultivation area.