• Title/Summary/Keyword: Data classification

Search Result 7,933, Processing Time 0.036 seconds

Satellite Image Classification Based on Color and Texture Feature Vectors (칼라 및 질감 속성 벡터를 이용한 위성영상의 분류)

  • 곽장호;김준철;이준환
    • Korean Journal of Remote Sensing
    • /
    • v.15 no.3
    • /
    • pp.183-194
    • /
    • 1999
  • The Brightness, color and texture included in a multispectral satellite data are used as important factors to analyze and to apply the image data for a proper use. One of the most significant process in the satellite data analysis using texture or color information is to extract features effectively expressing the information of original image. It was described in this paper that six features were introduced to extract useful features from the analysis of the satellite data, and also a classification network using the back-propagation neural network was constructed to evaluate the classification ability of each vector feature in SPOT imagery. The vector features were adopted from the training set selection for the interesting region, and applied to the classification process. The classification results showed that each vector feature contained many merits and demerits depending on each vector's characteristics, and each vector had compatible classification ability. Therefore, it is expected that the color and texture features are effectively used not only in the classification process of satellite imagery, but in various image classification and application fields.

Classification of Remote Sensing Data using Random Selection of Training Data and Multiple Classifiers (훈련 자료의 임의 선택과 다중 분류자를 이용한 원격탐사 자료의 분류)

  • Park, No-Wook;Yoo, Hee Young;Kim, Yihyun;Hong, Suk-Young
    • Korean Journal of Remote Sensing
    • /
    • v.28 no.5
    • /
    • pp.489-499
    • /
    • 2012
  • In this paper, a classifier ensemble framework for remote sensing data classification is presented that combines classification results generated from both different training sets and different classifiers. A core part of the presented framework is to increase a diversity between classification results by using both different training sets and classifiers to improve classification accuracy. First, different training sets that have different sampling densities are generated and used as inputs for supervised classification using different classifiers that show different discrimination capabilities. Then several preliminary classification results are combined via a majority voting scheme to generate a final classification result. A case study of land-cover classification using multi-temporal ENVISAT ASAR data sets is carried out to illustrate the potential of the presented classification framework. In the case study, nine classification results were combined that were generated by using three different training sets and three different classifiers including maximum likelihood classifier, multi-layer perceptron classifier, and support vector machine. The case study results showed that complementary information on the discrimination of land-cover classes of interest would be extracted within the proposed framework and the best classification accuracy was obtained. When comparing different combinations, to combine any classification results where the diversity of the classifiers is not great didn't show an improvement of classification accuracy. Thus, it is recommended to ensure the greater diversity between classifiers in the design of multiple classifier systems.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

Comparison of Performance Measures for Credit-Card Delinquents Classification Models : Measured by Hit Ratio vs. by Utility (신용카드 연체자 분류모형의 성능평가 척도 비교 : 예측률과 유틸리티 중심으로)

  • Chung, Suk-Hoon;Suh, Yong-Moo
    • Journal of Information Technology Applications and Management
    • /
    • v.15 no.4
    • /
    • pp.21-36
    • /
    • 2008
  • As the great disturbance from abusing credit cards in Korea becomes stabilized, credit card companies need to interpret credit-card delinquents classification models from the viewpoint of profit. However, hit ratio which has been used as a measure of goodness of classification models just tells us how much correctly they classified rather than how much profits can be obtained as a result of using classification models. In this research, we tried to develop a new utility-based measure from the viewpoint of profit and then used this new measure to analyze two classification models(Neural Networks and Decision Tree models). We found that the hit ratio of neural model is higher than that of decision tree model, but the utility value of decision tree model is higher than that of neural model. This experiment shows the importance of utility based measure for credit-card delinquents classification models. We expect this new measure will contribute to increasing profits of credit card companies.

  • PDF

Classification Accuracy Improvement for Decision Tree (의사결정트리의 분류 정확도 향상)

  • Rezene, Mehari Marta;Park, Sanghyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.787-790
    • /
    • 2017
  • Data quality is the main issue in the classification problems; generally, the presence of noisy instances in the training dataset will not lead to robust classification performance. Such instances may cause the generated decision tree to suffer from over-fitting and its accuracy may decrease. Decision trees are useful, efficient, and commonly used for solving various real world classification problems in data mining. In this paper, we introduce a preprocessing technique to improve the classification accuracy rates of the C4.5 decision tree algorithm. In the proposed preprocessing method, we applied the naive Bayes classifier to remove the noisy instances from the training dataset. We applied our proposed method to a real e-commerce sales dataset to test the performance of the proposed algorithm against the existing C4.5 decision tree classifier. As the experimental results, the proposed method improved the classification accuracy by 8.5% and 14.32% using training dataset and 10-fold crossvalidation, respectively.

Image Classification for Military Application using Public Landcover Map (공개된 토지피복도를 활용한 위성영상 분류)

  • Hong, Woo-Yong;Park, Wan-Yong;Song, Hyeon-Seung;Jung, Cheol-Hoon;Eo, Yang-Dam;Kim, Seong-Joon
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.13 no.1
    • /
    • pp.147-155
    • /
    • 2010
  • Landcover information of access-denied area was extracted from low-medium and high resolution satellite image. Training for supervised classification was performed to refer visually by landcover map which is made and distributed from The Ministry of Environment. The classification result was compared by relating data of FACC land classification system. As we rasterize digital military map with same pixel size of satellite classification, the accuracy test was performed by image to image method. In vegetation case, ancillary data such as NDVI and image for seasons are going to improve accuracy. FACC code of FDB need to recognize the properties which can be automated.

A Study on Face Recognition and Reliability Improvement Using Classification Analysis Technique

  • Kim, Seung-Jae
    • International journal of advanced smart convergence
    • /
    • v.9 no.4
    • /
    • pp.192-197
    • /
    • 2020
  • In this study, we try to find ways to recognize face recognition more stably and to improve the effectiveness and reliability of face recognition. In order to improve the face recognition rate, a lot of data must be used, but that does not necessarily mean that the recognition rate is improved. Another criterion for improving the recognition rate can be seen that the top/bottom of the recognition rate is determined depending on how accurately or precisely the degree of classification of the data to be used is made. There are various methods for classification analysis, but in this study, classification analysis is performed using a support vector machine (SVM). In this study, feature information is extracted using a normalized image with rotation information, and then projected onto the eigenspace to investigate the relationship between the feature values through the classification analysis of SVM. Verification through classification analysis can improve the effectiveness and reliability of various recognition fields such as object recognition as well as face recognition, and will be of great help in improving recognition rates.

Automatic Classification Method for Time-Series Image Data using Reference Map (Reference Map을 이용한 시계열 image data의 자동분류법)

  • Hong, Sun-Pyo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2
    • /
    • pp.58-65
    • /
    • 1997
  • A new automatic classification method with high and stable accuracy for time-series image data is presented in this paper. This method is based on prior condition that a classified map of the target area already exists, or at least one of the time-series image data had been classified. The classified map is used as a reference map to specify training areas of classification categories. The new automatic classification method consists of five steps, i.e., extraction of training data using reference map, detection of changed pixels based upon the homogeneity of training data, clustering of changed pixels, reconstruction of training data, and classification as like maximum likelihood classifier. In order to evaluate the performance of this method qualitatively, four time-series Landsat TM image data were classified by using this method and a conventional method which needs a skilled operator. As a results, we could get classified maps with high reliability and fast throughput, without a skilled operator.

  • PDF

Note on classification and regression tree analysis (분류와 회귀나무분석에 관한 소고)

  • 임용빈;오만숙
    • Journal of Korean Society for Quality Management
    • /
    • v.30 no.1
    • /
    • pp.152-161
    • /
    • 2002
  • The analysis of large data sets with hundreds of thousands observations and thousands of independent variables is a formidable computational task. A less parametric method, capable of identifying important independent variables and their interactions, is a tree structured approach to regression and classification. It gives a graphical and often illuminating way of looking at data in classification and regression problems. In this paper, we have reviewed and summarized tile methodology used to construct a tree, multiple trees and the sequential strategy for identifying active compounds in large chemical databases.

Analysis of Cone Penetration Data Using Fuzzy C-means Clustering (Fuzzy C-means 클러스터링 기법을 이용한 콘 관입 데이터의 해석)

  • 우철웅;장병욱;원정윤
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.45 no.3
    • /
    • pp.73-83
    • /
    • 2003
  • Methods of fuzzy C-means have been used to characterize geotechnical information from static cone penetration data. As contrary with traditional classification methods such as Robertson classification chart, the FCM expresses classes not conclusiveness but fuzzy. The results show that the FCM is useful to characterize ground information that can not be easily found by using normal classification chart. But optimal number of classes may not be easily defined. So, the optimal number of classes should be determined considering not only technical measures but engineering aspects.