• Title/Summary/Keyword: data classification

Search Result 7,945, Processing Time 0.036 seconds

Estimation of Classification Error Based on the Bhattacharyya Distance for Data with Multimodal Distribution (Multimodal 분포 데이터를 위한 Bhattacharyya distance 기반 분류 에러예측 기법)

  • 최의선;이철희
    • Proceedings of the IEEK Conference
    • /
    • 2000.06d
    • /
    • pp.85-87
    • /
    • 2000
  • In pattern classification, the Bhattacharyya distance has been used as a class separability measure and provides useful information for feature selection and extraction. In this paper, we propose a method to predict the classification error for multimodal data based on the Bhattacharyya distance. In our approach, we first approximate the pdf of multimodal distribution with a Gaussian mixture model and find the bhattacharyya distance and classification error. Exprimental results showed that there is a strong relationship between the Bhattacharyya distance and the classification error for multimodal data.

  • PDF

A Machine learning Approach for Knowledge Base Construction Incorporating GIS Data for land Cover Classification of Landsat ETM+ Image (지식 기반 시스템에서 GIS 자료를 활용하기 위한 기계 학습 기법에 관한 연구 - Landsat ETM+ 영상의 토지 피복 분류를 사례로)

  • Kim, Hwa-Hwan;Ku, Cha-Yang
    • Journal of the Korean Geographical Society
    • /
    • v.43 no.5
    • /
    • pp.761-774
    • /
    • 2008
  • Integration of GIS data and human expert knowledge into digital image processing has long been acknowledged as a necessity to improve remote sensing image analysis. We propose inductive machine learning algorithm for GIS data integration and rule-based classification method for land cover classification. Proposed method is tested with a land cover classification of a Landsat ETM+ multispectral image and GIS data layers including elevation, aspect, slope, distance to water bodies, distance to road network, and population density. Decision trees and production rules for land cover classification are generated by C5.0 inductive machine learning algorithm with 350 stratified random point samples. Production rules are used for land cover classification integrated with unsupervised ISODATA classification. Result shows that GIS data layers such as elevation, distance to water bodies and population density can be effectively integrated for rule-based image classification. Intuitive production rules generated by inductive machine learning are easy to understand. Proposed method demonstrates how various GIS data layers can be integrated with remotely sensed imagery in a framework of knowledge base construction to improve land cover classification.

Temporal Associative Classification based on Calendar Patterns (캘린더 패턴 기반의 시간 연관적 분류 기법)

  • Lee Heon Gyu;Noh Gi Young;Seo Sungbo;Ryu Keun Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.6
    • /
    • pp.567-584
    • /
    • 2005
  • Temporal data mining, the incorporation of temporal semantics to existing data mining techniques, refers to a set of techniques for discovering implicit and useful temporal knowledge from temporal data. Association rules and classification are applied to various applications which are the typical data mining problems. However, these approaches do not consider temporal attribute and have been pursued for discovering knowledge from static data although a large proportion of data contains temporal dimension. Also, data mining researches from temporal data treat problems for discovering knowledge from data stamped with time point and adding time constraint. Therefore, these do not consider temporal semantics and temporal relationships containing data. This paper suggests that temporal associative classification technique based on temporal class association rules. This temporal classification applies rules discovered by temporal class association rules which extends existing associative classification by containing temporal dimension for generating temporal classification rules. Therefore, this technique can discover more useful knowledge in compared with typical classification techniques.

A Sliding Window-based Multivariate Stream Data Classification (슬라이딩 윈도우 기반 다변량 스트림 데이타 분류 기법)

  • Seo, Sung-Bo;Kang, Jae-Woo;Nam, Kwang-Woo;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.2
    • /
    • pp.163-174
    • /
    • 2006
  • In distributed wireless sensor network, it is difficult to transmit and analyze the entire stream data depending on limited networks, power and processor. Therefore it is suitable to use alternative stream data processing after classifying the continuous stream data. We propose a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes input as a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a standard text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Bayesian classifier and SVM, and for unsupervised, we tested Jaccard, TFIDF Jaro and Jaro Winkler. In our experiments, SVM and TFIDF outperformed other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.

Classification Performance Analysis of Silicon Wafer Micro-Cracks Based on SVM (SVM 기반 실리콘 웨이퍼 마이크로크랙의 분류성능 분석)

  • Kim, Sang Yeon;Kim, Gyung Bum
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.33 no.9
    • /
    • pp.715-721
    • /
    • 2016
  • In this paper, the classification rate of micro-cracks in silicon wafers was improved using a SVM. In case I, we investigated how feature data of micro-cracks and SVM parameters affect a classification rate. As a result, weighting vector and bias did not affect the classification rate, which was improved in case of high cost and sigmoid kernel function. Case II was performed using a more high quality image than that in case I. It was identified that learning data and input data had a large effect on the classification rate. Finally, images from cases I and II and another illumination system were used in case III. In spite of different condition images, good classification rates was achieved. Critical points for micro-crack classification improvement are SVM parameters, kernel function, clustered feature data, and experimental conditions. In the future, excellent results could be obtained through SVM parameter tuning and clustered feature data.

A Wavelet based Feature Selection Method to Improve Classification of Large Signal-type Data (웨이블릿에 기반한 시그널 형태를 지닌 대형 자료의 feature 추출 방법)

  • Jang, Woosung;Chang, Woojin
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.32 no.2
    • /
    • pp.133-140
    • /
    • 2006
  • Large signal type data sets are difficult to classify, especially if the data sets are non-stationary. In this paper, large signal type and non-stationary data sets are wavelet transformed so that distinct features of the data are extracted in wavelet domain rather than time domain. For the classification of the data, a few wavelet coefficients representing class properties are employed for statistical classification methods : Linear Discriminant Analysis, Quadratic Discriminant Analysis, Neural Network etc. The application of our wavelet-based feature selection method to a mass spectrometry data set for ovarian cancer diagnosis resulted in 100% classification accuracy.

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data

  • Mehmood, Tahir;Rasheed, Zahid
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.575-587
    • /
    • 2015
  • The development in data collection techniques results in high dimensional data sets, where discrimination is an important and commonly encountered problem that are crucial to resolve when high dimensional data is heterogeneous (non-common variance covariance structure for classes). An example of this is to classify microbial habitat preferences based on codon/bi-codon usage. Habitat preference is important to study for evolutionary genetic relationships and may help industry produce specific enzymes. Most classification procedures assume homogeneity (common variance covariance structure for all classes), which is not guaranteed in most high dimensional data sets. We have introduced regularized elimination in partial least square coupled with QDA (rePLS-QDA) for the parsimonious variable selection and classification of high dimensional heterogeneous data sets based on recently introduced regularized elimination for variable selection in partial least square (rePLS) and heterogeneous classification procedure quadratic discriminant analysis (QDA). A comparison of proposed and existing methods is conducted over the simulated data set; in addition, the proposed procedure is implemented to classify microbial habitat preferences by their codon/bi-codon usage. Five bacterial habitats (Aquatic, Host Associated, Multiple, Specialized and Terrestrial) are modeled. The classification accuracy of each habitat is satisfactory and ranges from 89.1% to 100% on test data. Interesting codon/bi-codons usage, their mutual interactions influential for respective habitat preference are identified. The proposed method also produced results that concurred with known biological characteristics that will help researchers better understand divergence of species.

Knitted Data Glove System for Finger Motion Classification (손가락 동작 분류를 위한 니트 데이터 글러브 시스템)

  • Lee, Seulah;Choi, Yuna;Cha, Gwangyeol;Sung, Minchang;Bae, Jihyun;Choi, Youngjin
    • The Journal of Korea Robotics Society
    • /
    • v.15 no.3
    • /
    • pp.240-247
    • /
    • 2020
  • This paper presents a novel knitted data glove system for pattern classification of hand posture. Several experiments were conducted to confirm the performance of the knitted data glove. To find better sensor materials, the knitted data glove was fabricated with stainless-steel yarn and silver-plated yarn as representative conductive yarns, respectively. The result showed that the signal of the knitted data glove made of silver-plated yarn was more stable than that of stainless-steel yarn according as the measurement distance becomes longer. Also, the pattern classification was conducted for the performance verification of the data glove knitted using the silver-plated yarn. The average classification reached at 100% except for the pointing finger posture, and the overall classification accuracy of the knitted data glove was 98.3%. With these results, we expect that the knitted data glove is applied to various robot fields including the human-machine interface.

Text Classification with Heterogeneous Data Using Multiple Self-Training Classifiers

  • William Xiu Shun Wong;Donghoon Lee;Namgyu Kim
    • Asia pacific journal of information systems
    • /
    • v.29 no.4
    • /
    • pp.789-816
    • /
    • 2019
  • Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features generated for classification. Aside from proposing a new modified version of the existing algorithm or creating a new algorithm, we attempt to modify the use of data. The classifier performance is usually affected by the quality of learning data as the classifier is built based on these training data. We assume that the data from different domains might have different characteristics of noise, which can be utilized in the process of learning the classifier. Therefore, we attempt to enhance the robustness of the classifier by injecting the heterogeneous data artificially into the learning process in order to improve the classification accuracy. Semi-supervised approach was applied for utilizing the heterogeneous data in the process of learning the document classifier. However, the performance of document classifier might be degraded by the unlabeled data. Therefore, we further proposed an algorithm to extract only the documents that contribute to the accuracy improvement of the classifier.

Updating Land Cover Classification Using Integration of Multi-Spectral and Temporal Remotely Sensed Data (다중분광 및 다중시기 영상자료 통합을 통한 토지피복분류 갱신)

  • Jang, Dong-Ho;Chung, Chang-Jo F.
    • Journal of the Korean Geographical Society
    • /
    • v.39 no.5 s.104
    • /
    • pp.786-803
    • /
    • 2004
  • These days, interests on land cover classification using not only multi-sensor data but also thematic GIS information, are increasing. Often, although we have useful GIS information for the classification, the traditional classification method like maximum likelihood estimation technique (MLE) does not allow us to use the information due to the fact that the MLE and the existing computer programs cannot handle GIS data properly. We proposed a new method for updating the image classification using multi-spectral and multi-temporal images. In this study, we have simultaneously extended the MLE to accommodate both multi-spectral images data and land cover data for land cover classification. In addition to the extended MLE method, we also have extended the empirical likelihood ratio estimation technique (LRE), which is one of non-parametric techniques, to handle simultaneously both multi-spectral images data and land cover data. The proposed procedures were evaluated using land cover map based on Landsat ETM+ images in the Anmyeon-do area in South Korea. As a result, the proposed methods showed considerable improvements in classification accuracy when compared with other single-spectral data. Improved classification images showed that the overall accuracy indicated an improvement in classification accuracy of $6.2\%$ when using MLE, and $9.2\%$ for the LRE, respectively. The case study also showed that the proposed methods enable the extraction of the area with land cover change. In conclusion, land cover classification produced through the combination of various GIS spatial data and multi-spectral images will be useful to involve complementary data to make more accurate decisions.