• 제목/요약/키워드: Data Classification Systems

검색결과 1,432건 처리시간 0.029초

Impact of Instance Selection on kNN-Based Text Categorization

  • Barigou, Fatiha
    • Journal of Information Processing Systems
    • /
    • 제14권2호
    • /
    • pp.418-434
    • /
    • 2018
  • With the increasing use of the Internet and electronic documents, automatic text categorization becomes imperative. Several machine learning algorithms have been proposed for text categorization. The k-nearest neighbor algorithm (kNN) is known to be one of the best state of the art classifiers when used for text categorization. However, kNN suffers from limitations such as high computation when classifying new instances. Instance selection techniques have emerged as highly competitive methods to improve kNN through data reduction. However previous works have evaluated those approaches only on structured datasets. In addition, their performance has not been examined over the text categorization domain where the dimensionality and size of the dataset is very high. Motivated by these observations, this paper investigates and analyzes the impact of instance selection on kNN-based text categorization in terms of various aspects such as classification accuracy, classification efficiency, and data reduction.

진동 아날로그 신호 기반의 이상상황 탐지를 위한 기계학습 모형의 성능지표 향상 (Improving the Performance of Machine Learning Models for Anomaly Detection based on Vibration Analog Signals)

  • 김재훈;엄상천;박철순
    • 산업경영시스템학회지
    • /
    • 제47권2호
    • /
    • pp.1-9
    • /
    • 2024
  • New motor development requires high-speed load testing using dynamo equipment to calculate the efficiency of the motor. Abnormal noise and vibration may occur in the test equipment rotating at high speed due to misalignment of the connecting shaft or looseness of the fixation, which may lead to safety accidents. In this study, three single-axis vibration sensors for X, Y, and Z axes were attached on the surface of the test motor to measure the vibration value of vibration. Analog data collected from these sensors was used in classification models for anomaly detection. Since the classification accuracy was around only 93%, commonly used hyperparameter optimization techniques such as Grid search, Random search, and Bayesian Optimization were applied to increase accuracy. In addition, Response Surface Method based on Design of Experiment was also used for hyperparameter optimization. However, it was found that there were limits to improving accuracy with these methods. The reason is that the sampling data from an analog signal does not reflect the patterns hidden in the signal. Therefore, in order to find pattern information of the sampling data, we obtained descriptive statistics such as mean, variance, skewness, kurtosis, and percentiles of the analog data, and applied them to the classification models. Classification models using descriptive statistics showed excellent performance improvement. The developed model can be used as a monitoring system that detects abnormal conditions of the motor test.

얼굴 인식 성능 향상을 위한 재분류 방법 (Re-classifying Method for Face Recognition)

  • 배경률
    • 지능정보연구
    • /
    • 제10권3호
    • /
    • pp.105-114
    • /
    • 2004
  • 최근 생체인식에 대한 관심이 증가하면서 출입 통제나 사용자 인증과 같은 보안 분야에 적용이 활발히 진행되고 있다. 특히 얼굴인식은 생체인식 기술 중 사용자 편의성과 접촉 거부감이 적어 활용성이 증대되고 있으나 타 인식기술에 비해 인식 결과의 정확성과 재시도율(Re-attempt Rate)에 취약한 단점이 있다. 본 논문에서는 이러한 단점을 보완하기 위해 데이터 분류 방법(Data Classification Algorithm)으로 인식 결과를 재분류(Re-Classification)하는 접근법에 대해서 제안하고자 한다. 본 실험을 위해서 대표적인 형상 기반(Appearance-based) 알고리즘인 PCA를 사용하였고, 200명(총 얼굴 영상 200장)을 대상으로 제안한 재분류 접근법을 적용한 결과 재인식의 경우 성능이 향상되었음을 확인하였다.

  • PDF

자료변환 기반 특징과 다중 분류자를 이용한 다중시기 SAR자료의 분류 (Classification of Multi-temporal SAR Data by Using Data Transform Based Features and Multiple Classifiers)

  • 유희영;박노욱;홍석영;이경도;김예슬
    • 대한원격탐사학회지
    • /
    • 제31권3호
    • /
    • pp.205-214
    • /
    • 2015
  • 이 연구에서는 자료변환기법을 이용해 추출된 여러 특징과 다양한 분류방법론을 결합하여 다중시기 SAR 자료를 위한 새로운 토지피복 분류기법을 제안하였다. 먼저, 다중시기 SAR 자료로부터 원본자료와는 다른 새로운 정보를 추출하기 위해 주성분분석과 3차원 웨이블렛 변환을 이용한 자료변환을 수행하였다. 그리고 나서 최대우도법 분류자, 신경망, support vector machine을 포함한 세 가지 다른 분류자를 변환된 특징자료들과 원본 후방산란계수 자료를 포함한 세가지 자료에 적용하여 다양한 초기 분류 결과를 얻도록 한다. 이후 다수결규칙을 통해 모든 초기결과를 결합하여 최종 분류 결과를 생성하게 된다. 다중시기 ENVISAT ASAR 자료를 이용한 사례연구에서 모든 초기 결과는 사용한 특징자료와 분류자의 종류에 따라 매우 다양한 분류정확도를 보였다. 이러한 9개의 초기 분류 결과를 결합한 최종 분류 결과는 가장 높은 분류 정확도를 보여주고 있는데, 이는 각 초기 분류 결과가 토지피복을 결정하기 위한 상호 보완적인 정보를 제공하기 때문이다. 이 연구에서의 분류정확도 향상은 주로 자료변환을 통해 얻어진 각기 다른 특징자료와 다른 분류자를 결합에 의한 다양성 확보에서 기인한다. 그러므로 이 연구에서 제안한 토지피복 분류방법론은 다중시기 SAR자료의 분류에 효과적으로 적용가능하며, 또한 다중센서 원격탐사 자료융합으로 확장이 가능하다.

Multivariate Gaussian Function을 이용한 지능형 집진기 운전상황 모니터링 시스템 개발 (Development of An Operation Monitoring System for Intelligent Dust Collector By Using Multivariate Gaussian Function)

  • 한윤종;김성호
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2006년 학술대회 논문집 정보 및 제어부문
    • /
    • pp.470-472
    • /
    • 2006
  • Sensor networks are the results of convergence of very important technologies such as wireless communication and micro electromechanical systems. In recent years, sensor networks found a wide applicability in various fields such as environment and health, industry scene system monitoring, etc. A very important step for these many applications is pattern classification and recognition of data collected by sensors installed or deployed in different ways. But, pattern classification and recognition are sometimes difficult to perform. Systematic approach to pattern classification based on modem learning techniques like Multivariate Gaussian mixture models, can greatly simplify the process of developing and implementing real-time classification models. This paper proposes a new recognition system which is hierarchically composed of many sensor nodes having the capability of simple processing and wireless communication. The proposed system is able to perform context classification of sensed data using the Multivariate Gaussian function. In order to verify the usefulness of the proposed system, it was applied to intelligent dust collecting system.

  • PDF

Feature Impact Evaluation Based Pattern Classification System

  • Rhee, Hyun-Sook
    • 한국컴퓨터정보학회논문지
    • /
    • 제23권11호
    • /
    • pp.25-30
    • /
    • 2018
  • Pattern classification system is often an important component of intelligent systems. In this paper, we present a pattern classification system consisted of the feature selection module, knowledge base construction module and decision module. We introduce a feature impact evaluation selection method based on fuzzy cluster analysis considering computational approach and generalization capability of given data characteristics. A fuzzy neural network, OFUN-NET based on unsupervised learning data mining technique produces knowledge base for representative clusters. 240 blemish pattern images are prepared and applied to the proposed system. Experimental results show the feasibility of the proposed classification system as an automating defect inspection tool.

위상면궤적을 이용한 전력계통의 고장판별에 관한 연구 (A Study on the Classification of Arcing Faults in Power Systems using Phase Plane Trajectory Method)

  • 박남옥;신영철;안상필;여상민;김철환
    • 대한전기학회논문지:전력기술부문A
    • /
    • 제51권5호
    • /
    • pp.209-216
    • /
    • 2002
  • Recently, there is greater demand for stable supply of electric power as higher level of our living. It becomes the important problem that the cause of fault in power system is found out in early stage, if once it occurs. In this respect, accurate classification of arcing faults in power systems is vitally important. This paper presents a new classification method for arcing faults in power system. To obtain data of various faults including high impedance fault(HIF) and low impedance fault(LIF), HIF model with the ZnO arrester is adopted and implemented within the overall transmission system model based on the electromagnetic transients program(EMTP). Results of phase plane trajectory if Clarke modal transformation using postfault current and voltage are utilized to classify types of arcing faults. The performance of the proposed method is tested on a typical 154 kV korean transmission system under various fault conditions. As can be seen from results, phase plane trajectory of postfault current should be combined with that of o component from Clarke modal transformation to give reliability of clear fault classification. Thus the proposed method can classify arcing faults including LIFs and HIFs accurately in power systems.

Design of Fuzzy Model for Data Mining

  • Kim, Do-Wan;Joo, Young-Hoon;Park, Jin-Bae
    • 한국지능시스템학회논문지
    • /
    • 제13권1호
    • /
    • pp.107-113
    • /
    • 2003
  • A new GA-based methodology using information granules is suggested for the construction of fuzzy classifiers. The proposed scheme consists of three steps: selection of information granules, construction of the associated fuzzy sets, and tuning of the fuzzy rules. First, the genetic algorithm (GA) is applied to the development of the adequate information granules. The fuzzy sets are then constructed from the analysis of the developed information granules. An interpretable fuzzy classifier is designed by using the constructed fuzzy sets. Finally, the GA are utilized for tuning of the fuzzy rules, which can enhance the classification performance on the misclassified data (e.g., data with the strange pattern or on the boundaries of the classes). To show the effectiveness of the proposed method, an example, the classification of the Iris data, is provided.

Efficient Extraction of Hierarchically Structured Rules Using Rough Sets

  • Lee, Chul-Heui;Seo, Seon-Hak
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제4권2호
    • /
    • pp.205-210
    • /
    • 2004
  • This paper deals with rule extraction from data using rough set theory. We construct the rule base in a hierarchical granulation structure by applying core as a classification criteria at each level. When more than one core exist, the coverage is used for the selection of an appropriate one among them to increase the classification rate and accuracy. In Addition, a probabilistic approach is suggested so that the partially useful information included in inconsistent data can be contributed to knowledge reduction in order to decrease the effect of the uncertainty or vagueness of data. As a result, the proposed method yields more proper and efficient rule base in compatability and size. The simulation result shows that it gives a good performance in spite of very simple rules and short conditionals.

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

  • Aydadenta, Husna;Adiwijaya, Adiwijaya
    • Journal of Information Processing Systems
    • /
    • 제14권5호
    • /
    • pp.1167-1175
    • /
    • 2018
  • Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.