• 제목/요약/키워드: Data classification

검색결과 7,933건 처리시간 0.032초

Multiclass LS-SVM ensemble for large data

  • Hwang, Hyungtae
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권6호
    • /
    • pp.1557-1563
    • /
    • 2015
  • Multiclass classification is typically performed using the voting scheme method based on combining binary classifications. In this paper we propose multiclass classification method for large data, which can be regarded as the revised one-vs-all method. The multiclass classification is performed by using the hat matrix of least squares support vector machine (LS-SVM) ensemble, which is obtained by aggregating individual LS-SVM trained on each subset of whole large data. The cross validation function is defined to select the optimal values of hyperparameters which affect the performance of multiclass LS-SVM proposed. We obtain the generalized cross validation function to reduce computational burden of cross validation function. Experimental results are then presented which indicate the performance of the proposed method.

Combined Features with Global and Local Features for Gas Classification

  • Choi, Sang-Il
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권9호
    • /
    • pp.11-18
    • /
    • 2016
  • In this paper, we propose a gas classification method using combined features for an electronic nose system that performs well even when some loss occurs in measuring data samples. We first divide the entire measurement for a data sample into three local sections, which are the stabilization, exposure, and purge; local features are then extracted from each section. Based on the discrimination analysis, measurements of the discriminative information amounts are taken. Subsequently, the local features that have a large amount of discriminative information are chosen to compose the combined features together with the global features that extracted from the entire measurement section of the data sample. The experimental results show that the combined features by the proposed method gives better classification performance for a variety of volatile organic compound data than the other feature types, especially when there is data loss.

인공신경망을 이용한 소비자 선택 예측에 관한 연구 (A study on forecasting of consumers' choice using artificial neural network)

  • 송수섭;이의훈
    • 한국경영과학회지
    • /
    • 제26권4호
    • /
    • pp.55-70
    • /
    • 2001
  • Artificial neural network(ANN) models have been widely used for the classification problems in business such as bankruptcy prediction, credit evaluation, etc. Although the application of ANN to classification of consumers' choice behavior is a promising research area, there have been only a few researches. In general, most of the researches have reported that the classification performance of the ANN models were better than conventional statistical model Because the survey data on consumer behavior may include much noise and missing data, ANN model will be more robust than conventional statistical models welch need various assumptions. The purpose of this paper is to study the potential of the ANN model for forecasting consumers' choice behavior based on survey data. The data was collected by questionnaires to the shoppers of department stores and discount stores. Then the correct classification rates of the ANN models for the training and test sample with that of multiple discriminant analysis(MDA) and logistic regression(Logit) model. The performance of the ANN models were betted than the performance of the MDA and Logit model with respect to correct classification rate. By using input variables identified as significant in the stepwise MDA, the performance of the ANN models were improved.

  • PDF

Hybrid CNN-SVM Based Seed Purity Identification and Classification System

  • Suganthi, M;Sathiaseelan, J.G.R.
    • International Journal of Computer Science & Network Security
    • /
    • 제22권10호
    • /
    • pp.271-281
    • /
    • 2022
  • Manual seed classification challenges can be overcome using a reliable and autonomous seed purity identification and classification technique. It is a highly practical and commercially important requirement of the agricultural industry. Researchers can create a new data mining method with improved accuracy using current machine learning and artificial intelligence approaches. Seed classification can help with quality making, seed quality controller, and impurity identification. Seeds have traditionally been classified based on characteristics such as colour, shape, and texture. Generally, this is done by experts by visually examining each model, which is a very time-consuming and tedious task. This approach is simple to automate, making seed sorting far more efficient than manually inspecting them. Computer vision technologies based on machine learning (ML), symmetry, and, more specifically, convolutional neural networks (CNNs) have been widely used in related fields, resulting in greater labour efficiency in many cases. To sort a sample of 3000 seeds, KNN, SVM, CNN and CNN-SVM hybrid classification algorithms were used. A model that uses advanced deep learning techniques to categorise some well-known seeds is included in the proposed hybrid system. In most cases, the CNN-SVM model outperformed the comparable SVM and CNN models, demonstrating the effectiveness of utilising CNN-SVM to evaluate data. The findings of this research revealed that CNN-SVM could be used to analyse data with promising results. Future study should look into more seed kinds to expand the use of CNN-SVMs in data processing.

Comparison of Classification Rules Regarding SaMD Between the Regulation EU 2017/745 and the Directive 93/42/EEC

  • Ryu, Gyuha;Lee, Jiyoon
    • 대한의용생체공학회:의공학회지
    • /
    • 제42권6호
    • /
    • pp.277-286
    • /
    • 2021
  • The global market size of AI based SaMD for medical image in 2023 will be anticipated to reach around 620 billion won (518 million dollars). In order for Korean manufacturers to efficiently obtain CE marking for marketing in the EU countries, the paper is to introduce the recommendation and suggestion of how to reclassify SaMD based on classification rules of MDR because, after introducing the Regulation EU 2017/745, classification rules are quite modified and newly added compared to the Directive 93/42/EEC. In addition, the paper is to provide several rules of MDR that may be applicable to decide the classification of SaMD. Lastly, the paper is to examine and demonstrate various secondary data supported by qualitative data because the paper focuses on the suggestion and recommendation with a public trust on the basis of various secondary data conducted by the analysis of field data. In conclusion, the paper found that the previous classification of SaMD followed by the rule of MDD should be reclassified based on the Regulation EU 2017/745. Therefore, the suggestion and recommendation are useful for Korean manufacturers to comprehend the classification of SaMD for marketing in the EU countries.

랜덤 투영 앙상블 기법을 활용한 적응 최근접 이웃 판별분류기법 (Random projection ensemble adaptive nearest neighbor classification)

  • 강종경;전명식
    • 응용통계연구
    • /
    • 제34권3호
    • /
    • pp.401-410
    • /
    • 2021
  • 판별분류분석에서 널리 이용되는 k-최근접 이웃 분류 방법은 고정된 이웃의 수만을 고려하여 자료의 국소적 특징을 반영하지 못하는 한계가 있다. 이에 자료의 국소적 구조를 고려하여 이웃의 개수를 선택하는 적응 최근접이웃방법이 개발된 바 있다. 고차원 자료의 분석에 있어서는 k-최근접 이웃 분류를 사용하기 전에 랜덤 투영 기법 등을 활용하여 차원 축소를 수행하는 것이 일반적이다. 이렇게 랜덤 투영시킨 다수의 분류 결과들을 면밀히 조합하여 투표를 통해 최종 할당을 하는 기법이 최근 개발된 바 있다. 본 연구에서는 고차원 자료에서의 분석을 위해 적응 최근접이웃방법과 랜덤 투영 앙상블 기법을 조합한 새로운 판별분류 기법을 제안하였다. 제안된 방법은 기존에 개발된 방법에 비해 분류 정확성 측면에서 더 뛰어남을 모의실험 및 실제 사례 분석을 통해 확인하였다.

A Rule-based Urban Image Classification System for Time Series Landsat Data

  • Lee, Jin-A;Lee, Sung-Soon;Chi, Kwang-Hoon
    • 대한원격탐사학회지
    • /
    • 제27권6호
    • /
    • pp.637-651
    • /
    • 2011
  • This study presents a rule-based urban image classification method for time series analysis of changes in the vicinity of Asan-si and Cheonan-si in Chungcheongnam-do, using Landsat satellite images (1991-2006). The area has been highly developed through the relocation of industrial facilities, land development, construction of a high-speed railroad, and an extension of the subway. To determine the yearly changing pattern of the urban area, eleven classes were made depending on the trend of development. An algorithm was generalized for the rules to be applied as an unsupervised classification, without the need of training area. The analysis results show that the urban zone of the research area has increased by about 1.53 times, and each correlation graph confirmed the distribution of the Built Up Index (BUI) values for each class. To evaluate the rule-based classification, coverage and accuracy were assessed. When Optimal allowable factor=0.36, the coverage of the rule was 98.4%, and for the test using ground data from 1991 to 2006, overall accuracy was 99.49%. It was confirmed that the method suggested to determine the maximum allowable factor correlates to the accuracy test results using ground data. Among the multiple images, available data was used as best as possible and classification accuracy could be improved since optimal classification to suit objectives was possible. The rule-based urban image classification method is expected to be applied to time series image analyses such as thematic mapping for urban development, urban development, and monitoring of environmental changes.

특징 추출 알고리즘과 Adaboost를 이용한 이진분류기 (Binary classification by the combination of Adaboost and feature extraction methods)

  • 함승록;곽노준
    • 전자공학회논문지CI
    • /
    • 제49권4호
    • /
    • pp.42-53
    • /
    • 2012
  • 패턴 인식과 기계 학습 분야에서 분류는 가장 기본적으로 해결해야 하는 문제의 유형이다. Adaboost 알고리즘은 Boosting 알고리즘의 아이디어를 실제 데이터분석에 이용할 수 있도록 개량한 방법으로써, 단계를 반복하여 나온 여러 개의 약한 분류기와 가중치 값들의 조합으로 강한 분류기를 생성하는 두 개의 클래스를 분류하는 분류기이다. 주성분 분석법과 선형 판별 분석법은 높은 차원의 특징 벡터를 낮은 차원의 특징 벡터로 축소하는 특징 벡터의 차원 감소와 데이터의 특징 추출에도 유용하게 사용되는 방법들이다. 본 논문에서는, 주성분 분석법과 선형 판별 분석법을 이용하여 추출한 특징을 Adaboost 알고리즘의 약 분류기로 사용함으로써, 특징 추출과 분류를 동시에 하고, 인식률을 높이는 효율적인 Boosted-PCA와 Boosted-LDA 알고리즘을 제안한다. 마지막 장에서는, 제안하는 알고리즘으로 UCI Data-Set 중 2 Class-Data와 FRGC Data의 남자와 여자 영상에 대해서 분류 실험을 진행하였다. 실험의 결과로 제안한 Boosted-PCA와 Boosted-LDA 알고리즘이 기존의 특징 추출 알고리즘과 최근접 이웃 분류기, SVM을 이용한 분류기 방법과 비교하여 인식률이 향상됨을 보인다.

IKONOS 영상을 이용한 토지피복분류 기법 분석 (An Analysis of Land Cover Classification Methods Using IKONOS Satellite Image)

  • 강남이;박정기;조기성;유연
    • 대한공간정보학회지
    • /
    • 제20권3호
    • /
    • pp.65-71
    • /
    • 2012
  • 최근 고해상도 위성영상은 자연자원이나 환경 관리에 필요로 하는 토지 피복 및 이용 현황자료 등에 유용하게 사용되고 있는 실정이다. 이에 따라 고액의 투자가 필요로 하는 위성영상의 효율성을 높이기 위하여 영상자료의 분석과정이 중요해지고 있다. 따라서 본 연구에서는 전처리 과정 중 연구대상에 대한 통계값에 대한 계산 및 분석을 수행하였으며, 전통적인 분류 기법인 최대우도 분류 외에도 인공신경망 분류와 SVM 분류에 대하여 설명하고 고해상도 위성영상인 IKONOS영상에 각 분류기법을 적용하여 토지피복분류를 하였으며, 각각의 결과를 오차 행렬을 통해 정확도 분석을 수행하였다. 그 결과 다른 분류 기법에 비해 Support Vector Machines(SVM) 분류 기법이 전체 정확도가 약 86%정도로 가장 우위의 결과물을 도출하였다.

가우시안 기반 Hyper-Rectangle 생성을 이용한 효율적 단일 분류기 (An Efficient One Class Classifier Using Gaussian-based Hyper-Rectangle Generation)

  • 김도균;최진영;고정한
    • 산업경영시스템학회지
    • /
    • 제41권2호
    • /
    • pp.56-64
    • /
    • 2018
  • In recent years, imbalanced data is one of the most important and frequent issue for quality control in industrial field. As an example, defect rate has been drastically reduced thanks to highly developed technology and quality management, so that only few defective data can be obtained from production process. Therefore, quality classification should be performed under the condition that one class (defective dataset) is even smaller than the other class (good dataset). However, traditional multi-class classification methods are not appropriate to deal with such an imbalanced dataset, since they classify data from the difference between one class and the others that can hardly be found in imbalanced datasets. Thus, one-class classification that thoroughly learns patterns of target class is more suitable for imbalanced dataset since it only focuses on data in a target class. So far, several one-class classification methods such as one-class support vector machine, neural network and decision tree there have been suggested. One-class support vector machine and neural network can guarantee good classification rate, and decision tree can provide a set of rules that can be clearly interpreted. However, the classifiers obtained from the former two methods consist of complex mathematical functions and cannot be easily understood by users. In case of decision tree, the criterion for rule generation is ambiguous. Therefore, as an alternative, a new one-class classifier using hyper-rectangles was proposed, which performs precise classification compared to other methods and generates rules clearly understood by users as well. In this paper, we suggest an approach for improving the limitations of those previous one-class classification algorithms. Specifically, the suggested approach produces more improved one-class classifier using hyper-rectangles generated by using Gaussian function. The performance of the suggested algorithm is verified by a numerical experiment, which uses several datasets in UCI machine learning repository.