• 제목/요약/키워드: Large-set Classification

검색결과 183건 처리시간 0.026초

대용량 분류에서 SVM과 신경망의 성능 비교 (Performance comparison of SVM and neural networks for large-set classification problems)

  • 이진선;김영원;오일석
    • 정보처리학회논문지B
    • /
    • 제12B권1호
    • /
    • pp.25-30
    • /
    • 2005
  • 이 논문은 대용량 분류 문제를 위한 모듈러 신경망(modular feedforward MLP)과 SVM(Support Vector Machine)의 성능을 비교 분석하였다. 전반적으로 SVM이 상당한 성능 차이로 우수함을 확인하였다. 또한 부류 수가 많아짐에 따라 SVM이 신경망보다 완만하게 성능 저하가 있음도 확인하였다. 또한 기각에 따른 정인식률 추이를 분석하였고, 대용량 분류에 적합한 SVM 파라메터(kernel 함수와 관련 변수들)를 도출하였다.

Support Vector Machine Classification Using Training Sets of Small Mixed Pixels: An Appropriateness Assessment of IKONOS Imagery

  • Yu, Byeong-Hyeok;Chi, Kwang-Hoon
    • 대한원격탐사학회지
    • /
    • 제24권5호
    • /
    • pp.507-515
    • /
    • 2008
  • Many studies have generally used a large number of pure pixels as an approach to training set design. The training set are used, however, varies between classifiers. In the recent research, it was reported that small mixed pixels between classes are actually more useful than larger pure pixels of each class in Support Vector Machine (SVM) classification. We evaluated a usability of small mixed pixels as a training set for the classification of high-resolution satellite imagery. We presented an advanced approach to obtain a mixed pixel readily, and evaluated the appropriateness with the land cover classification from IKONOS satellite imagery. The results showed that the accuracy of the classification based on small mixed pixels is nearly identical to the accuracy of the classification based on large pure pixels. However, it also showed a limitation that small mixed pixels used may provide insufficient information to separate the classes. Small mixed pixels of the class border region provide cost-effective training sets, but its use with other pixels must be considered in use of high-resolution satellite imagery or relatively complex land cover situations.

애완동물 분류를 위한 딥러닝 (Deep Learning for Pet Image Classification)

  • 신광성;신성윤
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2019년도 춘계학술대회
    • /
    • pp.151-152
    • /
    • 2019
  • 본 논문에서는 동물 이미지 분류를위한 작은 데이터 세트를 기반으로 개선 된 심층 학습 방법을 제안한다. 첫째, CNN은 소규모 데이터 세트에 대한 교육 모델을 작성하고 데이터 세트를 사용하여 교육 세트의 데이터 세트를 확장하는 데 사용된다. 둘째, VGG16과 같은 대규모 데이터 세트에 사전 훈련 된 네트워크를 사용하여 작은 데이터 세트의 병목을 추출하여 새로운 교육 데이터 세트 및 테스트 데이터 세트로 두 개의 NumPy 파일에 저장하고, 마지막으로 완전히 연결된 네트워크를 새로운 데이터 세트로 학습한다.

  • PDF

대용량 자료에서 핵심적인 소수의 변수들의 선별과 로지스틱 회귀 모형의 전개 (Screening Vital Few Variables and Development of Logistic Regression Model on a Large Data Set)

  • 임용빈;조재연;엄경아;이선아
    • 품질경영학회지
    • /
    • 제34권2호
    • /
    • pp.129-135
    • /
    • 2006
  • In the advance of computer technology, it is possible to keep all the related informations for monitoring equipments in control and huge amount of real time manufacturing data in a data base. Thus, the statistical analysis of large data sets with hundreds of thousands observations and hundred of independent variables whose some of values are missing at many observations is needed even though it is a formidable computational task. A tree structured approach to classification is capable of screening important independent variables and their interactions. In a Six Sigma project handling large amount of manufacturing data, one of the goals is to screen vital few variables among trivial many variables. In this paper we have reviewed and summarized CART, C4.5 and CHAID algorithms and proposed a simple method of screening vital few variables by selecting common variables screened by all the three algorithms. Also how to develop a logistics regression model on a large data set is discussed and illustrated through a large finance data set collected by a credit bureau for th purpose of predicting the bankruptcy of the company.

웨이블릿에 기반한 시그널 형태를 지닌 대형 자료의 feature 추출 방법 (A Wavelet based Feature Selection Method to Improve Classification of Large Signal-type Data)

  • 장우성;장우진
    • 대한산업공학회지
    • /
    • 제32권2호
    • /
    • pp.133-140
    • /
    • 2006
  • Large signal type data sets are difficult to classify, especially if the data sets are non-stationary. In this paper, large signal type and non-stationary data sets are wavelet transformed so that distinct features of the data are extracted in wavelet domain rather than time domain. For the classification of the data, a few wavelet coefficients representing class properties are employed for statistical classification methods : Linear Discriminant Analysis, Quadratic Discriminant Analysis, Neural Network etc. The application of our wavelet-based feature selection method to a mass spectrometry data set for ovarian cancer diagnosis resulted in 100% classification accuracy.

에너지절약 DB 구축을 위한 수송부문 분류지표 설정 (A Study on Development of Classification Indicators in Transportation Sector Energy Conservation DB)

  • 임기추
    • 에너지공학
    • /
    • 제25권3호
    • /
    • pp.149-156
    • /
    • 2016
  • 본고의 목적은 국내 수송부문 에너지절약 및 에너지효율 향상 정책효과 분석 및 평가를 위한 기초 DB의 구축범위를 도출하는 것이다. 국내외 사례분석에 기초하여 도출한 대분류 항목은 에너지소비, 에너지원단위, 이산화탄소 또는 온실가스 배출량, 경제지표, 수송량/수송실적, 자동차 관련 기초자료 등이다. 전문가 의견조사에 의해 에너지 소비, 수송량/수송실적, 에너지효율/에너지원단위, 자동차, 에너지경제, 에너지환경 등 대분류 도출 하에, 하위 항목으로 세분하여, 각 구성항목에 대한 세부 분류에 대한 정보를 반영할 수 있는 분류지표로 설정하였다.

Rough Set-Based Approach for Automatic Emotion Classification of Music

  • Baniya, Babu Kaji;Lee, Joonwhoan
    • Journal of Information Processing Systems
    • /
    • 제13권2호
    • /
    • pp.400-416
    • /
    • 2017
  • Music emotion is an important component in the field of music information retrieval and computational musicology. This paper proposes an approach for automatic emotion classification, based on rough set (RS) theory. In the proposed approach, four different sets of music features are extracted, representing dynamics, rhythm, spectral, and harmony. From the features, five different statistical parameters are considered as attributes, including up to the $4^{th}$ order central moments of each feature, and covariance components of mutual ones. The large number of attributes is controlled by RS-based approach, in which superfluous features are removed, to obtain indispensable ones. In addition, RS-based approach makes it possible to visualize which attributes play a significant role in the generated rules, and also determine the strength of each rule for classification. The experiments have been performed to find out which audio features and which of the different statistical parameters derived from them are important for emotion classification. Also, the resulting indispensable attributes and the usefulness of covariance components have been discussed. The overall classification accuracy with all statistical parameters has recorded comparatively better than currently existing methods on a pair of datasets.

불균형 이분 데이터 분류분석을 위한 데이터마이닝 절차 (A Data Mining Procedure for Unbalanced Binary Classification)

  • 정한나;이정화;전치혁
    • 대한산업공학회지
    • /
    • 제36권1호
    • /
    • pp.13-21
    • /
    • 2010
  • The prediction of contract cancellation of customers is essential in insurance companies but it is a difficult problem because the customer database is large and the target or cancelled customers are a small proportion of the database. This paper proposes a new data mining approach to the binary classification by handling a large-scale unbalanced data. Over-sampling, clustering, regularized logistic regression and boosting are also incorporated in the proposed approach. The proposed approach was applied to a real data set in the area of insurance and the results were compared with some other classification techniques.

Training Data Sets Construction from Large Data Set for PCB Character Recognition

  • NDAYISHIMIYE, Fabrice;Gang, Sumyung;Lee, Joon Jae
    • Journal of Multimedia Information System
    • /
    • 제6권4호
    • /
    • pp.225-234
    • /
    • 2019
  • Deep learning has become increasingly popular in both academic and industrial areas nowadays. Various domains including pattern recognition, Computer vision have witnessed the great power of deep neural networks. However, current studies on deep learning mainly focus on quality data sets with balanced class labels, while training on bad and imbalanced data set have been providing great challenges for classification tasks. We propose in this paper a method of data analysis-based data reduction techniques for selecting good and diversity data samples from a large dataset for a deep learning model. Furthermore, data sampling techniques could be applied to decrease the large size of raw data by retrieving its useful knowledge as representatives. Therefore, instead of dealing with large size of raw data, we can use some data reduction techniques to sample data without losing important information. We group PCB characters in classes and train deep learning on the ResNet56 v2 and SENet model in order to improve the classification performance of optical character recognition (OCR) character classifier.

A New Distributed Parallel Algorithm for Pattern Classification using Neural Network Model

  • 김대수;백순철
    • ETRI Journal
    • /
    • 제13권2호
    • /
    • pp.34-41
    • /
    • 1991
  • In this paper, a new distributed parallel algorithm for pattern classification based upon Self-Organizing Neural Network(SONN)[10-12] is developed. This system works without any information about the number of clusters or cluster centers. The SONN model showed good performance for finding classification information, cluster centers, the number of salient clusters and membership information. It took a considerable amount of time in the sequential version if the input data set size is very large. Therefore, design of parallel algorithm is desirous. A new distributed parallel algorithm is developed and experimental results are presented.

  • PDF