• 제목/요약/키워드: Training Samples

검색결과 565건 처리시간 0.025초

가상 데이터와 융합 분류기에 기반한 얼굴인식 (Face Recognition based on Hybrid Classifiers with Virtual Samples)

  • 류연식;오세영
    • 전자공학회논문지CI
    • /
    • 제40권1호
    • /
    • pp.19-29
    • /
    • 2003
  • 본 논문은 인위적으로 생성된 가상 학습 데이터와 융합 분류기를 이용한 얼굴인식 알고리즘을 제안한다. 특징공간에서의 최근접 특징 선택 방법과 연결주의 모델에 기반한 서로 다른 형태의 분류기를 융합하여 통합효과를 얻도록 하였다. 두 분류기는 모두 학습 데이터의 공간적인 분포에 따라 생성된 가상 학습데이터를 이용하여 학습되고 이용된다. 첫째로, 특징 공간에서의 각 정보(Angular Infnrmation) 를 이용하는 최근접특징각(the Nearest Feature Angle : NFA)을 이용하여 저장된 학습데이터와 가장 근접한 것을 찾고, 둘째로, 질의(Query) 얼굴 특징 정보를 정면얼굴 영상의 특징정보로 투영하여 얻은 정보에 기반한 분류기의 결과를 이용한다. 정면영상 특징정보로의 투영은 다층 신경망을 이용하여 정면 회상망(Frontal Recall Network)을 구현하였고, 이것을 여러 개 묶어 앙상블 네트웍으로 구성한 Ensemble 회상망(Ensemble Recall Network)을 사용하여 일반화 성능을 향상시켰다. 끝으로, 각 분류기의 결과에 따라 융합 분류기가 최종 결과를 선택하도록 하였다. 제안된 알고리즘을 6 종류의 서고 다른 학습/시험데이터 군에 적용하여 평균 96.33%의 인식률을 얻었다. 이것은 특징라인에 기반한 방법(the Nearest Feature Line) 평균 에러율의 61.2% 이며, 단일 분류기를 사용한 경우 보다 안정된 견과를 얻고 있다.

Improving the Error Back-Propagation Algorithm for Imbalanced Data Sets

  • Oh, Sang-Hoon
    • International Journal of Contents
    • /
    • 제8권2호
    • /
    • pp.7-12
    • /
    • 2012
  • Imbalanced data sets are difficult to be classified since most classifiers are developed based on the assumption that class distributions are well-balanced. In order to improve the error back-propagation algorithm for the classification of imbalanced data sets, a new error function is proposed. The error function controls weight-updating with regards to the classes in which the training samples are. This has the effect that samples in the minority class have a greater chance to be classified but samples in the majority class have a less chance to be classified. The proposed method is compared with the two-phase, threshold-moving, and target node methods through simulations in a mammography data set and the proposed method attains the best results.

근적외선분광법을 이용한 택사의 산지 판별법 연구 (Discrimination of Alismatis Rhizoma According to Geographical Origins using Near Infrared Spectroscopy)

  • 이동영;김승현;김효진;성상현
    • 생약학회지
    • /
    • 제44권4호
    • /
    • pp.344-349
    • /
    • 2013
  • Near infrared spectroscopy (NIRS) combined with multivariate analysis was used to discriminate the geographical origin of Alisma orientale from Korea (n=94) and China (n=72). Two-thirds of samples were selected randomly for the training set, and one-third of samples for the test set. Second derivative was used for the pretreatment of NIR spectra. Partial least square discriminant analysis (PLS-DA) models correctly discriminated 100% of the Korean and Chinese A. orientale samples. These results demonstrate the potential use of NIR spectroscopy combined with multivariate analysis as a rapid and accurate method to discriminate A. orientale according to their geographical origin.

Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap

  • Kim Ji-Hyun;Cha Eun-Song
    • Communications for Statistical Applications and Methods
    • /
    • 제13권1호
    • /
    • pp.151-165
    • /
    • 2006
  • It is important to estimate the true misclassification rate of a given classifier when an independent set of test data is not available. Cross-validation and bootstrap are two possible approaches in this case. In related literature bootstrap estimators of the true misclassification rate were asserted to have better performance for small samples than cross-validation estimators. We compare the two estimators empirically when the classification rule is so adaptive to training data that its apparent misclassification rate is close to zero. We confirm that bootstrap estimators have better performance for small samples because of small variance, and we have found a new fact that their bias tends to be significant even for moderate to large samples, in which case cross-validation estimators have better performance with less computation.

Integrating Spatial Proximity with Manifold Learning for Hyperspectral Data

  • Kim, Won-Kook;Crawford, Melba M.;Lee, Sang-Hoon
    • 대한원격탐사학회지
    • /
    • 제26권6호
    • /
    • pp.693-703
    • /
    • 2010
  • High spectral resolution of hyperspectral data enables analysis of complex natural phenomena that is reflected on the data nonlinearly. Although many manifold learning methods have been developed for such problems, most methods do not consider the spatial correlation between samples that is inherent and useful in remote sensing data. We propose a manifold learning method which directly combines the spatial proximity and the spectral similarity through kernel PCA framework. A gain factor caused by spatial proximity is first modelled with a heat kernel, and is added to the original similarity computed from the spectral values of a pair of samples. Parameters are tuned with intelligent grid search (IGS) method for the derived manifold coordinates to achieve optimal classification accuracies. Of particular interest is its performance with small training size, because labelled samples are usually scarce due to its high acquisition cost. The proposed spatial kernel PCA (KPCA) is compared with PCA in terms of classification accuracy with the nearest-neighbourhood classification method.

Handwritten Hangul Graphemes Classification Using Three Artificial Neural Networks

  • Aaron Daniel Snowberger;Choong Ho Lee
    • Journal of information and communication convergence engineering
    • /
    • 제21권2호
    • /
    • pp.167-173
    • /
    • 2023
  • Hangul is unique compared to other Asian languages because of its simple letter forms that combine to create syllabic shapes. There are 24 basic letters that can be combined to form 27 additional complex letters. This produces 51 graphemes. Hangul optical character recognition has been a research topic for some time; however, handwritten Hangul recognition continues to be challenging owing to the various writing styles, slants, and cursive-like nature of the handwriting. In this study, a dataset containing thousands of samples of 51 Hangul graphemes was gathered from 110 freshmen university students to create a robust dataset with high variance for training an artificial neural network. The collected dataset included 2200 samples for each consonant grapheme and 1100 samples for each vowel grapheme. The dataset was normalized to the MNIST digits dataset, trained in three neural networks, and the obtained results were compared.

GC-MS 기반 대사체학 기법을 이용한 택사의 산지판별모델 (Discrimination Model of Cultivation Area of Alismatis Rhizoma using a GC-MS-Based Metabolomics Approach)

  • 임재윤
    • 약학회지
    • /
    • 제60권1호
    • /
    • pp.29-35
    • /
    • 2016
  • Traditional Korean medicines may be managed more scientifically, through the development of logical criterion to verify their cultivation region. It contributes to advance the industry of traditional herbal medicines. Volatile compounds were obtained from 14 samples of domestic Taeksa and 30 samples of Chinese Taeksa by steam distillation. The metabolites were identified by NIST mass spectral library in the obtained gas chromatography/mass spectrometer (GC/MS) data of 35 training samples. The multivariate statistical analysis, such as Principal Component Analysis (PCA), Partial Least Squares Discriminant Analysis (PLS-DA), and Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA), were performed based on the qualitative and quantitative data. Finally trans-(2,3-diphenylcyclopropyl)methyl phenyl sulfoxide (47.265 min), 1,2,3,4-tetrahydro-1-phenyl-naphthalene (47.781 min), spiro[4-oxatricyclo[5.3.0.0.(2,6)]decan-3-one-5,2'-cyclohexane] (54.62 min), 6-[7-nitrobenzofurazan-4-yl]amino-morphinan-4,5-epoxy (54.86 min), p-hydroxynorephedrine (55.14 min) were determined as marker metabolites to verify candidates for the origin of Taeksa. The statistical model was well established to determine the origin of Taeksa. The cultivation areas of test samples, each 3 domestic and 6 Chinese Taeksa were predicted by the established OPLS-DA model and it was confirmed that all 9 samples were precisely classified.

The Effects of Endurance Training on the Hemogram of the Horse

  • Fan, Y.K.;Hsu, J.C.;Peh, H.C.;Tsang, C.L.;Cheng, S.P.;Chiu, S.C.;Ju, J.C.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제15권9호
    • /
    • pp.1348-1353
    • /
    • 2002
  • The purpose of this study was to evaluate the changes and readjustment capacity in the hematological characteristics of the horse during and after a prolonged training program. One pony and two hot-blooded horses were used in this study. Resting or basal blood parameters were assessed by collecting blood samples of the animals for 1 to 2 months prior to start of the training program. Each animal was subjected to arbitrary exercise for 30 min by an automatic hot trotter and was bled at 0, 15, 30, 45 (15 min of recovery), 60 (30 min of recovery), and 75 min (45 min of recovery) after onset of exercise. All animals were exercised 3 times a week over a fivemonth period. Hematological parameters including average white blood cell counts (WBC, ${\times}$$10^3$/$\mu$l), erythrocyte concentrations (RBC, ${\times}$$10^6$/$\mu$l), hematocrit (HCT, %), mean corpuscular volume (MCV, fl), number of platelets (PLT, ${\times}$$10^4$/$\mu$l), hemoglobin concentration (Hb, g/dl), mean corpuscular hemoglobin (MCH, pg), and mean corpuscular hemoglobin concentration (MCHC, g/dl) were analyzed using an automatic cell counter. All animals showed that RBC, WBC, and HCT were significantly (p<0.05) increasing from 7.09, 8.55, and 43.5 to 8.11, 9.67, and 49.5, respectively, during the 30 min of exercise and were back to or lower than the initial basis (resting and 0 min) 30 min after exercise. However, no significant differences were detected in MCV (50.3-51.3 fl), MCH (17.2-17.4 pg), and MCHC (33.7-34.4 g/dl) values (p>0.05) regardless of the training periods. Similar trends were observed after 1, 3, 4, and 5 months of training when compared to the resting state. When these parameters were analyzed by the effect of training periods (month), mean WBC concentrations significantly reduced in the fourth and fifth month after onset of training compared to that in resting condition or the first month of training program (p<0.05). The RBC values elevated at the second month (9.40) and reaching a significantly low level (p<0.001) at the fifth month (8.62) after training compared to the first month of training (7.89). In conclusion, a mild training program enhances blood parameters gradually in both the horse and the pony. Therefore, an optimized training program is beneficial in promoting the endurance performance of the horse.

앙상블 구성을 이용한 SVM 분류성능의 향상 (Improving SVM Classification by Constructing Ensemble)

  • 제홍모;방승양
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제30권3_4호
    • /
    • pp.251-258
    • /
    • 2003
  • Support Vector Machine(SVM)은 이론상으로 좋은 일반화 성능을 보이지만, 실제적으로 구현된 SVM은 이론적인 성능에 미치지 못한다. 주 된 이유는 시간, 공간상의 높은 복잡도로 인해 근사화된 알고리듬으로 구현하기 때문이다. 본 논문은 SVM의 분류성능을 향상시키기 위해 Bagging(Bootstrap aggregating)과 Boosting을 이용한 SVM 앙상블 구조의 구성을 제안한다. SVM 앙상블의 학습에서 Bagging은 각각의 SVM의 학습데이타는 전체 데이타 집합에서 임의적으로 일부 추출되며, Boosting은 SVM 분류기의 에러와 연관된 확률분포에 따라 학습데이타를 추출한다. 학습단계를 마치면 다수결 (Majority voting), 최소자승추정법(LSE:Least Square estimation), 2단계 계층적 SVM등의 기법에 개개의 SVM들의 출력 값들이 통합되어진다. IRIS 분류, 필기체 숫자인식, 얼굴/비얼굴 분류와 같은 여러 실험들의 결과들은 제안된 SVM 앙상블의 분류성능이 단일 SVM보다 뛰어남을 보여준다.

지역화된 템플릿기반 동적 시간정합을 이용한 모바일 제스처인식 (Mobile Gesture Recognition using Dynamic Time Warping with Localized Template)

  • 최봉환;민준기;조성배
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제16권4호
    • /
    • pp.482-486
    • /
    • 2010
  • 최근 모바일기기에 탑재된 가속도 센서가 제스처기반 모바일 사용자 인터페이스에 활용됨에 따라 동적시간정합(Dynamic Time Warping, DTW)기반 인식기에 대한 연구가 활발하다. DTW는 학습샘플을 매칭 템플릿으로 사용하기 때문에 별도의 학습과정이 없다. 하지만 인식시 입력 데이터를 모든 템플릿과 비교해야하기 때문에 계산복잡도로 인하여 모바일환경에 적용하기 어렵다. 본 논문에서는 이러한 문제를 해결하기 위해 지역화된 소수의 템플릿을 사용하는 DTW기반 제스쳐 인식기를 제안한다. 지역화된 템플릿은 k-평균 클러스터링(k-means clustering)알고리즘을 사용하여 학습 제스처 셋의 유사한 패턴들을 k개의 그룹으로 묶고, 각 그룹의 중심(centroid)에 가까운 패턴을 DTW인식기의 템플릿으로 선택한다. 이러한 방법으로 템플릿수를 줄여 인식속도를 향상하고, 템플릿의 다양성을 유지하여 인식성능저하를 최소화한다. 실험 결과 제안하는 방법이 학습 템플릿을 전부 사용하는 DTW보다 약 5배 빠른 인식속도를 보였으며, 템플릿을 임의로 선택한 경우보다 안정적인 성능을 보임을 확인했다.