• 제목/요약/키워드: Training Data Set

검색결과 814건 처리시간 0.028초

Hybrid Linear Analysis Based on the Net Analyte Signal in Spectral Response with Orthogonal Signal Correction

  • Park, Kwang-Su;Jun, Chi-Hyuck
    • Near Infrared Analysis
    • /
    • 제1권2호
    • /
    • pp.1-8
    • /
    • 2000
  • Using the net analyte signal, hybrid linear analysis was proposed to predict chemical concentration. In this paper, we select a sample from training set and apply orthogonal signal correction to obtain an improved pseudo unit spectrum for hybrid least analysis. using the mean spectrum of a calibration training set, we first show the calibration by hybrid least analysis is effective to the prediction of not only chemical concentrations but also physical property variables. Then, a pseudo unit spectrum from a training set is also tested with and without orthogonal signal correction. We use two data sets, one including five chemical concentrations and the other including ten physical property variables, to compare the performance of partial least squares and modified hybrid least analysis calibration methods. The results show that the hybrid least analysis with a selected training spectrum instead of well-measured pure spectrum still gives good performances, which is a little better than partial least squares.

가상 훈련 데이터를 사용하는 소프트웨어 품질 분류 모델 (Software Quality Classification Model using Virtual Training Data)

  • 홍의석
    • 한국콘텐츠학회논문지
    • /
    • 제8권7호
    • /
    • pp.66-74
    • /
    • 2008
  • 소프트웨어 개발 프로세스의 초기 단계에서 결함경향성이 많은 모듈들을 예측하는 위험도 예측 모델은 프로젝트 자원할당에 도움을 주어 전체 시스템의 품질을 개선시키는 역할을 한다. 설계 복잡도 메트릭에 기반을 둔 여러 예측 모델들이 제안 되었지만 대부분 훈련 데이터 집합을 필요로 하는 모델들이었고 훈련 데이터 집합을 보유하고 있지 않은 대부분의 개발 집단들은 이들을 사용할 수 없다는 문제점이 있었다. 본 논문에서는 잘 알려진 감독형 학습 모델인 오류 역전파 신경망 모델에 SDL 시스템 명세를 정량화하여 적용한 예측 모델을 개발하였으며, 기존 학습 모델들의 문제점을 해결하기 위해 이 모델을 여러 제약조건을 가지고 만든 가상 훈련데이터집합으로 학습시켰다. 제안 모델의 사용가능성을 알아보기 위해 몇가지 모의실험을 수행 하였으며, 그 결과 제안 모델이 훈련 데이터 집합이 없는 개발 집단에서는 실제 데이터로 훈련된 예측 모델의 대안으로 사용될 수 있음을 보였다.

Nearest Neighbor Based Prototype Classification Preserving Class Regions

  • Hwang, Doosung;Kim, Daewon
    • Journal of Information Processing Systems
    • /
    • 제13권5호
    • /
    • pp.1345-1357
    • /
    • 2017
  • A prototype selection method chooses a small set of training points from a whole set of class data. As the data size increases, the selected prototypes play a significant role in covering class regions and learning a discriminate rule. This paper discusses the methods for selecting prototypes in a classification framework. We formulate a prototype selection problem into a set covering optimization problem in which the sets are composed with distance metric and predefined classes. The formulation of our problem makes us draw attention only to prototypes per class, not considering the other class points. A training point becomes a prototype by checking the number of neighbors and whether it is preselected. In this setting, we propose a greedy algorithm which chooses the most relevant points for preserving the class dominant regions. The proposed method is simple to implement, does not have parameters to adapt, and achieves better or comparable results on both artificial and real-world problems.

A Survey of Applications of Artificial Intelligence Algorithms in Eco-environmental Modelling

  • Kim, Kang-Suk;Park, Joon-Hong
    • Environmental Engineering Research
    • /
    • 제14권2호
    • /
    • pp.102-110
    • /
    • 2009
  • Application of artificial intelligence (AI) approaches in eco-environmental modeling has gradually increased for the last decade. Comprehensive understanding and evaluation on the applicability of this approach to eco-environmental modeling are needed. In this study, we reviewed the previous studies that used AI-techniques in eco-environmental modeling. Decision Tree (DT) and Artificial Neural Network (ANN) were found to be major AI algorithms preferred by researchers in ecological and environmental modeling areas. When the effect of the size of training data on model prediction accuracy was explored using the data from the previous studies, the prediction accuracy and the size of training data showed nonlinear correlation, which was best-described by hyperbolic saturation function among the tested nonlinear functions including power and logarithmic functions. The hyperbolic saturation equations were proposed to be used as a guideline for optimizing the size of training data set, which is critically important in designing the field experiments required for training AI-based eco-environmental modeling.

대용량 훈련 데이타의 점진적 학습에 기반한 얼굴 검출 방법 (Face Detection Based on Incremental Learning from Very Large Size Training Data)

  • 박지영;이준호
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제31권7호
    • /
    • pp.949-958
    • /
    • 2004
  • 본 연구는 대용량 훈련 데이타를 사용하는 얼굴 검출 분류기의 학습과정에서 새로운 데이터의 추가 학습이 가능한 새로운 방법을 제안한다. 추가되는 데이타로부터 새로운 정보를 학습하여 이미 습득된 기존의 지식을 갱신하는 것이 점진적 학습의 목표이다. 이러한 학습 기법에 기반한 분류기의 설계에서는 최종 분류기가 전체 훈련 데이타 집합의 특성을 반영하는 것이 매우 중요한 문제이다. 제안하는 알고리즘은 최적화된 최종 분류기 획득을 위하여 훈련 집합의 전역적인 특성을 대표하는 검증집합을 생성하고, 이 집단 내에서의 분류성능을 기준으로 중간단계 분류기들의 가중치를 결정한다. 각 중간단계 분류기는 개변 데이타 집합의 학습 결과로써 가중치 기반 결합 방식에 의해 최종 분류기로 구성된다. 반복적인 실험을 통해, 제안한 알고리즘을 사용하여 학습한 얼굴 검출 분류기의 성능이 AdaBoost 및 Learn++기반의 분류기보다 우수한 검출 성능을 보임을 확인하였다.

벡터 앙자화에서 시간 평균 왜곡치의 수렴 특성: II. 훈련된 부호책의 감사 기법 (The Convergence Characteristics of The Time-Averaged Distortion in Vector Quantization: Part II. Applications to Testing Trained Codebooks)

  • Dong Sik Kim
    • 전자공학회논문지B
    • /
    • 제32B권5호
    • /
    • pp.747-755
    • /
    • 1995
  • When codebooks designed by a clustering algorithm using training sets, a time-averaged distortion, which is called the inside-training-set- distortion (ITSD), is usually calculated in each iteration of the algorithm, since the input probability function is unknown in general. The algorithm stops if the ITSD no more significantly decreases. Then, in order to test the trained codebook, the outside-training-set-distortion (OTSD) is to be calculated by a time-averaged approximation using the test set. Hence codebooks that yield small values of the OTSD are regarded as good codebooks. In other words, the calculation of the OTSD is a criterion to testing a trained codebook. But, such an argument is not always true if some conditions are not satisfied. Moreover, in order to obtain an approximation of the OTSD using the test set, it is known that a large test set is requared in general. But, large test set causes heavy calculation com0plexity. In this paper, from the analyses in [16], it has been revealed that the enough size of the test set is only the same as that of the codebook when codebook size is large. Then a simple method to testing trained codebooks is addressed. Experimental results on synthetic data and real images supporting the analysis are also provided and discussed.

  • PDF

다시점 영상 집합을 활용한 선체 블록 분류를 위한 CNN 모델 성능 비교 연구 (Comparison Study of the Performance of CNN Models with Multi-view Image Set on the Classification of Ship Hull Blocks)

  • 전해명;노재규
    • 대한조선학회논문집
    • /
    • 제57권3호
    • /
    • pp.140-151
    • /
    • 2020
  • It is important to identify the location of ship hull blocks with exact block identification number when scheduling the shipbuilding process. The wrong information on the location and identification number of some hull block can cause low productivity by spending time to find where the exact hull block is. In order to solve this problem, it is necessary to equip the system to track the location of the blocks and to identify the identification numbers of the blocks automatically. There were a lot of researches of location tracking system for the hull blocks on the stockyard. However there has been no research to identify the hull blocks on the stockyard. This study compares the performance of 5 Convolutional Neural Network (CNN) models with multi-view image set on the classification of the hull blocks to identify the blocks on the stockyard. The CNN models are open algorithms of ImageNet Large-Scale Visual Recognition Competition (ILSVRC). Four scaled hull block models are used to acquire the images of ship hull blocks. Learning and transfer learning of the CNN models with original training data and augmented data of the original training data were done. 20 tests and predictions in consideration of five CNN models and four cases of training conditions are performed. In order to compare the classification performance of the CNN models, accuracy and average F1-Score from confusion matrix are adopted as the performance measures. As a result of the comparison, Resnet-152v2 model shows the highest accuracy and average F1-Score with full block prediction image set and with cropped block prediction image set.

미리 순서가 매겨진 학습 데이타를 이용한 효과적인 증가학습 (Efficient Incremental Learning using the Preordered Training Data)

  • 이선영;방승양
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제27권2호
    • /
    • pp.97-107
    • /
    • 2000
  • 증가학습은 점진적으로 학습 데이타를 늘려가며 신경망을 학습시킴으로써 일반적으로 학습시간을 단축시킬 뿐만 아니라 신경망의 일반화 성능을 향상시킨다. 그러나, 기존의 증가학습은 학습 데이타를 선정하는 과정에서 데이타의 중요도를 반복적으로 평가한다. 본 논문에서는 분류 문제의 경우 학습이 시작되기 전에 데이타의 중요도를 한 번만 평가한다. 제안된 방법에서는 분류 문제의 경우 클래스 경계에 가까운 데이타일수록 그 데이타의 중요도가 높다고 보고 이러한 데이타를 선택하는 방법을 제시한다. 두가지 합성 데이타와 실세계 데이타의 실험을 통해 제안된 방법이 기존의 방법보다 학습 시간을 단축시키며 일반화 성능을 향상시킴을 보인다.

  • PDF

Utilizing the GOA-RF hybrid model, predicting the CPT-based pile set-up parameters

  • Zhao, Zhilong;Chen, Simin;Zhang, Dengke;Peng, Bin;Li, Xuyang;Zheng, Qian
    • Geomechanics and Engineering
    • /
    • 제31권1호
    • /
    • pp.113-127
    • /
    • 2022
  • The undrained shear strength of soil is considered one of the engineering parameters of utmost significance in geotechnical design methods. In-situ experiments like cone penetration tests (CPT) have been used in the last several years to estimate the undrained shear strength depending on the characteristics of the soil. Nevertheless, the majority of these techniques rely on correlation presumptions, which may lead to uneven accuracy. This research's general aim is to extend a new united soft computing model, which is a combination of random forest (RF) with grasshopper optimization algorithm (GOA) to the pile set-up parameters' better approximation from CPT, based on two different types of data as inputs. Data type 1 contains pile parameters, and data type 2 consists of soil properties. The contribution of this article is that hybrid GOA - RF for the first time, was suggested to forecast the pile set-up parameter from CPT. In order to do this, CPT data and related bore log data were gathered from 70 various locations across Louisiana. With an R2 greater than 0.9098, which denotes the permissible relationship between measured and anticipated values, the results demonstrated that both models perform well in forecasting the set-up parameter. It is comprehensible that, in the training and testing step, the model with data type 2 has finer capability than the model using data type 1, with R2 and RMSE are 0.9272 and 0.0305 for the training step and 0.9182 and 0.0415 for the testing step. All in all, the models' results depict that the A parameter could be forecasted with adequate precision from the CPT data with the usage of hybrid GOA - RF models. However, the RF model with soil features as input parameters results in a finer commentary of pile set-up parameters.

비분류표시 데이타를 이용하는 분류 기반 Co-training 방법 (A Co-training Method based on Classification Using Unlabeled Data)

  • 윤혜성;이상호;박승수;용환승;김주한
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제31권8호
    • /
    • pp.991-998
    • /
    • 2004
  • 생물 정보학 등 많은 응용 분야에서 데이타 분석을 할 때는 적은 수의 분류표시된 데이터 (labeled data)와 많은 수의 비분류표시된 데이타(unlabeled data)가 있을 수 있다 분류표시된 자료는 사람의 노력이 요구되기 때문에 얻기가 어렵고 비용이 많이 들지만, 비분류표시된 자료는 별 어려움 없이 쉽게 얻을 수 있다. 이때 비분류표시된 자료를 이용하여 자료를 분류하고 분석하는데 널리 이용되고 있는 방법이 co-training 알고리즘이다. 이 방법은 적은 수의 분류표시된 자료에서 두 가지 뷰(view)로 각 분류자를 학습한다. 그리고 각 분류자는 분석하고자 하는 모든 비분류표시된 자료에서 가장 만족할만한 예측자들을 만들어 나간다. 이렇게 훈련 데이타 셋에서 실험을 여러 번 반복적으로 하게 되면 각 뷰에서 새로운 분류자가 학습되어 분류표시된 자료의 수가 증가한다. 본 논문에서는 비분류표시된 데이타를 이용하여 새로운 co-training 방법을 제시한다. 이 방법은 두 가지 분류자와 WebKB 및 BIND XML의 2가지 실험 데이타를 가지고 평가하였다. 실험 결과로서, 이 논문에서 제안한 co-training 방법이 분류표시된 자료의 수가 매우 적을 때 분류정확성을 효과적으로 향상시킬 수 있음을 보였다.