• Title/Summary/Keyword: Training Data Set

Search Result 814, Processing Time 0.03 seconds

The bootstrap VQ model for automatic speaker recognition system (VQ 방식의 화자인식 시스템 성능 향상을 위한 부쓰트랩 방식 적용)

  • Kyung YounJeong;Lee Jin-Ick;Lee Hwang-Soo
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.39-42
    • /
    • 2000
  • A bootstrap and aggregating (bagging) vector quantization (VQ) classifier is proposed for speaker recognition. This method obtains multiple training data sets by resampling the original training data set, and then integrates the corresponding multiple classifiers into a single classifier. Experiments involving a closed set, text-independent and speaker identification system are carried out using the TIMIT database. The proposed bagging VQ classifier shows considerably improved performance over the conventional VQ classifier.

  • PDF

The Cover Classification using Landsat TM and KOMPSAT-1 EOC Remotely Sensed Imagery -Yongdamdam Watershed- (Landsat TM KOMPSAT-1 EOC 영상을 이용한 용담댐 유역의 토지피복분류(수공))

  • 권형중;장철희;김성준
    • Proceedings of the Korean Society of Agricultural Engineers Conference
    • /
    • 2000.10a
    • /
    • pp.419-424
    • /
    • 2000
  • The land cover classification by using remotely sensed image becomes necessary and useful for hydrologic and water quality related applications. The purpose of this study is to obtain land classification map by using remotely sensed data : Landsat TM and KOMPSAT-1 EOC. The classification was conducted by maximum likelihood method with training set and Tasseled Cap Transform. The best result was obtain from the Landsat TM merged by KOMPSAT EOC, that is, similar with statistical data. This is caused by setting more precise training set with the enhanced spatial resolution by using KOMPSAT EOC(6.6m${\times}$6.6m).

  • PDF

Developing an Ensemble Classifier for Bankruptcy Prediction (부도 예측을 위한 앙상블 분류기 개발)

  • Min, Sung-Hwan
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.7
    • /
    • pp.139-148
    • /
    • 2012
  • An ensemble of classifiers is to employ a set of individually trained classifiers and combine their predictions. It has been found that in most cases the ensembles produce more accurate predictions than the base classifiers. Combining outputs from multiple classifiers, known as ensemble learning, is one of the standard and most important techniques for improving classification accuracy in machine learning. An ensemble of classifiers is efficient only if the individual classifiers make decisions as diverse as possible. Bagging is the most popular method of ensemble learning to generate a diverse set of classifiers. Diversity in bagging is obtained by using different training sets. The different training data subsets are randomly drawn with replacement from the entire training dataset. The random subspace method is an ensemble construction technique using different attribute subsets. In the random subspace, the training dataset is also modified as in bagging. However, this modification is performed in the feature space. Bagging and random subspace are quite well known and popular ensemble algorithms. However, few studies have dealt with the integration of bagging and random subspace using SVM Classifiers, though there is a great potential for useful applications in this area. The focus of this paper is to propose methods for improving SVM performance using hybrid ensemble strategy for bankruptcy prediction. This paper applies the proposed ensemble model to the bankruptcy prediction problem using a real data set from Korean companies.

Feature Selection of Training set for Supervised Classification of Satellite Imagery (위성영상의 감독분류를 위한 훈련집합의 특징 선택에 관한 연구)

  • 곽장호;이황재;이준환
    • Korean Journal of Remote Sensing
    • /
    • v.15 no.1
    • /
    • pp.39-50
    • /
    • 1999
  • It is complicate and time-consuming process to classify a multi-band satellite imagery according to the application. In addition, classification rate sensitively depends on the selection of training data set and features in a supervised classification process. This paper introduced a classification network adopting a fuzzy-based $\gamma$-model in order to select a training data set and to extract feature which highly contribute to an actual classification. The features used in the classification were gray-level histogram, textures, and NDVI(Normalized Difference Vegetation Index) of target imagery. Moreover, in order to minimize the errors in the classification network, the Gradient Descent method was used in the training process for the $\gamma$-parameters at each code used. The trained parameters made it possible to know the connectivity of each node and to delete the void features from all the possible input features.

Prediction of plasma etching using genetic-algorithm controlled backpropagation neural network

  • Kim, Sung-Mo;Kim, Byung-Whan
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1305-1308
    • /
    • 2003
  • A new technique is presented to construct a predictive model of plasma etch process. This was accomplished by combining a backpropagation neural network (BPNN) and a genetic algorithm (GA). The predictive model constructed in this way is referred to as a GA-BPNN. The GA played a role of controlling training factors simultaneously. The training factors to be optimized are the hidden neuron, training tolerance, initial weight magnitude, and two gradients of bipolar sigmoid and linear functions. Each etch response was optimized separately. The proposed scheme was evaluated with a set of experimental plasma etch data. The etch process was characterized by a $2^3$ full factorial experiment. The etch responses modeled are aluminum (A1) etch rate, silica profile angle, A1 selectivity, and dc bias. Additional test data were prepared to evaluate model appropriateness. The GA-BPNN was compared to a conventional BPNN. Compared to the BPNN, the GA-BPNN demonstrated an improvement of more than 20% for all etch responses. The improvement was significant in the case of A1 etch rate.

  • PDF

Improved Focused Sampling for Class Imbalance Problem (클래스 불균형 문제를 해결하기 위한 개선된 집중 샘플링)

  • Kim, Man-Sun;Yang, Hyung-Jeong;Kim, Soo-Hyung;Cheah, Wooi Ping
    • The KIPS Transactions:PartB
    • /
    • v.14B no.4
    • /
    • pp.287-294
    • /
    • 2007
  • Many classification algorithms for real world data suffer from a data class imbalance problem. To solve this problem, various methods have been proposed such as altering the training balance and designing better sampling strategies. The previous methods are not satisfy in the distribution of the input data and the constraint. In this paper, we propose a focused sampling method which is more superior than previous methods. To solve the problem, we must select some useful data set from all training sets. To get useful data set, the proposed method devide the region according to scores which are computed based on the distribution of SOM over the input data. The scores are sorted in ascending order. They represent the distribution or the input data, which may in turn represent the characteristics or the whole data. A new training dataset is obtained by eliminating unuseful data which are located in the region between an upper bound and a lower bound. The proposed method gives a better or at least similar performance compare to classification accuracy of previous approaches. Besides, it also gives several benefits : ratio reduction of class imbalance; size reduction of training sets; prevention of over-fitting. The proposed method has been tested with kNN classifier. An experimental result in ecoli data set shows that this method achieves the precision up to 2.27 times than the other methods.

Refinement of Ground Truth Data for X-ray Coronary Artery Angiography (CAG) using Active Contour Model

  • Dongjin Han;Youngjoon Park
    • International journal of advanced smart convergence
    • /
    • v.12 no.4
    • /
    • pp.134-141
    • /
    • 2023
  • We present a novel method aimed at refining ground truth data through regularization and modification, particularly applicable when working with the original ground truth set. Enhancing the performance of deep neural networks is achieved by applying regularization techniques to the existing ground truth data. In many machine learning tasks requiring pixel-level segmentation sets, accurately delineating objects is vital. However, it proves challenging for thin and elongated objects such as blood vessels in X-ray coronary angiography, often resulting in inconsistent generation of ground truth data. This method involves an analysis of the quality of training set pairs - comprising images and ground truth data - to automatically regulate and modify the boundaries of ground truth segmentation. Employing the active contour model and a recursive ground truth generation approach results in stable and precisely defined boundary contours. Following the regularization and adjustment of the ground truth set, there is a substantial improvement in the performance of deep neural networks.

CT-Based Radiomics Signature for Preoperative Prediction of Coagulative Necrosis in Clear Cell Renal Cell Carcinoma

  • Kai Xu;Lin Liu;Wenhui Li;Xiaoqing Sun;Tongxu Shen;Feng Pan;Yuqing Jiang;Yan Guo;Lei Ding;Mengchao Zhang
    • Korean Journal of Radiology
    • /
    • v.21 no.6
    • /
    • pp.670-683
    • /
    • 2020
  • Objective: The presence of coagulative necrosis (CN) in clear cell renal cell carcinoma (ccRCC) indicates a poor prognosis, while the absence of CN indicates a good prognosis. The purpose of this study was to build and validate a radiomics signature based on preoperative CT imaging data to estimate CN status in ccRCC. Materials and Methods: Altogether, 105 patients with pathologically confirmed ccRCC were retrospectively enrolled in this study and then divided into training (n = 72) and validation (n = 33) sets. Thereafter, 385 radiomics features were extracted from the three-dimensional volumes of interest of each tumor, and 10 traditional features were assessed by two experienced radiologists using triple-phase CT-enhanced images. A multivariate logistic regression algorithm was used to build the radiomics score and traditional predictors in the training set, and their performance was assessed and then tested in the validation set. The radiomics signature to distinguish CN status was then developed by incorporating the radiomics score and the selected traditional predictors. The receiver operating characteristic (ROC) curve was plotted to evaluate the predictive performance. Results: The area under the ROC curve (AUC) of the radiomics score, which consisted of 7 radiomics features, was 0.855 in the training set and 0.885 in the validation set. The AUC of the traditional predictor, which consisted of 2 traditional features, was 0.843 in the training set and 0.858 in the validation set. The radiomics signature showed the best performance with an AUC of 0.942 in the training set, which was then confirmed with an AUC of 0.969 in the validation set. Conclusion: The CT-based radiomics signature that incorporated radiomics and traditional features has the potential to be used as a non-invasive tool for preoperative prediction of CN in ccRCC.

Object Detection using Fuzzy Adaboost (퍼지 Adaboost를 이용한 객체 검출)

  • Kim, Kisang;Choi, Hyung-Il
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.5
    • /
    • pp.104-112
    • /
    • 2016
  • The Adaboost chooses a good set of features in rounds. On each round, it chooses the optimal feature and its threshold value by minimizing the weighted error of classification. The involved process of classification performs a hard decision. In this paper, we expand the process of classification to a soft fuzzy decision. We believe this expansion could allow some flexibility to the Adaboost algorithm as well as a good performance especially when the size of a training data set is not large enough. The typical Adaboost algorithm assigns a same weight to each training datum on the first round of a training process. We propose a new algorithm to assign different initial weights based on some statistical properties of involved features. In experimental results, we assess that the proposed method shows higher performance than the traditional one.

Boosting Algorithms for Large-Scale Data and Data Batch Stream (대용량 자료와 순차적 자료를 위한 부스팅 알고리즘)

  • Yoon, Young-Joo
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.1
    • /
    • pp.197-206
    • /
    • 2010
  • In this paper, we propose boosting algorithms when data are very large or coming in batches sequentially over time. In this situation, ordinary boosting algorithm may be inappropriate because it requires the availability of all of the training set at once. To apply to large scale data or data batch stream, we modify the AdaBoost and Arc-x4. These algorithms have good results for both large scale data and data batch stream with or without concept drift on simulated data and real data sets.