• Title/Summary/Keyword: Training Data

Search Result 7,417, Processing Time 0.038 seconds

Semi-supervised Model for Fault Prediction using Tree Methods (트리 기법을 사용하는 세미감독형 결함 예측 모델)

  • Hong, Euyseok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.4
    • /
    • pp.107-113
    • /
    • 2020
  • A number of studies have been conducted on predicting software faults, but most of them have been supervised models using labeled data as training data. Very few studies have been conducted on unsupervised models using only unlabeled data or semi-supervised models using enough unlabeled data and few labeled data. In this paper, we produced new semi-supervised models using tree algorithms in the self-training technique. As a result of the model performance evaluation experiment, the newly created tree models performed better than the existing models, and CollectiveWoods, in particular, outperformed other models. In addition, it showed very stable performance even in the case with very few labeled data.

A New Fast Training Algorithm for Vector Quantizer Design (벡터양자화기의 코드북을 구하는 새로운 고속 학습 알고리듬)

  • Lee, Dae-Ryong;Baek, Seong-Joon;Sung, Koeng-Mo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.5
    • /
    • pp.107-112
    • /
    • 1996
  • In this paper we propose a new fast codebook training algorithm for reducing the searching time of LBG algorithm. For each training data, the proposed algorithm stores the indexes of codewords that are close to that training data in the first iteration. It reduces computation time by searching only those codewords, the indexes of which are stored for each training data. Compared to one of the previous fast training algorithm, FSLBG, it obtains a better codebook with less exccution time. In our experiment, the performance of the codebook generated by the proposed algorithm in terms of peak signal-to-noise ratio(TSNR) is very close to that of LBG algorithm. However, the codewords to be searched for each training data of the proposed algorithm is only about 6%, for a codebook size of 256 and 1.6%, for a codebook size of 1.24, of LBG algorithm.

  • PDF

Deep survey using deep learning: generative adversarial network

  • Park, Youngjun;Choi, Yun-Young;Moon, Yong-Jae;Park, Eunsu;Lim, Beomdu;Kim, Taeyoung
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.2
    • /
    • pp.78.1-78.1
    • /
    • 2019
  • There are a huge number of faint objects that have not been observed due to the lack of large and deep surveys. In this study, we demonstrate that a deep learning approach can produce a better quality deep image from a single pass imaging so that could be an alternative of conventional image stacking technique or the expensive large and deep surveys. Using data from the Sloan Digital Sky Survey (SDSS) stripe 82 which provide repeatedly scanned imaging data, a training data set is constructed: g-, r-, and i-band images of single pass data as an input and r-band co-added image as a target. Out of 151 SDSS fields that have been repeatedly scanned 34 times, 120 fields were used for training and 31 fields for validation. The size of a frame selected for the training is 1k by 1k pixel scale. To avoid possible problems caused by the small number of training sets, frames are randomly selected within that field each iteration of training. Every 5000 iterations of training, the performance were evaluated with RMSE, peak signal-to-noise ratio which is given on logarithmic scale, structural symmetry index (SSIM) and difference in SSIM. We continued the training until a GAN model with the best performance is found. We apply the best GAN-model to NGC0941 located in SDSS stripe 82. By comparing the radial surface brightness and photometry error of images, we found the possibility that this technique could generate a deep image with statistics close to the stacked image from a single-pass image.

  • PDF

Response Modeling with Semi-Supervised Support Vector Regression (준지도 지지 벡터 회귀 모델을 이용한 반응 모델링)

  • Kim, Dong-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.9
    • /
    • pp.125-139
    • /
    • 2014
  • In this paper, I propose a response modeling with a Semi-Supervised Support Vector Regression (SS-SVR) algorithm. In order to increase the accuracy and profit of response modeling, unlabeled data in the customer dataset are used with the labeled data during training. The proposed SS-SVR algorithm is designed to be a batch learning to reduce the training complexity. The label distributions of unlabeled data are estimated in order to consider the uncertainty of labeling. Then, multiple training data are generated from the unlabeled data and their estimated label distributions with oversampling to construct the training dataset with the labeled data. Finally, a data selection algorithm, Expected Margin based Pattern Selection (EMPS), is employed to reduce the training complexity. The experimental results conducted on a real-world marketing dataset showed that the proposed response modeling method trained efficiently, and improved the accuracy and the expected profit.

Efficient Training Data Construction Scheme for Prediction of Transferring Students

  • Lee, Ji-Young;Song, Gyu-Moon;Kim, Tae-Yoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.3
    • /
    • pp.481-488
    • /
    • 2003
  • Kim et al.(2003) studied a prediction model for students likely to transfer. In their study they claim that a training data construction scheme is better than other schemes, which trains neural network on the data from the year right before prediction year. One problem with their claim is that it is based on rather high prediction error rate. In this paper we establish a more sound comparison for various training data construction schemes and check validity of their claim. It turns out that the favored scheme has sufficient advantages over other schemes.

  • PDF

Classification of Class-Imbalanced Data: Effect of Over-sampling and Under-sampling of Training Data (계급불균형자료의 분류: 훈련표본 구성방법에 따른 효과)

  • 김지현;정종빈
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.3
    • /
    • pp.445-457
    • /
    • 2004
  • Given class-imbalanced data in two-class classification problem, we often do over-sampling and/or under-sampling of training data to make it balanced. We investigate the validity of such practice. Also we study the effect of such sampling practice on boosting of classification trees. Through experiments on twelve real datasets it is observed that keeping the natural distribution of training data is the best way if you plan to apply boosting methods to class-imbalanced data.

Location-Based Military Simulation and Virtual Training Management System (위치인식 기반의 군사 시뮬레이션 및 가상훈련 관리 시스템)

  • Jeon, Hyun Min;Kim, Jae Wan
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.1
    • /
    • pp.51-57
    • /
    • 2017
  • The purpose of this study is to design a system that can be used for military simulation and virtual training using the location information of individual soldier's weapons. After acquiring the location information using Arduino's GPS shield, it is designed to transmit data to the Smartphone using Bluetooth Shield, and transmit the data to the server using 3G/4G of Smartphone in real time. The server builds the system to measure, analyze and manage the current position and the tracking information of soldier. Using this proposed system makes it easier to analyze the training situation for individual soldiers and expect better training results.

Video augmentation technique for human action recognition using genetic algorithm

  • Nida, Nudrat;Yousaf, Muhammad Haroon;Irtaza, Aun;Velastin, Sergio A.
    • ETRI Journal
    • /
    • v.44 no.2
    • /
    • pp.327-338
    • /
    • 2022
  • Classification models for human action recognition require robust features and large training sets for good generalization. However, data augmentation methods are employed for imbalanced training sets to achieve higher accuracy. These samples generated using data augmentation only reflect existing samples within the training set, their feature representations are less diverse and hence, contribute to less precise classification. This paper presents new data augmentation and action representation approaches to grow training sets. The proposed approach is based on two fundamental concepts: virtual video generation for augmentation and representation of the action videos through robust features. Virtual videos are generated from the motion history templates of action videos, which are convolved using a convolutional neural network, to generate deep features. Furthermore, by observing an objective function of the genetic algorithm, the spatiotemporal features of different samples are combined, to generate the representations of the virtual videos and then classified through an extreme learning machine classifier on MuHAVi-Uncut, iXMAS, and IAVID-1 datasets.

Effect of Online Education on Training Effectiveness: Conceptual Framework and Empirical Validation (온라인 교육이 훈련교과성에 미치는 영향에 관한 실증적 연구)

  • Kim, Jeong-Wook;Nam, Ki-Chan
    • The Journal of Society for e-Business Studies
    • /
    • v.12 no.4
    • /
    • pp.185-209
    • /
    • 2007
  • The development of information technologies has contributed on-line training as one of important education methods. On-line training in firms, which is similar to e-learning or virtual education, provides trainees with more education opportunities in diverse ways. It has developed a range of innovative services with a one-stop solution of education within the electronic sector. Also under the on-line training environment, trainees can undertake customized training packages at anytime and any places. Moreover, information technology allows both the trainers and other trainees to be decoupled in any of the elements of tune, place, and space. Two research questions are investigated : what are the determinants affecting the on-line training effectiveness and how those variables affect the two aspects of training effectiveness: learning performance and transfer performance. Based on the previous literature conducted on the traditional training environment, the determinants of training effectiveness are derived. Eight hypotheses are developed based on literature reviews and tested by questionnaires survey data. The collected data have been analyzed by LISREL. It is found that the relationship between individual, organizational and on-line site design variables and training effectiveness (learning and transfer) are significant. The contribution and limitations of this research are also discussed with future studies.

  • PDF

Individual factors influencing the location decisions of practicing physicians (최근 배출된 전문의의 개원지역 선택에 영향을 미치는 개인요인 분석)

  • 김창엽;윤석준;이진석;김용익
    • Health Policy and Management
    • /
    • v.9 no.3
    • /
    • pp.21-32
    • /
    • 1999
  • The purpose of this study is to assess individual decisive factors for distribution of medical specialists in Korea. A data set was constructed using several published data sources. including the Korean Medical Association's physician master file as a principal source for physician information. Linear logistic regression analysis was performed to assess the relationship between the location of private specialist clinic for practice with six variables related with individual characteristics: age. sex. location of postgraduate training hospital. location of medical school graduated, size of hospital for training, and specialty. Analysis showed that location of practice. classified into urban and rural areas, was significantly associated with the variables of sex. location of postgraduate training hospital. location of medical school. In addition, significant association was found between the location of practice which was categorized into "near-Seoul area" and others, and sex, location of postgraduate training hospital. and location of medical school. We could conclude that to improve area maldistribution of physicians locations of hospitals for training and medical schools have to have the highest priority in the policymaking.icymaking.

  • PDF