• 제목/요약/키워드: training sampling

검색결과 372건 처리시간 0.024초

Supervised Classification Using Training Parameters and Prior Probability Generated from VITD - The Case of QuickBird Multispectral Imagery

  • Eo, Yang-Dam;Lee, Gyeong-Wook;Park, Doo-Youl;Park, Wang-Yong;Lee, Chang-No
    • 대한원격탐사학회지
    • /
    • 제24권5호
    • /
    • pp.517-524
    • /
    • 2008
  • In order to classify an satellite imagery into geospatial features of interest, the supervised classification needs to be trained to distinguish these features through training sampling. However, even though an imagery is classified, different results of classification could be generated according to operator's experience and expertise in training process. Users who practically exploit an classification result to their applications need the research accomplishment for the consistent result as well as the accuracy improvement. The experiment includes the classification results for training process used VITD polygons as a prior probability and training parameter, instead of manual sampling. As results, classification accuracy using VITD polygons as prior probabilities shows the highest results in several methods. The training using unsupervised classification with VITD have produced similar classification results as manual training and/or with prior probability.

Random Forest 기법을 이용한 산사태 취약성 평가 시 훈련 데이터 선택이 결과 정확도에 미치는 영향 (Study on the Effect of Training Data Sampling Strategy on the Accuracy of the Landslide Susceptibility Analysis Using Random Forest Method)

  • 강경희;박혁진
    • 자원환경지질
    • /
    • 제52권2호
    • /
    • pp.199-212
    • /
    • 2019
  • 머신러닝 기법을 활용한 분석에서 훈련 데이터의 샘플링 전략은 예측 정확도 뿐 만 아니라 일반화 능력에도 많은 영향을 미친다. 특히, 산사태 취약성 분석의 경우, 산사태 발생부에 대한 정보에 비해 산사태 미발생부에 대한 정보가 과도하게 많은 데이터 불균형 현상이 발생하며, 이에 따라 분석 모델의 훈련 데이터 설계 시 데이터 샘플링 과정이 필수적이다. 그러나 기존의 연구들은 대부분 산사태 미발생부 선택 시 발생부 데이터와 1:1의 비율을 갖도록 무작위로 선택하는 방법을 적용하였을 뿐, 특정한 선택 기준에 따라 분석을 수행하지 않았다. 따라서 본 연구에서는 훈련 데이터의 샘플링 전략이 모델의 예측 성능에 미치는 결과를 확인하기 위하여 산사태 발생부와 미발생부의 샘플링 전략기준에 따라 서로 다른 6개의 시나리오를 만들어 Random Forest 모델의 훈련에 사용하였다. 또한 Random Forest의 결과 중 하나인 변수 중요도를 각 산사태 유발인자들에 가중치로 곱하여 줌으로써 산사태 취약지수 값을 산정하였으며, 취약지수 값을 이용해 산사태 취약성도를 제작하고 각 결과 지도의 정확도를 비교 분석하였다. 분석 결과, 훈련데이터의 샘플링 방법에 상관없이 두 지역의 산사태 취약성 분석 결과는 모두 70~80%의 정확도를 보였다. 이를 통해 Random Forest 기법의 산사태 취약성 분석기법으로서의 적용 가능성을 확인하였으며, Random Forest 모델이 제공하는 입력변수의 중요도를 산사태 유발인자 가중치로 활용할 수 있음을 확인하였다. 또한 훈련 시나리오 간의 정확도를 비교한 결과, 특정한 기준에 의해 훈련 데이터를 설계하는 것이 기존의 랜덤 선택 방법보다 높은 예측 정확도를 기대할 수 있음을 확인하였다.

계급불균형자료의 분류: 훈련표본 구성방법에 따른 효과 (Classification of Class-Imbalanced Data: Effect of Over-sampling and Under-sampling of Training Data)

  • 김지현;정종빈
    • 응용통계연구
    • /
    • 제17권3호
    • /
    • pp.445-457
    • /
    • 2004
  • 두 계급의 분류문제에서 두 계급의 관측 개체수가 심하게 불균형을 이룬 자료를 분석할 때, 흔히 인위적으로 두 계급의 크기를 비슷하게 해준 다음 분석한다. 본 연구에서는 이런 훈련표본 구성방법의 타당성에 대해 알아보았다. 또한 훈련표본의 구성방법이 부스팅에 미치는 효과에 대해서도 알아보았다. 12개의 실제 자료에 대한 실험 결과 나무모형으로 부스팅 기법을 적용할 때는 훈련표본을 그대로 둔 채 분석하는 것이 좋다는 결론을 얻었다.

기관패널 표집설계를 통한 훈련 교·강사 실태조사 방안 연구 (A Study on the Survey of Vocational Training Teachers and Instructors through Institutional Panel Sampling Design)

  • 정혜경;정일찬;이진구
    • 실천공학교육논문지
    • /
    • 제13권2호
    • /
    • pp.393-403
    • /
    • 2021
  • 본 연구의 목적은 훈련 교·강사를 모집단으로 데이터 기반 의사결정을 위한 토대를 마련하고자 직업훈련기관 수준에서의 패널조사 표집설계 방안을 제시하여 지속적이고 체계적인 훈련 교·강사 실태조사의 기초를 제공하는데 있다. 이에 본 연구에서는 체계적인 조사 설계를 위한 요소인 목표 모집단과 표본추출틀을 제안하였으며, 전문가 자문과 실증 자료 분석을 토대로 데이터의 대표성, 자료 수집의 효율성 및 지속가능성 등을 종합적으로 고려하여 표본추출단위, 외층변인과 내층변인을 고려한 표본추출방법 등을 제시하였다. 연구 결과 패널의 단위를 직업훈련기관으로 하여 패널로 선정된 기관과 그 기관에 소속된 훈련 교·강사가 설문조사에 참여할 수 있도록 2단계 층화 비례 표집 방안을 마련하였으며, 이를 바탕으로 패널조사 표본 설계 방안에 대한 시사점을 제시하였다.

Optimal SVM learning method based on adaptive sparse sampling and granularity shift factor

  • Wen, Hui;Jia, Dongshun;Liu, Zhiqiang;Xu, Hang;Hao, Guangtao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권4호
    • /
    • pp.1110-1127
    • /
    • 2022
  • To improve the training efficiency and generalization performance of a support vector machine (SVM) in a large-scale set, an optimal SVM learning method based on adaptive sparse sampling and the granularity shift factor is presented. The proposed method combines sampling optimization with learner optimization. First, an adaptive sparse sampling method based on the potential function density clustering is designed to adaptively obtain sparse sampling samples, which can achieve a reduction in the training sample set and effectively approximate the spatial structure distribution of the original sample set. A granularity shift factor method is then constructed to optimize the SVM decision hyperplane, which fully considers the neighborhood information of each granularity region in the sparse sampling set. Experiments on an artificial dataset and three benchmark datasets show that the proposed method can achieve a relatively higher training efficiency, as well as ensure a good generalization performance of the learner. Finally, the effectiveness of the proposed method is verified.

The Effectiveness of the Training Program at HCL

  • Kumari, Neeraj
    • Asian Journal of Business Environment
    • /
    • 제5권3호
    • /
    • pp.23-28
    • /
    • 2015
  • Purpose - The aim of this study is to evaluate the effectiveness of a corporate training program. The case study of HCL Technologies was used to investigate how training programs improve the performance of employees on the job, as well as to identify unnecessary aspects of the training for the purpose of eliminating these from future training programs. Research design, data, and methodology - An exploratory research design was used to conduct the study. The research sample size included 50 HCL employees. The sampling technique for the data collection was convenience sampling. Results - Training is a crucial process in an organization and thus needs to be well designed. Specifically, the training programs should provide adequate knowledge to all employees, ensure correct methods are used for the selection of trainees, and avoid any perception of biasness. Conclusions - Employees were not fully satisfied by the separation of the training program into two parts, on the job and off the job training, but if sufficient data is provided to employees in advance, this could help them during the training process.

Naive Bayes 문서 분류기를 위한 점진적 학습 모델 연구 (A Study on Incremental Learning Model for Naive Bayes Text Classifier)

  • 김제욱;김한준;이상구
    • 정보기술과데이타베이스저널
    • /
    • 제8권1호
    • /
    • pp.95-104
    • /
    • 2001
  • In the text classification domain, labeling the training documents is an expensive process because it requires human expertise and is a tedious, time-consuming task. Therefore, it is important to reduce the manual labeling of training documents while improving the text classifier. Selective sampling, a form of active learning, reduces the number of training documents that needs to be labeled by examining the unlabeled documents and selecting the most informative ones for manual labeling. We apply this methodology to Naive Bayes, a text classifier renowned as a successful method in text classification. One of the most important issues in selective sampling is to determine the criterion when selecting the training documents from the large pool of unlabeled documents. In this paper, we propose two measures that would determine this criterion : the Mean Absolute Deviation (MAD) and the entropy measure. The experimental results, using Renters 21578 corpus, show that this proposed learning method improves Naive Bayes text classifier more than the existing ones.

  • PDF

Effectiveness of E-Training, E-Leadership, and Work Life Balance on Employee Performance during COVID-19

  • WOLOR, Christian Wiradendi;SOLIKHAH, Solikhah;FIDHYALLAH, Nadya Fadillah;LESTARI, Deniar Puji
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제7권10호
    • /
    • pp.443-450
    • /
    • 2020
  • This study aims to add insight into the effectiveness of e-training, e-leadership, work-life balance, and work motivation on millennial generation employees' performance in today's work life amid the outbreak of the COVID-19 pandemic that requires to work more online. Unlike previous generations, millennials are technology-literate, intent on succeeding quickly, give up easily, and seek instantaneous gratification. The population in this study are millennial generation employees at one of Honda motorcycle dealers in Jakarta, Indonesia. The number of samples collected was 200. The sampling technique used is the side probability method, with proportional random sampling technique. The research method used is an associative quantitative approach through survey methods and Structural Equation Modeling. Data were collected through questionnaires distributed to millennial generation employees, with results then processed through the Lisrel 8.5 program. The results of this study show, first, that e-training, e-leadership, and work-life balance have positive effect on work motivation. Second, e-training, e-leadership, work-life balance, and work motivation have positive effect on employees' performance. The findings indicate that companies must pay attention to the factors of e-training, e-leadership, and work-life balance to keep employees motivated and to maintain optimal employee performance, especially during the COVID-19 pandemic through working online.

점진적 샘플링과 정규 상호정보량을 이용한 온라인 기계학습 공조기 급기온도 예측 모델 개발 (Development of Online Machine Learning Model for AHU Supply Air Temperature Prediction using Progressive Sampling and Normalized Mutual Information)

  • 추한경;신한솔;안기언;라선중;박철수
    • 대한건축학회논문집:구조계
    • /
    • 제34권6호
    • /
    • pp.63-69
    • /
    • 2018
  • The machine learning model can capture the dynamics of building systems with less inputs than the first principle based simulation model. The training data for developing a machine learning model are usually selected in a heuristic manner. In this study, the authors developed a machine learning model which can describe supply air temperature from an AHU in a real office building. For rational reduction of the training data, the progressive sampling method was used. It is found that even though the progressive sampling requires far less training data (n=60) than the offline regular sampling (n=1,799), the MBEs of both models are similar (2.6% vs. 5.4%). In addition, for the update of the machine learning model, the normalized mutual information (NMI) was applied. If the NMI between the simulation output and the measured data is less than 0.2, the model has to be updated. By the use of the NMI, the model can perform better prediction ($5.4%{\rightarrow}1.3%$).

Feedwater Flowrate Estimation Based on the Two-step De-noising Using the Wavelet Analysis and an Autoassociative Neural Network

  • Gyunyoung Heo;Park, Seong-Soo;Chang, Soon-Heung
    • Nuclear Engineering and Technology
    • /
    • 제31권2호
    • /
    • pp.192-201
    • /
    • 1999
  • This paper proposes an improved signal processing strategy for accurate feedwater flowrate estimation in nuclear power plants. It is generally known that ∼2% thermal power errors occur due to fouling Phenomena in feedwater flowmeters. In the strategy Proposed, the noises included in feedwater flowrate signal are classified into rapidly varying noises and gradually varying noises according to the characteristics in a frequency domain. The estimation precision is enhanced by introducing a low pass filter with the wavelet analysis against rapidly varying noises, and an autoassociative neural network which takes charge of the correction of only gradually varying noises. The modified multivariate stratification sampling using the concept of time stratification and MAXIMIN criteria is developed to overcome the shortcoming of a general random sampling. In addition the multi-stage robust training method is developed to increase the quality and reliability of training signals. Some validations using the simulated data from a micro-simulator were carried out. In the validation tests, the proposed methodology removed both rapidly varying noises and gradually varying noises respectively in each de-noising step, and 5.54% root mean square errors of initial noisy signals were decreased to 0.674% after de-noising. These results indicate that it is possible to estimate the reactor thermal power more elaborately by adopting this strategy.

  • PDF