• 제목/요약/키워드: Training and Validation Data

검색결과 305건 처리시간 0.03초

크라우드소싱 드론 영상의 기하학적 품질 자동 검증 (Automatic Validation of the Geometric Quality of Crowdsourcing Drone Imagery)

  • 이동호;최경아
    • 대한원격탐사학회지
    • /
    • 제39권5_1호
    • /
    • pp.577-587
    • /
    • 2023
  • 크라우드소싱(crowdsourcing) 공간 데이터 활용 연구가 활발히 진행되고 있으나 데이터 품질의 불확실성으로 인한 문제점이 제기되고 있다. 특히 드론 영상 데이터셋에 품질이 낮은 데이터가 포함될 경우, 출력되는 공간 정보의 품질이 저하될 수 있다. 이를 위해 본 연구에서는 크라우드소싱된 영상의 기하학적 품질을 자동으로 검증하는 방법론을 제안하였다. 주요 품질 요소로는 영상의 공간해상도, 해상도 변화량, 매칭점 재투영 오차, 번들 조정 결과 등을 입력변수로 활용하였다. 공간 정보 생성에 적합한 영상을 분류하기 위해 학습 및 검증 데이터를 구축하고, radial basis function (RBF) 기반의 support vector machine (SVM) 모델로 학습을 진행하였다. 학습된 SVM 모델의 분류 정확도는 99.1%를 기록하였다. 품질 검증 모델 효과를 확인하기 위해 학습 및 검증에 사용하지 않은 드론 영상에 대하여 해당 모델을 적용하기 전후의 영상 데이터셋으로 각각 정사영상을 생성하고 비교하였다. 그 결과 모델 적용을 통하여 정사영상에 포함될 수 있는 다양한 왜곡을 줄이고 객체 식별력을 증대시키는 것을 확인하였다. 제안된 품질 검증 방법론은 다양한 품질의 크라우드소싱 데이터를 입력으로 받아 양질의 정보만을 자동 선별하게 함으로써 공간정보 생성에서의 활용 가능성을 증대시킬 것으로 기대한다.

시계열 교차검증을 적용한 2,3-BDO 분리공정 온도예측 모델의 초매개변수 최적화 (Application of Time-series Cross Validation in Hyperparameter Tuning of a Predictive Model for 2,3-BDO Distillation Process)

  • 안나현;최영렬;조형태;김정환
    • Korean Chemical Engineering Research
    • /
    • 제59권4호
    • /
    • pp.532-541
    • /
    • 2021
  • 최근 인공지능에 대한 관심이 높아짐에 따라 화학공정분야에서도 인공지능을 활용한 연구가 많아지고 있다. 그러나 인공지능 기반 모델이 충분히 일반화되지 않아 학습에 이용되지 않은 새로운 데이터에 대한 예측률이 떨어지는 과적합 현상이 빈번하게 일어나고 있으며, 교차검증은 과적합을 해결하는 방법 중 하나이다. 본 연구에서는 2,3-BDO 분리 공정 온도 예측 모델의 초매개변수 중에서 배치 개수와 반복횟수를 조정하기 위해 시계열 교차검증을 적용하고 일반적으로 사용되는 K 겹 교차검증과 비교하였다. 결과적으로 K 겹 교차검증을 사용했을 때 보다 시계열 교차검증 방식을 사용했을 때 MAPE는 0.61% 증가한 반면 RMSE는 9.06% 감소하였고 학습 시간은 198.29초 적게 소요되었다.

Development of kNN QSAR Models for 3-Arylisoquinoline Antitumor Agents

  • Tropsha, Alexander;Golbraikh, Alexander;Cho, Won-Jea
    • Bulletin of the Korean Chemical Society
    • /
    • 제32권7호
    • /
    • pp.2397-2404
    • /
    • 2011
  • Variable selection k nearest neighbor QSAR modeling approach was applied to a data set of 80 3-arylisoquinolines exhibiting cytotoxicity against human lung tumor cell line (A-549). All compounds were characterized with molecular topology descriptors calculated with the MolconnZ program. Seven compounds were randomly selected from the original dataset and used as an external validation set. The remaining subset of 73 compounds was divided into multiple training (56 to 61 compounds) and test (17 to 12 compounds) sets using a chemical diversity sampling method developed in this group. Highly predictive models characterized by the leave-one out cross-validated $R^2$ ($q^2$) values greater than 0.8 for the training sets and $R^2$ values greater than 0.7 for the test sets have been obtained. The robustness of models was confirmed by the Y-randomization test: all models built using training sets with randomly shuffled activities were characterized by low $q^2{\leq}0.26$ and $R^2{\leq}0.22$ for training and test sets, respectively. Twelve best models (with the highest values of both $q^2$ and $R^2$) predicted the activities of the external validation set of seven compounds with $R^2$ ranging from 0.71 to 0.93.

근전도 기반의 Spider Chart와 딥러닝을 활용한 일상생활 잡기 손동작 분류 (Classification of Gripping Movement in Daily Life Using EMG-based Spider Chart and Deep Learning)

  • 이성문;피승훈;한승호;조용운;오도창
    • 대한의용생체공학회:의공학회지
    • /
    • 제43권5호
    • /
    • pp.299-307
    • /
    • 2022
  • In this paper, we propose a pre-processing method that converts to Spider Chart image data for classification of gripping movement using EMG (electromyography) sensors and Convolution Neural Networks (CNN) deep learning. First, raw data for six hand gestures are extracted from five test subjects using an 8-channel armband and converted into Spider Chart data of octagonal shapes, which are divided into several sliding windows and are learned. In classifying six hand gestures, the classification performance is compared with the proposed pre-processing method and the existing methods. Deep learning was performed on the dataset by dividing 70% of the total into training, 15% as testing, and 15% as validation. For system performance evaluation, five cross-validations were applied by dividing 80% of the entire dataset by training and 20% by testing. The proposed method generates 97% and 94.54% in cross-validation and general tests, respectively, using the Spider Chart preprocessing, which was better results than the conventional methods.

CT-Based Radiomics Signature for Preoperative Prediction of Coagulative Necrosis in Clear Cell Renal Cell Carcinoma

  • Kai Xu;Lin Liu;Wenhui Li;Xiaoqing Sun;Tongxu Shen;Feng Pan;Yuqing Jiang;Yan Guo;Lei Ding;Mengchao Zhang
    • Korean Journal of Radiology
    • /
    • 제21권6호
    • /
    • pp.670-683
    • /
    • 2020
  • Objective: The presence of coagulative necrosis (CN) in clear cell renal cell carcinoma (ccRCC) indicates a poor prognosis, while the absence of CN indicates a good prognosis. The purpose of this study was to build and validate a radiomics signature based on preoperative CT imaging data to estimate CN status in ccRCC. Materials and Methods: Altogether, 105 patients with pathologically confirmed ccRCC were retrospectively enrolled in this study and then divided into training (n = 72) and validation (n = 33) sets. Thereafter, 385 radiomics features were extracted from the three-dimensional volumes of interest of each tumor, and 10 traditional features were assessed by two experienced radiologists using triple-phase CT-enhanced images. A multivariate logistic regression algorithm was used to build the radiomics score and traditional predictors in the training set, and their performance was assessed and then tested in the validation set. The radiomics signature to distinguish CN status was then developed by incorporating the radiomics score and the selected traditional predictors. The receiver operating characteristic (ROC) curve was plotted to evaluate the predictive performance. Results: The area under the ROC curve (AUC) of the radiomics score, which consisted of 7 radiomics features, was 0.855 in the training set and 0.885 in the validation set. The AUC of the traditional predictor, which consisted of 2 traditional features, was 0.843 in the training set and 0.858 in the validation set. The radiomics signature showed the best performance with an AUC of 0.942 in the training set, which was then confirmed with an AUC of 0.969 in the validation set. Conclusion: The CT-based radiomics signature that incorporated radiomics and traditional features has the potential to be used as a non-invasive tool for preoperative prediction of CN in ccRCC.

XGBoost와 교차검증을 이용한 품사부착말뭉치에서의 오류 탐지 (Detecting Errors in POS-Tagged Corpus on XGBoost and Cross Validation)

  • 최민석;김창현;박호민;천민아;윤호;남궁영;김재균;김재훈
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제9권7호
    • /
    • pp.221-228
    • /
    • 2020
  • 품사부착말뭉치는 품사정보를 부착한 말뭉치를 말하며 자연언어처리 분야에서 다양한 학습말뭉치로 사용된다. 학습말뭉치는 일반적으로 오류가 없다고 가정하지만, 실상은 다양한 오류를 포함하고 있으며, 이러한 오류들은 학습된 시스템의 성능을 저하시키는 요인이 된다. 이러한 문제를 다소 완화시키기 위해서 본 논문에서는 XGBoost와 교차 검증을 이용하여 이미 구축된 품사부착말뭉치로부터 오류를 탐지하는 방법을 제안한다. 제안된 방법은 먼저 오류가 포함된 품사부착말뭉치와 XGBoost를 사용해서 품사부착기를 학습하고, 교차검증을 이용해서 품사오류를 검출한다. 그러나 오류가 부착된 학습말뭉치가 존재하지 않으므로 일반적인 분류기로서 오류를 검출할 수 없다. 따라서 본 논문에서는 매개변수를 조절하면서 학습된 품사부착기의 출력을 비교함으로써 오류를 검출한다. 매개변수를 조절하기 위해서 본 논문에서는 작은 규모의 오류부착말뭉치를 이용한다. 이 말뭉치는 오류 검출 대상의 전체 말뭉치로부터 임의로 추출된 것을 전문가에 의해서 오류가 부착된 것이다. 본 논문에서는 성능 평가의 척도로 정보검색에서 널리 사용되는 정밀도와 재현율을 사용하였다. 또한 모집단의 모든 오류 후보를 수작업으로 확인할 수 없으므로 표본 집단과 모집단의 오류 분포를 비교하여 본 논문의 타당성을 보였다. 앞으로 의존구조부착 말뭉치와 의미역 부착말뭉치에서 적용할 계획이다.

Development of Land fog Detection Algorithm based on the Optical and Textural Properties of Fog using COMS Data

  • Suh, Myoung-Seok;Lee, Seung-Ju;Kim, So-Hyeong;Han, Ji-Hye;Seo, Eun-Kyoung
    • 대한원격탐사학회지
    • /
    • 제33권4호
    • /
    • pp.359-375
    • /
    • 2017
  • We developed fog detection algorithm (KNU_FDA) based on the optical and textural properties of fog using satellite (COMS) and ground observation data. The optical properties are dual channel difference (DCD: BT3.7 - BT11) and albedo, and the textural properties are normalized local standard deviation of IR1 and visible channels. Temperature difference between air temperature and BT11 is applied to discriminate the fog from other clouds. Fog detection is performed according to the solar zenith angle of pixel because of the different availability of satellite data: day, night and dawn/dusk. Post-processing is also performed to increase the probability of detection (POD), in particular, at the edge of main fog area. The fog probability is calculated by the weighted sum of threshold tests. The initial threshold and weighting values are optimized using sensitivity tests for the varying threshold values using receiver operating characteristic analysis. The validation results with ground visibility data for the validation cases showed that the performance of KNU_FDA show relatively consistent detection skills but it clearly depends on the fog types and time of day. The average POD and FAR (False Alarm Ratio) for the training and validation cases are ranged from 0.76 to 0.90 and from 0.41 to 0.63, respectively. In general, the performance is relatively good for the fog without high cloud and strong fog but that is significantly decreased for the weak fog. In order to improve the detection skills and stability, optimization of threshold and weighting values are needed through the various training cases.

CMIP5 기반 하천유량 예측을 위한 딥러닝 LSTM 모형의 최적 학습기간 산정 (Estimation of Optimal Training Period for the Deep-Learning LSTM Model to Forecast CMIP5-based Streamflow)

  • 천범석;이태화;김상우;임경재;정영훈;도종원;신용철
    • 한국농공학회논문집
    • /
    • 제64권1호
    • /
    • pp.39-50
    • /
    • 2022
  • In this study, we suggested the optimal training period for predicting the streamflow using the LSTM (Long Short-Term Memory) model based on the deep learning and CMIP5 (The fifth phase of the Couple Model Intercomparison Project) future climate scenarios. To validate the model performance of LSTM, the Jinan-gun (Seongsan-ri) site was selected in this study. We comfirmed that the LSTM-based streamflow was highly comparable to the measurements during the calibration (2000 to 2002/2014 to 2015) and validation (2003 to 2005/2016 to 2017) periods. Additionally, we compared the LSTM-based streamflow to the SWAT-based output during the calibration (2000~2015) and validation (2016~2019) periods. The results supported that the LSTM model also performed well in simulating streamflow during the long-term period, although small uncertainties exist. Then the SWAT-based daily streamflow was forecasted using the CMIP5 climate scenario forcing data in 2011~2100. We tested and determined the optimal training period for the LSTM model by comparing the LSTM-/SWAT-based streamflow with various scenarios. Note that the SWAT-based streamflow values were assumed as the observation because of no measurements in future (2011~2100). Our results showed that the LSTM-based streamflow was similar to the SWAT-based streamflow when the training data over the 30 years were used. These findings indicated that training periods more than 30 years were required to obtain LSTM-based reliable streamflow forecasts using climate change scenarios.

The Prediction Ability of Genomic Selection in the Wheat Core Collection

  • Yuna Kang;Changsoo Kim
    • 한국작물학회:학술대회논문집
    • /
    • 한국작물학회 2022년도 추계학술대회
    • /
    • pp.235-235
    • /
    • 2022
  • Genome selection is a promising tool for plant and animal breeding, which uses genome-wide molecular marker data to capture large and small effect quantitative trait loci and predict the genetic value of selection candidates. Genomic selection has been shown previously to have higher prediction accuracies than conventional marker-assisted selection (MAS) for quantitative traits. In this study, the prediction accuracy of 10 agricultural traits in the wheat core group with 567 points was compared. We used a cross-validation approach to train and validate prediction accuracy to evaluate the effects of training population size and training model.As for the prediction accuracy according to the model, the prediction accuracy of 0.4 or more was evaluated except for the SVN model among the 6 models (GBLUP, LASSO, BayseA, RKHS, SVN, RF) used in most all traits. For traits such as days to heading and days to maturity, the prediction accuracy was very high, over 0.8. As for the prediction accuracy according to the training group, the prediction accuracy increased as the number of training groups increased in all traits. It was confirmed that the prediction accuracy was different in the training population according to the genetic composition regardless of the number. All training models were verified through 5-fold cross-validation. To verify the prediction ability of the training population of the wheat core collection, we compared the actual phenotype and genomic estimated breeding value using 35 breeding population. In fact, out of 10 individuals with the fastest days to heading, 5 individuals were selected through genomic selection, and 6 individuals were selected through genomic selection out of the 10 individuals with the slowest days to heading. Therefore, we confirmed the possibility of selecting individuals according to traits with only the genotype for a shorter period of time through genomic selection.

  • PDF

Machine Learning Approach to Estimation of Stellar Atmospheric Parameters

  • Han, Jong Heon;Lee, Young Sun;Kim, Young kwang
    • 천문학회보
    • /
    • 제41권2호
    • /
    • pp.54.2-54.2
    • /
    • 2016
  • We present a machine learning approach to estimating stellar atmospheric parameters, effective temperature (Teff), surface gravity (log g), and metallicity ([Fe/H]) for stars observed during the course of the Sloan Digital Sky Survey (SDSS). For training a neural network, we randomly sampled the SDSS data with stellar parameters available from SEGUE Stellar Parameter Pipeline (SSPP) to cover the parameter space as wide as possible. We selected stars that are not included in the training sample as validation sample to determine the accuracy and precision of each parameter. We also divided the training and validation samples into four groups that cover signal-to-noise ratio (S/N) of 10-20, 20-30, 30-50, and over 50 to assess the effect of S/N on the parameter estimation. We find from the comparison of the network-driven parameters with the SSPP ones the range of the uncertainties of 73~123 K in Teff, 0.18~0.42 dex in log g, and 0.12~0.25 dex in [Fe/H], respectively, depending on the S/N range adopted. We conclude that these precisions are high enough to study the chemical and kinematic properties of the Galactic disk and halo stars, and we will attempt to apply this technique to Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST), which plans to obtain about 8 million stellar spectra, in order to estimate stellar parameters.

  • PDF