• Title/Summary/Keyword: Validation data set

Search Result 381, Processing Time 0.028 seconds

Prediction of Tumor Progression During Neoadjuvant Chemotherapy and Survival Outcome in Patients With Triple-Negative Breast Cancer

  • Heera Yoen;Soo-Yeon Kim;Dae-Won Lee;Han-Byoel Lee;Nariya Cho
    • Korean Journal of Radiology
    • /
    • v.24 no.7
    • /
    • pp.626-639
    • /
    • 2023
  • Objective: To investigate the association of clinical, pathologic, and magnetic resonance imaging (MRI) variables with progressive disease (PD) during neoadjuvant chemotherapy (NAC) and distant metastasis-free survival (DMFS) in patients with triple-negative breast cancer (TNBC). Materials and Methods: This single-center retrospective study included 252 women with TNBC who underwent NAC between 2010 and 2019. Clinical, pathologic, and treatment data were collected. Two radiologists analyzed the pre-NAC MRI. After random allocation to the development and validation sets in a 2:1 ratio, we developed models to predict PD and DMFS using logistic regression and Cox proportional hazard regression, respectively, and validated them. Results: Among the 252 patients (age, 48.3 ± 10.7 years; 168 in the development set; 84 in the validation set), PD was occurred in 17 patients and 9 patients in the development and validation sets, respectively. In the clinical-pathologic-MRI model, the metaplastic histology (odds ratio [OR], 8.0; P = 0.032), Ki-67 index (OR, 1.02; P = 0.044), and subcutaneous edema (OR, 30.6; P = 0.004) were independently associated with PD in the development set. The clinical-pathologic-MRI model showed a higher area under the receiver-operating characteristic curve (AUC) than the clinical-pathologic model (AUC: 0.69 vs. 0.54; P = 0.017) for predicting PD in the validation set. Distant metastases occurred in 49 patients and 18 patients in the development and validation sets, respectively. Residual disease in both the breast and lymph nodes (hazard ratio [HR], 6.0; P = 0.005) and the presence of lymphovascular invasion (HR, 3.3; P < 0.001) were independently associated with DMFS. The model consisting of these pathologic variables showed a Harrell's C-index of 0.86 in the validation set. Conclusion: The clinical-pathologic-MRI model, which considered subcutaneous edema observed using MRI, performed better than the clinical-pathologic model for predicting PD. However, MRI did not independently contribute to the prediction of DMFS.

Validation of RELAP5 MOD3.3 code for Hybrid-SIT against SET and IET experimental data

  • Yoon, Ho Joon;Al Naqbi, Waleed;Al-Yahia, Omar S.;Jo, Daeseong
    • Nuclear Engineering and Technology
    • /
    • v.52 no.9
    • /
    • pp.1926-1938
    • /
    • 2020
  • We validated the performance of RELAP MOD3.3 code regarding the hybrid SIT with available experimental data. The concept of the hybrid SIT is to connect the pressurizer to SIT to utilize the water inside SIT in the case of SBO or SB-LOCA combined with TLOFW. We investigated how well RELAP5 code predicts the physical phenomena in terms of the equilibrium time, stratification, condensation against Separate Effect Test (SET) data. We also conducted the validation of RELAP5 code against Integrated Effect Test (IET) experimental data produced by the ATLAS facility. We followed conventional approach for code validation of IET data, which are pre-test and post-test calculation. RELAP5 code shows substantial difference with changing number of nodes. The increase of the number of nodes tends to reduce the condensation rate at the interface between liquid and vapor inside the hybrid SIT. The environmental heat loss also contributes to the large discrepancy between the simulation results of RELAP5 and the experimental data.

Developing a Molecular Prognostic Predictor of a Cancer based on a Small Sample

  • Kim Inyoung;Lee Sunho;Rha Sun Young;Kim Byungsoo
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.195-198
    • /
    • 2004
  • One Important problem in a cancer microarray study is to identify a set of genes from which a molecular prognostic indicator can be developed. In parallel with this problem is to validate the chosen set of genes. We develop in this note a K-fold cross validation procedure by combining a 'pre-validation' technique and a bootstrap resampling procedure in the Cox regression . The pre-validation technique predicts the microarray predictor of a case without having seen the true class level of the case. It was suggested by Tibshirani and Efron (2002) to avoid the possible over-fitting in the regression in which a microarray based predictor is employed. The bootstrap resampling procedure for the Cox regression was proposed by Sauerbrei and Schumacher (1992) as a means of overcoming the instability of a stepwise selection procedure. We apply this K-fold cross validation to the microarray data of 92 gastric cancers of which the experiment was conducted at Cancer Metastasis Research Center, Yonsei University. We also share some of our experience on the 'false positive' result due to the information leak.

  • PDF

Robust Cross Validation Score

  • Park, Dong-Ryeon
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.413-423
    • /
    • 2005
  • Consider the problem of estimating the underlying regression function from a set of noisy data which is contaminated by a long tailed error distribution. There exist several robust smoothing techniques and these are turned out to be very useful to reduce the influence of outlying observations. However, no matter what kind of robust smoother we use, we should choose the smoothing parameter and relatively less attention has been made for the robust bandwidth selection method. In this paper, we adopt the idea of robust location parameter estimation technique and propose the robust cross validation score functions.

Comparison of EKF and UKF on Training the Artificial Neural Network

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.2
    • /
    • pp.499-506
    • /
    • 2004
  • The Unscented Kalman Filter is known to outperform the Extended Kalman Filter for the nonlinear state estimation with a significance advantage that it does not require the computation of Jacobian but EKF has a competitive advantage to the UKF on the performance time. We compare both algorithms on training the artificial neural network. The validation data set is used to estimate parameters which are supposed to result in better fitting for the test data set. Experimental results are presented which indicate the performance of both algorithms.

  • PDF

Prediction of the compressive strength of fly ash geopolymer concrete using gene expression programming

  • Alkroosh, Iyad S.;Sarker, Prabir K.
    • Computers and Concrete
    • /
    • v.24 no.4
    • /
    • pp.295-302
    • /
    • 2019
  • Evolutionary algorithms based on conventional statistical methods such as regression and classification have been widely used in data mining applications. This work involves application of gene expression programming (GEP) for predicting compressive strength of fly ash geopolymer concrete, which is gaining increasing interest as an environmentally friendly alternative of Portland cement concrete. Based on 56 test results from the existing literature, a model was obtained relating the compressive strength of fly ash geopolymer concrete with the significantly influencing mix design parameters. The predictions of the model in training and validation were evaluated. The coefficient of determination ($R^2$), mean (${\mu}$) and standard deviation (${\sigma}$) were 0.89, 1.0 and 0.12 respectively, for the training set, and 0.89, 0.99 and 0.13 respectively, for the validation set. The error of prediction by the model was also evaluated and found to be very low. This indicates that the predictions of GEP model are in close agreement with the experimental results suggesting this as a promising method for compressive strength prediction of fly ash geopolymer concrete.

Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap

  • Kim Ji-Hyun;Cha Eun-Song
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.1
    • /
    • pp.151-165
    • /
    • 2006
  • It is important to estimate the true misclassification rate of a given classifier when an independent set of test data is not available. Cross-validation and bootstrap are two possible approaches in this case. In related literature bootstrap estimators of the true misclassification rate were asserted to have better performance for small samples than cross-validation estimators. We compare the two estimators empirically when the classification rule is so adaptive to training data that its apparent misclassification rate is close to zero. We confirm that bootstrap estimators have better performance for small samples because of small variance, and we have found a new fact that their bias tends to be significant even for moderate to large samples, in which case cross-validation estimators have better performance with less computation.

Recovery the Missing Streamflow Data on River Basin Based on the Deep Neural Network Model

  • Le, Xuan-Hien;Lee, Giha
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.156-156
    • /
    • 2019
  • In this study, a gated recurrent unit (GRU) network is constructed based on a deep neural network (DNN) with the aim of restoring the missing daily flow data in river basins. Lai Chau hydrological station is located upstream of the Da river basin (Vietnam) is selected as the target station for this study. Input data of the model are data on observed daily flow for 24 years from 1961 to 1984 (before Hoa Binh dam was built) at 5 hydrological stations, in which 4 gauge stations in the basin downstream and restoring - target station (Lai Chau). The total available data is divided into sections for different purposes. The data set of 23 years (1961-1983) was employed for training and validation purposes, with corresponding rates of 80% for training and 20% for validation respectively. Another data set of one year (1984) was used for the testing purpose to objectively verify the performance and accuracy of the model. Though only a modest amount of input data is required and furthermore the Lai Chau hydrological station is located upstream of the Da River, the calculated results based on the suggested model are in satisfactory agreement with observed data, the Nash - Sutcliffe efficiency (NSE) is higher than 95%. The finding of this study illustrated the outstanding performance of the GRU network model in recovering the missing flow data at Lai Chau station. As a result, DNN models, as well as GRU network models, have great potential for application within the field of hydrology and hydraulics.

  • PDF

Face Detection Based on Incremental Learning from Very Large Size Training Data (대용량 훈련 데이타의 점진적 학습에 기반한 얼굴 검출 방법)

  • 박지영;이준호
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.7
    • /
    • pp.949-958
    • /
    • 2004
  • race detection using a boosting based algorithm requires a very large size of face and nonface data. In addition, the fact that there always occurs a need for adding additional training data for better detection rates demands an efficient incremental teaming algorithm. In the design of incremental teaming based classifiers, the final classifier should represent the characteristics of the entire training dataset. Conventional methods have a critical problem in combining intermediate classifiers that weight updates depend solely on the performance of individual dataset. In this paper, for the purpose of application to face detection, we present a new method to combine an intermediate classifier with previously acquired ones in an optimal manner. Our algorithm creates a validation set by incrementally adding sampled instances from each dataset to represent the entire training data. The weight of each classifier is determined based on its performance on the validation set. This approach guarantees that the resulting final classifier is teamed by the entire training dataset. Experimental results show that the classifier trained by the proposed algorithm performs better than by AdaBoost which operates in batch mode, as well as by ${Learn}^{++}$.

Assessment of the Near Real-Time Validation for the AQUA Satellite Level-2 Observation Products

  • Yang Min-Sil;Lee Jeongsoon;Lee Chol;Park Jong-Seo;Kim Hee-Ah
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.35-38
    • /
    • 2004
  • We developed a Near Real-Time Validation System (NRVS) for the Level-2 Products of AQUA Satellite. AQUA satellite is the second largest project of Earth Observing System (EOS) mission of NASA. This satellite provides the information of water cycle of the entire earth with many different forms. Among its products, we have used five kinds of level-2 geophysical parameters containing rain rate, sea surface wind speed, skin surface temperature, atmospheric temperature profile, and atmospheric humidity profile. To use these products in a scientific purpose, reasonable quantification is indispensable. In this paper we explain the near real-time validation system process and its detail algorithm. Its simulation results are also analyzed in a quantitative way. As reference data set in-situ measured meteorological data which are periodically gathered and provided by the Korea Meteorological Administration (KMA) is processed. Not only site-specific analysis but also time-series analysis of the validation results are explained and detail algorithms are described.

  • PDF