• Title/Summary/Keyword: one-leave-out cross-validation

Search Result 44, Processing Time 0.036 seconds

Prediction of Chronic Hepatitis Susceptibility using Single Nucleotide Polymorphism Data and Support Vector Machine (Single Nucleotide Polymorphism(SNP) 데이타와 Support Vector Machine(SVM)을 이용한 만성 간염 감수성 예측)

  • Kim, Dong-Hoi;Uhmn, Saang-Yong;Hahm, Ki-Baik;Kim, Jin
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.7
    • /
    • pp.276-281
    • /
    • 2007
  • In this paper, we use Support Vector Machine to predict the susceptibility of chronic hepatitis from single nucleotide polymorphism data. Our data set consists of SNP data for 328 patients based on 28 SNPs and patients classes(chronic hepatitis, healthy). We use leave-one-out cross validation method for estimation of the accuracy. The experimental results show that SVM with SNP is capable of classifying the SNP data successfully for chronic hepatitis susceptibility with accuracy value of 67.1%. The accuracy of all SNPs with health related feature(sex, age) is improved more than 7%(accuracy 74.9%). This result shows that the accuracy of predicting susceptibility can be improved with health related features. With more SNPs and other health related features, SVM prediction of SNP data is a potential tool for chronic hepatitis susceptibility.

A comparative study of conceptual model and machine learning model for rainfall-runoff simulation (강우-유출 모의를 위한 개념적 모형과 기계학습 모형의 성능 비교)

  • Lee, Seung Cheol;Kim, Daeha
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.9
    • /
    • pp.563-574
    • /
    • 2023
  • Recently, climate change has affected functional responses of river basins to meteorological variables, emphasizing the importance of rainfall-runoff simulation research. Simultaneously, the growing interest in machine learning has led to its increased application in hydrological studies. However, it is not yet clear whether machine learning models are more advantageous than the conventional conceptual models. In this study, we compared the performance of the conventional GR6J model with the machine learning-based Random Forest model across 38 basins in Korea using both gauged and ungauged basin prediction methods. For gauged basin predictions, each model was calibrated or trained using observed daily runoff data, and their performance was evaluted over a separate validation period. Subsequently, ungauged basin simulations were evaluated using proximity-based parameter regionalization with Leave-One-Out Cross-Validation (LOOCV). In gauged basins, the Random Forest consistently outperformed the GR6J, exhibiting superiority across basins regardless of whether they had strong or weak rainfall-runoff correlations. This suggest that the inherent data-driven training structures of machine learning models, in contrast to the conceptual models, offer distinct advantages in data-rich scenarios. However, the advantages of the machine-learning algorithm were not replicated in ungauged basin predictions, resulting in a lower performance than that of the GR6J. In conclusion, this study suggests that while the Random Forest model showed enhanced performance in trained locations, the existing GR6J model may be a better choice for prediction in ungagued basins.

Added Value of Contrast Leakage Information over the CBV Value of DSC Perfusion MRI to Differentiate between Pseudoprogression and True Progression after Concurrent Chemoradiotherapy in Glioblastoma Patients

  • Pak, Elena;Choi, Seung Hong;Park, Chul-Kee;Kim, Tae Min;Park, Sung-Hye;Won, Jae-Kyung;Lee, Joo Ho;Lee, Soon-Tae;Hwang, Inpyeong;Yoo, Roh-Eul;Kang, Koung Mi;Yun, Tae Jin
    • Investigative Magnetic Resonance Imaging
    • /
    • v.26 no.1
    • /
    • pp.10-19
    • /
    • 2022
  • Purpose: To evaluate whether the added value of contrast leakage information from dynamic susceptibility contrast magnetic resonance imaging (DSC MRI) is a better prognostic imaging biomarker than the cerebral blood volume (CBV) value in distinguishing true progression from pseudoprogression in glioblastoma patients. Materials and Methods: Forty-nine glioblastoma patients who had undergone MRI after concurrent chemoradiotherapy with temozolomide were enrolled in this retrospective study. Twenty features were extracted from the normalized relative CBV (nCBV) and extraction fraction (EF) map of the contrast-enhancing region in each patient. After univariable analysis, we used multivariable stepwise logistic regression analysis to identify significant predictors for differentiating between pseudoprogression and true progression. Receiver operating characteristic (ROC) analysis was employed to determine the best cutoff values for the nCBV and EF features. Finally, leave-one-out cross-validation was used to validate the best predictor in differentiating between true progression and pseudoprogression. Results: Multivariable stepwise logistic regression analysis showed that MGMT (O6-methylguanine-DNA methyltransferase) and EF max were independent differentiating variables (P = 0.004 and P = 0.02, respectively). ROC analysis yielded the best cutoff value of 95.75 for the EF max value for differentiating the two groups (sensitivity, 61%; specificity, 84.6%; AUC, 0.681 ± 0.08; 95% CI, 0.524-0.837; P = 0.03). In the leave-one-out cross-validation of the EF max value, the cross-validated values for predicting true progression and pseudoprogression accuracies were 69.4% and 71.4%, respectively. Conclusion: We demonstrated that contrast leakage information parameter from DSC MRI showed significance in differentiating true progression from pseudoprogression in glioblastoma patients.

Classification of Emotional States of Interest and Neutral Using Features from Pulse Wave Signal

  • Phongsuphap, Sukanya;Sopharak, Akara
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.682-685
    • /
    • 2004
  • This paper investigated a method for classifying emotional states by using pulse wave signal. It focused on finding effective features for emotional state classification. The emptional states considered here consisted of interest and neutral. Classification experiments utilized 65 and 60 samples of interest and neutral states respectively. We have investigated 19 features derived from pulse wave signals by using both time domain and frequency domain analysis methods with 2 classifiers of minimum distance (normalized Euclidean distanece) and ${\kappa}$-Nearest Neighbour. The Leave-one-out cross validation was used as an evaluation mehtod. Based on experimental results, the most efficient features were a combination of 4 features consisting of (i) the mean of the first differences of the smoothed pulse rate time series signal, (ii) the mean of absolute values of the second differences of thel normalized interbeat intervals, (iii) the root mean square successive difference, and (iv) the power in high frequency range in normalized unit, which provided 80.8% average accuracy with ${\kappa}$-Nearest Neighbour classifier.

  • PDF

On-line Signature Identification Based on Writing Habit Information (필기습관 정보에 기반한 온라인 서명인식)

  • 성한호;이일병
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04c
    • /
    • pp.322-324
    • /
    • 2003
  • 생체인식 기술은 현재까지 많은 발전을 거듭하고 있으며 국내에서도 연구는 물론 표준화작업 및 데이터 베이스 구축이 활발히 진행되고 있다. 생체인식은 신체의 여러 부분을 이용하는 방법과 습관에서 비롯된 특징을 이용하는 방법이 있는데, 본 연구에서는 이 중에서 개인의 필기습관 정보를 이용하여 인식하였다. 본 연구에서는 필기습관에 주목하여 서명하는 사람의 습관이 잘 드러나는 펜의 기울임과 눌림, 펜의 방위각도 둥의 성분이 표현되어지는 동적인 생채정보를 감지하고 특성을 추출할 수 있는 타블렛과 펜을 사용하여 서명정보를 추출한다. 이렇게 생성된 서명정보의 특징을 추출하기 위하여 패턴인식분야에 널리 활용하고 있는 주성분요소분석(PCA, Principal Component Analysis), 독립성분요소분석(ICA, Independent Component Analysis)기법에 적용하였다. 생성된 두 특징벡터 사이의 거리를 Euclidean Distance를 이용하여 구하고 Nearest Neighbor를 비교하여 인식률을 알아보고 교차인식(Cross Validation) 기법 중 하나인 Leave-One-Out 방법을 이용한 분류성능 측정을 통하여 데이터의 신뢰수준을 알아보았다.

  • PDF

GEOSTATISTICAL INTEGRATION OF HIGH-RESOLUTION REMOTE SENSING DATA IN SPATIAL ESTIMATION OF GRAIN SIZE

  • Park, No-Wook;Chi, Kwang-Hoon;Jang, Dong-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.1
    • /
    • pp.406-408
    • /
    • 2006
  • Various geological thematic maps such as grain size or ground water level maps have been generated by interpolating sparsely sampled ground survey data. When there are sampled data at a limited number of locations, to use secondary information which is correlated to primary variable can help us to estimate the attribute values of the primary variable at unsampled locations. This paper applies two multivariate geostatistical algorithms to integrate remote sensing imagery with sparsely sampled ground survey data for spatial estimation of grain size: simple kriging with local means and kriging with an external drift. High-resolution IKONOS imagery which is well correlated with the grain size is used as secondary information. The algorithms are evaluated from a case study with grain size observations measured at 53 locations in the Baramarae beach of Anmyeondo, Korea. Cross validation based on a one-leave-out approach is used to compare the estimation performance of the two multivariate geostatistical algorithms with that of traditional ordinary kriging.

  • PDF

Pridict of Liver cirrhosis susceptibility using Decision tree with SNP (Decision Tree와 SNP정보를 이용한 간경화 환자의 감수성 예측)

  • Kim, Dong-Hoi;Uhmn, Saang-Yong;Cho, Sung-Won;Ham, Ki-Baek;Kim, Jin
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10a
    • /
    • pp.63-66
    • /
    • 2006
  • 본 논문에서는 SNP데이터를 이용하여 간경화에 대한 감수성을 예측하기 위해 의사결정 트리를 이용하였다. 데이터는 간경화 환자와 정상환자 총 116명의 데이터를 사용하였으며, Feature 값으로는 간질환과 밀접한 연관성을 갖는 28개의 SNP데이터를 사용하였다. 실험방법은 각각의 SNP에 대하여 의사결정트리로 분류율을 측정한 후 가장 높은 분류율을 가지는 SNP부터 조합해 나가는 방식으로 C4.5 의사결정트리를 이용 leave-one-out cross validation으로 간경화와 정상을 구분하는 정확도를 측정하였다. 실험결과 간 질환 관련 SNP중 IL1RN-S130S, IRNGR2-Q64R, IL-10(-592), IL1B_S35S 4개의 SNP조합에서 65.52%의 정확도를 얻을 수 있었다.

  • PDF

Multiple Optimal Classifiers based on Speciated Evolution for Classifying DNA Microarray Data (DNA 마이크로어레이 데이터의 분류를 위한 종분화 진화 기반의 최적 다중 분류기)

  • 박찬호;조성배
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.724-726
    • /
    • 2004
  • DNA 마이크로어레이 기술의 발전은 암의 조기 발견 및 예후 예측을 가능하게 해주었으며, 이와 관련된 많은 연구가 진행 중이다. 마이크로어레이 데이터의 분류에서 관련 유전자들의 선택은 필수적이며, 유전자 선택방법은 분류기와 짝을 이루어 특징-분류기를 형성한다. 이제까지 여러 가지 특징-분류기를 사용하여 마이크로어레이 데이터를 분류해 왔지만, 알고리즘의 한계와 데이터의 결함 등으로 인하여 최적의 특징-분류기를 찾기 어려웠다. 따라서 앙상블 분류기를 이용하여 높은 분류성능을 얻는 방법이 시도되어왔으며. 최적의 것을 찾기 위하여 유전자 알고리즘이 사용되기도 했다. 본 논문에서는 이를 발전시켜 다양한 최적의 앙상블을 생성하기 위해 종분화 방법을 사용한다. 림프종 암 데이터에 대하여 leave-one-out cross-validation을 적용한 결과, 제안한 방법으로 다양한 최적해를 탐색하는 것을 확인할 수 있었다.

  • PDF

Development of kNN QSAR Models for 3-Arylisoquinoline Antitumor Agents

  • Tropsha, Alexander;Golbraikh, Alexander;Cho, Won-Jea
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.7
    • /
    • pp.2397-2404
    • /
    • 2011
  • Variable selection k nearest neighbor QSAR modeling approach was applied to a data set of 80 3-arylisoquinolines exhibiting cytotoxicity against human lung tumor cell line (A-549). All compounds were characterized with molecular topology descriptors calculated with the MolconnZ program. Seven compounds were randomly selected from the original dataset and used as an external validation set. The remaining subset of 73 compounds was divided into multiple training (56 to 61 compounds) and test (17 to 12 compounds) sets using a chemical diversity sampling method developed in this group. Highly predictive models characterized by the leave-one out cross-validated $R^2$ ($q^2$) values greater than 0.8 for the training sets and $R^2$ values greater than 0.7 for the test sets have been obtained. The robustness of models was confirmed by the Y-randomization test: all models built using training sets with randomly shuffled activities were characterized by low $q^2{\leq}0.26$ and $R^2{\leq}0.22$ for training and test sets, respectively. Twelve best models (with the highest values of both $q^2$ and $R^2$) predicted the activities of the external validation set of seven compounds with $R^2$ ranging from 0.71 to 0.93.

Comparison of Univariate Kriging Algorithms for GIS-based Thematic Mapping with Ground Survey Data (현장 조사 자료를 이용한 GIS 기반 주제도 작성을 위한 단변량 크리깅 기법의 비교)

  • Park, No-Wook
    • Korean Journal of Remote Sensing
    • /
    • v.25 no.4
    • /
    • pp.321-338
    • /
    • 2009
  • The objective of this paper is to compare spatial prediction capabilities of univariate kriging algorithms for generating GIS-based thematic maps from ground survey data with asymmetric distributions. Four univariate kriging algorithms including traditional ordinary kriging, three non-linear transform-based kriging algorithms such as log-normal kriging, multi-Gaussian kriging and indicator kriging are applied for spatial interpolation of geochemical As and Pb elements. Cross validation based on a leave-one-out approach is applied and then prediction errors are computed. The impact of the sampling density of the ground survey data on the prediction errors are also investigated. Through the case study, indicator kriging showed the smallest prediction errors and superior prediction capabilities of very low and very high values. Other non-linear transform based kriging algorithms yielded better prediction capabilities than traditional ordinary kriging. Log-normal kriging which has been widely applied, however, produced biased estimation results (overall, overestimation). It is expected that such quantitative comparison results would be effectively used for the selection of an optimal kriging algorithm for spatial interpolation of ground survey data with asymmetric distributions.