• Title/Summary/Keyword: 10-fold cross-validation

Search Result 210, Processing Time 0.021 seconds

Prediction and analysis of acute fish toxicity of pesticides to the rainbow trout using 2D-QSAR (2D-QSAR방법을 이용한 농약류의 무지개 송어 급성 어독성 분석 및 예측)

  • Song, In-Sik;Cha, Ji-Young;Lee, Sung-Kwang
    • Analytical Science and Technology
    • /
    • v.24 no.6
    • /
    • pp.544-555
    • /
    • 2011
  • The acute toxicity in the rainbow trout (Oncorhynchus mykiss) was analyzed and predicted using quantitative structure-activity relationships (QSAR). The aquatic toxicity, 96h $LC_{50}$ (median lethal concentration) of 275 organic pesticides, was obtained from EU-funded project DEMETRA. Prediction models were derived from 558 2D molecular descriptors, calculated in PreADMET. The linear (multiple linear regression) and nonlinear (support vector machine and artificial neural network) learning methods were optimized by taking into account the statistical parameters between the experimental and predicted p$LC_{50}$. After preprocessing, population based forward selection were used to select the best subsets of descriptors in the learning methods including 5-fold cross-validation procedure. The support vector machine model was used as the best model ($R^2_{CV}$=0.677, RMSECV=0.887, MSECV=0.674) and also correctly classified 87% for the training set according to EU regulation criteria. The MLR model could describe the structural characteristics of toxic chemicals and interaction with lipid membrane of fish. All the developed models were validated by 5 fold cross-validation and Y-scrambling test.

Implementation on the evolutionary machine learning approaches for streamflow forecasting: case study in the Seybous River, Algeria (유출예측을 위한 진화적 기계학습 접근법의 구현: 알제리 세이보스 하천의 사례연구)

  • Zakhrouf, Mousaab;Bouchelkia, Hamid;Stamboul, Madani;Kim, Sungwon;Singh, Vijay P.
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.6
    • /
    • pp.395-408
    • /
    • 2020
  • This paper aims to develop and apply three different machine learning approaches (i.e., artificial neural networks (ANN), adaptive neuro-fuzzy inference systems (ANFIS), and wavelet-based neural networks (WNN)) combined with an evolutionary optimization algorithm and the k-fold cross validation for multi-step (days) streamflow forecasting at the catchment located in Algeria, North Africa. The ANN and ANFIS models yielded similar performances, based on four different statistical indices (i.e., root mean squared error (RMSE), Nash-Sutcliffe efficiency (NSE), correlation coefficient (R), and peak flow criteria (PFC)) for training and testing phases. The values of RMSE and PFC for the WNN model (e.g., RMSE = 8.590 ㎥/sec, PFC = 0.252 for (t+1) day, testing phase) were lower than those of ANN (e.g., RMSE = 19.120 ㎥/sec, PFC = 0.446 for (t+1) day, testing phase) and ANFIS (e.g., RMSE = 18.520 ㎥/sec, PFC = 0.444 for (t+1) day, testing phase) models, while the values of NSE and R for WNN model were higher than those of ANNs and ANFIS models. Therefore, the new approach can be a robust tool for multi-step (days) streamflow forecasting in the Seybous River, Algeria.

A novel method for predicting protein subcellular localization based on pseudo amino acid composition

  • Ma, Junwei;Gu, Hong
    • BMB Reports
    • /
    • v.43 no.10
    • /
    • pp.670-676
    • /
    • 2010
  • In this paper, a novel approach, ELM-PCA, is introduced for the first time to predict protein subcellular localization. Firstly, Protein Samples are represented by the pseudo amino acid composition (PseAAC). Secondly, the principal component analysis (PCA) is employed to extract essential features. Finally, the Elman Recurrent Neural Network (RNN) is used as a classifier to identify the protein sequences. The results demonstrate that the proposed approach is effective and practical.

Alzheimer progression classification using fMRI data (fMRI 데이터를 이용한 알츠하이머 진행상태 분류)

  • Ju Hyeon-Noh;Hee-Deok Yang
    • Smart Media Journal
    • /
    • v.13 no.4
    • /
    • pp.86-93
    • /
    • 2024
  • The development of functional magnetic resonance imaging (fMRI) has significantly contributed to mapping brain functions and understanding brain networks during rest. This paper proposes a CNN-LSTM-based classification model to classify the progression stages of Alzheimer's disease. Firstly, four preprocessing steps are performed to remove noise from the fMRI data before feature extraction. Secondly, the U-Net architecture is utilized to extract spatial features once preprocessing is completed. Thirdly, the extracted spatial features undergo LSTM processing to extract temporal features, ultimately leading to classification. Experiments were conducted by adjusting the temporal dimension of the data. Using 5-fold cross-validation, an average accuracy of 96.4% was achieved, indicating that the proposed method has high potential for identifying the progression of Alzheimer's disease by analyzing fMRI data.

Object Detection from Mongolian Nomadic Environmental Images

  • Perenleilkhundev, Gantuya;Batdemberel, Mungunshagai;Battulga, Batnyam;Batsuuri, Suvdaa
    • Journal of Multimedia Information System
    • /
    • v.6 no.4
    • /
    • pp.173-178
    • /
    • 2019
  • Mongolian historical and cultural monuments on settlement areas of stone inscriptions, stone images, rock-drawings, remains of cities, architecture are still telling us their stories. These monuments depict the understanding of the word, philosophical and artistic outlook, beliefs, religion, national art, language, culture and traditions of Mongols [1]. Nowadays computer science, especially computer vision is applying in the other science fields. The main problem is how to apply and which algorithm can detect and classify the objects correctly. In this paper, we propose a method to detect object from Mongolian nomadic environment images. This work proposes a method for object detection that is the combination of the binary operations in the edge detection results. We found out the best method and parameters of state-of-the-art machine learning algorithms. In experimental result, we evaluate our results with 10-fold cross validation and split 66% strategies.

A Comparative Study of Local Features in Face-based Video Retrieval

  • Zhou, Juan;Huang, Lan
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.1
    • /
    • pp.24-31
    • /
    • 2017
  • Face-based video retrieval has become an active and important branch of intelligent video analysis. Face profiling and matching is a fundamental step and is crucial to the effectiveness of video retrieval. Although many algorithms have been developed for processing static face images, their effectiveness in face-based video retrieval is still unknown, simply because videos have different resolutions, faces vary in scale, and different lighting conditions and angles are used. In this paper, we combined content-based and semantic-based image analysis techniques, and systematically evaluated four mainstream local features to represent face images in the video retrieval task: Harris operators, SIFT and SURF descriptors, and eigenfaces. Results of ten independent runs of 10-fold cross-validation on datasets consisting of TED (Technology Entertainment Design) talk videos showed the effectiveness of our approach, where the SIFT descriptors achieved an average F-score of 0.725 in video retrieval and thus were the most effective, while the SURF descriptors were computed in 0.3 seconds per image on average and were the most efficient in most cases.

Evaluation of Hemiplegic Gait Using Accelerometer (가속도센서를 이용한 편마비성보행 평가)

  • Lee, Jun Seok;Park, Sooji;Shin, Hangsik
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.11
    • /
    • pp.1634-1640
    • /
    • 2017
  • The study aims to distinguish hemiplegic gait and normal gait using simple wearable device and classification algorithm. Thus, we developed a wearable system equipped three axis accelerometer and three axis gyroscope. The developed wearable system was verified by clinical experiment. In experiment, twenty one normal subjects and twenty one patients undergoing stroke treatment were participated. Based on the measured inertial signal, a random forest algorithm was used to classify hemiplegic gait. Four-fold cross validation was applied to ensure the reliability of the results. To select optimal attributes, we applied the forward search algorithm with 10 times of repetition, then selected five most frequently attributes were chosen as a final attribute. The results of this study showed that 95.2% of accuracy in hemiplegic gait and normal gait classification and 77.4% of accuracy in hemiplegic-side and normal gait classification.

An Improvement of AdaBoost using Boundary Classifier

  • Lee, Wonju;Cheon, Minkyu;Hyun, Chang-Ho;Park, Mignon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.2
    • /
    • pp.166-171
    • /
    • 2013
  • The method proposed in this paper can improve the performance of the Boosting algorithm in machine learning. The proposed Boundary AdaBoost algorithm can make up for the weak points of Normal binary classifier using threshold boundary concepts. The new proposed boundary can be located near the threshold of the binary classifier. The proposed algorithm improves classification in areas where Normal binary classifier is weak. Thus, the optimal boundary final classifier can decrease error rates classified with more reasonable features. Finally, this paper derives the new algorithm's optimal solution, and it demonstrates how classifier accuracy can be improved using the proposed Boundary AdaBoost in a simulation experiment of pedestrian detection using 10-fold cross validation.

Hybridized Decision Tree methods for Detecting Generic Attack on Ciphertext

  • Alsariera, Yazan Ahmad
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.7
    • /
    • pp.56-62
    • /
    • 2021
  • The surge in generic attacks execution against cipher text on the computer network has led to the continuous advancement of the mechanisms to protect information integrity and confidentiality. The implementation of explicit decision tree machine learning algorithm is reported to accurately classifier generic attacks better than some multi-classification algorithms as the multi-classification method suffers from detection oversight. However, there is a need to improve the accuracy and reduce the false alarm rate. Therefore, this study aims to improve generic attack classification by implementing two hybridized decision tree algorithms namely Naïve Bayes Decision tree (NBTree) and Logistic Model tree (LMT). The proposed hybridized methods were developed using the 10-fold cross-validation technique to avoid overfitting. The generic attack detector produced a 99.8% accuracy, an FPR score of 0.002 and an MCC score of 0.995. The performances of the proposed methods were better than the existing decision tree method. Similarly, the proposed method outperformed multi-classification methods for detecting generic attacks. Hence, it is recommended to implement hybridized decision tree method for detecting generic attacks on a computer network.

Application of Text-Classification Based Machine Learning in Predicting Psychiatric Diagnosis (텍스트 분류 기반 기계학습의 정신과 진단 예측 적용)

  • Pak, Doohyun;Hwang, Mingyu;Lee, Minji;Woo, Sung-Il;Hahn, Sang-Woo;Lee, Yeon Jung;Hwang, Jaeuk
    • Korean Journal of Biological Psychiatry
    • /
    • v.27 no.1
    • /
    • pp.18-26
    • /
    • 2020
  • Objectives The aim was to find effective vectorization and classification models to predict a psychiatric diagnosis from text-based medical records. Methods Electronic medical records (n = 494) of present illness were collected retrospectively in inpatient admission notes with three diagnoses of major depressive disorder, type 1 bipolar disorder, and schizophrenia. Data were split into 400 training data and 94 independent validation data. Data were vectorized by two different models such as term frequency-inverse document frequency (TF-IDF) and Doc2vec. Machine learning models for classification including stochastic gradient descent, logistic regression, support vector classification, and deep learning (DL) were applied to predict three psychiatric diagnoses. Five-fold cross-validation was used to find an effective model. Metrics such as accuracy, precision, recall, and F1-score were measured for comparison between the models. Results Five-fold cross-validation in training data showed DL model with Doc2vec was the most effective model to predict the diagnosis (accuracy = 0.87, F1-score = 0.87). However, these metrics have been reduced in independent test data set with final working DL models (accuracy = 0.79, F1-score = 0.79), while the model of logistic regression and support vector machine with Doc2vec showed slightly better performance (accuracy = 0.80, F1-score = 0.80) than the DL models with Doc2vec and others with TF-IDF. Conclusions The current results suggest that the vectorization may have more impact on the performance of classification than the machine learning model. However, data set had a number of limitations including small sample size, imbalance among the category, and its generalizability. With this regard, the need for research with multi-sites and large samples is suggested to improve the machine learning models.