• Title/Summary/Keyword: Validation data set

Search Result 379, Processing Time 0.028 seconds

Studies on 5 Protein Fractions Prediction of Forage Legume Mixture by NIRS

  • Lee, Hyo-Won;Jang, Sungkwon;Lee, Hyo-Jin;Park, Hyung-Soo
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.34 no.3
    • /
    • pp.214-218
    • /
    • 2014
  • This study was conducted to assess the feasibility of near-infrared reflectance spectroscopy (NIRS) as a rapid and reliable method for the estimation of crude protein (CP) fractions in forage legume mixtures (sudangrass and pea mixture, and kidney bean and potato mixture). A total of 178 samples were collected and their spectral reflectance obtained in the range of 400~2,500 nm. Of these, 50 samples were selected for calibration and validation, and 35 samples were used for calibration of the data set, and the modified partial least square regression (MPLSR) analysis was performed. The correlation coefficient ($r^2$) and the standard error of cross-validation (SECV) of the calibration models in the CP fractions, A, B1, B2, B3, and C, were 0.94 (1.05), 0.92 (0.74), 0.96 (0.95), 0.91 (0.42), and 0.83 (0.38), respectively. Fifteen samples were used for equation validation, and the $r^2$ and the standard error of prediction (SEP) were 0.87 (1.45), 0.91 (0.49), 0.94 (1.13), 0.36 (0.96), and 0.74 (0.67), respectively. This study showed that NIRS could be an effective tool for the rapid and precise estimation of CP fractions in forage legume mixtures.

A Study on Bandwith Selection Based on ASE for Nonparametric Regression Estimator

  • Kim, Tae-Yoon
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.1
    • /
    • pp.21-30
    • /
    • 2001
  • Suppose we observe a set of data (X$_1$,Y$_1$(, …, (X$_{n}$,Y$_{n}$) and use the Nadaraya-Watson regression estimator to estimate m(x)=E(Y│X=x). in this article bandwidth selection problem for the Nadaraya-Watson regression estimator is investigated. In particular cross validation method based on average square error(ASE) is considered. Theoretical results here include a central limit theorem that quantifies convergence rates of the bandwidth selector.tor.

  • PDF

Detecting Influential Observations on the Smoothing Parameter in Nonparametric Regression

  • Kim, Choong-Rak;Jeon, Jong-Woo
    • Journal of the Korean Statistical Society
    • /
    • v.24 no.2
    • /
    • pp.495-506
    • /
    • 1995
  • We present formula for detecting influential observations on the smoothing parameter in smoothing spline. Further, we express them as functions of basic building blocks such as residuals and leverage, and compare it with the local influence approach by Thomas (1991). An example based on a real data set is given.

  • PDF

Use of a Machine Learning Algorithm to Predict Individuals with Suicide Ideation in the General Population

  • Ryu, Seunghyong;Lee, Hyeongrae;Lee, Dong-Kyun;Park, Kyeongwoo
    • Psychiatry investigation
    • /
    • v.15 no.11
    • /
    • pp.1030-1036
    • /
    • 2018
  • Objective In this study, we aimed to develop a model predicting individuals with suicide ideation within a general population using a machine learning algorithm. Methods Among 35,116 individuals aged over 19 years from the Korea National Health & Nutrition Examination Survey, we selected 11,628 individuals via random down-sampling. This included 5,814 suicide ideators and the same number of non-suicide ideators. We randomly assigned the subjects to a training set (n=10,466) and a test set (n=1,162). In the training set, a random forest model was trained with 15 features selected with recursive feature elimination via 10-fold cross validation. Subsequently, the fitted model was used to predict suicide ideators in the test set and among the total of 35,116 subjects. All analyses were conducted in R. Results The prediction model achieved a good performance [area under receiver operating characteristic curve (AUC)=0.85] in the test set and predicted suicide ideators among the total samples with an accuracy of 0.821, sensitivity of 0.836, and specificity of 0.807. Conclusion This study shows the possibility that a machine learning approach can enable screening for suicide risk in the general population. Further work is warranted to increase the accuracy of prediction.

AC4E: An Access Control Model for Emergencies of Mission-Critical Cyber-Physical Systems

  • Chen, Dong;Chang, Guiran;Jia, Jie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.9
    • /
    • pp.2052-2072
    • /
    • 2012
  • Access control is an essential security component in protecting sensitive data and services from unauthorized access to the resources in mission-critical Cyber-Physical Systems (CPSs). CPSs are different from conventional information processing systems in such that they involve interactions between the cyber world and the physical world. Therefore, existing access control models cannot be used directly and even become disabled in an emergency situation. This paper proposes an adaptive Access Control model for Emergences (AC4E) for mission-critical CPSs. The principal aim of AC4E is to control the criticalities in these systems by executing corresponding responsive actions. AC4E not only provides the ability to control access to data and services in normal situations, but also grants the correct set of access privileges, at the correct time, to the correct set of subjects in emergency situations. It can facilitate adaptively responsive actions altering the privileges to specific subjects in a proactive manner without the need for any explicit access requests. A semiformal validation of the AC4E model is presented, with respect to responsiveness, correctness, safety, non-repudiation and concurrency, respectively. Then a case study is given to demonstrate how the AC4E model detects, responds, and controls the emergency events for a typical CPS adaptively in a proactive manner. Eventually, a wide set of simulations and performance comparisons of the proposed AC4E model are presented.

Prediction of concrete compressive strength using non-destructive test results

  • Erdal, Hamit;Erdal, Mursel;Simsek, Osman;Erdal, Halil Ibrahim
    • Computers and Concrete
    • /
    • v.21 no.4
    • /
    • pp.407-417
    • /
    • 2018
  • Concrete which is a composite material is one of the most important construction materials. Compressive strength is a commonly used parameter for the assessment of concrete quality. Accurate prediction of concrete compressive strength is an important issue. In this study, we utilized an experimental procedure for the assessment of concrete quality. Firstly, the concrete mix was prepared according to C 20 type concrete, and slump of fresh concrete was about 20 cm. After the placement of fresh concrete to formworks, compaction was achieved using a vibrating screed. After 28 day period, a total of 100 core samples having 75 mm diameter were extracted. On the core samples pulse velocity determination tests and compressive strength tests were performed. Besides, Windsor probe penetration tests and Schmidt hammer tests were also performed. After setting up the data set, twelve artificial intelligence (AI) models compared for predicting the concrete compressive strength. These models can be divided into three categories (i) Functions (i.e., Linear Regression, Simple Linear Regression, Multilayer Perceptron, Support Vector Regression), (ii) Lazy-Learning Algorithms (i.e., IBk Linear NN Search, KStar, Locally Weighted Learning) (iii) Tree-Based Learning Algorithms (i.e., Decision Stump, Model Trees Regression, Random Forest, Random Tree, Reduced Error Pruning Tree). Four evaluation processes, four validation implements (i.e., 10-fold cross validation, 5-fold cross validation, 10% split sample validation & 20% split sample validation) are used to examine the performance of predictive models. This study shows that machine learning regression techniques are promising tools for predicting compressive strength of concrete.

PREPROCESSING EFFECTS ON ON-LINE SSC MEASUREMENT OF FUJI APPLE BY NIR SPECTROSCOPY

  • Ryu, D.S.;Noh, S.H.;Hwang, I.G.
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2000.11c
    • /
    • pp.560-568
    • /
    • 2000
  • The aims of this research were to investigate the preprocessing effect of spectrum data on prediction performance and to develop a robust model to predict SSC in intact apple. Spectrum data of 320 Fuji apples were measured with the on-line transmittance measurement system at the wavelength range of 550∼1100nm. Preprocess methods adopted for the tests were Savitzky Golay, MSC, SNV, first derivative and OSC. Several combinations of those methods were applied to the raw spectrum data set to investigate the relative effect of each method on the performance of the calibration model. PLS method was used to regress the preprocessed data set and the SSCs of samples, and the cross-validation was to select the optimal number of PLS factors. Smoothing and scattering corection were essential in increasing the prediction performance of PLS regression model and the OSC contributed to reduction of the number of PLS factors. The first derivative resulted in unfavorable effect on the prediction performance. MSC and SNV showed similar effect. A robust calibration model could be developed by the preprocessing combination of Savitzky Golay smoothing, MSC and OSC, which resulted in SEP= 0.507, bias=0.032 and R$^2$=0.8823.

  • PDF

Prediction of subcellular localization of proteins using pairwise sequence alignment and support vector machine

  • Kim, Jong-Kyoung;Raghava, G. P. S.;Kim, Kwang-S.;Bang, Sung-Yang;Choi, Seung-Jin
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2004.11a
    • /
    • pp.158-166
    • /
    • 2004
  • Predicting the destination of a protein in a cell gives valuable information for annotating the function of the protein. Recent technological breakthroughs have led us to develop more accurate methods for predicting the subcellular localization of proteins. The most important factor in determining the accuracy of these methods, is a way of extracting useful features from protein sequences. We propose a new method for extracting appropriate features only from the sequence data by computing pairwise sequence alignment scores. As a classifier, support vector machine (SVM) is used. The overall prediction accuracy evaluated by the jackknife validation technique reach 94.70% for the eukaryotic non-plant data set and 92.10% for the eukaryotic plant data set, which show the highest prediction accuracy among methods reported so far with such data sets. Our numerical experimental results confirm that our feature extraction method based on pairwise sequence alignment, is useful for this classification problem.

  • PDF

Detecting Jaywalking Using the YOLOv5 Model

  • Kim, Hyun-Tae;Lee, Sang-Hyun
    • International Journal of Advanced Culture Technology
    • /
    • v.10 no.2
    • /
    • pp.300-306
    • /
    • 2022
  • Currently, Korea is building traffic infrastructure using Intelligent Transport Systems (ITS), but the pedestrian traffic accident rate is very high. The purpose of this paper is to prevent the risk of traffic accidents by jaywalking pedestrians. The development of this study aims to detect pedestrians who trespass using the public data set provided by the Artificial Intelligence Hub (AIHub). The data set uses training data: 673,150 pieces and validation data: 131,385 pieces, and the types include snow, rain, fog, etc., and there is a total of 7 types including passenger cars, small buses, large buses, trucks, large trailers, motorcycles, and pedestrians. has a class format of Learning is carried out using YOLOv5 as an implementation model, and as an object detection and edge detection method of an input image, a canny edge model is applied to classify and visualize human objects within the detected road boundary range. In this study, it was designed and implemented to detect pedestrians using the deep learning-based YOLOv5 model. As the final result, the mAP 0.5 showed a real-time detection rate of 61% and 114.9 fps at 338 epochs using the YOLOv5 model.

Extraction Method of Significant Clinical Tests Based on Data Discretization and Rough Set Approximation Techniques: Application to Differential Diagnosis of Cholecystitis and Cholelithiasis Diseases (데이터 이산화와 러프 근사화 기술에 기반한 중요 임상검사항목의 추출방법: 담낭 및 담석증 질환의 감별진단에의 응용)

  • Son, Chang-Sik;Kim, Min-Soo;Seo, Suk-Tae;Cho, Yun-Kyeong;Kim, Yoon-Nyun
    • Journal of Biomedical Engineering Research
    • /
    • v.32 no.2
    • /
    • pp.134-143
    • /
    • 2011
  • The selection of meaningful clinical tests and its reference values from a high-dimensional clinical data with imbalanced class distribution, one class is represented by a large number of examples while the other is represented by only a few, is an important issue for differential diagnosis between similar diseases, but difficult. For this purpose, this study introduces methods based on the concepts of both discernibility matrix and function in rough set theory (RST) with two discretization approaches, equal width and frequency discretization. Here these discretization approaches are used to define the reference values for clinical tests, and the discernibility matrix and function are used to extract a subset of significant clinical tests from the translated nominal attribute values. To show its applicability in the differential diagnosis problem, we have applied it to extract the significant clinical tests and its reference values between normal (N = 351) and abnormal group (N = 101) with either cholecystitis or cholelithiasis disease. In addition, we investigated not only the selected significant clinical tests and the variations of its reference values, but also the average predictive accuracies on four evaluation criteria, i.e., accuracy, sensitivity, specificity, and geometric mean, during l0-fold cross validation. From the experimental results, we confirmed that two discretization approaches based rough set approximation methods with relative frequency give better results than those with absolute frequency, in the evaluation criteria (i.e., average geometric mean). Thus it shows that the prediction model using relative frequency can be used effectively in classification and prediction problems of the clinical data with imbalanced class distribution.