• Title/Summary/Keyword: Validation Set

Search Result 679, Processing Time 0.022 seconds

Use of Near-Infrared Spectroscopy for Estimating Lignan Glucosides Contents in Intact Sesame Seeds

  • Kim, Kwan-Su;Park, Si-Hyung;Shim, Kang-Bo;Ryu, Su-Noh
    • Journal of Crop Science and Biotechnology
    • /
    • v.10 no.3
    • /
    • pp.185-192
    • /
    • 2007
  • Near-infrared spectroscopy(NIRS) was used to develop a rapid and efficient method to determine lignan glucosides in intact seeds of sesame(Sesamum indicum L.) germplasm accessions in Korea. A total of 93 samples(about 2 g of intact seeds) were scanned in the reflectance mode of a scanning monochromator, and the reference values for lignan glucosides contents were measured by high performance liquid chromatography. Calibration equations for sesaminol triglucoside, sesaminol($1{\rightarrow}2$) diglucoside, sesamolinol diglucoside, sesaminol($1{\rightarrow}6$) diglucoside, and total amount of lignan glucosides were developed using modified partial least square regression with internal cross validation(n=63), which exhibited lower SECV(standard errors of cross-validation), higher $R^2$(coefficient of determination in calibration), and higher 1-VR(ratio of unexplained variance divided by variance) values. Prediction of an external validation set(n=30) showed a significant correlation between reference values and NIRS estimated values based on the SEP(standard error of prediction), $r^2$(coefficient of determination in prediction), and the ratio of standard deviation(SD) of reference data to SEP, as factors used to evaluate the accuracy of equations. The models for each glucoside content had relatively higher values of SD/SEP(C) and $r^2$(more than 2.0 and 0.80, respectively), thereby characterizing those equations as having good quantitative information, while those of sesaminol($1{\rightarrow}2$) diglucoside showing a minor quantity had the lowest SD/SEP(C) and $r^2$ values(1.7 and 0.74, respectively), indicating a poor correlation between reference values and NIRS estimated values. The results indicated that NIRS could be used to rapidly determine lignan glucosides content in sesame seeds in the breeding programs for high quality sesame varieties.

  • PDF

Use of Near-Infrared Spectroscopy for Estimating Fatty Acid Composition in Intact Seeds of Rapeseed

  • Kim, Kwan-Su;Park, Si-Hyung;Choung, Myoung-Gun;Jang, Young-Seok
    • Journal of Crop Science and Biotechnology
    • /
    • v.10 no.1
    • /
    • pp.13-18
    • /
    • 2007
  • Near-infrared spectroscopy(NIRS) was used as a rapid and nondestructive method to determine the fatty acid composition in intact seed samples of rapeseed(Brassica napus L.). A total of 349 samples(about 2 g of intact seeds) were scanned in the reflectance mode of a scanning monochromator, and the reference values for fatty acid composition were measured by gas-liquid chromatography. Calibration equations for individual fatty acids were developed using the regression method of modified partial least-squares with internal cross validation(n=249). The equations had low SECV(standard errors of cross-validation), and high $R^2$(coefficient of determination in calibration) values(>0.8) except for palmitic and eicosenoic acid. Prediction of an external validation set(n=100) showed significant correlation between reference values and NIRS estimated values based on the SEP(standard error of prediction), $r^2$(coefficient of determination in prediction), and the ratio of standard deviation(SD) of reference data to SEP. The models developed in this study had relatively higher values(> 3.0 and 0.9, respectively) of SD/SEP(C) and $r^2$ for oleic, linoleic, and erucic acid, characterizing those equations as having good quantitative information. The results indicated that NIRS could be used to rapidly determine the fatty acid composition in rapeseed seeds in the breeding programs for high quality rapeseed oil.

  • PDF

Simultaneous Analysis of three Marker Components in Hwangryunhaedok-tang by HPLC-DAD (황련해독탕 중 3종 생리활성 물질의 HPLC-DAD 동시 정량분석법 확립)

  • Yang, Hye-Jin;Weon, Jin-Bae;Ma, Jin-Yeul;Ma, Choong-Je
    • YAKHAK HOEJI
    • /
    • v.55 no.1
    • /
    • pp.64-68
    • /
    • 2011
  • In this study, a high performance liquid chromatography-diode array detector method was established, for simultaneous determination of three compounds, berberine, palmatine and geniposide in Hwangryunhaedok-tang, To develop and validate method, $C_{18}$ column (5 ${\mu}M$, 4.6 mm${\times}$250 mm) was used with gradient mobile phase, water containing 0.1% trifluoroacetic acid (TFA) and MeOH at the column temperature of $30^{\circ}C$. UV wavelength was set at 230 and 280 nm. Validation of the chromatography method was evaluated by linearity, precision and accuracy test. Calibration curve of standard components showed good linearity ($R^2$ > 0.9999). The limits of detection (LOD) and limits of quantification (LOQ) varied from 0.05 to 0.17 ${\mu}g/ml$ and 0.15 to 0.53 ${\mu}g/ml$, respectively. The relative standard deviations (RSDs) data of intra-day and inter-day test were in less than 2.99% and 1.90%, respectively. The results of the accuracy test were in the range of 98.36 to 102.52% with RSDs values 0.32 to 1.98%. The results of validation indicated that this method was a very accurate and sensitive assay.

Co-Validation Environment for Memory Card Compatibility Test (메모리 카드 호환성 테스트를 위한 통합 검증 환경)

  • Sung, Min-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.3
    • /
    • pp.57-63
    • /
    • 2008
  • As diverse memory cards based on NAND flash memory are getting popularity with consumer electronics such as digital camera, camcorder and MP3 player the compatibility problems between a newly developed memory card and existent host systems have become a main obstacle to time-to-market delivery of product. The common practice for memory card compatibility test is to use a real host system as a test bed. As an improved solution, an FPGA-based prototyping board can be used for emulating host systems. However, the above approaches require a long set-up time and have limitations in representing various host and device systems. In this paper, we propose a co-validation environment for compatibility test between memory card and host system using formal modeling based on Esterel language and co-simulation methodology. Finally, we demonstrate the usefulness of the proposed environment with a case study of real memory card development.

  • PDF

An HPLC-UV-based quantitative analytical method for Chrysanthemum morifolium: development, validation, and application

  • Jung, Dasom;Jin, Yan;Kang, Seulgi;Lee, Heesoo;Park, Keunbae;Li, Ke;Kim, Jin Hak;Geum, Jeong Ho;Lee, Jeongmi
    • Analytical Science and Technology
    • /
    • v.32 no.4
    • /
    • pp.139-146
    • /
    • 2019
  • A simple and reliable analytical method based on high-performance liquid chromatography-ultraviolet detection was established for the analysis of the flowers of Chrysanthemum morifolium (CM). Luteolin-7-O-glucoside (LU7G) was chosen as a target analyte considering its content, availability, and ease of analysis. Chromatographic separation of LU7G was achieved using a Phenomenex Gemini $C_{18}$ column ($250{\times}4.6mm$, $5{\mu}m$) run with a mobile phase consisting of 0.5 % acetic acid in water and 0.5 % acetic acid in acetonitrile at a flow rate of $1.0mL\;min^{-1}$. The detection wavelength and column temperature were set at 350 nm and $40^{\circ}C$, respectively. Method validation was performed according to the AOAC guidelines and the method was specific, linear ($R^2=0.9991$ for $50-300{\mu}g\;mL^{-1}$), precise (${\leq}3.91%$RSD), and accurate (100.1-105.7 %). The limits of detection and quantification were 3.62 and $10.96{\mu}g\;mL^{-1}$, respectively. The established method was successfully applied to determine the contents of LU7G in various batches of bulk CM extracts and labscale CM extract. The developed method is a readily applicable method for the quality assessment of CM and its related products.

Application of Text-Classification Based Machine Learning in Predicting Psychiatric Diagnosis (텍스트 분류 기반 기계학습의 정신과 진단 예측 적용)

  • Pak, Doohyun;Hwang, Mingyu;Lee, Minji;Woo, Sung-Il;Hahn, Sang-Woo;Lee, Yeon Jung;Hwang, Jaeuk
    • Korean Journal of Biological Psychiatry
    • /
    • v.27 no.1
    • /
    • pp.18-26
    • /
    • 2020
  • Objectives The aim was to find effective vectorization and classification models to predict a psychiatric diagnosis from text-based medical records. Methods Electronic medical records (n = 494) of present illness were collected retrospectively in inpatient admission notes with three diagnoses of major depressive disorder, type 1 bipolar disorder, and schizophrenia. Data were split into 400 training data and 94 independent validation data. Data were vectorized by two different models such as term frequency-inverse document frequency (TF-IDF) and Doc2vec. Machine learning models for classification including stochastic gradient descent, logistic regression, support vector classification, and deep learning (DL) were applied to predict three psychiatric diagnoses. Five-fold cross-validation was used to find an effective model. Metrics such as accuracy, precision, recall, and F1-score were measured for comparison between the models. Results Five-fold cross-validation in training data showed DL model with Doc2vec was the most effective model to predict the diagnosis (accuracy = 0.87, F1-score = 0.87). However, these metrics have been reduced in independent test data set with final working DL models (accuracy = 0.79, F1-score = 0.79), while the model of logistic regression and support vector machine with Doc2vec showed slightly better performance (accuracy = 0.80, F1-score = 0.80) than the DL models with Doc2vec and others with TF-IDF. Conclusions The current results suggest that the vectorization may have more impact on the performance of classification than the machine learning model. However, data set had a number of limitations including small sample size, imbalance among the category, and its generalizability. With this regard, the need for research with multi-sites and large samples is suggested to improve the machine learning models.

Risk-Scoring System for Prediction of Non-Curative Endoscopic Submucosal Dissection Requiring Additional Gastrectomy in Patients with Early Gastric Cancer

  • Kim, Tae-Se;Min, Byung-Hoon;Kim, Kyoung-Mee;Yoo, Heejin;Kim, Kyunga;Min, Yang Won;Lee, Hyuk;Rhee, Poong-Lyul;Kim, Jae J.;Lee, Jun Haeng
    • Journal of Gastric Cancer
    • /
    • v.21 no.4
    • /
    • pp.368-378
    • /
    • 2021
  • Purpose: When patients with early gastric cancer (EGC) undergo non-curative endoscopic submucosal dissection requiring gastrectomy (NC-ESD-RG), additional medical resources and expenses are required for surgery. To reduce this burden, predictive model for NC-ESD-RG is required. Materials and Methods: Data from 2,997 patients undergoing ESD for 3,127 forceps biopsy-proven differentiated-type EGCs (2,345 and 782 in training and validation sets, respectively) were reviewed. Using the training set, the logistic stepwise regression analysis determined the independent predictors of NC-ESD-RG (NC-ESD other than cases with lateral resection margin involvement or piecemeal resection as the only non-curative factor). Using these predictors, a risk-scoring system for predicting NC-ESD-RG was developed. Performance of the predictive model was examined internally with the validation set. Results: Rate of NC-ESD-RG was 17.3%. Independent pre-ESD predictors for NC-ESD-RG included moderately differentiated or papillary EGC, large tumor size, proximal tumor location, lesion at greater curvature, elevated or depressed morphology, and presence of ulcers. A risk-score was assigned to each predictor of NC-ESD-RG. The area under the receiver operating characteristic curve for predicting NC-ESD-RG was 0.672 in both training and validation sets. A risk-score of 5 points was the optimal cut-off value for predicting NC-ESD-RG, and the overall accuracy was 72.7%. As the total risk score increased, the predicted risk for NC-ESD-RG increased from 3.8% to 72.6%. Conclusions: We developed and validated a risk-scoring system for predicting NC-ESD-RG based on pre-ESD variables. Our risk-scoring system can facilitate informed consent and decision-making for preoperative treatment selection between ESD and surgery in patients with EGC.

Imaging Predictors of Survival in Patients with Single Small Hepatocellular Carcinoma Treated with Transarterial Chemoembolization

  • Chan Park;Jin Hyoung Kim;Pyeong Hwa Kim;So Yeon Kim;Dong Il Gwon;Hee Ho Chu;Minho Park;Joonho Hur;Jin Young Kim;Dong Joon Kim
    • Korean Journal of Radiology
    • /
    • v.22 no.2
    • /
    • pp.213-224
    • /
    • 2021
  • Objective: Clinical outcomes of patients who undergo transarterial chemoembolization (TACE) for single small hepatocellular carcinoma (HCC) are not consistent, and may differ based on certain imaging findings. This retrospective study was aimed at determining the efficacy of pre-TACE CT or MR imaging findings in predicting survival outcomes in patients with small HCC upon being treated with TACE. Besides, the study proposed to build a risk prediction model for these patients. Materials and Methods: Altogether, 750 patients with functionally good hepatic reserve who received TACE as the first-line treatment for single small HCC between 2004 and 2014 were included in the study. These patients were randomly assigned into training (n = 525) and validation (n = 225) sets. Results: According to the results of a multivariable Cox analysis, three pre-TACE imaging findings (tumor margin, tumor location, enhancement pattern) and two clinical factors (age, serum albumin level) were selected and scored to create predictive models for overall, local tumor progression (LTP)-free, and progression-free survival in the training set. The median overall survival time in the validation set were 137.5 months, 76.1 months, and 44.0 months for low-, intermediate-, and high-risk groups, respectively (p < 0.001). Time-dependent receiver operating characteristic curves of the predictive models for overall, LTP-free, and progression-free survival applied to the validation cohort showed acceptable areas under the curve values (0.734, 0.802, and 0.775 for overall survival; 0.738, 0.789, and 0.791 for LTP-free survival; and 0.671, 0.733, and 0.694 for progression-free survival at 3, 5, and 10 years, respectively). Conclusion: Pre-TACE CT or MR imaging findings could predict survival outcomes in patients with small HCC upon treatment with TACE. Our predictive models including three imaging predictors could be helpful in prognostication, identification, and selection of suitable candidates for TACE in patients with single small HCC.

An Automatic Coding System of Korean Standard Industry/Occupation Code Using Example-based Learning (예제기반의 학습을 이용한 한국어 표준 산업/직업 자동 코딩 시스템)

  • Lim Heui-Seok
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.4
    • /
    • pp.169-179
    • /
    • 2005
  • Standard industry and occupation code are usually assigned manually in Korean census. The manual coding is very labor intensive and expensive task. Furthermore, inconsistent coding is resulted from the ability of human experts and their working environments. This paper proposes an automatic code classification system which converts natural language responses on survey questionnaires into corresponding numeric codes by using manually constructed rule base and example-based machine learning. The system was trained with 400,000 records of which standard codes was assigned. It was evaluated with 10-fold cross validation and was tested with three code sets: population occupation set, industry set, and industry survey set. The proposed system showed 76.63%, 82.24 and 99.68% accuracy for each code set.

  • PDF

Using rough set to support arbitrage box spread strategies in KOSPI 200 option markets (러프 집합을 이용한 코스피 200 주가지수옵션 시장에서의 박스스프레드 전략 실증분석 및 거래 전략)

  • Kim, Min-Sik;Oh, Kyong-Joo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.1
    • /
    • pp.37-47
    • /
    • 2011
  • Stock price index option market has various investment strategies that have been developed. Specially, arbitrage strategies are very important to be efficient in option market. The purpose of this study is to improve profit using rough set and Box spread by using past option trading data. Option trading data was based on an actual stock exchange market tick data ranging from 2001 to 2006. Validation process was carried out by transferring the tick data into one-minute intervals. Box spread arbitrage strategies is low risk but low profit. It can be accomplished by back-testing of the existing strategy of the past data and by using rough set, which limit the time line of dealing. This study can make more stable profits with lower risk if control the strategy that can produces a higher profit module compared to that of the same level of risk.