• 제목/요약/키워드: Validation Set

검색결과 679건 처리시간 0.026초

Logistic Regression Method in Interval-Censored Data

  • Yun, Eun-Young;Kim, Jin-Mi;Ki, Choong-Rak
    • 응용통계연구
    • /
    • 제24권5호
    • /
    • pp.871-881
    • /
    • 2011
  • In this paper we propose a logistic regression method to estimate the survival function and the median survival time in interval-censored data. The proposed method is motivated by the data augmentation technique with no sacrifice in augmenting data. In addition, we develop a cross validation criterion to determine the size of data augmentation. We compare the proposed estimator with other existing methods such as the parametric method, the single point imputation method, and the nonparametric maximum likelihood estimator through extensive numerical studies to show that the proposed estimator performs better than others in the sense of the mean squared error. An illustrative example based on a real data set is given.

Fixed size LS-SVM for multiclassification problems of large data sets

  • Hwang, Hyung-Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • 제21권3호
    • /
    • pp.561-567
    • /
    • 2010
  • Multiclassification is typically performed using voting scheme methods based on combining a set of binary classifications. In this paper we use multiclassification method with a hat matrix of least squares support vector machine (LS-SVM), which can be regarded as the revised one-against-all method. To tackle multiclass problems for large data, we use the $Nystr\ddot{o}m$ approximation and the quadratic Renyi entropy with estimation in the primal space such as used in xed size LS-SVM. For the selection of hyperparameters, generalized cross validation techniques are employed. Experimental results are then presented to indicate the performance of the proposed procedure.

Development of Attitude Constraints for Real-time Attitude Determination System using GPS carrier phase

  • Jang, Jae-Gyu;Kee, Chang-Don
    • International Journal of Aeronautical and Space Sciences
    • /
    • 제6권2호
    • /
    • pp.17-22
    • /
    • 2005
  • As one of validation tool for attitude determination system, we have used various constraints using priori information which is known through base vector set up. However these conventional constraints cannot guarantee validity in terms of final solutions such as Euler angle. So we suggest attitude boundary concept to verify the final attitude solution on the flying airplane, it is based on the combination of velocity based attitude estimation technique and ambiguity resolution. we can say it can check invalid solution effectively at just one epoch without repeatability test of resolved cycle ambiguity. In this paper we show that the suggested constraint can effectively reject incorrectly resolved cycle ambiguity the conventional constraints have missed.

AEGIS: AN ADVANCED LATTICE PHYSICS CODE FOR LIGHT WATER REACTOR ANALYSES

  • Yamamoto, Akio;Endo, Tomohiro;Tabuchi, Masato;Sugimura, Naoki;Ushio, Tadashi;Mori, Masaaki;Tatsumi, Masahiro;Ohoka, Yasunori
    • Nuclear Engineering and Technology
    • /
    • 제42권5호
    • /
    • pp.500-519
    • /
    • 2010
  • AEGIS is a lattice physics code incorporating the latest advances in lattice physics computation, innovative calculation models and efficient numerical algorithms and is mainly used for light water reactor analyses. Though the primary objective of the AEGIS code is the preparation of a cross section set for SCOPE2 that is a three-dimensional pin-by-pin core analysis code, the AEGIS code can handle not only a fuel assembly but also multi-assemblies and a whole core geometry in two-dimensional geometry. The present paper summarizes the major calculation models and part of the verification/validation efforts related to the AEGIS code.

휴대용 근적외선 분광분석기를 이용한 비침투 혈당 측정 (Non-invasive Blood Glucose Measurement by a Portable Near Infrared (NIR) System)

  • 강나루;우영아;차봉수;이현철;김효진
    • 약학회지
    • /
    • 제46권5호
    • /
    • pp.331-336
    • /
    • 2002
  • The purpose of this study is to develop a non-invasive blood glucose measurement method by a portable near infrared (NIR) system which was newly integrated by our lab. The portable NIR system includes a tungsten halogen lamp, a specialized reflectance fiber optic probe and a photo diode array type InGaAs detector; which was developed by a microchip technology based on the lithography. Reflectance NIR spectra of different parts of human body (finger tip, earlobe, and inner lip) were recorded by using a fiber optic probe. The spectra were collected over the spectral range 1100 ∼ 1740 nm. Partial least squares regression (PLSR) was applied for the calibration and validation for the determination of blood glucose. The calibration model from earlobe spectra presented better results, showing good correlation with a glucose oxidase method which is a mostly used standard method. This model predicted the glucose concentration for validation set with a SEP of 33 mg/dL. This study indicated the feasibility for non-invasive monitoring of blood glucose by a portable near infrared system.

Consensus Clustering for Time Course Gene Expression Microarray Data

  • Kim, Seo-Young;Bae, Jong-Sung
    • Communications for Statistical Applications and Methods
    • /
    • 제12권2호
    • /
    • pp.335-348
    • /
    • 2005
  • The rapid development of microarray technologies enabled the monitoring of expression levels of thousands of genes simultaneously. Recently, the time course gene expression data are often measured to study dynamic biological systems and gene regulatory networks. For the data, biologists are attempting to group genes based on the temporal pattern of their expression levels. We apply the consensus clustering algorithm to a time course gene expression data in order to infer statistically meaningful information from the measurements. We evaluate each of consensus clustering and existing clustering methods with various validation measures. In this paper, we consider hierarchical clustering and Diana of existing methods, and consensus clustering with hierarchical clustering, Diana and mixed hierachical and Diana methods and evaluate their performances on a real micro array data set and two simulated data sets.

A Study on the Prediction of Community Smart Pension Intention Based on Decision Tree Algorithm

  • Liu, Lijuan;Min, Byung-Won
    • International Journal of Contents
    • /
    • 제17권4호
    • /
    • pp.79-90
    • /
    • 2021
  • With the deepening of population aging, pension has become an urgent problem in most countries. Community smart pension can effectively resolve the problem of traditional pension, as well as meet the personalized and multi-level needs of the elderly. To predict the pension intention of the elderly in the community more accurately, this paper uses the decision tree classification method to classify the pension data. After missing value processing, normalization, discretization and data specification, the discretized sample data set is obtained. Then, by comparing the information gain and information gain rate of sample data features, the feature ranking is determined, and the C4.5 decision tree model is established. The model performs well in accuracy, precision, recall, AUC and other indicators under the condition of 10-fold cross-validation, and the precision was 89.5%, which can provide the certain basis for government decision-making.

딥러닝을 이용한 벼 도복 면적 추정 (Estimation of the Lodging Area in Rice Using Deep Learning)

  • 반호영;백재경;상완규;김준환;서명철
    • 한국작물학회지
    • /
    • 제66권2호
    • /
    • pp.105-111
    • /
    • 2021
  • 해마다, 강한 바람을 동반한 태풍 및 집중호우로 인해 벼도복이 발생하고 있으며, 이삭이 여무는 등숙기에 도복으로 인한 수발아와 관련된 피해를 발생시키고 있다. 따라서,신속한 피해 대응을 위해 신속한 벼 도복 피해 면적 산정은 필수적이다. 벼 도복과 관련된 이미지들은 도복이 발생된 김제, 부안, 군산일대에서 드론을 이용하여 수집하였고, 수집한 이미지들을 128 × 128 픽셀로 분할하였다. 벼 도복을 예측하기 위해 이미지 기반 딥 러닝 모델인 CNN을 이용하였다. 분할한 이미지들은 도복 이미지(lodging)와 정상 이미지(non-lodging) 2가지로 라벨로 분류하였고, 자료들은 학습을 위한 training-set과 검증을 위한 vali-se을 8:2의 비율로 구분하였다. CNN의 층을 간단하게 구성하여, 3개의 optimizer (Adam, Rmsprop, and SGD)로 모델을 학습하였다. 벼 도복 면적 평가는 training-set과 vali-set에 포함되지 않은 자료를 이용하였으며, 이미지들을 methshape 프로그램으로 전체 농지로 결합하여 총 3개의 농지를 평가하였다. 도복 면적 추정은 필지 전체의 이미지를 모델의 학습 입력 크기(128 × 128)로 분할하여 학습된 CNN 모델로 각각 예측한 후, 전체 분할 이미지 개수 대비 도복 이미지 개수의 비율을 전체 농지의 면적에 곱하여 산정하였다. training-set과 vali-set에 대한 학습 결과, 3개의 optimizer 모두 학습이 진행됨에 따라 정확도가 높아졌으며, 0.919 이상의 높은 정확도를 보였다. 평가를 위한 3개의 농지에 대한 결과는 모든 optimizer에서 높은 정확도를 보였으며, Adam이 가장 높은 정확도를 보였다(RMSE: 52.80 m2, NRMSE: 2.73%). 따라서 딥 러닝을 이용하여 신속하게 벼 도복 면적을 추정할 수 있을 것으로 예상된다.

온도와 일장에 따른 국화의 식물계절과 출엽 예측 모델 개발 (Modelling the Effects of Temperature and Photoperiod on Phenology and Leaf Appearance in Chrysanthemum)

  • 서범석;박하승;이규종;최덕환;이변우
    • 한국농림기상학회지
    • /
    • 제18권4호
    • /
    • pp.253-263
    • /
    • 2016
  • 단일식물인 국화의 생육은 온도, 일장, 일사량 등 기상 환경과 재배관리 조건에 영향을 받는다. 기상환경과 재배관리를 고려한 국화의 생육예측모델은 국화 재배 시 의사결정을 위한 도구로 이용될 수 있을 것이다. 이 연구에서는 국화 생육모델 구축을 위한 기초 작업으로 온도와 일장뿐만 아니라 에세폰 처리, 야간 전조처리리 등 재배관리 정보를 입력변수로 하여 국화 품종 백선의 출엽과 식물계절을 예측할 수 있는 모델을 개발하였다. 모델은 국화의 생육시기를 유약기(Juvenile phase), 유약기 이후 발뢰기까지 기간, 발뢰기부터 개화기까지의 기간의 기간으로 구분하여 계산을 하도록 구성하였다. 유약기는 출엽속도의 온도 반응 곡선과 유약기의 종료를 결정하는 기준 엽수를 이용하여 추정하도록 구성 하였다. 한편 모주와 이식 후 식물체에 대한 에세폰 처리가 유약기 종료시점의 엽수를 증가시키는 것으로 가정하여 모델에 반영하였다. 유약기 이후에는 온도와 일장에 관한 함수를 이용하여 발육속도를 계산하여 발뢰기와 개화기를 예측하도록 하였는데 야간전조 처리는 임계일장 이상의 장일로 가정하여 모델에 반영하였다. 그리고 최종 엽수는 잎의 출엽이 발뢰 직전까지 진행되는 것으로 가정하여 예측하였다. 위와 같이 구성된 모델의 계수는 온도반응 실험과 정식시기 실험 등을 이용하여 추정하였고 프로그램 언어인 Java를 이용하여 구현하였다. 모델의 계수 추정에 이용한 자료(calibration 자료)뿐만 아니라 이와는 별개의 자료(validation 자료) 모두에 대하여 모델이 비교적 정확하게 발뢰기와 개화기를 예측할 뿐만 calibration에 비하여 validation의 정확도가 떨어지지 않았다. 한편 생육시기에 따른 출엽수와 최종엽수를 모델이 비교적 정확하게 예측하였으나 생육시기에 따라 다소 과소 또는 과대 예측을 하여는 경향을 보여, 온도 이외의 요인을 반영할 수 있는 실험을 통해 개선할 필요가 있는 것으로 판단되었다.

Cutoff Values for Diagnosing Hepatic Steatosis Using Contemporary MRI-Proton Density Fat Fraction Measuring Methods

  • Sohee Park;Jae Hyun Kwon;So Yeon Kim;Ji Hun Kang;Jung Il Chung;Jong Keon Jang;Hye Young Jang;Ju Hyun Shim;Seung Soo Lee;Kyoung Won Kim;Gi-Won Song
    • Korean Journal of Radiology
    • /
    • 제23권12호
    • /
    • pp.1260-1268
    • /
    • 2022
  • Objective: To propose standardized MRI-proton density fat fraction (PDFF) cutoff values for diagnosing hepatic steatosis, evaluated using contemporary PDFF measuring methods in a large population of healthy adults, using histologic fat fraction (HFF) as the reference standard. Materials and Methods: A retrospective search of electronic medical records between 2015 and 2018 identified 1063 adult donor candidates for liver transplantation who had undergone liver MRI and liver biopsy within a 7-day interval. Patients with a history of liver disease or significant alcohol consumption were excluded. Chemical shift imaging-based MRI (CS-MRI) PDFF and high-speed T2-corrected multi-echo MR spectroscopy (HISTO-MRS) PDFF data were obtained. By temporal splitting, the total population was divided into development and validation sets. Receiver operating characteristic (ROC) analysis was performed to evaluate the diagnostic performance of the MRI-PDFF method. Two cutoff values with sensitivity > 90% and specificity > 90% were selected to rule-out and rule-in, respectively, hepatic steatosis with reference to HFF ≥ 5% in the development set. The diagnostic performance was assessed using the validation set. Results: Of 921 final participants (624 male; mean age ± standard deviation, 31.5 ± 9.0 years), the development and validation sets comprised 497 and 424 patients, respectively. In the development set, the areas under the ROC curve for diagnosing hepatic steatosis were 0.920 for CS-MRI-PDFF and 0.915 for HISTO-MRS-PDFF. For ruling-out hepatic steatosis, the CS-MRI-PDFF cutoff was 2.3% (sensitivity, 92.4%; specificity, 63.0%) and the HISTO-MRI-PDFF cutoff was 2.6% (sensitivity, 88.8%; specificity, 70.1%). For ruling-in hepatic steatosis, the CS-MRI-PDFF cutoff was 3.5% (sensitivity, 73.5%; specificity, 88.6%) and the HISTO-MRI-PDFF cutoff was 4.0% (sensitivity, 74.7%; specificity, 90.6%). Conclusion: In a large population of healthy adults, our study suggests diagnostic thresholds for ruling-out and ruling-in hepatic steatosis defined as HFF ≥ 5% by contemporary PDFF measurement methods.