• Title/Summary/Keyword: Validation data set

Search Result 379, Processing Time 0.03 seconds

Development of a Risk Index for Prediction of Abnormal Pap Test Results in Serbia

  • Vukovic, Dejana;Antic, Ljiljana;Vasiljevic, Mladenko;Antic, Dragan;Matejic, Bojana
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.8
    • /
    • pp.3527-3531
    • /
    • 2015
  • Background: Serbia is one of the countries with highest incidence and mortality rates for cervical cancer in Central and South Eastern Europe. Introducing a risk index could provide a powerful means for targeting groups at high likelihood of having an abnormal cervical smear and increase efficiency of screening. The aim of the present study was to create and assess validity ofa index for prediction of an abnormal Pap test result. Materials and Methods: The study population was drawn from patients attending Departments for Women's Health in two primary health care centers in Serbia. Out of 525 respondents 350 were randomly selected and data obtained from them were used as the index creation dataset. Data obtained from the remaining 175 were used as an index validation data set. Results: Age at first intercourse under 18, more than 4 sexual partners, history of STD and multiparity were attributed statistical weights 16, 15, 14 and 13, respectively. The distribution of index scores in index-creation data set showed that most respondents had a score 0 (54.9%). In the index-creation dataset mean index score was 10.3 (SD-13.8), and in the validation dataset the mean was 9.1 (SD=13.2). Conclusions: The advantage of such scoring system is that it is simple, consisting of only four elements, so it could be applied to identify women with high risk for cervical cancer that would be referred for further examination.

Numerical Prediction of Ship Motions in Wave using RANS Method (RANS 방법을 이용한 파랑 중 선박운동 해석)

  • Park, Il-Ryong;Kim, Jin;Kim, Yoo-Chul;Kim, Kwang-Soo;Van, Suak-Ho;Suh, Sung-Bu
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.50 no.4
    • /
    • pp.232-239
    • /
    • 2013
  • This paper provides the structure of a Reynolds Averaged Navier-Stokes(RANS) based simulation method and its validation results for the ship motion problem. The motion information of the hull computed from the equations of motion is considered in the momentum equations as the relative fluid motions with respect to a non-inertial coordinates system. A finite volume method is used to solve the governing equations, while the free surface is captured by using a two-phase level-set method and the realizable k-${\varepsilon}$ model is used for turbulence closure. For the validation of the present numerical approach, the numerical results of the resistance and motion tests for DTMB 5415 at two ship speeds are compared against available experimental data.

Extraction of Potential Area for Block Stream and Talus Using Spatial Integration Model (공간통합 모델을 적용한 암괴류 및 애추 지형 분포가능지 추출)

  • Lee, Seong-Ho;JANG, Dong-Ho
    • Journal of The Geomorphological Association of Korea
    • /
    • v.26 no.2
    • /
    • pp.1-14
    • /
    • 2019
  • This study analyzed the relativity between block stream and talus distributions by employing a likelihood ratio approach. Possible distribution sites for each debris slope landform were extracted by applying a spatial integration model, in which we combined fuzzy set model, Bayesian predictive model, and logistic regression model. Moreover, to verify model performance, a success rate curve was prepared by cross-validation. The results showed that elevation, slope, curvature, topographic wetness index, geology, soil drainage, and soil depth were closely related to the debris slope landform sites. In addition, all spatial integration models displayed an accuracy of over 90%. The accuracy of the distribution potential area map of the block stream was highest in the logistic regression model (93.79%). Eventually, the accuracy of the distribution potential area map of the talus was also highest in the logistic regression model (97.02%). We expect that the present results will provide essential data and propose methodologies to improve the performance of efficient and systematic micro-landform studies. Moreover, our research will potentially help to enhance field research and topographic resource management.

Computation of Unsteady Flows over an Oscillating airfoil (진동하는 익형을 지나는 비정상 유동에 관한 계산)

  • Yang C. M.;Baek J. H.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 1999.05a
    • /
    • pp.125-130
    • /
    • 1999
  • A flowfields around a NACA0012 airfoil pitching about a 1/4 chord and plunging in vertical displacement are analyzed by solving two-dimensional compressible Navier-Stokes equations. A steady solution was solved first as a validation of the code used and the results were compared with experimental data. Then as a unsteady case, the oscillatory airfoil was solved to compare the results with experimental data. Oscillating rate of pitching and plunging motion was set to have analogy and the magnitude of plunging was set using the magnitude of pitching angle of attack. Finally combined pitching and plunging motion was solved to show the effect of 2 different types of oscillating motion of the airfoil.

  • PDF

Development of Windows forensic tool for verifying a set of data (윈도우 포렌식 도구의 검증용 데이터 세트의 개발)

  • Kim, Min-Seo;Lee, Sang-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.6
    • /
    • pp.1421-1433
    • /
    • 2015
  • For an accurate analysis through the forensic of digital devices and computer, it is a very important validation of the reliability of digital forensic tools. To verify the reliability of the tool, it is necessary to research and development of the data set to be input to the tool. In many-used Windows operating system of the computer, there is a Window forensic artifacts associated with time and system behavior. In this paper, we developed a set of data in the Windows operating system to be able to analyze all of the two Windows artifacts and we conducted a test with published digital forensic tools. Therefore, the developed data set presents the use of the following method. First, artefacts education for growing ability can be analyzed acts standards. Secondly, the purpose of tool tests for verifying the reliability of digital forensics. Lastly, recyclability for new artifact analysis.

A study of estimation and removal of baseline drift for the automated diagnosis of electrocardiogram (심전도 자동 진단을 위한 기저선 동요 평가 및 제거에 관한 연구)

  • 권혁제;이명호
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.7
    • /
    • pp.99-106
    • /
    • 1996
  • Estimation and removal procedures for baseline drift have been developed using linear, cubic spline, and bilineared transformed high pass filter. Linear and cubic spline interpolation with the PQ and TP segmens, which are considered to be isoelectric, as fiducial points ahve been estimated respectively. For a quantitative validation of the estimation procedure, 4 ECGs with arfificial baseline drift were constructed and analyzed by mean square error calculations and amplitude histograms. Also real ECGs were analyzed in a test set of the CSE data set 3 and set 4. Baseline drift detecton rule were designed and new method for the decision of fiducial point were constructed to avoid distorting as the case of premature ventricular or atrial contraction. From these comparison, proposed cubic spline method with PQ and TP segment (CS_PQ & TP) emerged as the most efficient method.

  • PDF

Prediction of Non-Genotoxic Carcinogenicity Based on Genetic Profiles of Short Term Exposure Assays

  • Perez, Luis Orlando;Gonzalez-Jose, Rolando;Garcia, Pilar Peral
    • Toxicological Research
    • /
    • v.32 no.4
    • /
    • pp.289-300
    • /
    • 2016
  • Non-genotoxic carcinogens are substances that induce tumorigenesis by non-mutagenic mechanisms and long term rodent bioassays are required to identify them. Recent studies have shown that transcription profiling can be applied to develop early identifiers for long term phenotypes. In this study, we used rat liver expression profiles from the NTP (National Toxicology Program, Research Triangle Park, USA) DrugMatrix Database to construct a gene classifier that can distinguish between non-genotoxic carcinogens and other chemicals. The model was based on short term exposure assays (3 days) and the training was limited to oxidative stressors, peroxisome proliferators and hormone modulators. Validation of the predictor was performed on independent toxicogenomic data (TG-GATEs, Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System, Osaka, Japan). To build our model we performed Random Forests together with a recursive elimination algorithm (VarSelRF). Gene set enrichment analysis was employed for functional interpretation. A total of 770 microarrays comprising 96 different compounds were analyzed and a predictor of 54 genes was built. Prediction accuracy was 0.85 in the training set, 0.87 in the test set and increased with increasing concentration in the validation set: 0.6 at low dose, 0.7 at medium doses and 0.81 at high doses. Pathway analysis revealed gene prominence of cellular respiration, energy production and lipoprotein metabolism. The biggest target of toxicogenomics is accurately predict the toxicity of unknown drugs. In this analysis, we presented a classifier that can predict non-genotoxic carcinogenicity by using short term exposure assays. In this approach, dose level is critical when evaluating chemicals at early time points.

Application of Text-Classification Based Machine Learning in Predicting Psychiatric Diagnosis (텍스트 분류 기반 기계학습의 정신과 진단 예측 적용)

  • Pak, Doohyun;Hwang, Mingyu;Lee, Minji;Woo, Sung-Il;Hahn, Sang-Woo;Lee, Yeon Jung;Hwang, Jaeuk
    • Korean Journal of Biological Psychiatry
    • /
    • v.27 no.1
    • /
    • pp.18-26
    • /
    • 2020
  • Objectives The aim was to find effective vectorization and classification models to predict a psychiatric diagnosis from text-based medical records. Methods Electronic medical records (n = 494) of present illness were collected retrospectively in inpatient admission notes with three diagnoses of major depressive disorder, type 1 bipolar disorder, and schizophrenia. Data were split into 400 training data and 94 independent validation data. Data were vectorized by two different models such as term frequency-inverse document frequency (TF-IDF) and Doc2vec. Machine learning models for classification including stochastic gradient descent, logistic regression, support vector classification, and deep learning (DL) were applied to predict three psychiatric diagnoses. Five-fold cross-validation was used to find an effective model. Metrics such as accuracy, precision, recall, and F1-score were measured for comparison between the models. Results Five-fold cross-validation in training data showed DL model with Doc2vec was the most effective model to predict the diagnosis (accuracy = 0.87, F1-score = 0.87). However, these metrics have been reduced in independent test data set with final working DL models (accuracy = 0.79, F1-score = 0.79), while the model of logistic regression and support vector machine with Doc2vec showed slightly better performance (accuracy = 0.80, F1-score = 0.80) than the DL models with Doc2vec and others with TF-IDF. Conclusions The current results suggest that the vectorization may have more impact on the performance of classification than the machine learning model. However, data set had a number of limitations including small sample size, imbalance among the category, and its generalizability. With this regard, the need for research with multi-sites and large samples is suggested to improve the machine learning models.

Modelling the Effects of Temperature and Photoperiod on Phenology and Leaf Appearance in Chrysanthemum (온도와 일장에 따른 국화의 식물계절과 출엽 예측 모델 개발)

  • Seo, Beom-Seok;Pak, Ha-Seung;Lee, Kyu-Jong;Choi, Doug-Hwan;Lee, Byun-Woo
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.18 no.4
    • /
    • pp.253-263
    • /
    • 2016
  • Chrysanthemum production would benefit from crop growth simulations, which would support decision-making in crop management. Chrysanthemum is a typical short day plant of which floral initiation and development is sensitive to photoperiod. We developed a model to predict phenological development and leaf appearance of chrysanthemum (cv. Baekseon) using daylength (including civil twilight period), air temperature, and management options like light interruption and ethylene treatment as predictor variables. Chrysanthemum development stage (DVS) was divided into juvenile (DVS=1.0), juvenile to budding (DVS=1.33), and budding to flowering (DVS=2.0) phases for which different strategies and variables were used to predict the development toward the end of each phenophase. The juvenile phase was assumed to be completed at a certain leaf number which was estimated as 15.5 and increased by ethylene application to the mother plant before cutting and the transplanted plant after cutting. After juvenile phase, development rate (DVR) before budding and flowering were calculated from temperature and day length response functions, and budding and flowering were completed when the integrated DVR reached 1.33 and 2.0, respectively. In addition the model assumed that leaf appearance terminates just before budding. This model predicted budding date, flowering date, and leaf appearance with acceptable accuracy and precision not only for the calibration data set but also for the validation data set which are independent of the calibration data set.

Using rough set to support arbitrage box spread strategies in KOSPI 200 option markets (러프 집합을 이용한 코스피 200 주가지수옵션 시장에서의 박스스프레드 전략 실증분석 및 거래 전략)

  • Kim, Min-Sik;Oh, Kyong-Joo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.1
    • /
    • pp.37-47
    • /
    • 2011
  • Stock price index option market has various investment strategies that have been developed. Specially, arbitrage strategies are very important to be efficient in option market. The purpose of this study is to improve profit using rough set and Box spread by using past option trading data. Option trading data was based on an actual stock exchange market tick data ranging from 2001 to 2006. Validation process was carried out by transferring the tick data into one-minute intervals. Box spread arbitrage strategies is low risk but low profit. It can be accomplished by back-testing of the existing strategy of the past data and by using rough set, which limit the time line of dealing. This study can make more stable profits with lower risk if control the strategy that can produces a higher profit module compared to that of the same level of risk.