• 제목/요약/키워드: Model validation

검색결과 3,278건 처리시간 0.029초

Feature Selection with Ensemble Learning for Prostate Cancer Prediction from Gene Expression

  • Abass, Yusuf Aleshinloye;Adeshina, Steve A.
    • International Journal of Computer Science & Network Security
    • /
    • 제21권12spc호
    • /
    • pp.526-538
    • /
    • 2021
  • Machine and deep learning-based models are emerging techniques that are being used to address prediction problems in biomedical data analysis. DNA sequence prediction is a critical problem that has attracted a great deal of attention in the biomedical domain. Machine and deep learning-based models have been shown to provide more accurate results when compared to conventional regression-based models. The prediction of the gene sequence that leads to cancerous diseases, such as prostate cancer, is crucial. Identifying the most important features in a gene sequence is a challenging task. Extracting the components of the gene sequence that can provide an insight into the types of mutation in the gene is of great importance as it will lead to effective drug design and the promotion of the new concept of personalised medicine. In this work, we extracted the exons in the prostate gene sequences that were used in the experiment. We built a Deep Neural Network (DNN) and Bi-directional Long-Short Term Memory (Bi-LSTM) model using a k-mer encoding for the DNA sequence and one-hot encoding for the class label. The models were evaluated using different classification metrics. Our experimental results show that DNN model prediction offers a training accuracy of 99 percent and validation accuracy of 96 percent. The bi-LSTM model also has a training accuracy of 95 percent and validation accuracy of 91 percent.

Development and validation of diffusion based CFD model for modelling of hydrogen and carbon monoxide recombination in passive autocatalytic recombiner

  • Bhuvaneshwar Gera;Vishnu Verma;Jayanta Chattopadhyay
    • Nuclear Engineering and Technology
    • /
    • 제55권9호
    • /
    • pp.3194-3201
    • /
    • 2023
  • In water-cooled power reactor, hydrogen is generated in case of steam zirconium reaction during severe accident condition and later on in addition to hydrogen; CO is also generated during molten corium concrete interaction after reactor pressure vessel failure. Passive Autocatalytic Recombiners (PARs) are provided in the containment for hydrogen management. The performance of the PARs in presence of hydrogen and carbon monoxide along with air has been evaluated. Depending on the conditions, CO may either react with oxygen to form carbon dioxide (CO2) or act as catalyst poison, reducing the catalyst activity and hence the hydrogen conversion efficiency. CFD analysis has been carried out to determine the effect of CO on catalyst plate temperature for 2 & 4% v/v H2 and 1-4% v/v CO with air at the recombiner inlet for a reported experiment. The results of CFD simulations have been compared with the reported experimental data for the model validation. The reaction at the recombiner plate is modelled based on diffusion theory. The developed CFD model has been used to predict the maximum catalyst temperature and outlet species concentration for different inlet velocity and temperatures of the mixture gas. The obtained results were used to fit a correlation for obtaining removal rate of carbon monoxide inside PAR as a function of inlet velocity and concentrations.

Simulation and validation of flash flood in the head-water catchments of the Geum river basin

  • Duong, Ngoc Tien;Kim, Jeong Bae;Bae, Deg-Hyo
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2021년도 학술발표회
    • /
    • pp.138-138
    • /
    • 2021
  • Flash floods are one of the types of natural hazards which has severe consequences. Flash floods cause high mortality, about 5,000 deaths a year worldwide. Flash floods usually occur in mountainous areas in conditions where the soil is highly saturated and also when heavy rainfall happens in a short period of time. The magnitude of a flash flood depends on several natural and human factors, including: rainfall duration and intensity, antecedent soil moisture conditions, land cover, soil type, watershed characteristics, land use. Among these rainfall intensity and antecedent soil moisture, play the most important roles, respectively. Flash Flood Guidance is the amount of rainfall of a given duration over a small stream basin needed to create minor flooding (bank-full) conditions at the outlet of the stream basin. In this study, the Sejong University Rainfall-Runoff model (SURR model) was used to calculate soil moisture along with FFG in order to identify flash flood events for the Geum basin. The division of Geum river basin led to 177 head-water catchments, with an average of 38 km2. the soil moisture of head-water catchments is considered the same as sub-basin. The study has measured the threshold of flash flood generation by GIUH method. Finally, the flash flood events were used for verification of FFG. The results of the validation of seven past independent events of flash flood events are very satisfying.

  • PDF

오염원 산정단위 수준의 소유역 세분화를 고려한 새만금유역 수문·수질모델링 적용성 검토 (Developing Surface Water Quality Modeling Framework Considering Spatial Resolution of Pollutant Load Estimation for Saemangeum Using HSPF)

  • 성충현;황세운;오찬성;조재필
    • 한국농공학회논문집
    • /
    • 제59권3호
    • /
    • pp.83-96
    • /
    • 2017
  • This study presented a surface water quality modeling framework considering the spatial resolution of pollutant load estimation to better represent stream water quality characteristics in the Saemangeum watershed which has been focused on keeping its water resources sustainable after the Saemangeum embankment construction. The watershed delineated into 804 sub-watersheds in total based on the administrative districts, which were units for pollutant load estimation and counted as 739 in the watershed, Digital Elevation Model (DEM), and agricultural structures such as drainage canal. The established model consists of 7 Mangyung (MG) sub-models, 7 Dongjin (DJ) sub-models, and 3 Reclaimed sub-models, and the sub-models were simulated in a sequence of upstream to downstream based on its connectivity. The hydrologic calibration and validation of the model were conducted from 14 flow stations for the period of 2009 and 2013 using an automatic calibration scheme. The model performance to the hydrologic stations for calibration and validation showed that the Nash-Sutcliffe coefficient (NSE) ranged from 0.66 to 0.97, PBIAS were -31.0~16.5 %, and $R^2$ were from 0.75 to 0.98, respectively in a monthly time step and therefore, the model showed its hydrological applicability to the watershed. The water quality calibration and validation were conducted based on the 29 stations with the water quality constituents of DO, BOD, TN, and TP during the same period with the flow. The water quality model were manually calibrated, and generally showed an applicability by resulting reasonable variability and seasonality, although some exceptional simulation results were identified in some upstream stations under low-flow conditions. The spatial subdivision in the model framework were compared with previous studies to assess the consideration of administrative boundaries for watershed delineation, and this study outperformed in flow, but showed a similar level of model performance in water quality. The framework presented here can be applicable in a regional scale watershed as well as in a need of fine-resolution simulation.

강우-유출 모의를 위한 개념적 모형과 기계학습 모형의 성능 비교 (A comparative study of conceptual model and machine learning model for rainfall-runoff simulation)

  • 이승철;김대하
    • 한국수자원학회논문집
    • /
    • 제56권9호
    • /
    • pp.563-574
    • /
    • 2023
  • 최근 기후변화로 인해 유역의 기상자료에 대한 반응이 달라지고 있어 강우-유출 모의에 대한 연구는 중요해지고 있다. 아울러 최근 기계학습 기법에 대한 높은 관심으로 이를 통한 강우-유출 모의 역시 활발하게 증가하고 있으나 기계학습 모형이 전통적으로 사용되어온 개념적 모형에 비해 활용성이 높은지는 아직 확실치 않다. 본 연구에서는 개념적 모형인 GR6J와 기계학습 모형인 Random Forest 성능을 한국 전역의 38개 계측 유역에 대해 계측 유역 예측기법과 미계측 유역 예측기법을 이용해 평가하였다. 먼저 계측 유역 적용기법 평가를 위해 각 모형을 관측 일 유량자료에 학습시키고 분리된 평가기간에 대한 모의성능을 비교하였다. 이후 미계측 유역 모의성능 평가를 위해 인접성 기반 지역화 방법을 Leave-One-Out Cross-Validation (LOOCV)을 이용해 평가하였다. 그 결과 계측 유역 평가에서는 Random Forest 기법이 GR6J 모형보다 일관되게 높은 성능을 보였다. 학습된 데이터를 출력 값으로 재생산하도록 구조화되어 있는 기계학습 기법이 개념적 이론을 통한 모형보다 높은 재현성을 갖기 때문으로 판단된다. 하지만 Random Forest 모형의 성능은 미계측 유역의 예측기법으로는 재현되지 않았고 GR6J 모형보다 성능이 더 낮은 것이 확인되었다. 본 연구는 기계학습 모형은 계측 유역의 유출예측에는 적용성이 높을 수 있으나 미계측 유역에 대한 적용가능성은 전통적인 개념적 모형보다 낮을 수 있음을 제시한다.

다목적함수를 이용한 PDM 모형의 유량 분석 (Prediction of Stream Flow on Probability Distributed Model using Multi-objective Function)

  • 안상억;이효상;전민우
    • 한국방재학회 논문집
    • /
    • 제9권5호
    • /
    • pp.93-102
    • /
    • 2009
  • 본 연구는 미호천 유역을 대상으로 유량곡선의 세부적인 특성을 고려한 다목적함수를 적용하여 Probability Distribution Model(PDM) 모형의 유량모의성능을 검토하였다. PDM은 유역을 한 개의 단위구역으로 개념화한 집중형 강우유출모형으로 영국의 지역화 연구 및 홍수량 산정방법에 대표적으로 이용되고 있다. PDM 모형의 5개 매개변수를 Monte Carlo 방법에 기반을 둔 분석도구(MCAT, Monte Carlo Analysis Toolkit)를 활용하여 사후검정분포, 검정근거 및 민감도 분석 등을 수행하였으며, 모형의 매개변수 중 cmax와 k(q)만이 뚜렷한 검정 근거가 있고 나머지 변수들은 동등성의 영향을 확인하였다. 또한, 유량곡선의 고유량 및 저유량의 특성을 맞춘 목적함수의 Trade-off를 고려한 매개변수의 파레토 최적해를 산정한 결과, 모든 목적에 최대한 부합하는 유량 산정의 가능성을 제시하였다. 검정(calibration)기간에서 NS*E=0.035, FSB=0.161, FDBH= 0.809로 안정적이며 만족할만한 모의성능을 나타내었고, 검증(validation)기간에 대해서도 안정적인 모의성능을 나타내었다.

KC-100 항공기 무인화를 위한 운동모델 구축 및 검증 (Development and Validation of Dynamic Model for KC-100 UAS)

  • 김성현;김지본;이정훈;김응태;김병수
    • 항공우주시스템공학회지
    • /
    • 제17권1호
    • /
    • pp.79-87
    • /
    • 2023
  • 항공기의 비행 제어법칙을 설계하기 위해서는 정확한 항공기 운동모델이 요구되며, 이를 구축하기 위한 공력 데이터베이스(DB)를 획득하는 데는 일반적으로 많은 횟수의 풍동시험을 수행해야 한다. 그러나 유인 항공기의 무인화 과정과 같이, 대상 항공기의 비행시험 데이터가 존재하는 경우, 파라미터 추정기법과 DB튜닝 절차를 통해 항공기 운동모델을 획득할 수 있다. 본 논문에서는 KC-100 항공기의 무인화를 위한 비선형 모델 구축 과정 및 검증 방법에 관해 기술한다. 추정기법 적용의 적합성을 판단하는 비행데이터 유효성 분석 과정과 최대공산 추정법을 사용한 선형모델 추정, 그 결과를 활용하는 공력 DB튜닝 과정과 이를 통해 최종 구축된 비선형 모델에 FFS 기준을 적용한 검증 결과를 제시한다.

Deep learning for the classification of cervical maturation degree and pubertal growth spurts: A pilot study

  • Mohammad-Rahimi, Hossein;Motamadian, Saeed Reza;Nadimi, Mohadeseh;Hassanzadeh-Samani, Sahel;Minabi, Mohammad A. S.;Mahmoudinia, Erfan;Lee, Victor Y.;Rohban, Mohammad Hossein
    • 대한치과교정학회지
    • /
    • 제52권2호
    • /
    • pp.112-122
    • /
    • 2022
  • Objective: This study aimed to present and evaluate a new deep learning model for determining cervical vertebral maturation (CVM) degree and growth spurts by analyzing lateral cephalometric radiographs. Methods: The study sample included 890 cephalograms. The images were classified into six cervical stages independently by two orthodontists. The images were also categorized into three degrees on the basis of the growth spurt: pre-pubertal, growth spurt, and post-pubertal. Subsequently, the samples were fed to a transfer learning model implemented using the Python programming language and PyTorch library. In the last step, the test set of cephalograms was randomly coded and provided to two new orthodontists in order to compare their diagnosis to the artificial intelligence (AI) model's performance using weighted kappa and Cohen's kappa statistical analyses. Results: The model's validation and test accuracy for the six-class CVM diagnosis were 62.63% and 61.62%, respectively. Moreover, the model's validation and test accuracy for the three-class classification were 75.76% and 82.83%, respectively. Furthermore, substantial agreements were observed between the two orthodontists as well as one of them and the AI model. Conclusions: The newly developed AI model had reasonable accuracy in detecting the CVM stage and high reliability in detecting the pubertal stage. However, its accuracy was still less than that of human observers. With further improvements in data quality, this model should be able to provide practical assistance to practicing dentists in the future.

Clinical Validation of a Deep Learning-Based Hybrid (Greulich-Pyle and Modified Tanner-Whitehouse) Method for Bone Age Assessment

  • Kyu-Chong Lee;Kee-Hyoung Lee;Chang Ho Kang;Kyung-Sik Ahn;Lindsey Yoojin Chung;Jae-Joon Lee;Suk Joo Hong;Baek Hyun Kim;Euddeum Shim
    • Korean Journal of Radiology
    • /
    • 제22권12호
    • /
    • pp.2017-2025
    • /
    • 2021
  • Objective: To evaluate the accuracy and clinical efficacy of a hybrid Greulich-Pyle (GP) and modified Tanner-Whitehouse (TW) artificial intelligence (AI) model for bone age assessment. Materials and Methods: A deep learning-based model was trained on an open dataset of multiple ethnicities. A total of 102 hand radiographs (51 male and 51 female; mean age ± standard deviation = 10.95 ± 2.37 years) from a single institution were selected for external validation. Three human experts performed bone age assessments based on the GP atlas to develop a reference standard. Two study radiologists performed bone age assessments with and without AI model assistance in two separate sessions, for which the reading time was recorded. The performance of the AI software was assessed by comparing the mean absolute difference between the AI-calculated bone age and the reference standard. The reading time was compared between reading with and without AI using a paired t test. Furthermore, the reliability between the two study radiologists' bone age assessments was assessed using intraclass correlation coefficients (ICCs), and the results were compared between reading with and without AI. Results: The bone ages assessed by the experts and the AI model were not significantly different (11.39 ± 2.74 years and 11.35 ± 2.76 years, respectively, p = 0.31). The mean absolute difference was 0.39 years (95% confidence interval, 0.33-0.45 years) between the automated AI assessment and the reference standard. The mean reading time of the two study radiologists was reduced from 54.29 to 35.37 seconds with AI model assistance (p < 0.001). The ICC of the two study radiologists slightly increased with AI model assistance (from 0.945 to 0.990). Conclusion: The proposed AI model was accurate for assessing bone age. Furthermore, this model appeared to enhance the clinical efficacy by reducing the reading time and improving the inter-observer reliability.

Machine Learning Prediction for the Recurrence After Electrical Cardioversion of Patients With Persistent Atrial Fibrillation

  • Soonil Kwon;Eunjung Lee;Hojin Ju;Hyo-Jeong Ahn;So-Ryoung Lee;Eue-Keun Choi;Jangwon Suh;Seil Oh;Wonjong Rhee
    • Korean Circulation Journal
    • /
    • 제53권10호
    • /
    • pp.677-689
    • /
    • 2023
  • Background and Objectives: There is limited evidence regarding machine-learning prediction for the recurrence of atrial fibrillation (AF) after electrical cardioversion (ECV). This study aimed to predict the recurrence of AF after ECV using machine learning of clinical features and electrocardiograms (ECGs) in persistent AF patients. Methods: We analyzed patients who underwent successful ECV for persistent AF. Machine learning was designed to predict patients with 1-month recurrence. Individual 12-lead ECGs were collected before and after ECV. Various clinical features were collected and trained the extreme gradient boost (XGBoost)-based model. Ten-fold cross-validation was used to evaluate the performance of the model. The performance was compared to the C-statistics of the selected clinical features. Results: Among 718 patients (mean age 63.5±9.3 years, men 78.8%), AF recurred in 435 (60.6%) patients after 1 month. With the XGBoost-based model, the areas under the receiver operating characteristic curves (AUROCs) were 0.57, 0.60, and 0.63 if the model was trained by clinical features, ECGs, and both (the final model), respectively. For the final model, the sensitivity, specificity, and F1-score were 84.7%, 28.2%, and 0.73, respectively. Although the AF duration showed the best predictive performance (AUROC, 0.58) among the clinical features, it was significantly lower than that of the final machine-learning model (p<0.001). Additional training of extended monitoring data of 15-minute single-lead ECG and photoplethysmography in available patients (n=261) did not significantly improve the model's performance. Conclusions: Machine learning showed modest performance in predicting AF recurrence after ECV in persistent AF patients, warranting further validation studies.