• Title/Summary/Keyword: Bayesian model

Search Result 1,312, Processing Time 0.027 seconds

Mixture distribution based nonstationary frequency model using climate variables (기후 변수를 이용한 혼합분포 기반 비정상성 빈도 모델)

  • Choi, Hong-Geun;Kim, Jang-Gyeong;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.338-338
    • /
    • 2019
  • 설계강우량 산정시, 일반적으로 극치자료를 활용하여 정상성 가정하에 빈도해석을 수행하고 있다. 그러나 종종 정상성으로 가정했던 기존 극치강우자료가 정상성 빈도해석 모형에서 효과적으로 모델링되지 않는 비정상성 특성을 나타내고 있다. 또한, 대부분의 극치강우 분포는 해마다 다른 규모로 발생하는 홍수와 태풍 등의 강우요인으로 인해 두 개의 첨두를 갖는 혼합분포 형태를 보인다. 이에 본 연구에서는 혼합분포 기반 비정상성 빈도모델(mixture distribution based nonstationary frequency model, MDNF)을 제시하였다. 제안된 모형의 입력자료로 기후변수(e.g. SSTs and SLPs)를 사용하여 두 개의 분포형으로 구성되는 극치강우의 혼합비(mixing ratio)에 대한 영향을 분석하였으며, 극치강우 패턴이 특정 기후변수의 영향을 받는 것을 확인하였다. 최종적으로 Bayesian 기법을 MDNF 모형에 연계하여 각 첨두에 해당하는 분포형의 매개변수들에 대한 불확실성 구간을 정량적으로 제시하였다. 본 연구를 통해 강우 패턴의 변동은 설계 강우량 추정에 영향을 미치며, 특정 기후변수와 강우 패턴이 상관성을 가지는 것을 확인함으로써 합리적인 설계 강우량 산정을 위한 중요한 근거를 제공할 것으로 사료된다.

  • PDF

Computational analysis of SARS-CoV-2, SARS-CoV, and MERS-CoV genome using MEGA

  • Sohpal, Vipan Kumar
    • Genomics & Informatics
    • /
    • v.18 no.3
    • /
    • pp.30.1-30.7
    • /
    • 2020
  • The novel coronavirus pandemic that has originated from China and spread throughout the world in three months. Genome of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) predecessor, severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV) play an important role in understanding the concept of genetic variation. In this paper, the genomic data accessed from National Center for Biotechnology Information (NCBI) through Molecular Evolutionary Genetic Analysis (MEGA) for statistical analysis. Firstly, the Bayesian information criterion (BIC) and Akaike information criterion (AICc) are used to evaluate the best substitution pattern. Secondly, the maximum likelihood method used to estimate of transition/transversions (R) through Kimura-2, Tamura-3, Hasegawa-Kishino-Yano, and Tamura-Nei nucleotide substitutions model. Thirdly and finally nucleotide frequencies computed based on genomic data of NCBI. The results indicate that general times reversible model has the lowest BIC and AICc score 347,394 and 347,287, respectively. The transition/transversions bias for nucleotide substitutions models varies from 0.56 to 0.59 in MEGA output. The average nitrogenous bases frequency of U, C, A, and G are 31.74, 19.48, 28.04, and 20.74, respectively in percentages. Overall the genomic data analysis of SARS-CoV-2, SARS-CoV, and MERS-CoV highlights the close genetic relationship.

Online railway wheel defect detection under varying running-speed conditions by multi-kernel relevance vector machine

  • Wei, Yuan-Hao;Wang, You-Wu;Ni, Yi-Qing
    • Smart Structures and Systems
    • /
    • v.30 no.3
    • /
    • pp.303-315
    • /
    • 2022
  • The degradation of wheel tread may result in serious hazards in the railway operation system. Therefore, timely wheel defect diagnosis of in-service trains to avoid tragic events is of particular importance. The focus of this study is to develop a novel wheel defect detection approach based on the relevance vector machine (RVM) which enables online detection of potentially defective wheels with trackside monitoring data acquired under different running-speed conditions. With the dynamic strain responses collected by a trackside monitoring system, the cumulative Fourier amplitudes (CFA) characterizing the effect of individual wheels are extracted to formulate multiple probabilistic regression models (MPRMs) in terms of multi-kernel RVM, which accommodate both variables of vibration frequency and running speed. Compared with the general single-kernel RVM-based model, the proposed multi-kernel MPRM approach bears better local and global representation ability and generalization performance, which are prerequisite for reliable wheel defect detection by means of data acquired under different running-speed conditions. After formulating the MPRMs, we adopt a Bayesian null hypothesis indicator for wheel defect identification and quantification, and the proposed method is demonstrated by utilizing real-world monitoring data acquired by an FBG-based trackside monitoring system deployed on a high-speed trial railway. The results testify the validity of the proposed method for wheel defect detection under different running-speed conditions.

Prediction Model for Gastric Cancer via Class Balancing Techniques

  • Danish, Jamil ;Sellappan, Palaniappan;Sanjoy Kumar, Debnath;Muhammad, Naseem;Susama, Bagchi ;Asiah, Lokman
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.53-63
    • /
    • 2023
  • Many researchers are trying hard to minimize the incidence of cancers, mainly Gastric Cancer (GC). For GC, the five-year survival rate is generally 5-25%, but for Early Gastric Cancer (EGC), it is almost 90%. Predicting the onset of stomach cancer based on risk factors will allow for an early diagnosis and more effective treatment. Although there are several models for predicting stomach cancer, most of these models are based on unbalanced datasets, which favours the majority class. However, it is imperative to correctly identify cancer patients who are in the minority class. This research aims to apply three class-balancing approaches to the NHS dataset before developing supervised learning strategies: Oversampling (Synthetic Minority Oversampling Technique or SMOTE), Undersampling (SpreadSubsample), and Hybrid System (SMOTE + SpreadSubsample). This study uses Naive Bayes, Bayesian Network, Random Forest, and Decision Tree (C4.5) methods. We measured these classifiers' efficacy using their Receiver Operating Characteristics (ROC) curves, sensitivity, and specificity. The validation data was used to test several ways of balancing the classifiers. The final prediction model was built on the one that did the best overall.

Nonstationary Surrogate Model for Reference Evapotranspiration Estimation Based on In-situ Temperature Data (온도인자를 활용한 비정상성 기준증발산량 대체모형 개발)

  • Kim, Ho-Jun;Nguyen, Thi Huong;Kang, Dongwon;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.96-96
    • /
    • 2021
  • 수문기상인자 중 하나인 증발산량은 수자원 계획 및 관리 시 고려되며, 특히 물수지 모형 등의 입력자료로 활용된다. 우리나라를 포함한 각국 기상청 및 국제기구에서는 직접 관측이 아닌 FAO56 Penman-Monteith(PM)을 통해 증발산량을 산출하고 있다. FAO56 PM 방법은 복사(radiation), 대기온도(air temperature), 습도(humidity), 풍속(wind speed) 등의 기상인자로부터 기준증발산량(reference evapotransipiration)을 추정하며, 상대적으로 높은 정확성을 보여준다. 그러나 FAO56 PM 방법은 많은 기상인자를 요구하므로 미계측 유역을 포함한 일부지역에 대한 증발산량 자료 구축이 어려운 실정이다. 또한, 기준증발산량의 특성이 시간에 따라 변화하므로 비정상성(nonstationary)을 고려한 분석이 요구된다. 본 연구에서는 온도인자 기반의 대체모형(surrogate model)을 개발하여 기준증발산량의 비정상성을 고려하고자 한다. 한강유역에 위치한 관측소를 대상으로 모형을 개발하였으며, 시간에 따라 변동하는 기준증발산량의 특성을 고려하기 위해 Bayesian 추론기법을 통해 매개변수를 시간에 따라 추정하였다. 또한, 본 연구에서는 대체모형으로 산정된 증발산량을 활용해 가뭄지수인 EDDI(evaporative demand drought index)를 제시하였다. 가뭄 모니터링 및 조기 경보 안내를 위해 개발된 EDDI를 활용하여 기존 가뭄보다 빠르게 진행되는 초단기 가뭄(flash drought)를 평가하였다. 본 연구에서 개발된 모형은 미계측 지역에서도 적용이 가능하므로 수자원분야에서 활용성이 높을 것으로 사료된다.

  • PDF

Surrogate Model for Potential Evapotranspiration Using a difference in Maximum and Minimum Temperature within a Hargreaves Modeling Framework (온도인자를 활용한 Hargreaves 모형 기반의 잠재증발산량 대체 모형 개발)

  • Kim, Ho Jun;Kim, Tae-Jeong;Lee, Kang Wook;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.184-184
    • /
    • 2020
  • 수자원 계획 및 관리 시 증발산량의 정량적 분석은 필수적으로 고려되는 사항 중 하나이다. 일단위 이하의 잠재증발산량 산정은 세계식량기구(FAO)가 Penman-Monteith 방법을 기반으로 개발한 FAO56 PM 방법을 주로 활용하며, 이는 다른 방법에 비하여 높은 정확성과 적용성이 뛰어나다. 그러나 FAO56 PM 방법의 입력 매개변수는 다양한 기상자료이며, 장기간의 신뢰성 높은 자료를 구축하는 것은 어려운 실정이다. 이에 본 연구에서는 증발산량 공식인 Hargreaves 공식을 활용하여 FAO56 PM 방법으로 산정된 잠재증발산량과 기온차 사이의 시계열 관계를 재구성한 회귀분석 기법을 개발하였다. 개발된 모형에 유역면적을 적용하여 유역면적별 잠재증발산량을 산정하였으며, 이를 기존의 잠재증발산량과의 비교를 통해 모형의 적합성을 평가하였다. 결과적으로, 복잡한 잠재증발산량식을 단순한 대체모형(surrogate model)으로 제시함으로써 효율적인 증발산량 정량적 평가와 제한적인 기상자료 조건에 보편적 활용이 가능하다. 향후 연구에서는 회귀분석방법에 Bayesian 추론기법을 활용하여 구성함으로 잠재증발산량의 불확실성을 정량적으로 표현하고자 한다.

  • PDF

Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm (머신러닝 알고리즘 기반의 의료비 예측 모델 개발)

  • Han Bi KIM;Dong Hoon HAN
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.1 no.1
    • /
    • pp.11-16
    • /
    • 2023
  • Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.

Evaluation of Performance of Artificial Neural Network based Hardening Model for Titanium Alloy Considering Strain Rate and Temperature (티타늄 합금의 변형률속도 및 온도를 고려한 인공신경망 기반 경화모델 성능평가)

  • M. Kim;S. Lim;Y. Kim
    • Transactions of Materials Processing
    • /
    • v.33 no.2
    • /
    • pp.96-102
    • /
    • 2024
  • This study addresses evaluation of performance of hardening model for a titanium alloy (Ti6Al4V) based on the artificial neural network (ANN) regarding the strain rate and the temperature. Uniaxial compression tests were carried out at different strain rates from 0.001 /s to 10 /s and temperatures from 575 ℃ To 975 ℃. Using the experimental data, ANN models were trained and tested with different hyperparameters, such as size of hidden layer and optimizer. The input features were determined with the equivalent plastic strain, strain rate, and temperature while the output value was set to the equivalent stress. When the number of data is sufficient with a smooth tendency, both the Bayesian regulation (BR) and the Levenberg-Marquardt (LM) show good performance to predict the flow behavior. However, only BR algorithm shows a predictability when the number of data is insufficient. Furthermore, a proper size of the hidden layer must be confirmed to describe the behavior with the limited number of the data.

Comparison of the fit of automatic milking system and test-day records with the use of lactation curves

  • Sitkowska, B.;Kolenda, M.;Piwczynski, D.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.33 no.3
    • /
    • pp.408-415
    • /
    • 2020
  • Objective: The aim of the paper was to compare the fit of data derived from daily automatic milking systems (AMS) and monthly test-day records with the use of lactation curves; data was analysed separately for primiparas and multiparas. Methods: The study was carried out on three Polish Holstein-Friesians (PHF) dairy herds. The farms were equipped with an automatic milking system which provided information on milking performance throughout lactation. Once a month cows were also subjected to test-day milkings (method A4). Most studies described in the literature are based on test-day data; therefore, we aimed to compare models based on both test-day and AMS data to determine which mathematical model (Wood or Wilmink) would be the better fit. Results: Results show that lactation curves constructed from data derived from the AMS were better adjusted to the actual milk yield (MY) data regardless of the lactation number and model. Also, we found that the Wilmink model may be a better fit for modelling the lactation curve of PHF cows milked by an AMS as it had the lowest values of Akaike information criterion, Bayesian information criterion, mean square error, the highest coefficient of determination values, and was more accurate in estimating MY than the Wood model. Although both models underestimated peak MY, mean, and total MY, the Wilmink model was closer to the real values. Conclusion: Models of lactation curves may have an economic impact and may be helpful in terms of herd management and decision-making as they assist in forecasting MY at any moment of lactation. Also, data obtained from modelling can help with monitoring milk performance of each cow, diet planning, as well as monitoring the health of the cow.

Estimation of genetic parameters and trends for production traits of dairy cattle in Thailand using a multiple-trait multiple-lactation test day model

  • Buaban, Sayan;Puangdee, Somsook;Duangjinda, Monchai;Boonkum, Wuttigrai
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.33 no.9
    • /
    • pp.1387-1399
    • /
    • 2020
  • Objective: The objective of this study was to estimate the genetic parameters and trends for milk, fat, and protein yields in the first three lactations of Thai dairy cattle using a 3-trait,-3-lactation random regression test-day model. Methods: Data included 168,996, 63,388, and 27,145 test-day records from the first, second, and third lactations, respectively. Records were from 19,068 cows calving from 1993 to 2013 in 124 herds. (Co) variance components were estimated by Bayesian methods. Gibbs sampling was used to obtain posterior distributions. The model included herd-year-month of testing, breed group-season of calving-month in tested milk group, linear and quadratic age at calving as fixed effects, and random regression coefficients for additive genetic and permanent environmental effects, which were defined as modified constant, linear, quadratic, cubic and quartic Legendre coefficients. Results: Average daily heritabilities ranged from 0.36 to 0.48 for milk, 0.33 to 0.44 for fat and 0.37 to 0.48 for protein yields; they were higher in the third lactation for all traits. Heritabilities of test-day milk and protein yields for selected days in milk were higher in the middle than at the beginning or end of lactation, whereas those for test-day fat yields were high at the beginning and end of lactation. Genetics correlations (305-d yield) among production yields within lactations (0.44 to 0.69) were higher than those across lactations (0.36 to 0.68). The largest genetic correlation was observed between the first and second lactation. The genetic trends of 305-d milk, fat and protein yields were 230 to 250, 25 to 29, and 30 to 35 kg per year, respectively. Conclusion: A random regression model seems to be a flexible and reliable procedure for the genetic evaluation of production yields. It can be used to perform breeding value estimation for national genetic evaluation in the Thai dairy cattle population.