• Title/Summary/Keyword: Random selection

Search Result 638, Processing Time 0.027 seconds

Genetic Parameters for Milk Yield and Lactation Persistency Using Random Regression Models in Girolando Cattle

  • Canaza-Cayo, Ali William;Lopes, Paulo Savio;da Silva, Marcos Vinicius Gualberto Barbosa;de Almeida Torres, Robledo;Martins, Marta Fonseca;Arbex, Wagner Antonio;Cobuci, Jaime Araujo
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.28 no.10
    • /
    • pp.1407-1418
    • /
    • 2015
  • A total of 32,817 test-day milk yield (TDMY) records of the first lactation of 4,056 Girolando cows daughters of 276 sires, collected from 118 herds between 2000 and 2011 were utilized to estimate the genetic parameters for TDMY via random regression models (RRM) using Legendre's polynomial functions whose orders varied from 3 to 5. In addition, nine measures of persistency in milk yield ($PS_i$) and the genetic trend of 305-day milk yield (305MY) were evaluated. The fit quality criteria used indicated RRM employing the Legendre's polynomial of orders 3 and 5 for fitting the genetic additive and permanent environment effects, respectively, as the best model. The heritability and genetic correlation for TDMY throughout the lactation, obtained with the best model, varied from 0.18 to 0.23 and from -0.03 to 1.00, respectively. The heritability and genetic correlation for persistency and 305MY varied from 0.10 to 0.33 and from -0.98 to 1.00, respectively. The use of $PS_7$ would be the most suitable option for the evaluation of Girolando cattle. The estimated breeding values for 305MY of sires and cows showed significant and positive genetic trends. Thus, the use of selection indices would be indicated in the genetic evaluation of Girolando cattle for both traits.

Genetic Analysis of Milk Yield in First-Lactation Holstein Friesian in Ethiopia: A Lactation Average vs Random Regression Test-Day Model Analysis

  • Meseret, S.;Tamir, B.;Gebreyohannes, G.;Lidauer, M.;Negussie, E.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.28 no.9
    • /
    • pp.1226-1234
    • /
    • 2015
  • The development of effective genetic evaluations and selection of sires requires accurate estimates of genetic parameters for all economically important traits in the breeding goal. The main objective of this study was to assess the relative performance of the traditional lactation average model (LAM) against the random regression test-day model (RRM) in the estimation of genetic parameters and prediction of breeding values for Holstein Friesian herds in Ethiopia. The data used consisted of 6,500 test-day (TD) records from 800 first-lactation Holstein Friesian cows that calved between 1997 and 2013. Co-variance components were estimated using the average information restricted maximum likelihood method under single trait animal model. The estimate of heritability for first-lactation milk yield was 0.30 from LAM whilst estimates from the RRM model ranged from 0.17 to 0.29 for the different stages of lactation. Genetic correlations between different TDs in first-lactation Holstein Friesian ranged from 0.37 to 0.99. The observed genetic correlation was less than unity between milk yields at different TDs, which indicated that the assumption of LAM may not be optimal for accurate evaluation of the genetic merit of animals. A close look at estimated breeding values from both models showed that RRM had higher standard deviation compared to LAM indicating that the TD model makes efficient utilization of TD information. Correlations of breeding values between models ranged from 0.90 to 0.96 for different group of sires and cows and marked re-rankings were observed in top sires and cows in moving from the traditional LAM to RRM evaluations.

Mortality Prediction of Older Adults Using Random Forest and Deep Learning (랜덤 포레스트와 딥러닝을 이용한 노인환자의 사망률 예측)

  • Park, Junhyeok;Lee, Songwook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.10
    • /
    • pp.309-316
    • /
    • 2020
  • We predict the mortality of the elderly patients visiting the emergency department who are over 65 years old using Feed Forward Neural Network (FFNN) and Convolutional Neural Network (CNN) respectively. Medical data consist of 99 features including basic information such as sex, age, temperature, and heart rate as well as past history, various blood tests and culture tests, and etc. Among these, we used random forest to select features by measuring the importance of features in the prediction of mortality. As a result, using the top 80 features with high importance is best in the mortality prediction. The performance of the FFNN and CNN is compared by using the selected features for training each neural network. To train CNN with images, we convert medical data to fixed size images. We acquire better results with CNN than with FFNN. With CNN for mortality prediction, F1 score and the AUC for test data are 56.9 and 92.1 respectively.

Development of a Genome-Wide Random Mutagenesis System Using Proofreading-Deficient DNA Polymerase ${\delta}$ in the Methylotrophic Yeast Hansenula polymorpha

  • Kim, Oh Cheol;Kim, Sang-Yoon;Hwang, Dong Hyeon;Oh, Doo-Byoung;Kang, Hyun Ah;Kwon, Ohsuk
    • Journal of Microbiology and Biotechnology
    • /
    • v.23 no.3
    • /
    • pp.304-312
    • /
    • 2013
  • The thermotolerant methylotrophic yeast Hansenula polymorpha is attracting interest as a potential strain for the production of recombinant proteins and biofuels. However, only limited numbers of genome engineering tools are currently available for H. polymorpha. In the present study, we identified the HpPOL3 gene encoding the catalytic subunit of DNA polymerase ${\delta}$ of H. polymorpha and mutated the sequence encoding conserved amino acid residues that are important for its proofreading 3'${\rightarrow}$5' exonuclease activity. The resulting $HpPOL3^*$ gene encoding the error-prone proofreading-deficient DNA polymerase ${\delta}$ was cloned under a methanol oxidase promoter to construct the mutator plasmid pHIF8, which also contains additional elements for site-specific chromosomal integration, selection, and excision. In a H. polymorpha mutator strain chromosomally integrated with pHIF8, a $URA3^-$ mutant resistant to 5-fluoroorotic acid was generated at a 50-fold higher frequency than in the wild-type strain, due to the dominant negative expression of $HpPOL3^*$. Moreover, after obtaining the desired mutant, the mutator allele was readily removed from the chromosome by homologous recombination to avoid the uncontrolled accumulation of additional mutations. Our mutator system, which depends on the accumulation of random mutations that are incorporated during DNA replication, will be useful to generate strains with mutant phenotypes, especially those related to unknown or multiple genes on the chromosome.

Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games (데이터마이닝을 활용한 한국프로야구 승패예측모형 수립에 관한 연구)

  • Oh, Younhak;Kim, Han;Yun, Jaesub;Lee, Jong-Seok
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.8-17
    • /
    • 2014
  • In this research, we employed various data mining techniques to build predictive models for win-loss prediction in Korean professional baseball games. The historical data containing information about players and teams was obtained from the official materials that are provided by the KBO website. Using the collected raw data, we additionally prepared two more types of dataset, which are in ratio and binary format respectively. Dividing away-team's records by the records of the corresponding home-team generated the ratio dataset, while the binary dataset was obtained by comparing the record values. We applied seven classification techniques to three (raw, ratio, and binary) datasets. The employed data mining techniques are decision tree, random forest, logistic regression, neural network, support vector machine, linear discriminant analysis, and quadratic discriminant analysis. Among 21(= 3 datasets${\times}$7 techniques) prediction scenarios, the most accurate model was obtained from the random forest technique based on the binary dataset, which prediction accuracy was 84.14%. It was also observed that using the ratio and the binary dataset helped to build better prediction models than using the raw data. From the capability of variable selection in decision tree, random forest, and stepwise logistic regression, we found that annual salary, earned run, strikeout, pitcher's winning percentage, and four balls are important winning factors of a game. This research is distinct from existing studies in that we used three different types of data and various data mining techniques for win-loss prediction in Korean professional baseball games.

A Case Study on the Target Sampling Inspection for Improving Outgoing Quality (타겟 샘플링 검사를 통한 출하품질 향상에 관한 사례 연구)

  • Kim, Junse;Lee, Changki;Kim, Kyungnam;Kim, Changwoo;Song, Hyemi;Ahn, Seoungsu;Oh, Jaewon;Jo, Hyunsang;Han, Sangseop
    • Journal of Korean Society for Quality Management
    • /
    • v.49 no.3
    • /
    • pp.421-431
    • /
    • 2021
  • Purpose: For improving outgoing quality, this study presents a novel sampling framework based on predictive analytics. Methods: The proposed framework is composed of three steps. The first step is the variable selection. The knowledge-based and data-driven approaches are employed to select important variables. The second step is the model learning. In this step, we consider the supervised classification methods, the anomaly detection methods, and the rule-based methods. The applying model is the third step. This step includes the all processes to be enabled on real-time prediction. Each prediction model classifies a product as a target sample or random sample. Thereafter intensive quality inspections are executed on the specified target samples. Results: The inspection data of three Samsung products (mobile, TV, refrigerator) are used to check functional defects in the product by utilizing the proposed method. The results demonstrate that using target sampling is more effective and efficient than random sampling. Conclusion: The results of this paper show that the proposed method can efficiently detect products that have the possibilities of user's defect in the lot. Additionally our study can guide practitioners on how to easily detect defective products using stratified sampling

Predictive Model of Optimal Continuous Positive Airway Pressure for Obstructive Sleep Apnea Patients with Obesity by Using Machine Learning (비만 폐쇄수면무호흡 환자에서 기계학습을 통한 적정양압 예측모형)

  • Kim, Seung Soo;Yang, Kwang Ik
    • Journal of Sleep Medicine
    • /
    • v.15 no.2
    • /
    • pp.48-54
    • /
    • 2018
  • Objectives: The aim of this study was to develop a predicting model for the optimal continuous positive airway pressure (CPAP) for obstructive sleep apnea (OSA) patient with obesity by using a machine learning. Methods: We retrospectively investigated the medical records of 162 OSA patients who had obesity [body mass index (BMI) ≥ 25] and undertaken successful CPAP titration study. We divided the data to a training set (90%) and a test set (10%), randomly. We made a random forest model and a least absolute shrinkage and selection operator (lasso) regression model to predict the optimal pressure by using the training set, and then applied our models and previous reported equations to the test set. To compare the fitness of each models, we used a correlation coefficient (CC) and a mean absolute error (MAE). Results: The random forest model showed the best performance {CC 0.78 [95% confidence interval (CI) 0.43-0.93], MAE 1.20}. The lasso regression model also showed the improved result [CC 0.78 (95% CI 0.42-0.93), MAE 1.26] compared to the Hoffstein equation [CC 0.68 (95% CI 0.23-0.89), MAE 1.34] and the Choi's equation [CC 0.72 (95% CI 0.30-0.90), MAE 1.40]. Conclusions: Our random forest model and lasso model ($26.213+0.084{\times}BMI+0.004{\times}$apnea-hypopnea index+$0.004{\times}oxygen$ desaturation index-$0.215{\times}mean$ oxygen saturation) showed the improved performance compared to the previous reported equations. The further study for other subgroup or phenotype of OSA is required.

Classifying the severity of pedestrian accidents using ensemble machine learning algorithms: A case study of Daejeon City (앙상블 학습기법을 활용한 보행자 교통사고 심각도 분류: 대전시 사례를 중심으로)

  • Kang, Heungsik;Noh, Myounggyu
    • Journal of Digital Convergence
    • /
    • v.20 no.5
    • /
    • pp.39-46
    • /
    • 2022
  • As the link between traffic accidents and social and economic losses has been confirmed, there is a growing interest in developing safety policies based on crash data and a need for countermeasures to reduce severe crash outcomes such as severe injuries and fatalities. In this study, we select Daejeon city where the relative proportion of fatal crashes is high, as a case study region and focus on the severity of pedestrian crashes. After a series of data manipulation process, we run machine learning algorithms for the optimal model selection and variable identification. Of nine algorithms applied, AdaBoost and Random Forest (ensemble based ones) outperform others in terms of performance metrics. Based on the results, we identify major influential factors (i.e., the age of pedestrian as 70s or 20s, pedestrian crossing) on pedestrian crashes in Daejeon, and suggest them as measures for reducing severe outcomes.

Single-step genomic evaluation for growth traits in a Mexican Braunvieh cattle population

  • Jonathan Emanuel Valerio-Hernandez;Agustin Ruiz-Flores;Mohammad Ali Nilforooshan;Paulino Perez-Rodriguez
    • Animal Bioscience
    • /
    • v.36 no.7
    • /
    • pp.1003-1009
    • /
    • 2023
  • Objective: The objective was to compare (pedigree-based) best linear unbiased prediction (BLUP), genomic BLUP (GBLUP), and single-step GBLUP (ssGBLUP) methods for genomic evaluation of growth traits in a Mexican Braunvieh cattle population. Methods: Birth (BW), weaning (WW), and yearling weight (YW) data of a Mexican Braunvieh cattle population were analyzed with BLUP, GBLUP, and ssGBLUP methods. These methods are differentiated by the additive genetic relationship matrix included in the model and the animals under evaluation. The predictive ability of the model was evaluated using random partitions of the data in training and testing sets, consistently predicting about 20% of genotyped animals on all occasions. For each partition, the Pearson correlation coefficient between adjusted phenotypes for fixed effects and non-genetic random effects and the estimated breeding values (EBV) were computed. Results: The random contemporary group (CG) effect explained about 50%, 45%, and 35% of the phenotypic variance in BW, WW, and YW, respectively. For the three methods, the CG effect explained the highest proportion of the phenotypic variances (except for YW-GBLUP). The heritability estimate obtained with GBLUP was the lowest for BW, while the highest heritability was obtained with BLUP. For WW, the highest heritability estimate was obtained with BLUP, the estimates obtained with GBLUP and ssGBLUP were similar. For YW, the heritability estimates obtained with GBLUP and BLUP were similar, and the lowest heritability was obtained with ssGBLUP. Pearson correlation coefficients between adjusted phenotypes for non-genetic effects and EBVs were the highest for BLUP, followed by ssBLUP and GBLUP. Conclusion: The successful implementation of genetic evaluations that include genotyped and non-genotyped animals in our study indicate a promising method for use in genetic improvement programs of Braunvieh cattle. Our findings showed that simultaneous evaluation of genotyped and non-genotyped animals improved prediction accuracy for growth traits even with a limited number of genotyped animals.

Effects of Number of Incomplete Data in Latest Generation on the Breeding Value Estimated by Random Regression Model (임의회귀 모형 사용시 마지막 세대의 불완전한 기록이 추정육종가에 미치는 효과)

  • ;;;;;;;;Salces, A.J.
    • Journal of Animal Science and Technology
    • /
    • v.48 no.2
    • /
    • pp.143-150
    • /
    • 2006
  • The data were collected in the dairy herd improvement program from January 2000 to July 2005. Test data included 825,157 records of first parity and animals with both parents known were included. This study aimed to describe the effect of incomplete lactation records of latest generation to the change in sire's breeding value using Random Regression model (RRM) in genetic evaluation. Estimation of genetic parameter and breeding value for sire used REMLF90 and BLUPF90 program. The phenotypic value on the number of test day records between group TD11, TD8, TD5, TD2 showed no large differences. For all the group heritability of test day milk yield range from 0.30 to 0.36. However TD2 group showed low heritability the least test day recode on the latest generation. The correlation of above 50% between test day and TD11(0.610), TD8(0.616), TD5(0.661) and TD2(0.682) with different records in latest generation. Sire's rank of breeding value varied widely depending on the records on the number of lactation from start to the latest generation. Study showed that change in breeding value ranked if daughter's test recode more so it should have at least 5 test day records. The use of RRM in dairy cattle genetic evaluation would be desirable if complete lactation records for latest generation daughters of young bulls when selection for proven bulls. Random Regression model (RRM) require at least 5 test-day lactation recode.