• Title/Summary/Keyword: Random selection

Search Result 638, Processing Time 0.033 seconds

Study on the Prediction Model for Employment of University Graduates Using Machine Learning Classification (머신러닝 기법을 활용한 대졸 구직자 취업 예측모델에 관한 연구)

  • Lee, Dong Hun;Kim, Tae Hyung
    • The Journal of Information Systems
    • /
    • v.29 no.2
    • /
    • pp.287-306
    • /
    • 2020
  • Purpose Youth unemployment is a social problem that continues to emerge in Korea. In this study, we create a model that predicts the employment of college graduates using decision tree, random forest and artificial neural network among machine learning techniques and compare the performance between each model through prediction results. Design/methodology/approach In this study, the data processing was performed, including the acquisition of the college graduates' vocational path survey data first, then the selection of independent variables and setting up dependent variables. We use R to create decision tree, random forest, and artificial neural network models and predicted whether college graduates were employed through each model. And at the end, the performance of each model was compared and evaluated. Findings The results showed that the random forest model had the highest performance, and the artificial neural network model had a narrow difference in performance than the decision tree model. In the decision-making tree model, key nodes were selected as to whether they receive economic support from their families, major affiliates, the route of obtaining information for jobs at universities, the importance of working income when choosing jobs and the location of graduation universities. Identifying the importance of variables in the random forest model, whether they receive economic support from their families as important variables, majors, the route to obtaining job information, the degree of irritating feelings for a month, and the location of the graduating university were selected.

Selection of Performance of Bias Correction using TOPSIS method (TOPSIS 방법을 이용한 편의 보정 방법 선정)

  • Song, Young Hoon;Chung, Eun Sung
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.306-306
    • /
    • 2019
  • 전지구적 기온상승으로 인해 미래기후의 관한 연구가 중요시 되고 있다. 위와 같은 현상으로 인하여 다양한 기후변화 연구가 진행되고 있다. 미래기후 연구에는 GCM (General Circulation Model) 모의 결과가 이용된다. 격자 자료로 구성된 GCM은 연구 지점으로 지역적 상세화와 연구지역의 관측자료 사이의 편이 보정(bias correction)이 필수적이다. 위와 같은 근거로 편이 보정 방법의 선택은 매우 중요하며 편의 보정의 방법에 따라서 결과가 다르게 도출될 수 있다. 또한 국내외 연구에서는 다양한 상세화 기법과 편이 보정 기법을 분석 및 평가하는 연구가 진행되고 있으며, 편의 기법 중 대표적인 기법인 Quantile mapping과 Random Forest 기법이 있다. Quantile mapping 기법은 GCM의 과거 모의 데이터와의 편이 보정에 있어서 우수하게 나타났으나, GCM 데이터의 미래 예측 기간(2010년~2018년)까지의 데이터에서는 극한 강수를 정량적으로 분석 가능한 Random Forest 기법이 편이 보정 과정에서 성능이 우수할 것으로 판단된다. 본 연구에서는 우리나라 21개 관측소를 기준으로 총 4개의 GCM(GISS, CSIRO, CCSM4,MIROC5)의 과거 기간 자료(1970년~2005년)를 실제 관측소에서 관측된 강수량을 편의 보정하는 방법에 있어서 편의 보정 기법의 성능을 비교한 결과와 GCM 미래 예측 기간 자료(2010년~2018년)에서의 편의 보정 기법의 성능 결과를 비교하였다. 이를 토대로 편이 보정 기법의 결과를 6개의 평가지수를 이용하여 정량적으로 분석하였으며, 다기준의사결정기법인 TOPSIS(Technique for Order of Preference by Similarity to Ideal Solution)를 이용하여 편이 보정기법들의 성능에 있어서 우선순위를 선정하였다. 본 연구에서 편이 보정 방법으로 Quantile mapping 방법을 사용했으며, Quantile mapping의 기법으로는 비모수 변환법(non-parametric transformation)과 분포기반 변환법(distribution derived transformation)이 사용되었다. 또한 머신러닝 방법 중 하나인 Random Forest 방법을 동시에 사용하여 결과를 비교하였다. 또한 GCM 자료가 격자식으로 제공하고 있기 때문에 관측소 강수량도 공간적으로 환산하여야 하는데, 본 연구에서는 역거리 가중치법(inverse distance weighting, IDW) 방법을 이용하였다.

  • PDF

Fine Mapping of the Rice Bph1 Gene, which Confers Resistance to the Brown Planthopper (Nilaparvata lugens Stal), and Development of STS Markers for Marker-assisted Selection

  • Cha, Young-Soon;Ji, Hyeonso;Yun, Doh-Won;Ahn, Byoung-Ohg;Lee, Myung Chul;Suh, Seok-Cheol;Lee, Chun Seok;Ahn, Eok Keun;Jeon, Yong-Hee;Jin, Il-Doo;Sohn, Jae-Keun;Koh, Hee-Jong;Eun, Moo-Young
    • Molecules and Cells
    • /
    • v.26 no.2
    • /
    • pp.146-151
    • /
    • 2008
  • The brown planthopper (BPH) is a major insect pest in rice, and damages these plants by sucking phloem-sap and transmitting viral diseases. Many BPH resistance genes have been identified in indica varieties and wild rice accessions, but none has yet been cloned. In the present study we report fine mapping of the region containing the Bph1 locus, which enabled us to perform marker-aided selection (MAS). We used 273 F8 recombinant inbred lines (RILs) derived from a cross between Cheongcheongbyeo, an indica type variety harboring Bph1 from Mudgo, and Hwayeongbyeo, a BPH susceptible japonica variety. By random amplification of polymorphic DNA (RAPD) analysis using 656 random 10-mer primers, three RAPD markers (OPH09, OPA10 and OPA15) linked to Bph1 were identified and converted to SCAR (sequence characterized amplified region) markers. These markers were found to be contained in two BAC clones derived from chromosome 12: OPH09 on OSJNBa0011B18, and both OPA10 and OPA15 on OSJNBa0040E10. By sequence analysis of ten additional BAC clones evenly distributed between OSJNBa0011B18 and OSJNBa0040E10, we developed 15 STS markers. Of these, pBPH4 and pBPH14 flanked Bph1 at distances of 0.2 cM and 0.8 cM, respectively. The STS markers pBPH9, pBPH19, pBPH20, and pBPH21 co-segregated with Bph1. These markers were shown to be very useful for marker-assisted selection (MAS) in breeding populations of 32 F6 RILs from a cross between Andabyeo and IR71190, and 32 F5 RILs from a cross between Andabyeo and Suwon452.

Genetic Parameters Estimated for Sexual Maturity and Weekly Live Weights of Japanese Quail (Coturnix coturnix japonica)

  • Sezer, Metin
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.20 no.1
    • /
    • pp.19-24
    • /
    • 2007
  • Covariance components and genetic parameters of weekly live body weight from hatching to six weeks of age and age of sexual maturation were estimated in a laying type Japanese quail line. The univariate and bivariate animal model analysis included hatching group and sex as fixed effects. Each trait was analysed with animal as random effect to fit the additive direct effect. Additional random effects incorporated in the models were changed according to the trait examined. The best model for a trait was chosen based on a likelihood ratio test, comparing the models with and without maternal additive genetic and maternal permanent environmental effects. Heritability estimates of live-weight at hatch and one to six weeks of age with their standard errors were 0.22${\pm}$0.088, 0.39${\pm}$0.099, 0.31${\pm}$0.086, 0.38${\pm}$0.056, 0.46${\pm}$0.055, 0.50${\pm}$0.059, and 0.56${\pm}$0.062, respectively. Direct heritability value of age of sexual maturation was moderate (0.24${\pm}$0.055). The variances due to permanent environmental effect of dam after one week of age and maternal genetic effect after two weeks of age were not important sources of variation. The correlations between direct and maternal genetic effects were negative and ranged from high to moderate values (-0.21 to -0.83). Among the weekly live weights, genetic correlations were generally high between not only successive but also early and late weightings. It suggests that selection for final weight may be based on early weight records. Genetic correlations between age of sexual maturation and live weights were low, favourable but had high standard errors. These results indicate that selection for high weight will potentially result in lower age of sexual maturation only with accurate determination of breeding values.

Selection of Sahiwal Cattle Bulls on Pedigree and Progeny

  • Bhatti, A.A.;Khan, M.S.;Rehman, Z.;Hyder, A.U.;Hassan, F.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.20 no.1
    • /
    • pp.12-18
    • /
    • 2007
  • The objective of the study was to compare ranking of Sahiwal bulls selected on the basis of highest lactation milk yield of their dams with their estimated breeding values (EBVs) using an animal model. Data on 23,761 lactation milk yield records of 5,936 cows from five main Livestock Experiment Stations in Punjab province of Pakistan (1964-2004) were used for the study. At present the young A.I bulls are required to be from A-category bull-dams. Dams were categorized as A, B, C and D if they had highest lactation milk yield of ${\geq}$2,700, 2,250-2,699, 1,800-2,249 and <1,800 litres, respectively. The EBVs for lactation milk yield were estimated for all the animals using an individual animal model having fixed effect of herd-year and season of calving and random effect of animal. Fixed effect of parity and random effect of permanent environment were incorporated when multiple lactation were used. There were 396 young bulls used for semen collection and A.I during 1973-2004. However, progeny with lactation yields recorded, were available only for 91 bulls and dams could be traced for only 63 bulls. Overall lactation milk yield averaged 1,440.8 kg. Milk yield was 10% heritable with repeatability of 39%. Ranking bulls on highest lactation milk yield of their dams, the in-vogue criteria of selecting bulls, had a rank correlation of 0.167 (p<0.190) with ranking based on EBVs from animal model analysis. Bulls' EBVs for all lactations had rank correlation of 0.716 (p<0.001) with EBVs based on first lactation milk yield and 0.766 (p<0.001) with average EBVs of dam and sire (pedigree index). Ranking of bulls on highest lactation yield of their dams has no association with their ranking based on animal model evaluation. Young Sahiwal bulls should be selected on the basis of pedigree index instead of highest lactation yield of dams. This can help improve the genetic potential of the breed accruing to conservation and development efforts.

Utilizing Experiences of Supervisor in Sequential Learning for Multilayer Perceptron (지도 경험을 활용한 다계층 퍼셉트론의 순차적 학습 방법)

  • Lee, Jae-Young;Kim, Hwang-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.10
    • /
    • pp.723-735
    • /
    • 2010
  • Evaluating the level of achievement and providing the knowledge which is appropriate at the evaluated level have great influence in studying of the human beings. This shows the importance of the order of training and the training order should be considered in machine learning. In this research, to assess the influence of the order of training, we propose a method of controlling the order of training samples utilizing the experience of supervisor in the training of MLP. The supervisor finds out the current state of MLP using teaching experience and student evaluation, and then selects the most instructive sample for MLP in that state. We use CRF to represent and utilize the experience of supervisor. While the proposed method is similar to active learning in selecting samples, it is basically different in that selection is not to reduce the number of samples to be used but to assist the learning progress. The result from classification problem shows that the method is usually effective in terms of time taken in training in contrast to random selection.

Pattern-Mixture Model of the Cox Proportional Hazards Model with Missing Binary Covariates (결측이 있는 이산형 공변량에 대한 Cox비례위험모형의 패턴-혼합 모델)

  • Youk, Tae-Mi;Song, Ju-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.2
    • /
    • pp.279-291
    • /
    • 2012
  • When fitting a Cox proportional hazards model with missing covariates, it is inefficient to exclude observations with missing values in the analysis. Furthermore, if the missing-data mechanism is not Missing Completely At Random(MCAR), it may lead to biased parameter estimation. Many approaches have been suggested to handle the Cox proportional hazards model when covariates are sometimes missing, but they are based on the selection model. This paper suggest an approach to handle Cox proportional hazards model with missing covariates by using the pattern-mixture model (Little, 1993). The pattern-mixture model is expressed by the joint distribution of survival time and the missing-data mechanism. In the pattern-mixture model, many models can be considered by setting up various restrictions, and different results under various restrictions indicate the sensitivity of the model due to missing covariates. A simulation study was conducted to show the sensitivity of parameter estimation under different restrictions in a pattern-mixture model. The proposed approach was also applied to mouse leukemia data.

Self-Consciousness Information and Selection Effect (자기의식 정보와 관찰 선택 효과)

  • Kim, Myeongseok
    • Korean Journal of Logic
    • /
    • v.20 no.1
    • /
    • pp.1-19
    • /
    • 2017
  • In modern cosmology, it is controversial whether the existence of human consciousness can be used as evidence to support the hypothesis that many parallel universes are actualized. In this paper, we want to explore the nature of self-consciousness information that I am awake now. Consider the following experiment participating Al and Bob. We throw a fair coin on Sunday. If the coin lands heads we wake up just one of Al and Bob on Monday. If the coin lands tails we wake up both of Al and Bob. On Monday, at least one of Al and Bob will wake up, to what degree ought they believe that the outcome of the coin toss is heads? We will argue that the correct answer to this question is 1/3. To this end, we will argue the awakened person's information that "I am awake" is given to himself through a random procedure.

  • PDF

Bayesian Model Selection in the Unbalanced Random Effect Model

  • Kim, Dal-Ho;Kang, Sang-Gil;Lee, Woo-Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.743-752
    • /
    • 2004
  • In this paper, we develop the Bayesian model selection procedure using the reference prior for comparing two nested model such as the independent and intraclass models using the distance or divergence between the two as the basis of comparison. A suitable criterion for this is the power divergence measure as introduced by Cressie and Read(1984). Such a measure includes the Kullback -Liebler divergence measures and the Hellinger divergence measure as special cases. For this problem, the power divergence measure turns out to be a function solely of $\rho$, the intraclass correlation coefficient. Also, this function is convex, and the minimum is attained at $\rho=0$. We use reference prior for $\rho$. Due to the duality between hypothesis tests and set estimation, the hypothesis testing problem can also be solved by solving a corresponding set estimation problem. The present paper develops Bayesian method based on the Kullback-Liebler and Hellinger divergence measures, rejecting $H_0:\rho=0$ when the specified divergence measure exceeds some number d. This number d is so chosen that the resulting credible interval for the divergence measure has specified coverage probability $1-{\alpha}$. The length of such an interval is compared with the equal two-tailed credible interval and the HPD credible interval for $\rho$ with the same coverage probability which can also be inverted into acceptance regions of $H_0:\rho=0$. Example is considered where the HPD interval based on the one-at- a-time reference prior turns out to be the shortest credible interval having the same coverage probability.

  • PDF

Performance Analysis of Random Resource Selection in LTE D2D Discovery (LTE D2D 디스커버리에서 무작위 자원 선택 방법에 대한 성능 분석)

  • Park, Kyungwon;Kim, Joonyoung;Jeong, Byeong Kook;Lee, Kwang Bok;Choi, Sunghyun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.3
    • /
    • pp.577-584
    • /
    • 2017
  • Long Term Evolution device-to-device (LTE D2D) is a key technology to mitigate data traffic load in a cellular system. It facilitates direct data exchange between neighboring users, which is preceded by D2D discovery. Each device advertises its presence to neighboring devices by broadcasting its discovery message. In this paper, we develop a mathematical analysis to assess the probability that discovery messages are successfully transmitted at the D2D discovery stage. We make use of stochastic geometry for modeling spatial statistics of nodes in a two dimensional space. It reflects signal to noise plus interference ratio (SINR) degradation due to resource collision and in-band emission, which leads to the discovery message reception probability being modeled as a function of the distance between the transmitter and the receiver. Numerical results verify that the newly developed analysis accurately estimates discovery message reception probabilities of nodes at the D2D discovery stage.