• Title/Summary/Keyword: Sample selection model

Search Result 197, Processing Time 0.034 seconds

Effect of Somatic Cell Score on Protein Yield in Holsteins

  • Khan, M.S.;Shook, G.E.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.11 no.5
    • /
    • pp.580-585
    • /
    • 1998
  • The study was conducted to determine if variation in protein yield can be explained by expressions of early lactation somatic cell score (SCS) and if prediction can be improved by including SCS among the predictors. A data set was prepared (n = 663,438) from Wisconsin Dairy Improvement Association (USA) records for protein yield with sample days near 20. Stepwise regression was used requiring F statistic (p < .01) for any variable to stay in the model. Separate analyses were run for 12 combinations of four seasons and first three parities. Selection of SCS variables was not consistent across seasons or lactations. Coefficients of detennination ($R^2$) ranged from 51 to 61% with higher values for earlier lactations. Including any expression of SCS in the prediction equations improved $R^2$ by < 1 %. SCS was associated with milk yield on the sample day, but the association was not strong enough to improve the prediction of future yield when other expressions of milk yield were in the model.

Undecided inference using bivariate probit models (이변량 프로빗모형을 이용한 미결정자 추론)

  • Hong, Chong-Sun;Jung, Mi-Yang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.6
    • /
    • pp.1017-1028
    • /
    • 2011
  • When it is not easy to decide the credit scoring for some loan applicants, credit evaluation is postponded and reserve to ask a specialist for further evaluation of undecided applicants. This undecided inference is one of problems that happen to most statistical models including the biostatistics and sportal statistics as well as credit evaluation area. In this work, the undecided inference is regarded as a missing data mechanism under the assumption of MNAR, and use the bivariate probit model which is one of sample selection models. Two undecided inference methods are proposed: one is to make use of characteristic variables to represent the state for decided applicants, and the other is that more accurate and additional informations are collected and apply these new variables. With an illustrated example, misclassification error rates for undecided and overall applicants are obtainded and compared according to various characteristic variables, undecided intervals, and thresholds. It is found that misclassification error rates could be reduced when the undecided interval is increased and more accurate information is put to model, since more accurate situation of decided applications are reflected in the bivariate probit model.

An Exploratory Study of Psychological Characteristics of Metaverse Users (메타버스 이용자의 심리 특성 탐색 연구)

  • Hyeonjeong Kim;HyunJung Kim;Beomsoo Kim;Hwan-Ho Noh
    • Knowledge Management Research
    • /
    • v.24 no.4
    • /
    • pp.63-85
    • /
    • 2023
  • This study aims to identify the primary user group in the growing metaverse space based on the increased interest during the COVID-19 era. It also aims to explore the predictive factors for metaverse adoption. To predict online activities, the study examined user purposes, motivations, and relevant demographic factors as predictive variables through model analysis. The data from the Korean Media Panel Survey were used, and a two-stage analysis with the Heckman two-stage sample selection model was conducted to predict metaverse users. The analysis revealed that the key factors influencing metaverse adoption were offline activities, openness, OTT usage, and purchasing of paid content. Moreover, in the second stage model, openness, gender, and paid content purchases were identified as significant variables for increasing metaverse usage time. These results indicate that understanding metaverse users is essential in the context of the rising interest in online activities during the COVID-19 era and can provide valuable insights for metaverse platform-related companies and developers.

The Relationship between Win-Win Growth Effort and Financial Performance with Time-lag : Development of Win-Win Growth Index using Ordered Probit Model (기업의 동반성장 노력과 재무성과의 선후행 관계 : 순위 프로빗 모형을 이용한 계량적 동반성장지수의 산출)

  • Min, Jae H.;Kim, Bumseok
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.39 no.2
    • /
    • pp.67-82
    • /
    • 2014
  • The purpose of this study is two-fold : the one is to examine the causal relationship between domestic large firms' win-win growth effort and their financial performance by fiscal years; and the other is to develop a quantitative win-win growth index to overcome the limitation of the current one mainly using a survey method developed by NCCP (National Commission for Corporate Partnership). To serve the first purpose, we take a sample of 128 large companies whose win-win growth indices as of year 2011 and 2012 were evaluated by NCCP. We use their respective fiscal year's financial data to select 62 candidate financial ratios, which are then used in subsequent empirical tests. For the tests, we employ ordered probit model with stepwise selection method and two-way ANOVA with randomized block design to identify which of the 62 financial ratios are statistically significant ones to affect the firms' win-win growth index as well as to determine if the firms' win-win growth effort would cause their financial performance positively. To serve the second purpose, we devise a model using the 123 firms' 45 financial ratios, which employs ordered probit model with stepwise selection, and the validation of the model follows. We claim that the model suggested in this study serve as an alternative complementing the current one as it can produce the index in a more objective and swift manner using the firms' publicized financial statements.

Study on Classification Function into Sasang Constitution Using Data Mining Techniques (데이터마이닝 기법을 이용한 사상체질 판별함수에 관한 연구)

  • Kim Kyu Kon;Kim Jong Won;Lee Eui Ju;Kim Jong Yeol;Choi Sun-Mi
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.18 no.6
    • /
    • pp.1938-1944
    • /
    • 2004
  • In this study, when we make a diagnosis of constitution using QSCC Ⅱ(Questionnaire of Sasang Constitution Classification). data mining techniques are applied to seek the classification function for improving the accuracy. Data used in the analysis are the questionnaires of 1051 patients who had been treated in Dong Eui Oriental Medical Hospital and Kyung Hee Oriental Medical Hospital. The criteria for data cleansing are the response pattern in the opposite questionnaires and the positive proportion of specific questionnaires in each constitution. And the criteria for variable selection are the test of homogeneity in frequency analysis and the coefficients in the linear discriminant function. Discriminant analysis model and decision tree model are applied to seek the classification function into Sasang constitution. The accuracy in learning sample is similar in two models, the higher accuracy in test sample is obtained in discriminant analysis model.

Application of the Weibull-Poisson long-term survival model

  • Vigas, Valdemiro Piedade;Mazucheli, Josmar;Louzada, Francisco
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.4
    • /
    • pp.325-337
    • /
    • 2017
  • In this paper, we proposed a new long-term lifetime distribution with four parameters inserted in a risk competitive scenario with decreasing, increasing and unimodal hazard rate functions, namely the Weibull-Poisson long-term distribution. This new distribution arises from a scenario of competitive latent risk, in which the lifetime associated to the particular risk is not observable, and where only the minimum lifetime value among all risks is noticed in a long-term context. However, it can also be used in any other situation as long as it fits the data well. The Weibull-Poisson long-term distribution is presented as a particular case for the new exponential-Poisson long-term distribution and Weibull long-term distribution. The properties of the proposed distribution were discussed, including its probability density, survival and hazard functions and explicit algebraic formulas for its order statistics. Assuming censored data, we considered the maximum likelihood approach for parameter estimation. For different parameter settings, sample sizes, and censoring percentages various simulation studies were performed to study the mean square error of the maximum likelihood estimative, and compare the performance of the model proposed with the particular cases. The selection criteria Akaike information criterion, Bayesian information criterion, and likelihood ratio test were used for the model selection. The relevance of the approach was illustrated on two real datasets of where the new model was compared with its particular cases observing its potential and competitiveness.

The Comparative Study for Property of Learning Effect based on Truncated time and Delayed S-Shaped NHPP Software Reliability Model (절단고정시간과 지연된 S-형태 NHPP 소프트웨어 신뢰모형에 근거한 학습효과특성 비교연구)

  • Kim, Hee Cheul
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.8 no.4
    • /
    • pp.25-34
    • /
    • 2012
  • In this study, in the process of testing before the release of the software products designed, software testing manager in advance should be aware of the testing-information. Therefore, the effective learning effects perspective has been studied using the NHPP software. The finite failure nonhomogeneous Poisson process models presented and applied property of learning effect based on truncated time and delayed S-shaped software reliability. Software error detection techniques known in advance, but influencing factors for considering the errors found automatically and learning factors, by prior experience, to find precisely the error factor setting up the testing manager are presented comparing the problem. As a result, the learning factor is greater than autonomous errors-detected factor that is generally efficient model can be confirmed. This paper, a failure data analysis was performed, using time between failures, according to the small sample and large sample sizes. The parameter estimation was carried out using maximum likelihood estimation method. Model selection was performed using the mean square error and coefficient of determination, after the data efficiency from the data through trend analysis was performed.

Behavioral Intention to Accept and Use Banking Service

  • NGAN, Nguyen Thi;KHOI, Bui Huy
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.7 no.11
    • /
    • pp.393-400
    • /
    • 2020
  • Banking service is provided by a bank that allows its customers to conduct banking transactions and so the purpose of this study was to identify the factors that influenced the behavioral intention to accept and use banking services in Vietnam. The research methodology was implemented through two steps: qualitative research and quantitative research. Qualitative research was conducted with a sample of 30 people. Quantitative research was carried out as soon as the question was edited from the test results with a sample of 217 customers living in Ho Chi Minh City, Vietnam. The research model was proposed from the studies of the behavioral intentions to accept and use banking service. The reliability and validity of the scale were evaluated by Cronbach's Alpha, Average Variance Extracted (Pvc), and Composite Reliability (Pc). The model selection of AIC showed that the behavioral intention to accept and use banking service was impacted by four components. The outcomes showed that the model of research intended to accept and use banking services in Ho Chi Minh, Vietnam showing the effects of 4 scales is built as perceived ease of use, trust, social norm, and innovation about banking services.

Forecasting Korean CPI Inflation (우리나라 소비자물가상승률 예측)

  • Kang, Kyu Ho;Kim, Jungsung;Shin, Serim
    • Economic Analysis
    • /
    • v.27 no.4
    • /
    • pp.1-42
    • /
    • 2021
  • The outlook for Korea's consumer price inflation rate has a profound impact not only on the Bank of Korea's operation of the inflation target system but also on the overall economy, including the bond market and private consumption and investment. This study presents the prediction results of consumer price inflation in Korea for the next three years. To this end, first, model selection is performed based on the out-of-sample predictive power of autoregressive distributed lag (ADL) models, AR models, small-scale vector autoregressive (VAR) models, and large-scale VAR models. Since there are many potential predictors of inflation, a Bayesian variable selection technique was introduced for 12 macro variables, and a precise tuning process was performed to improve predictive power. In the case of the VAR model, the Minnesota prior distribution was applied to solve the dimensional curse problem. Looking at the results of long-term and short-term out-of-sample predictions for the last five years, the ADL model was generally superior to other competing models in both point and distribution prediction. As a result of forecasting through the combination of predictions from the above models, the inflation rate is expected to maintain the current level of around 2% until the second half of 2022, and is expected to drop to around 1% from the first half of 2023.

An Enhanced Feature Selection Method Based on the Impurity of Words Considering Unbalanced Distribution of Documents (문서의 불균등 분포를 고려한 단어 불순도 기반 특징 선택 방법)

  • Kang, Jin-Beom;Yang, Jae-Young;Choi, Joong-Min
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.9
    • /
    • pp.804-816
    • /
    • 2007
  • Sample training data for machine learning often contain irrelevant information or redundant concept. It is also the case that the original data may include noise. If the information collected for constructing learning model is not reliable, it is difficult to obtain accurate information. So the system attempts to find relations or regulations between features and categories in the teaming phase. The feature selection is to remove irrelevant or redundant information before constructing teaming model. for improving its performance. Existing feature selection methods assume that the distribution of documents is balanced in terms of the number of documents for each class and the length of each document. In practice, however, it is difficult not only to prepare a set of documents with almost equal length, but also to define a number of classes with fixed number of document elements. In this paper, we propose a new feature selection method that considers the impurities among the words and unbalanced distribution of documents in categories. We could obtain feature candidates using the word impurity and eventually select the features through unbalanced distribution of documents. We demonstrate that our method performs better than other existing methods via some experiments.