• Title/Summary/Keyword: Sample selection model

Search Result 198, Processing Time 0.026 seconds

A Smooth Goodness-of-fit Test Using Selected Sample Quantiles

  • Umbach, Dale;Masoom Ali, M.
    • Journal of the Korean Statistical Society
    • /
    • v.25 no.3
    • /
    • pp.347-358
    • /
    • 1996
  • A new test for goodness-of-fit is presented. It is a modification of a test of LaRiccia (1991). These tests are applicable to continuous lo-cation/scale models. The new test statistic is based on a few selected order statistics taken from the sample, while the LaRiccia test is based directly on the full sample. Each test embeds the hypothesized model in a larger linear model and proceeds to test the goodness-of-fit hy-pothesis by testing the coefficients of this linear model appropriately. The general theory is presented. The tests are compared via computer simulation to a related test of Ali and Umbach (1989) for distributions that could be used as lifetime models. An important aspect of all these tests is that only standard $X_2$ tables are used. Selection of the spacings of the order statistics is discussed.

  • PDF

Closeness of Lindley distribution to Weibull and gamma distributions

  • Raqab, Mohammad Z.;Al-Jarallah, Reem A.;Al-Mutairi, Dhaifallah K.
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.2
    • /
    • pp.129-142
    • /
    • 2017
  • In this paper we consider the problem of the model selection/discrimination among three different positively skewed lifetime distributions. Lindley, Weibull, and gamma distributions have been used to effectively analyze positively skewed lifetime data. This paper assesses how much closer the Lindley distribution gets to Weibull and gamma distributions. We consider three techniques that involve the likelihood ratio test, asymptotic likelihood ratio test, and minimum Kolmogorov distance as optimality criteria to diagnose the appropriate fitting model among the three distributions for a given data set. Monte Carlo simulation study is performed for computing the probability of correct selection based on the considered optimality criteria among these families of distributions for various choices of sample sizes and shape parameters. It is observed that overall, the Lindley distribution is closer to Weibull distribution in the sense of likelihood ratio and Kolmogorov criteria. A real data set is presented and analyzed for illustrative purposes.

A Study on the Selection of Slack Bus at Application of Marginal Loss-Factor in a Competitive Electricity Market (경쟁적 전력시장에서 한계손실계수 적용시 기준모선 선정에 대한 연구)

  • Kim, Sang-Hoon;Lee, Kwang-Ho
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.2
    • /
    • pp.264-269
    • /
    • 2009
  • Marginal Loss Factor(MLF) is represented as the sensitivity of transmission loss, which is computed from the change of the generation at slack bus by the change of the load at the arbitrary bus. The MLF dependent on the selection of slack bus is one of the key factors affecting nodal pricing, Genco's profits, social welfare(SW) and Nash Equilibrium in a competitive electricity market. This paper addresses the methodology of slack bus selection by using Cournot model of Cost Based Pool market. Numerical results from sample cases show that the slack bus of MLF of the highest average is beneficial from the view points of SW.

BAYESIAN MODEL SELECTION IN REGRESSION MODEL WITH AUTOREGRESSIVE ERRORS

  • Chung, Youn-Shik;Sohn, Keon-Tae;Kim, Sung-Duk;Kim, Chan-Soo
    • Journal of applied mathematics & informatics
    • /
    • v.9 no.1
    • /
    • pp.289-301
    • /
    • 2002
  • This paper considers the Bayesian analysis of the regression model wish autoregressive errors. The Bayesian approach for finding the order p of autoregressive error is proposed and the proposed method can be simplified by generalized Savage-Dicky density ratio(Verdinelli and Wasser-man, [18]). And the Markov chain Monte Carlo method(Gibbs sample, [7]) is used in order to overcome the difficulty of Bayesian computations. Final1y, several examples are used to illustrate our proposed methodology.

Moderately clipped LASSO for the high-dimensional generalized linear model

  • Lee, Sangin;Ku, Boncho;Kown, Sunghoon
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.4
    • /
    • pp.445-458
    • /
    • 2020
  • The least absolute shrinkage and selection operator (LASSO) is a popular method for a high-dimensional regression model. LASSO has high prediction accuracy; however, it also selects many irrelevant variables. In this paper, we consider the moderately clipped LASSO (MCL) for the high-dimensional generalized linear model which is a hybrid method of the LASSO and minimax concave penalty (MCP). The MCL preserves advantages of the LASSO and MCP since it shows high prediction accuracy and successfully selects relevant variables. We prove that the MCL achieves the oracle property under some regularity conditions, even when the number of parameters is larger than the sample size. An efficient algorithm is also provided. Various numerical studies confirm that the MCL can be a better alternative to other competitors.

Classification for Imbalanced Breast Cancer Dataset Using Resampling Methods

  • Hana Babiker, Nassar
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.89-95
    • /
    • 2023
  • Analyzing breast cancer patient files is becoming an exciting area of medical information analysis, especially with the increasing number of patient files. In this paper, breast cancer data is collected from Khartoum state hospital, and the dataset is classified into recurrence and no recurrence. The data is imbalanced, meaning that one of the two classes have more sample than the other. Many pre-processing techniques are applied to classify this imbalanced data, resampling, attribute selection, and handling missing values, and then different classifiers models are built. In the first experiment, five classifiers (ANN, REP TREE, SVM, and J48) are used, and in the second experiment, meta-learning algorithms (Bagging, Boosting, and Random subspace). Finally, the ensemble model is used. The best result was obtained from the ensemble model (Boosting with J48) with the highest accuracy 95.2797% among all the algorithms, followed by Bagging with J48(90.559%) and random subspace with J48(84.2657%). The breast cancer imbalanced dataset was classified into recurrence, and no recurrence with different classified algorithms and the best result was obtained from the ensemble model.

An Analysis of Job Selection, Major-Job Match and Wage Level of College Graduates (대학 졸업생의 직업선택과 임금 수준)

  • Park, Jae-Min
    • Journal of Korea Technology Innovation Society
    • /
    • v.14 no.1
    • /
    • pp.22-39
    • /
    • 2011
  • This study examines the wage level from a viewpoint of major-job match as part of an analysis on the skill mismatch problem in 4-year college graduates. The empirical analysis explicitly incorporate the sample selection bias as an econometric problem not only suggested but merely introduced in the earlier studies. This study also set up a major-job match variable, which was usually handled as a binary variable for analytical convenience, as a polychotomous choice variable in selection equation as provided by the survey. In particular, it considered multi-cohort survey on graduates of the years 1982, 1992, and 2002 for the empirical analysis. As a result of empirical analysis, the wage premium of a major-job match was identified. This result was consistent after the consideration of a sample selection bias and also after modeling the major-job match variable as polychotomously selective. Through an analysis classified by the major, this study identified a relatively high wage premium among Social Science, Engineering, and Science majors. However, there was a difference in the effect of selection among these majors. Also, by assessing cohort effects this study found that the skill mismatch had rapidly progressed in 1992, while difference between 1992 and 2002 cohorts are insignificant. The analysis suggests that wage level is better understood within the context of both sample selection and major-job match, and regardless of model specification the major-job match affects wage strongly.

  • PDF

A GENERALIZED MODEL-BASED OPTIMAL SAMPLE SELECTION METHOD

  • Hong, Ki-Hak;Lee, Gi-Sung;Son, Chang-Kyoon
    • Journal of applied mathematics & informatics
    • /
    • v.9 no.2
    • /
    • pp.807-815
    • /
    • 2002
  • We consider a more general linear regression super-population model than the one of Chaudhuri and Stronger(1992) . We can find the same type of the best linear unbiased(BLU) predictor as that of Chaudhuri and Stenger and see that the optimal design is again a purposive one which prescribes choosing one of the samples of size n which has $\chi$ closest to $\bar{X}$.

Two-Stage Penalized Composite Quantile Regression with Grouped Variables

  • Bang, Sungwan;Jhun, Myoungshic
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.4
    • /
    • pp.259-270
    • /
    • 2013
  • This paper considers a penalized composite quantile regression (CQR) that performs a variable selection in the linear model with grouped variables. An adaptive sup-norm penalized CQR (ASCQR) is proposed to select variables in a grouped manner; in addition, the consistency and oracle property of the resulting estimator are also derived under some regularity conditions. To improve the efficiency of estimation and variable selection, this paper suggests the two-stage penalized CQR (TSCQR), which uses the ASCQR to select relevant groups in the first stage and the adaptive lasso penalized CQR to select important variables in the second stage. Simulation studies are conducted to illustrate the finite sample performance of the proposed methods.

Selection of features and hidden Markov model parameters for English word recognition from Leap Motion air-writing trajectories

  • Deval Verma;Himanshu Agarwal;Amrish Kumar Aggarwal
    • ETRI Journal
    • /
    • v.46 no.2
    • /
    • pp.250-262
    • /
    • 2024
  • Air-writing recognition is relevant in areas such as natural human-computer interaction, augmented reality, and virtual reality. A trajectory is the most natural way to represent air writing. We analyze the recognition accuracy of words written in air considering five features, namely, writing direction, curvature, trajectory, orthocenter, and ellipsoid, as well as different parameters of a hidden Markov model classifier. Experiments were performed on two representative datasets, whose sample trajectories were collected using a Leap Motion Controller from a fingertip performing air writing. Dataset D1 contains 840 English words from 21 classes, and dataset D2 contains 1600 English words from 40 classes. A genetic algorithm was combined with a hidden Markov model classifier to obtain the best subset of features. Combination ftrajectory, orthocenter, writing direction, curvatureg provided the best feature set, achieving recognition accuracies on datasets D1 and D2 of 98.81% and 83.58%, respectively.