• Title/Summary/Keyword: linear probability models

Search Result 94, Processing Time 0.025 seconds

Generalized Partially Linear Additive Models for Credit Scoring

  • Shim, Ju-Hyun;Lee, Young-K.
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.4
    • /
    • pp.587-595
    • /
    • 2011
  • Credit scoring is an objective and automatic system to assess the credit risk of each customer. The logistic regression model is one of the popular methods of credit scoring to predict the default probability; however, it may not detect possible nonlinear features of predictors despite the advantages of interpretability and low computation cost. In this paper, we propose to use a generalized partially linear model as an alternative to logistic regression. We also introduce modern ensemble technologies such as bagging, boosting and random forests. We compare these methods via a simulation study and illustrate them through a German credit dataset.

Effects on Regression Estimates under Misspecified Generalized Linear Mixed Models for Counts Data

  • Jeong, Kwang Mo
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.6
    • /
    • pp.1037-1047
    • /
    • 2012
  • The generalized linear mixed model(GLMM) is widely used in fitting categorical responses of clustered data. In the numerical approximation of likelihood function the normality is assumed for the random effects distribution; subsequently, the commercial statistical packages also routinely fit GLMM under this normality assumption. We may also encounter departures from the distributional assumption on the response variable. It would be interesting to investigate the impact on the estimates of parameters under misspecification of distributions; however, there has been limited researche on these topics. We study the sensitivity or robustness of the maximum likelihood estimators(MLEs) of GLMM for counts data when the true underlying distribution is normal, gamma, exponential, and a mixture of two normal distributions. We also consider the effects on the MLEs when we fit Poisson-normal GLMM whereas the outcomes are generated from the negative binomial distribution with overdispersion. Through a small scale Monte Carlo study we check the empirical coverage probabilities of parameters and biases of MLEs of GLMM.

Studies on the Stochastic Generation of Synthetic Streamflow Sequences(I) -On the Simulation Models of Streamflow- (하천유량의 추계학적 모의발생에 관한 연구(I) -하천유량의 Simulation 모델에 대하여-)

  • 이순탁
    • Water for future
    • /
    • v.7 no.1
    • /
    • pp.71-77
    • /
    • 1974
  • This paper reviews several different single site generation models for further development of a model for generating the Synthetic sequences of streamflow in the continuous streams like main streams in Korea. Initially the historical time series is looked using a time series technique, that is correlograms, to determine whether a lag one Markov model will satisfactorily represent the historical data. The single site models which were examined include an empirical model using the historical probability distribution of the random component, the linear autoregressive model(Markov model, or Thomas-Fiering model) using both logarithms of the data and Matala's log-normal transformation equations, and finally gamma distribution model.

  • PDF

Competitive Influence Maximization on Online Social Networks under Cost Constraint

  • Chen, Bo-Lun;Sheng, Yi-Yun;Ji, Min;Liu, Ji-Wei;Yu, Yong-Tao;Zhang, Yue
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1263-1274
    • /
    • 2021
  • In online competitive social networks, each user can be influenced by different competing influencers and consequently chooses different products. But their interest may change over time and may have swings between different products. The existing influence spreading models seldom take into account the time-related shifts. This paper proposes a minimum cost influence maximization algorithm based on the competitive transition probability. In the model, we set a one-dimensional vector for each node to record the probability that the node chooses each different competing influencer. In the process of propagation, the influence maximization on Competitive Linear Threshold (IMCLT) spreading model is proposed. This model does not determine by which competing influencer the node is activated, but sets different weights for all competing influencers. In the process of spreading, we select the seed nodes according to the cost function of each node, and evaluate the final influence based on the competitive transition probability. Experiments on different datasets show that the proposed minimum cost competitive influence maximization algorithm based on IMCLT spreading model has excellent performance compared with other methods, and the computational performance of the method is also reasonable.

Reliability of an elastic bar under tension in a corrosive environment

  • Elishakoff, Isaac;Soret, Clement
    • Ocean Systems Engineering
    • /
    • v.2 no.3
    • /
    • pp.173-187
    • /
    • 2012
  • In this study we investigate the reliability of a bar subjected to a random tensile load in presence of corrosion. We consider linear, quadratic and exponential models that connect the stress in the bar with the corrosion rate. Two probability densities are considered for the load, with attendant derivation of the time-dependant reliability. The design time of operation is determined utilizing the requirement that the reliability must not be less than the required value.

A Study on the Quantitative Evaluation of Outdoor-Recreational Function and User Satisfaction with Urban Park and Open Space (도시공원녹지에 대한 실외위락기능과 만족도의 계량적 평가에 관한 연구)

  • 박승범
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.18 no.4
    • /
    • pp.127-140
    • /
    • 1991
  • The Primary purpose of this study is to investigate factors and variables which have significant effects on user satisfaction with recreational facilities in Taejong-Dae recreational complex, thereby establishing indices of planning and development of urban parks and open space. To test the causal models of this research, the date were gathered by self-administered questionnaires from 967 households in Pusan City which were selected by the multi-stage probability sampling methood. The analysis of the multi-stage primarily consists of two phase : The first analysis dealt exploratory factor analysis which identified major factors involved in satisfaction with recreational activities and facilities in Taejong-Dae recreational complex and the second analysis tested the fit of the causal models of this research by employing LISREL methodology. There are three advantages of using LISREL over other multivariate analysis methods : First, measurement error is allowed and calculated in LISREL, otherwise there is a risk of seriously misleading estimates of coefficients ; Second, LISREL deals with latent variables or unmeasured variables ; Third, it enables to test causal relations among variables. The factors analysis identified that five factors are involved in satisfaction with recreational facilities. The five factors of satisfaction with recreational facilities are space for repose and relaxation, active recreation facilities such as pool and zoo, physical exercise facility, convenience and maintenance facility, and linear facility, and linear facility for walking. The second phase analysis tested the fit of the causal models for satisfaction with recreational facilities to the data and identified statistically significant causal linkage among overall satisfaction with Taejong-Dae recreational complex, other endogenous factors and exogenous variables. Overall fits of both causal models were very good. Among endogenous factors, facility for repose and relaxation. linear facility for walking, active recreation facility, facility for convenience and maintenance were identified as having significant effects on overall satisfaction. Exogenous variables which have significant effects on endogenous variables wer also identified. These significant relationships indicate important factors and variables that should be considered in planning and development of the recreational complex. On the basis of these significant causal relationships, implications for planning and the delovepment of Taejong-Dae recreational complex were suggested.

  • PDF

Review of the Existing Relative Biological Effectiveness Models for Carbon Ion Beam Therapy

  • Kim, Yejin;Kim, Jinsung;Cho, Seungryong
    • Progress in Medical Physics
    • /
    • v.31 no.1
    • /
    • pp.1-7
    • /
    • 2020
  • Hadron therapy, such as carbon and helium ions, is increasingly coming to the fore for the treatment of cancers. Such hadron therapy has several advantages over conventional radiotherapy using photons and electrons physically and clinically. These advantages are due to the different physical and biological characteristics of heavy ions including high linear energy transfer and Bragg peak, which lead to the reduced exit dose, lower normal tissue complication probability and the increased relative biological effectiveness (RBE). Despite the promising prospects on the carbon ion radiation therapy, it is in dispute with which bio-mathematical models to calculate the carbon ion RBE. The two most widely used models are local effect model and microdosimetric kinetic model, which are actively utilized in Europe and Japan respectively. Such selection on the RBE model is a crucial issue in that the dose prescription for planning differs according to the models. In this study, we aim to (i) introduce the concept of RBE, (ii) clarify the determinants of RBE, and (iii) compare the existing RBE models for carbon ion therapy.

Parameter estimation for the imbalanced credit scoring data using AUC maximization (AUC 최적화를 이용한 낮은 부도율 자료의 모수추정)

  • Hong, C.S.;Won, C.H.
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.2
    • /
    • pp.309-319
    • /
    • 2016
  • For binary classification models, we consider a risk score that is a function of linear scores and estimate the coefficients of the linear scores. There are two estimation methods: one is to obtain MLEs using logistic models and the other is to estimate by maximizing AUC. AUC approach estimates are better than MLEs when using logistic models under a general situation which does not support logistic assumptions. This paper considers imbalanced data that contains a smaller number of observations in the default class than those in the non-default for credit assessment models; consequently, the AUC approach is applied to imbalanced data. Various logit link functions are used as a link function to generate imbalanced data. It is found that predicted coefficients obtained by the AUC approach are equivalent to (or better) than those from logistic models for low default probability - imbalanced data.

Performance Based Seismic Design State of Practice, 2012 Manila, Philippines

  • Sy, Jose A.;Anwar, Naveed;HtutAung, Thaung;Rayamajhi, Deepak
    • International Journal of High-Rise Buildings
    • /
    • v.1 no.3
    • /
    • pp.203-209
    • /
    • 2012
  • The purpose of this paper is to present the state of practice being used in the Philippines for the performance-based seismic design of reinforced concrete tall buildings. Initially, the overall methodology follows "An Alternative Procedure for Seismic Analysis and Design of Tall Buildings Located in the Los Angeles Region, 2008", which was developed by Los Angeles Tall Buildings Structural Design Council. After 2010, the design procedure follows "Tall Buildings Initiative, Guidelines for Performance-Based Seismic Design of Tall Buildings, 2010" developed by Pacific Earthquake Engineering Research Center (PEER). After the completion of preliminary design in accordance with code-based design procedures, the performance of the building is checked for serviceable behaviour for frequent earthquakes (50% probability of exceedance in 30 years, i.e,, with 43-year return period) and very low probability of collapse under extremely rare earthquakes (2% of probability of exceedance in 50 years, i.e., 2475-year return period). In the analysis, finite element models with various complexity and refinements are used in different types of analyses using, linear-static, multi-mode pushover, and nonlinear-dynamic analyses, as appropriate. Site-specific seismic input ground motions are used to check the level of performance under the potential hazard, which is likely to be experienced. Sample project conducted using performance-based seismic design procedures is also briefly presented.

Protein Secondary Structure Prediction using Multiple Neural Network Likelihood Models

  • Kim, Seong-Gon;Kim, Yong-Gi
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.10 no.4
    • /
    • pp.314-318
    • /
    • 2010
  • Predicting Alpha-helicies, Beta-sheets and Turns of a proteins secondary structure is a complex non-linear task that has been approached by several techniques such as Neural Networks, Genetic Algorithms, Decision Trees and other statistical or heuristic methods. This project introduces a new machine learning method by combining Bayesian Inference with offline trained Multilayered Perceptron (MLP) models as the likelihood for secondary structure prediction of proteins. With varying window sizes of neighboring amino acid information, the information is extracted and passed back and forth between the Neural Net and the Bayesian Inference process until the posterior probability of the secondary structure converges.