• Title/Summary/Keyword: Random regression model

Search Result 494, Processing Time 0.026 seconds

Predicting Gross Box Office Revenue for Domestic Films

  • Song, Jongwoo;Han, Suji
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.4
    • /
    • pp.301-309
    • /
    • 2013
  • This paper predicts gross box office revenue for domestic films using the Korean film data from 2008-2011. We use three regression methods, Linear Regression, Random Forest and Gradient Boosting to predict the gross box office revenue. We only consider domestic films with a revenue size of at least KRW 500 million; relevant explanatory variables are chosen by data visualization and variable selection techniques. The key idea of analyzing this data is to construct the meaningful explanatory variables from the data sources available to the public. Some variables must be categorized to conduct more effective analysis and clustering methods are applied to achieve this task. We choose the best model based on performance in the test set and important explanatory variables are discussed.

A Bayesian inference for fixed effect panel probit model

  • Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.2
    • /
    • pp.179-187
    • /
    • 2016
  • The fixed effects panel probit model faces "incidental parameters problem" because it has a property that the number of parameters to be estimated will increase with sample size. The maximum likelihood estimation fails to give a consistent estimator of slope parameter. Unlike the panel regression model, it is not feasible to find an orthogonal reparameterization of fixed effects to get a consistent estimator. In this note, a hierarchical Bayesian model is proposed. The model is essentially equivalent to the frequentist's random effects model, but the individual specific effects are estimable with the help of Gibbs sampling. The Bayesian estimator is shown to reduce reduced the small sample bias. The maximum likelihood estimator in the random effects model is also efficient, which contradicts Green (2004)'s conclusion.

Analysis of Linear Regression Model with Two Way Correlated Errors

  • Ssong, Seuck-Heun
    • Journal of the Korean Statistical Society
    • /
    • v.29 no.2
    • /
    • pp.231-245
    • /
    • 2000
  • This paper considers a linear regression model with space and time data in where the disturbances follow spatially correlated error components. We provide the best linear unbiased predictor for the one way error components. We provide the best linear unbiased predictor for the one way error component model with spatial autocorrelation. Further, we derive two diagnostic test statistics for the assessment of model specification due to spatial dependence and random effects as an application of the Lagrange Multiplier principle.

  • PDF

Genetic Analysis of Milk Yield in First-Lactation Holstein Friesian in Ethiopia: A Lactation Average vs Random Regression Test-Day Model Analysis

  • Meseret, S.;Tamir, B.;Gebreyohannes, G.;Lidauer, M.;Negussie, E.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.28 no.9
    • /
    • pp.1226-1234
    • /
    • 2015
  • The development of effective genetic evaluations and selection of sires requires accurate estimates of genetic parameters for all economically important traits in the breeding goal. The main objective of this study was to assess the relative performance of the traditional lactation average model (LAM) against the random regression test-day model (RRM) in the estimation of genetic parameters and prediction of breeding values for Holstein Friesian herds in Ethiopia. The data used consisted of 6,500 test-day (TD) records from 800 first-lactation Holstein Friesian cows that calved between 1997 and 2013. Co-variance components were estimated using the average information restricted maximum likelihood method under single trait animal model. The estimate of heritability for first-lactation milk yield was 0.30 from LAM whilst estimates from the RRM model ranged from 0.17 to 0.29 for the different stages of lactation. Genetic correlations between different TDs in first-lactation Holstein Friesian ranged from 0.37 to 0.99. The observed genetic correlation was less than unity between milk yields at different TDs, which indicated that the assumption of LAM may not be optimal for accurate evaluation of the genetic merit of animals. A close look at estimated breeding values from both models showed that RRM had higher standard deviation compared to LAM indicating that the TD model makes efficient utilization of TD information. Correlations of breeding values between models ranged from 0.90 to 0.96 for different group of sires and cows and marked re-rankings were observed in top sires and cows in moving from the traditional LAM to RRM evaluations.

Machine learning-based analysis and prediction model on the strengthening mechanism of biopolymer-based soil treatment

  • Haejin Lee;Jaemin Lee;Seunghwa Ryu;Ilhan Chang
    • Geomechanics and Engineering
    • /
    • v.36 no.4
    • /
    • pp.381-390
    • /
    • 2024
  • The introduction of bio-based materials has been recommended in the geotechnical engineering field to reduce environmental pollutants such as heavy metals and greenhouse gases. However, bio-treated soil methods face limitations in field application due to short research periods and insufficient verification of engineering performance, especially when compared to conventional materials like cement. Therefore, this study aimed to develop a machine learning model for predicting the unconfined compressive strength, a representative soil property, of biopolymer-based soil treatment (BPST). Four machine learning algorithms were compared to determine a suitable model, including linear regression (LR), support vector regression (SVR), random forest (RF), and neural network (NN). Except for LR, the SVR, RF, and NN algorithms exhibited high predictive performance with an R2 value of 0.98 or higher. The permutation feature importance technique was used to identify the main factors affecting the strength enhancement of BPST. The results indicated that the unconfined compressive strength of BPST is affected by mean particle size, followed by biopolymer content and water content. With a reliable prediction model, the proposed model can present guidelines prior to laboratory testing and field application, thereby saving a significant amount of time and money.

A Study on Developing Crash Prediction Model for Urban Intersections Considering Random Effects (임의효과를 고려한 도심지 교차로 교통사고모형 개발에 관한 연구)

  • Lee, Sang Hyuk;Park, Min Ho;Woo, Yong Han
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.14 no.1
    • /
    • pp.85-93
    • /
    • 2015
  • Previous studies have estimated crash prediction models with the fixed effect model which assumes the fixed value of coefficients without considering characteristics of each intersections. However the fixed effect model would estimate under estimation of the standard error resulted in over estimation of t-value. In order to overcome these shortcomings, the random effect model can be used with considering heterogeneity of AADT, geometric information and unobserved factors. In this study, data collections from 89 intersections in Daejeon and estimates of crash prediction models were conducted using the random and fixed effect negative binomial regression model for comparison and analysis of two models. As a result of model estimates, AADT, speed limits, number of lanes, exclusive right turn pockets and front traffic signal were found to be significant. For comparing statistical significance of two models, the random effect model could be better statistical significance with -1537.802 of log-likelihood at convergence comparing with -1691.327 for the fixed effect model. Also likelihood ration value was computed as 0.279 for the random effect model and 0.207 for the fixed effect model. This mean that the random effect model can be improved for statistical significance of models comparing with the fixed effect model.

A Study on Prediction Techniques through Machine Learning of Real-time Solar Radiation in Jeju (제주 실시간 일사량의 기계학습 예측 기법 연구)

  • Lee, Young-Mi;Bae, Joo-Hyun;Park, Jeong-keun
    • Journal of Environmental Science International
    • /
    • v.26 no.4
    • /
    • pp.521-527
    • /
    • 2017
  • Solar radiation forecasts are important for predicting the amount of ice on road and the potential solar energy. In an attempt to improve solar radiation predictability in Jeju, we conducted machine learning with various data mining techniques such as tree models, conditional inference tree, random forest, support vector machines and logistic regression. To validate machine learning models, the results from the simulation was compared with the solar radiation data observed over Jeju observation site. According to the model assesment, it can be seen that the solar radiation prediction using random forest is the most effective method. The error rate proposed by random forest data mining is 17%.

Linear Inversion of Heat Flow Data (지각열류량(地殼熱流量)의 선형(線型) 반전(反轉))

  • Han, Wook
    • Economic and Environmental Geology
    • /
    • v.17 no.3
    • /
    • pp.163-169
    • /
    • 1984
  • A linear inversion of heat flow values using heat production data with reliable value is studied in this work. To evaluate 2-D problem, a thin vertical sheet model is considered. Making use of a relation based on potential theory, a new relation between $q_{rad}$ and $A_0$ is derived. The forward calculations with noise and without noise are shown. The inversion of random search is comparable to that of ridge regression method. The agreements between the computed and best fit after inversion suggest the importance of random search method in the inversion technique.

  • PDF

Development of Ship Valuation Model by Neural Network (신경망기법을 활용한 선박 가치평가 모델 개발)

  • Kim, Donggyun;Choi, Jung-Suk
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.27 no.1
    • /
    • pp.13-21
    • /
    • 2021
  • The purpose of this study is to develop the ship valuation model by utilizing the neural network model. The target of the valuation was secondhand VLCC. The variables were set as major factors inducing changes in the value of ship through prior research, and the corresponding data were collected on a monthly basis from January 2000 to August 2020. To determine the stability of subsequent variables, a multi-collinearity test was carried out and finally the research structure was designed by selecting six independent variables and one dependent variable. Based on this structure, a total of nine simulation models were designed using linear regression, neural network regression, and random forest algorithm. In addition, the accuracy of the evaluation results are improved through comparative verification between each model. As a result of the evaluation, it was found that the most accurate when the neural network regression model, which consist of a hidden layer composed of two layers, was simulated through comparison with actual VLCC values. The possible implications of this study first, creative research in terms of applying neural network model to ship valuation; this deviates from the existing formalized evaluation techniques. Second, the objectivity of research results was enhanced from a dynamic perspective by analyzing and predicting the factors of changes in the shipping. market.

Nonparametric Estimation of Discontinuous Variance Function in Regression Model

  • Kang, Kee-Hoon;Huh, Jib
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2002.11a
    • /
    • pp.103-108
    • /
    • 2002
  • We consider an estimation of discontinuous variance function in nonparametric heteroscedastic random design regression model. We first propose estimators of a change point and jump size in variance function and then construct an estimator of entire variance function. We examine the rates of convergence of these estimators and give results on their asymptotics. Numerical work reveals that the effectiveness of change point analysis in variance function estimation is quite significant.

  • PDF