• Title/Summary/Keyword: Random Regression

Search Result 953, Processing Time 0.022 seconds

Optimal Rates of Convergence for Tensor Spline Regression Estimators

  • Koo, Ja-Yong
    • Journal of the Korean Statistical Society
    • /
    • v.19 no.2
    • /
    • pp.105-112
    • /
    • 1990
  • Let (X, Y) be a pair random variables and let f denote the regression function of the response Y on the measurement variable X. Let K(f) denote a derivative of f. The least squares method is used to obtain a tensor spline estimator $\hat{f}$ of f based on a random sample of size n from the distribution of (X, Y). Under some mild conditions, it is shown that $K(\hat{f})$ achieves the optimal rate of convergence for the estimation of K(f) in $L_2$ and $L_{\infty}$ norms.

  • PDF

Method to Construct Feature Functions of C-CRF Using Regression Tree Analysis (회귀나무 분석을 이용한 C-CRF의 특징함수 구성 방법)

  • Ahn, Gil Seung;Hur, Sun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.41 no.4
    • /
    • pp.338-343
    • /
    • 2015
  • We suggest a method to configure feature functions of continuous conditional random field (C-CRF). Regression tree and similarity analysis are introduced to construct the first and second feature functions of C-CRF, respectively. Rules from the regression tree are transformed to logic functions. If a logic in the set of rules is true for a data then it returns the corresponding value of leaf node and zero, otherwise. We build an Euclidean similarity matrix to define neighborhood, which constitute the second feature function. Using two feature functions, we make a C-CRF model and an illustrate example is provided.

A BERRY-ESSEEN TYPE BOUND OF REGRESSION ESTIMATOR BASED ON LINEAR PROCESS ERRORS

  • Liang, Han-Ying;Li, Yu-Yu
    • Journal of the Korean Mathematical Society
    • /
    • v.45 no.6
    • /
    • pp.1753-1767
    • /
    • 2008
  • Consider the nonparametric regression model $Y_{ni}\;=\;g(x_{ni})+{\epsilon}_{ni}$ ($1\;{\leq}\;i\;{\leq}\;n$), where g($\cdot$) is an unknown regression function, $x_{ni}$ are known fixed design points, and the correlated errors {${\epsilon}_{ni}$, $1\;{\leq}\;i\;{\leq}\;n$} have the same distribution as {$V_i$, $1\;{\leq}\;i\;{\leq}\;n$}, here $V_t\;=\;{\sum}^{\infty}_{j=-{\infty}}\;{\psi}_je_{t-j}$ with ${\sum}^{\infty}_{j=-{\infty}}\;|{\psi}_j|$ < $\infty$ and {$e_t$} are negatively associated random variables. Under appropriate conditions, we derive a Berry-Esseen type bound for the estimator of g($\cdot$). As corollary, by choice of the weights, the Berry-Esseen type bound can attain O($n^{-1/4}({\log}\;n)^{3/4}$).

Variable Selection with Regression Trees

  • Chang, Young-Jae
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.2
    • /
    • pp.357-366
    • /
    • 2010
  • Many tree algorithms have been developed for regression problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy when there are many noise variables. To handle this problem, we propose the multi-step GUIDE, which is a regression tree algorithm with a variable selection process. The multi-step GUIDE performs better than some of the well-known algorithms such as Random Forest and MARS. The results based on simulation study shows that the multi-step GUIDE outperforms other algorithms in terms of variable selection and prediction accuracy. It generally selects the important variables correctly with relatively few noise variables and eventually gives good prediction accuracy.

A review of tree-based Bayesian methods

  • Linero, Antonio R.
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.6
    • /
    • pp.543-559
    • /
    • 2017
  • Tree-based regression and classification ensembles form a standard part of the data-science toolkit. Many commonly used methods take an algorithmic view, proposing greedy methods for constructing decision trees; examples include the classification and regression trees algorithm, boosted decision trees, and random forests. Recent history has seen a surge of interest in Bayesian techniques for constructing decision tree ensembles, with these methods frequently outperforming their algorithmic counterparts. The goal of this article is to survey the landscape surrounding Bayesian decision tree methods, and to discuss recent modeling and computational developments. We provide connections between Bayesian tree-based methods and existing machine learning techniques, and outline several recent theoretical developments establishing frequentist consistency and rates of convergence for the posterior distribution. The methodology we present is applicable for a wide variety of statistical tasks including regression, classification, modeling of count data, and many others. We illustrate the methodology on both simulated and real datasets.

Crop Yield and Crop Production Predictions using Machine Learning

  • Divya Goel;Payal Gulati
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.9
    • /
    • pp.17-28
    • /
    • 2023
  • Today Agriculture segment is a significant supporter of Indian economy as it represents 18% of India's Gross Domestic Product (GDP) and it gives work to half of the nation's work power. Farming segment are required to satisfy the expanding need of food because of increasing populace. Therefore, to cater the ever-increasing needs of people of nation yield prediction is done at prior. The farmers are also benefited from yield prediction as it will assist the farmers to predict the yield of crop prior to cultivating. There are various parameters that affect the yield of crop like rainfall, temperature, fertilizers, ph level and other atmospheric conditions. Thus, considering these factors the yield of crop is thus hard to predict and becomes a challenging task. Thus, motivated this work as in this work dataset of different states producing different crops in different seasons is prepared; which was further pre-processed and there after machine learning techniques Gradient Boosting Regressor, Random Forest Regressor, Decision Tree Regressor, Ridge Regression, Polynomial Regression, Linear Regression are applied and their results are compared using python programming.

Maximum likelihood estimation of Logistic random effects model (로지스틱 임의선형 혼합모형의 최대우도 추정법)

  • Kim, Minah;Kyung, Minjung
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.6
    • /
    • pp.957-981
    • /
    • 2017
  • A generalized linear mixed model is an extension of a generalized linear model that allows random effect as well as provides flexibility in developing a suitable model when observations are correlated or when there are other underlying phenomena that contribute to resulting variability. We describe maximum likelihood estimation methods for logistic regression models that include random effects - the Laplace approximation, Gauss-Hermite quadrature, adaptive Gauss-Hermite quadrature, and pseudo-likelihood. Applications are provided with social science problems by analyzing the effect of mental health and life satisfaction on volunteer activities from Korean welfare panel data; in addition, we observe that the inclusion of random effects in the model leads to improved analyses with more reasonable inferences.

Estimation of Genetic Parameters for First Lactation Monthly Test-day Milk Yields using Random Regression Test Day Model in Karan Fries Cattle

  • Singh, Ajay;Singh, Avtar;Singh, Manvendra;Prakash, Ved;Ambhore, G.S.;Sahoo, S.K.;Dash, Soumya
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.29 no.6
    • /
    • pp.775-781
    • /
    • 2016
  • A single trait linear mixed random regression test-day model was applied for the first time for analyzing the first lactation monthly test-day milk yield records in Karan Fries cattle. The test-day milk yield data was modeled using a random regression model (RRM) considering different order of Legendre polynomial for the additive genetic effect (4th order) and the permanent environmental effect (5th order). Data pertaining to 1,583 lactation records spread over a period of 30 years were recorded and analyzed in the study. The variance component, heritability and genetic correlations among test-day milk yields were estimated using RRM. RRM heritability estimates of test-day milk yield varied from 0.11 to 0.22 in different test-day records. The estimates of genetic correlations between different test-day milk yields ranged 0.01 (test-day 1 [TD-1] and TD-11) to 0.99 (TD-4 and TD-5). The magnitudes of genetic correlations between test-day milk yields decreased as the interval between test-days increased and adjacent test-day had higher correlations. Additive genetic and permanent environment variances were higher for test-day milk yields at both ends of lactation. The residual variance was observed to be lower than the permanent environment variance for all the test-day milk yields.

Estimation of genetic parameters and trends for production traits of dairy cattle in Thailand using a multiple-trait multiple-lactation test day model

  • Buaban, Sayan;Puangdee, Somsook;Duangjinda, Monchai;Boonkum, Wuttigrai
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.33 no.9
    • /
    • pp.1387-1399
    • /
    • 2020
  • Objective: The objective of this study was to estimate the genetic parameters and trends for milk, fat, and protein yields in the first three lactations of Thai dairy cattle using a 3-trait,-3-lactation random regression test-day model. Methods: Data included 168,996, 63,388, and 27,145 test-day records from the first, second, and third lactations, respectively. Records were from 19,068 cows calving from 1993 to 2013 in 124 herds. (Co) variance components were estimated by Bayesian methods. Gibbs sampling was used to obtain posterior distributions. The model included herd-year-month of testing, breed group-season of calving-month in tested milk group, linear and quadratic age at calving as fixed effects, and random regression coefficients for additive genetic and permanent environmental effects, which were defined as modified constant, linear, quadratic, cubic and quartic Legendre coefficients. Results: Average daily heritabilities ranged from 0.36 to 0.48 for milk, 0.33 to 0.44 for fat and 0.37 to 0.48 for protein yields; they were higher in the third lactation for all traits. Heritabilities of test-day milk and protein yields for selected days in milk were higher in the middle than at the beginning or end of lactation, whereas those for test-day fat yields were high at the beginning and end of lactation. Genetics correlations (305-d yield) among production yields within lactations (0.44 to 0.69) were higher than those across lactations (0.36 to 0.68). The largest genetic correlation was observed between the first and second lactation. The genetic trends of 305-d milk, fat and protein yields were 230 to 250, 25 to 29, and 30 to 35 kg per year, respectively. Conclusion: A random regression model seems to be a flexible and reliable procedure for the genetic evaluation of production yields. It can be used to perform breeding value estimation for national genetic evaluation in the Thai dairy cattle population.

Business Intelligence Design for Strategic Decision Making for Small and Midium-size E-Commerce Sellers: Focusing on Promotion Strategy (중소 전자상거래 판매상의 전략적 의사결정을 위한 비즈니스 인텔리전스 설계: 프로모션 전략을 중심으로)

  • Seung-Joo Lee;Young-Hyun Lee;Jin-Hyun Lee;Kang-Hyun Lee;Kwang-Sup Shin
    • The Journal of Bigdata
    • /
    • v.8 no.2
    • /
    • pp.201-222
    • /
    • 2023
  • As the e-Commerce gets increased based on the platform, a lot of small and medium sized sellers have tried to develop the more effective strategies to maximize the profit. In order to increase the profitability, it is quite important to make the strategic decisions based on the range of promotion, discount rate and categories of products. This research aims to develop the business intelligence application which can help sellers of e-Commerce platform make better decisions. To decide whether or not to promote, it is needed to predict the level of increase in sales after promotion. I n this research, we have applied the various machine learning algorithm such as MLP(Multi Layer Perceptron), Gradient Boosting Regression, Random Forest, and Linear Regression. Because of the complexity of data structure and distinctive characteristics of product categories, Random Forest and MLP showed the best performance. It seems possible to apply the proposed approach in this research in support the small and medium sized sellers to react on the market changes and to make the reasonable decisions based on the data, not their own experience.