• 제목/요약/키워드: 최소제곱회귀분석

Search Result 75, Processing Time 0.023 seconds

Prediction of movie audience numbers using hybrid model combining GLS and Bass models (GLS와 Bass 모형을 결합한 하이브리드 모형을 이용한 영화 관객 수 예측)

  • Kim, Bokyung;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.447-461
    • /
    • 2018
  • Domestic film industry sales are increasing every year. Theaters are the primary sales channels for movies and the number of audiences using the theater affects additional selling rights. Therefore, the number of audiences using the theater is an important factor directly linked to movie industry sales. In this paper we consider a hybrid model that combines a multiple linear regression model and the Bass model to predict the audience numbers for a specific day. By combining the two models, the predictive value of the regression analysis was corrected to that of the Bass model. In the analysis, three films with different release dates were used. All subset regression method is used to generate all possible combinations and 5-fold cross validation to estimate the model 5 times. In this case, the predicted value is obtained from the model with the smallest root mean square error and then combined with the predicted value of the Bass model to obtain the final predicted value. With the existence of past data, it was confirmed that the weight of the Bass model increases and the compensation is added to the predicted value.

Precipitation Analysis Based on Spatial Linear Regression Model (공간적 상관구조를 포함하는 선형회귀모형을 이용한 강수량 자료 분석)

  • Jung, Ji-Young;Jin, Seo-Hoon;Park, Man-Sik
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.6
    • /
    • pp.1093-1107
    • /
    • 2008
  • In this study, we considered linear regression model with various spatial dependency structures in order to make more reliable prediction of precipitation in South Korea. The prediction approaches are based on semi-variogram models fitted by least-squares estimation method and restricted maximum likelihood estimation method. We validated some candidate models from the two different estimation methods in terms of cross-validation and comparison between predicted values and observed values measured at different locations.

Identifying Regional Characteristics Faxtors Affecting the Number of Tuberculosis Death - The Comparative Analysis between Urban and Rural areas - (결핵 사망자수에 영향을 미치는 지역특성 요인 규명 - 도시 및 비도시지역 비교분석 -)

  • Yoon, Sanghoon;Park, Keunoh
    • Journal of the Society of Disaster Information
    • /
    • v.16 no.3
    • /
    • pp.513-525
    • /
    • 2020
  • Purpose: The purpose of this study is to analyze the characteristics of local factors affecting number of tuberculosis death by urban and rural areas. Method: The Partial Least Square(PLS) Regression analysis was used to solve the problem of multicollinearity and number of samples. Result: As a result of analysis, The number of tuberculosis deaths in urban and rural areas is about three times as large. As a result of analysis about Regional Characteristics Factor, In general, children, elderly people, and economically vulnerable populations are more likely to be exposed to tuberculosis. In differential results, it shows that environmental factors such as ultrafine dust and sulfur dioxide have a significant impact on the number of tuberculosis deaths in urban areas and social factors such as depression experience rate in rural areas. Conclusion: The Tuberculosis prevention and management policies that reflect the characteristics of urban and rural areas are needed in the future.

A Study on the Heterogeneity of Leisure Travel Time between Elderly and Non Elderly People - Focusing on urban and rural areas in south Chungcheong province - (고령자와 비고령자의 여가통행시간 이질성 연구 - 충남 도시권과 농어촌권을 중심으로 -)

  • Kim, Wonchul
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.12 no.5
    • /
    • pp.87-97
    • /
    • 2013
  • This study tried to explore the quantitative travel heterogeneity between elderly and non elderly people, focusing on urban and rural areas in south Chungcheong province. For the analysis, a PLS(Partial least square) model is applied with economic and traffic environment characteristics of the urban and rural areas. The characteristics of elderly and non elderly people in the urban and rural areas are derived from the 2011 person trip survey. As a result, the study found out that the key factors affect on elderly people in the urban and rural areas are bus operation interval, number of bus operation routes, number of household member, and a monthly average income of household. In case of non elderly people, areas economic factors such as GRDP, the rate of economic activity, and employment status as well as those of elderly people. Meanwhile, female elderly people in rural area have more sensitivity compared to male elderly people and the gender heterogeneity is not revealed in non elderly people.

Uncertainty Estimation of AR Model Parameters Using a Bayesian technique (Bayesian 기법을 활용한 AR Model 매개변수의 불확실성 추정)

  • Park, Chan-Young;Park, Jong-Hyeon;Park, Min-Woo;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2016.05a
    • /
    • pp.280-280
    • /
    • 2016
  • 특정 자료의 시간의 흐름에 따른 예측치를 추정하는 방법으로 AR Model 즉, 자기회귀모형이 많이 사용되고 있다. AR Model은 변수의 현재 값을 과거 값의 함수로 나타내게 되는데, 이런 시계열 분석 모델을 사용할 때 매개변수의 추정 과정이 필수적으로 요구된다. 일반적으로 매개변수를 추정하는 방법에는 확률적근사법(stochastic approximation), 최소제곱법(method of least square), 자기상관법(method of autocorrelation method), 최우도법(method of maximum likelihood) 등이 있다. AR Model에서 가장 많이 사용되는 최우도법은 표본크기가 충분히 클 때 가장 효율적인 방법으로 평가되지만 수치적으로 해를 구하는 과정이 복잡한 경우가 많으며, 해를 구하지 못하는 어려움이 따르기도 한다. 또한 표본 크기가 작을 때 일반적으로 잘 일치하지 않은 결과를 얻게 된다. 우리나라의 강우, 유량 등의 자료는 자료의 수가 적은 경우가 많기 때문에 최우도법을 통한 매개변수 추정 시 불확실성이 내재되어있지만 그것을 정량적으로 제시하는데 한계가 있다. 본 연구에서는 AR Model의 매개변수 추정 시 Bayesian 기법으로 매개변수의 사후분포(posterior distribution)를 제공하여 매개변수의 불확실성 구간을 정량적으로 표현하게 됨으로써, 시계열 분석을 통해 보다 신뢰성 있는 예측치를 얻을 수 있으리라 판단된다.

  • PDF

Identification of Evacuation Route Planning Elements for the Disabled by Considering Universal Design - A Study on the Welfare Center for the Disabled - (유니버설 디자인을 고려한 지체장애인 대피경로 계획요소 규명 - 장애인 종합복지관 시설을 대상으로 -)

  • Jung, Tae-Ho;Yang, Won-Jik
    • Journal of the Society of Disaster Information
    • /
    • v.18 no.3
    • /
    • pp.672-686
    • /
    • 2022
  • Purpose: This study derived the planning factors affecting the evacuation route of facilities for the disabled and to identify the planning factors that affect each facility. Method: The PLS(Partial Least Square)Regression analysis was used to solve the problem of multicollinearity and number of samples. Result: As a result of analysis, The most important planning elements for each facility were derived as door: closing time (1.131), corridor: ramp for wheelchairs (1.227), stairs: emergency lighting for stairs (1.117), and evacuation space: evacuation space convenience facilities (1.106). Conclusion: In order to plan an effective evacuation route for the disabled, a universal design should be applied to consider the perception, needs, and satisfaction of the disabled, rather than a comprehensive reflection.

Comparison of Principal Component Regression and Nonparametric Multivariate Trend Test for Multivariate Linkage (다변량 형질의 유전연관성에 대한 주성분을 이용한 회귀방법와 다변량 비모수 추세검정법의 비교)

  • Kim, Su-Young;Song, Hae-Hiang
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.1
    • /
    • pp.19-33
    • /
    • 2008
  • Linear regression method, proposed by Haseman and Elston(1972), for detecting linkage to a quantitative trait of sib pairs is a linkage testing method for a single locus and a single trait. However, multivariate methods for detecting linkage are needed, when information from each of several traits that are affected by the same major gene are available on each individual. Amos et al. (1990) extended the regression method of Haseman and Elston(1972) to incorporate observations of two or more traits by estimating the principal component linear function that results in the strongest correlation between the squared pair differences in the trait measurements and identity by descent at a marker locus. But, it is impossible to control the probability of type I errors with this method at present, since the exact distribution of the statistic that they use is yet unknown. In this paper, we propose a multivariate nonparametric trend test for detecting linkage to multiple traits. We compared with a simulation study the efficiencies of multivariate nonparametric trend test with those of the method developed by Amos et al. (1990) for quantitative traits data. For multivariate nonparametric trend test, the results of the simulation study reveal that the Type I error rates are close to the predetermined significance levels, and have in general high powers.

Electrostatic Prediction Embedded System based on PXA255 (PXA255 기반 정전기 예측 임베디드 시스템 개발)

  • Byeon, Chi-Nam;Kim, Kang-Chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.06a
    • /
    • pp.406-409
    • /
    • 2007
  • This paper proposes an algorithm that predicts current electrostatic charge in a factory. The algorithm based on LSM(Least Square Method) dynamically takes the number of sample while calculating the value of electrostatic charge. The simulation results show that the proposed algorithm gains 73.18161 standard deviation with 95% trust probability and is better than conventional algorithm. We design the electrostatic prediction embedded system based on pxa255 with the proposes algorithm.

  • PDF

Forensic Classification of Latent Fingerprints Applying Laser-induced Plasma Spectroscopy Combined with Chemometric Methods (케모메트릭 방법과 결합된 레이저 유도 플라즈마 분광법을 적용한 유류 지문의 법의학적 분류 연구)

  • Yang, Jun-Ho;Yoh, Jai-Ick
    • Korean Journal of Optics and Photonics
    • /
    • v.31 no.3
    • /
    • pp.125-133
    • /
    • 2020
  • An innovative method for separating overlapping latent fingerprints, using laser-induced plasma spectroscopy (LIPS) combined with multivariate analysis, is reported in the current study. LIPS provides the capabilities of real-time analysis and high-speed scanning, as well as data regarding the chemical components of overlapping fingerprints. These spectra provide valuable chemical information for the forensic classification and reconstruction of overlapping latent fingerprints, by applying appropriate multivariate analysis. This study utilizes principal-component analysis (PCA) and partial-least-squares (PLS) techniques for the basis classification of four types of fingerprints from the LIPS spectra. The proposed method is successfully demonstrated through a classification example of four distinct latent fingerprints, using discrimination such as soft independent modeling of class analogy (SIMCA) and partial-least-squares discriminant analysis (PLS-DA). This demonstration develops an accuracy of more than 85% and is proven to be sufficiently robust. In addition, by laser-scanning analysis at a spatial interval of 125 ㎛, the overlapping fingerprints were separated as two-dimensional forms.

Feature selection for text data via sparse principal component analysis (희소주성분분석을 이용한 텍스트데이터의 단어선택)

  • Won Son
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.501-514
    • /
    • 2023
  • When analyzing high dimensional data such as text data, if we input all the variables as explanatory variables, statistical learning procedures may suffer from over-fitting problems. Furthermore, computational efficiency can deteriorate with a large number of variables. Dimensionality reduction techniques such as feature selection or feature extraction are useful for dealing with these problems. The sparse principal component analysis (SPCA) is one of the regularized least squares methods which employs an elastic net-type objective function. The SPCA can be used to remove insignificant principal components and identify important variables from noisy observations. In this study, we propose a dimension reduction procedure for text data based on the SPCA. Applying the proposed procedure to real data, we find that the reduced feature set maintains sufficient information in text data while the size of the feature set is reduced by removing redundant variables. As a result, the proposed procedure can improve classification accuracy and computational efficiency, especially for some classifiers such as the k-nearest neighbors algorithm.