• Title/Summary/Keyword: 로그 회귀분석

Search Result 92, Processing Time 0.046 seconds

Predicting claim size in the auto insurance with relative error: a panel data approach (상대오차예측을 이용한 자동차 보험의 손해액 예측: 패널자료를 이용한 연구)

  • Park, Heungsun
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.697-710
    • /
    • 2021
  • Relative error prediction is preferred over ordinary prediction methods when relative/percentile errors are regarded as important, especially in econometrics, software engineering and government official statistics. The relative error prediction techniques have been developed in linear/nonlinear regression, nonparametric regression using kernel regression smoother, and stationary time series models. However, random effect models have not been used in relative error prediction. The purpose of this article is to extend relative error prediction to some of generalized linear mixed model (GLMM) with panel data, which is the random effect models based on gamma, lognormal, or inverse gaussian distribution. For better understanding, the real auto insurance data is used to predict the claim size, and the best predictor and the best relative error predictor are comparatively illustrated.

Diagnostic Evaluation and Alternative Plans of Public Libraries in Taegu Metropolitan City (공공도서관의 진단적 평가와 대안모색 - 대구광역시를 중심으로 -)

  • Yoon, Hee-Yoon
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.34 no.2
    • /
    • pp.47-67
    • /
    • 2000
  • This study is to evaluate the performance and suggest alternative plans of the public libraries in Taegu Metropolitan City. Regression analysis is used to fit a log-log equation to the total budget-output for all the public libraries. The results of the performance evaluation confirm that there is diseconomies of scale in library operations(${\Sigma}b_n$ 3,732). Therefore, all the public libraries should be improve productivity by restructuring of library identity, rational selection of physical sites and new branch library planning, userbased budget allocation and collection development, staffing and reengineering, optimalization of web site, and enforcement of service functions, etc.

  • PDF

Performance Evaluation of Loss Functions and Composition Methods of Log-scale Train Data for Supervised Learning of Neural Network (신경 망의 지도 학습을 위한 로그 간격의 학습 자료 구성 방식과 손실 함수의 성능 평가)

  • Donggyu Song;Seheon Ko;Hyomin Lee
    • Korean Chemical Engineering Research
    • /
    • v.61 no.3
    • /
    • pp.388-393
    • /
    • 2023
  • The analysis of engineering data using neural network based on supervised learning has been utilized in various engineering fields such as optimization of chemical engineering process, concentration prediction of particulate matter pollution, prediction of thermodynamic phase equilibria, and prediction of physical properties for transport phenomena system. The supervised learning requires training data, and the performance of the supervised learning is affected by the composition and the configurations of the given training data. Among the frequently observed engineering data, the data is given in log-scale such as length of DNA, concentration of analytes, etc. In this study, for widely distributed log-scaled training data of virtual 100×100 images, available loss functions were quantitatively evaluated in terms of (i) confusion matrix, (ii) maximum relative error and (iii) mean relative error. As a result, the loss functions of mean-absolute-percentage-error and mean-squared-logarithmic-error were the optimal functions for the log-scaled training data. Furthermore, we figured out that uniformly selected training data lead to the best prediction performance. The optimal loss functions and method for how to compose training data studied in this work would be applied to engineering problems such as evaluating DNA length, analyzing biomolecules, predicting concentration of colloidal suspension.

Data Analysis and Mining for Fish Growth Data in Fish-Farms (양식장 어류 생육 데이터 분석 및 마이닝)

  • Seoung-Bin Ye;Jeong-Seon Park;Soon-Hee Han;Hyi-Thaek Ceong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.1
    • /
    • pp.127-142
    • /
    • 2023
  • The management of size and weight, which are the growth information of aquaculture fish in fish-farms, is the most basic goal. In this study, the epoch is defined in fish-farms from the time of stocking or dividing to the time of shipment, and the growth data for a total of three epoch is analyzed from a time series perspective. Growth information such as the size and weight of aquaculture fish that occur over time in fish-farms is compared and analyzed with water quality environmental information and feeding information, and a model is presented using the analysis results. In this study, linear, exponential, and logarithmic regression models are presented using the Box-Jenkins method for size and weight by epoch using data obtained in the field.

Prediction of the Number of Crimes according to Urban Environmental Factors in the Metropolitan Area (수도권 도시 환경 요인에 따른 범죄 발생 건수 예측)

  • Ye-Won Jang;Ye-Lim Kim;Si-Hyeon Park;Jae-Young Lee;Yoo-Jin Moon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.01a
    • /
    • pp.321-322
    • /
    • 2023
  • 본 논문에서는 Scikit-learn 패키지의 LinearRegression 모델과 Keras 딥러닝 모델을 활용하여 수도권 도시 환경 요인에 따른 범죄 발생 건수를 예측 모델을 제안한다. 연구 방법으로 범죄 발생과 유의미한 관계가 있다고 파악되는 수도권의 각 자치구 별 데이터셋을 분석하여, CCTV, 파출소, 가로등의 수가 범죄 발생에 유의미한 영향을 끼치는 것을 확인하였다. 독립 변수들 간에 Scale을 줄이고자 정규화를 진행했고, 종속변수의 정규성 확보를 위해 로그변환을 취했다. 손실 함수는 회귀문제에서 사용되는 'relu'함수를 사용했고 모델의 성능을 확인할 수 있는 지표로 MSE(Mean Squared Error)를 사용해 모델을 구성하였다. 본 논문에서 설계한 이 프로그램은 범죄 발생율이 높은 지역구에 경찰 인력의 추가적 배치, 안전 시설 확충 등 실무적 조치를 취함에 있어 근거를 제공할 수 있을 것으로 사료된다.

  • PDF

A Study on the Factors Affecting the Arson (방화 발생에 영향을 미치는 요인에 관한 연구)

  • Kim, Young-Chul;Bak, Woo-Sung;Lee, Su-Kyung
    • Fire Science and Engineering
    • /
    • v.28 no.2
    • /
    • pp.69-75
    • /
    • 2014
  • This study derives the factors which affect the occurrence of arson from statistical data (population, economic, and social factors) by multiple regression analysis. Multiple regression analysis applies to 4 forms of functions, linear functions, semi-log functions, inverse log functions, and dual log functions. Also analysis respectively functions by using the stepwise progress which considered selection and deletion of the independent variable factors by each steps. In order to solve a problem of multiple regression analysis, autocorrelation and multicollinearity, Variance Inflation Factor (VIF) and the Durbin-Watson coefficient were considered. Through the analysis, the optimal model was determined by adjusted Rsquared which means statistical significance used determination, Adjusted R-squared of linear function is scored 0.935 (93.5%), the highest of the 4 forms of function, and so linear function is the optimal model in this study. Then interpretation to the optimal model is conducted. As a result of the analysis, the factors affecting the arson were resulted in lines, the incidence of crime (0.829), the general divorce rate (0.151), the financial autonomy rate (0.149), and the consumer price index (0.099).

Modeling Traffic Accident Characteristics and Severity Related to Drinking-Driving (음주교통사고 영향요인과 심각도 분석을 위한 모형설정)

  • Jang, Taeyoun;Park, Hyunchun
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.30 no.6D
    • /
    • pp.577-585
    • /
    • 2010
  • Traffic accidents are caused by several factors such as drivers, vehicles, and road environment. It is necessary to investigate and analyze them in advance to prevent similar and repetitive traffic accidents. Especially, the human factor is most significant element and traffic accidents by drinking-driving caused from human factor have become social problem to be paid attention to. The study analyzes traffic accidents resulting from drinking-driving and the effects of driver's attributes and environmental factors on them. The study is composed as two parts. First, the log-linear model is applied to analyze that accidents by drinking or non-drinking driving associate with road geometry, weather condition and personal characteristics. Probability is tested for drinking-driving accidents relative to non-drinking drive accidents. The study analyzes probability differences between genders, between ages, and between kinds of vehicles through odds multipliers. Second, traffic accidents related to drinking are classified into property damage, minor injury, heavy injury, and death according to their severity. Heavy injury is more serious than minor one and death is more serious than heavy injury. The ordinal regression models are established to find effecting factors on traffic accident severity.

A Study on the Factors Determining Officetel Price in Busan (부산지역 오피스텔 가격 결정요인 분석)

  • Choi, Yeol;Kim, Hyeong Jun;Yeo, Jung Hoon
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.35 no.3
    • /
    • pp.725-735
    • /
    • 2015
  • The aim of this study is to specifically understand the officetel market by empirical analysis for the determining factors that affect determining the price of the officetel in Busan. In my opinion, it can help officetel providers to select the appropriate size and location that analysis for the factors determining officetel price with market price, and also it can help customers officetel to choice depending on the purpose. So I was conducting this study. In this study, I analyzes the factors determining the price of Officetel using a OLS linear regression, semi-log model, and a robust regression-Busan area Officetel Real Transaction Price as the dependent variable and factors representing the physical characteristics, locational characteristics and regional characteristics as independent variables.

Survival analysis on the business types of small business using Cox's proportional hazard regression model (콕스 비례위험 모형을 이용한 중소기업의 업종별 생존율 및 생존요인 분석)

  • Park, Jin-Kyung;Oh, Kwang-Ho;Kim, Min-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.2
    • /
    • pp.257-269
    • /
    • 2012
  • Global crisis expedites the change in the environment of industry and puts small size enterprises in danger of mass bankruptcy. Because of this, domestic small size enterprises is an urgent need of restructuring. Based on the small business data registered in the Credit Guarantee Fund, we estimated the survival probability in the context of the survival analysis. We also analyzed the survival time which are distinguished depending on the types of business in the small business. Financial variables were also conducted using COX regression analysis of small businesses by types of business. In terms of types of business wholesale and retail trade industry and services were relatively high in the survival probability than light, heavy, and the construction industries. Especially the construction industry showed the lowest survival probability. In addition, we found that construction industry, the bigger BIS (bank of international settlements capital ratio) and current ratio are, the smaller default-rate is. But the bigger borrowing bond is, the bigger default-rate is. In the light industry, the bigger BIS and ROA (return on assets) are, the smaller a default-rate is. In the wholesale and retail trade industry, the bigger bis and current ratio are, the smaller a default-rate is. In the heavy industry, the bigger BIS, ROA, current ratio are, the smaller default-rate is. Finally, in the services industry, the bigger current ratio is, the smaller a default-rate is.

바젤2 자산상관계수 계산공식의 현실성 검토: 중소기업 대출 포트폴리오를 대상으로

  • Gwon, Tae-Go;Jeong, Jae-Man;Jo, Tae-Geun
    • 한국산학경영학회:학술대회논문집
    • /
    • 2004.11a
    • /
    • pp.73-100
    • /
    • 2004
  • 본 연구는 기업은행은 1999년${\sim}$2003년 중소기업 대출 자료로 바젤2 자산상관계수 계산공식의 현실성을 검토하였다. 실증분석 결과에 따르면, 자산상관계수는 매출규모와는 양(+)의 관계를, 신용등급과는 음(-)의 관계를 갖는 것으로 나타나 바젤2 계산공식이 상정하고 있는 자산상관계수 패턴이 국내에서도 현실성이 있었다. 이는 자산상관계수가 매출규모와 음(-)의 관계를 보이는 것으로 보고한 Kim-Park(2004)과 상반되는 결과이다. 또한, 바젤2에서는 60억원 이하의 매출규모에 대해서는 60억원으로 간주하고 있지만, 매출규모 60억원 이하에서도 자산상관계수가 매출규모와 양(+)의 관계를 갖는 것으로 나타났다. 바젤2 계산공식에 의해 산출된 자산상관계수는 자료로 추정한 자산상관계수가 비해 1.3배${\sim}$19.2배 높으며, 이러한 차이는 통계적으로 유의할 뿐 만 아니라 경제적으로도 유의하다. 회귀분석 결과에 의하면, 바젤2 자산상관계수의 상향편의는 주로 계산공식에서 절편을 과도하게 높게 설정하였기 때문에 발생한 것으로 나타났으며, 바젤2에서는 매출규모와 자산상관계수간의 관계를 선형으로 설정하였지만, 로그선형이 실제 자료를 더 잘 적합시키는 것으로 나타났다. 이상의 결과로 보건대, 바젤2의 자산상관계수 계산공식은 비교적 현실적으로 고아된어져 있지만, 국내의 실정에 맞게 조정하기 위해서 보다 광범위한 실증분석이 필요한 것으로 판단된다.

  • PDF