• Title/Summary/Keyword: Regression estimator

Search Result 311, Processing Time 0.025 seconds

Machine learning-based prediction of wind forces on CAARC standard tall buildings

  • Yi Li;Jie-Ting Yin;Fu-Bin Chen;Qiu-Sheng Li
    • Wind and Structures
    • /
    • v.36 no.6
    • /
    • pp.355-366
    • /
    • 2023
  • Although machine learning (ML) techniques have been widely used in various fields of engineering practice, their applications in the field of wind engineering are still at the initial stage. In order to evaluate the feasibility of machine learning algorithms for prediction of wind loads on high-rise buildings, this study took the exposure category type, wind direction and the height of local wind force as the input features and adopted four different machine learning algorithms including k-nearest neighbor (KNN), support vector machine (SVM), gradient boosting regression tree (GBRT) and extreme gradient (XG) boosting to predict wind force coefficients of CAARC standard tall building model. All the hyper-parameters of four ML algorithms are optimized by tree-structured Parzen estimator (TPE). The result shows that mean drag force coefficients and RMS lift force coefficients can be well predicted by the GBRT algorithm model while the RMS drag force coefficients can be forecasted preferably by the XG boosting algorithm model. The proposed machine learning based algorithms for wind loads prediction can be an alternative of traditional wind tunnel tests and computational fluid dynamic simulations.

A GEE approach for the semiparametric accelerated lifetime model with multivariate interval-censored data

  • Maru Kim;Sangbum Choi
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.4
    • /
    • pp.389-402
    • /
    • 2023
  • Multivariate or clustered failure time data often occur in many medical, epidemiological, and socio-economic studies when survival data are collected from several research centers. If the data are periodically observed as in a longitudinal study, survival times are often subject to various types of interval-censoring, creating multivariate interval-censored data. Then, the event times of interest may be correlated among individuals who come from the same cluster. In this article, we propose a unified linear regression method for analyzing multivariate interval-censored data. We consider a semiparametric multivariate accelerated failure time model as a statistical analysis tool and develop a generalized Buckley-James method to make inferences by imputing interval-censored observations with their conditional mean values. Since the study population consists of several heterogeneous clusters, where the subjects in the same cluster may be related, we propose a generalized estimating equations approach to accommodate potential dependence in clusters. Our simulation results confirm that the proposed estimator is robust to misspecification of working covariance matrix and statistical efficiency can increase when the working covariance structure is close to the truth. The proposed method is applied to the dataset from a diabetic retinopathy study.

Schematic Cost Estimation Method using Case-Based Reasoning: Focusing on Determining Attribute Weight (사례기반추론을 이용한 초기단계 공사비 예측 방법: 속성 가중치 산정을 중심으로)

  • Park, Moon-Seo;Seong, Ki-Hoon;Lee, Hyun-Soo;Ji, Sae-Hyun;Kim, Soo-Young
    • Korean Journal of Construction Engineering and Management
    • /
    • v.11 no.4
    • /
    • pp.22-31
    • /
    • 2010
  • Because the estimated cost at early stage has great influence on decisions of project owner, the importance of early cost estimation is increasing. However, it depends on experience and knowledge of the estimator mainly due to shortage of information. Those tendency developed into case-based reasoning(CBR) method which solves new problems by adapting previous solution to similar past problems. The performance of CBR model is affected by attribute weight, so that its accurate determination is necessary. Previous research utilizes mathematical method or subjective judgement of estimator. In order to improve the problem of previous research, this suggests CBR schematic cost estimation method using genetic algorithm to determine attribute weight. The cost model employs nearest neighbor retrieval for selecting past case. And it estimates the cost of new cases based on cost information of extracted cases. As the result of validation for 17 testing cases, 3.57% of error rate is calculated. This rate is superior to accuracy rate proposed by AACE and the method to determine attribute weight using multiple regression analysis and feature counting. The CBR cost estimation method improve the accuracy by introducing genetic algorithm for attribute weight. Moreover, this makes user understand the problem-solving process easier than other artificial intelligence method, and find solution within short time through case retrieval algorithm.

Spatial Upscaling of Aboveground Biomass Estimation using National Forest Inventory Data and Forest Type Map (국가산림자원조사 자료와 임상도를 이용한 지상부 바이오매스의 공간규모 확장)

  • Kim, Eun-Sook;Kim, Kyoung-Min;Lee, Jung-Bin;Lee, Seung-Ho;Kim, Chong-Chan
    • Journal of Korean Society of Forest Science
    • /
    • v.100 no.3
    • /
    • pp.455-465
    • /
    • 2011
  • In order to assess and mitigate climate change, the role of forest biomass as carbon sink has to be understood spatially and quantitatively. Since existing forest statistics can not provide spatial information about forest resources, it is needed to predict spatial distribution of forest biomass under an alternative scheme. This study focuses on developing an upscaling method that expands forest variables from plot to landscape scale to estimate spatially explicit aboveground biomass(AGB). For this, forest stand variables were extracted from National Forest Inventory(NFI) data and used to develop AGB regression models by tree species. Dominant/codominant height and crown density were used as explanatory variables of AGB regression models. Spatial distribution of AGB could be estimated using AGB models, forest type map and the stand height map that was developed by forest type map and height regression models. Finally, it was estimated that total amount of forest AGB in Danyang was 6,606,324 ton. This estimate was within standard error of AGB statistics calculated by sample-based estimator, which was 6,518,178 ton. This AGB upscaling method can provide the means that can easily estimate biomass in large area. But because forest type map used as base map was produced using categorical data, this method has limits to improve a precision of AGB map.

Comparison of Spatial Small Area Estimators Based on Neighborhood Information Systems (이웃정보시스템을 이용한 공간 소지역 추정량 비교)

  • Kim, Jeong-Suk;Hwang, Hee-Jin;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.5
    • /
    • pp.855-866
    • /
    • 2008
  • Recently many small area estimation methods using the lattice data analysis have been studied and known that they have good performances. In the case of using the lattice data which is mainly used for small area estimation, the choice of better neighborhood information system is very important for the efficiency of the data analysis. Recently Lee and Shin (2008) compared and analyzed some neighborhood information systems based on GIS methods. In this paper, we evaluate the effect of various neighborhood information systems which were suggested by Lee and Shin (2008). For comparison of the estimators, MSE, Coverage, Calibration, Regression methods are used. The number of unemployment in Economic Active Population Survey(2001) is used for the comparison.

Recent Decrease in Colorectal Cancer Mortality Rate is Affected by Birth Cohort in Korea

  • Jee, Yonho;Oh, Chang-Mo;Shin, Aesun
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.9
    • /
    • pp.3951-3955
    • /
    • 2015
  • Background: Colorectal cancer mortality has started to decrease in several developed countries in Asia. The current study aimed to present the long-term trends in colorectal cancer mortality in Korea using joinpoint analysis and age-period-cohort modeling. Materials and Methods: The number of colorectal cancer deaths and the population for each 5-year age group were obtained from Statistics Korea for the period 1984-2013 for adults 30 years and older. Joinpoint regression analysis was conducted to determine changes in trends in age-standardized mortality rates, and age-period-cohort analysis was performed to describe trends in colorectal cancer mortality using the intrinsic estimator method. Results: In men, the age-standardized mortality rate for colorectal cancer increased from 1984 to 2003, and the mortality rates stabilized thereafter, whereas the mortality rate of colorectal cancer in women has decreased since 2004. The age-specific mortality rate of colorectal cancer increased in both men and women over time, whereas decreases in the age-specific mortality rate in younger cohorts were observed. In the age-period-cohort analysis, old age and recent period were associated with higher mortality for both men and women. The birth cohort born after 1919 showed reduced colorectal cancer mortality in both men and women. Conclusions: Our study showed a recent decreasing trend in colorectal cancer mortality in women and a stable trend in men after 2003-2004. These changes in colorectal cancer mortality may be attributed to birth cohort effects.

A Study for the Drivers of Movie Box-office Performance (영화흥행 영향요인 선택에 관한 연구)

  • Kim, Yon Hyong;Hong, Jeong Han
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.3
    • /
    • pp.441-452
    • /
    • 2013
  • This study analyzed the relationship between key film and a box office record success factors based on movies released in the first quarter of 2013 in Korea. An over-fitting problem can happen if there are too many explanatory variables inserted to regression model; in addition, there is a risk that the estimator is instable when there is multi-collinearity among the explanatory variables. For this reason, optimal variable selection based on high explanatory variables in box-office performance is of importance. Among the numerous ways to select variables, LASSO estimation applied by a generalized linear model has the smallest prediction error that can efficiently and quickly find variables with the highest explanatory power to box-office performance in order.

Collapse Probability of a Low-rise Piloti-type Building Considering Domestic Seismic Hazard (국내 지진재해도를 고려한 저층 필로티 건물의 붕괴 확률)

  • Kim, Dae-Hwan;Kim, Taewan;Chu, Yurim
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.20 no.7_spc
    • /
    • pp.485-494
    • /
    • 2016
  • The risk-based assessment, also called time-based assessment of structure is usually performed to provide seismic risk evaluation of a target structure for its entire life-cycle, e.g. 50 years. The prediction of collapse probability is the estimator in the risk-based assessment. While the risk-based assessment is the key in the performance-based earthquake engineering, its application is very limited because this evaluation method is very expensive in terms of simulation and computational efforts. So the evaluation database for many archetype structures usually serve as representative of the specific system. However, there is no such an assessment performed for building stocks in Korea. Consequently, the performance objective of current building code, KBC is not clear at least in a quantitative way. This shortcoming gives an unresolved issue to insurance industry, socio-economic impact, seismic safety policy in national and local governments. In this study, we evaluate the comprehensive seismic performance of an low-rise residential buildings with discontinuous structural walls, so called piloti-type structure which is commonly found in low-rise domestic building stocks. The collapse probability is obtained using the risk integral of a conditioned collapse capacity function and regression of current hazard curve. Based on this approach it is expected to provide a robust tool to seismic safety policy as well as seismic risk analysis such as Probable Maximum Loss (PML) commonly used in the insurance industry.

A comparison on coefficient estimation methods in single index models (단일지표모형에서 계수 추정방법의 비교)

  • Choi, Young-Woong;Kang, Kee-Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.6
    • /
    • pp.1171-1180
    • /
    • 2010
  • It is well known that the asymptotic convergence rates of nonparametric regression estimator gets worse as the dimension of covariates gets larger. One possible way to overcome this problem is reducing the dimension of covariates by using single index models. Two coefficient estimation methods in single index models are introduced. One is semiparametric least square estimation method, which tries to find approximate solution by using iterative computation. The other one is weighted average derivative estimation method, which is non-iterative method. Both of these methods offer the parametric convergence rate to normal distribution. However, practical comparison of these two methods has not been done yet. In this article, we compare these methods by examining the variances of estimators in various models.

Improving Estimation Ability of Software Development Effort Using Principle Component Analysis (주성분분석을 이용한 소프트웨어 개발노력 추정능력 향상)

  • Lee, Sang-Un
    • The KIPS Transactions:PartD
    • /
    • v.9D no.1
    • /
    • pp.75-80
    • /
    • 2002
  • Putnam develops SLIM (Software LIfecycle Management) model based upon the assumption that the manpower utilization during software project development is followed by a Rayleigh distribution. To obtain the manpower distribution, we have to be estimate the total development effort and difficulty ratio parameter. We need a way to accurately estimate these parameters early in the requirements and specification phase before investment decisions have to be made. Statistical tests show that system attributes are highly correlation (redundant) so that Putnam discards one and get a parameter estimator from the other attributes. But, different statistical method has different system attributes and presents different performance. To select the principle system attributes, this paper uses the principle component analysis (PCA) instead of Putnam's method. The PCA's results improve a 9.85 percent performance more than the Putnam's result. Also, this model seems to be simple and easily realize.