• Title/Summary/Keyword: Spatial regression modelling

Search Result 16, Processing Time 0.031 seconds

Regression-based algorithms for exploring the relationships in a cement raw material quarry

  • Tutmez, Bulent;Dag, Ahmet
    • Computers and Concrete
    • /
    • v.10 no.5
    • /
    • pp.457-467
    • /
    • 2012
  • Using appropriate raw materials for cement is crucial for providing the required products. Monitoring relationships and analyzing distributions in a cement material quarry are important stages in the process. CaO, one of the substantial chemical components, is included in some raw materials such as limestone and marl; furthermore, appraising spatial assessment of this chemical component is also very critical. In this study, spatial evaluation and monitoring of CaO concentrations in a cement site are considered. For this purpose, two effective regression-based models were applied to a cement quarry located in Turkey. For the assessment, some spatial models were developed and performance comparisons were carried out. The results show that the regression-based spatial modelling is an efficient methodology and it can be employed to evaluate spatially varying relationships in a cement quarry.

Threshold Modelling of Spatial Extremes - Summer Rainfall of Korea (공간 극단값의 분계점 모형 사례 연구 - 한국 여름철 강수량)

  • Hwang, Seungyong;Choi, Hyemi
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.4
    • /
    • pp.655-665
    • /
    • 2014
  • An adequate understanding and response to natural hazards such as heat wave, heavy rainfall and severe drought is required. We apply extreme value theory to analyze these abnormal weather phenomena. It is common for extremes in climatic data to be nonstationary in space and time. In this paper, we analyze summer rainfall data in South Korea using exceedance values over thresholds estimated by quantile regression with location information and time as covariates. We group weather stations in South Korea into 5 clusters and t extreme value models to threshold exceedances for each cluster under the assumption of independence in space and time as well as estimates of uncertainty for spatial dependence as proposed in Northrop and Jonathan (2011).

Application of a Statistical Interpolation Method to Correct Extreme Values in High-Resolution Gridded Climate Variables (고해상도 격자 기후자료 내 이상 기후변수 수정을 위한 통계적 보간법 적용)

  • Jeong, Yeo min;Eum, Hyung-Il
    • Journal of Climate Change Research
    • /
    • v.6 no.4
    • /
    • pp.331-344
    • /
    • 2015
  • A long-term gridded historical data at 3 km spatial resolution has been generated for practical regional applications such as hydrologic modelling. However, overly high or low values have been found at some grid points where complex topography or sparse observational network exist. In this study, the Inverse Distance Weighting (IDW) method was applied to properly smooth the overly predicted values of Improved GIS-based Regression Model (IGISRM), called the IDW-IGISRM grid data, at the same resolution for daily precipitation, maximum temperature and minimum temperature from 2001 to 2010 over South Korea. We tested various effective distances in the IDW method to detect an optimal distance that provides the highest performance. IDW-IGISRM was compared with IGISRM to evaluate the effectiveness of IDW-IGISRM with regard to spatial patterns, and quantitative performance metrics over 243 AWS observational points and four selected stations showing the largest biases. Regarding the spatial pattern, IDW-IGISRM reduced irrational overly predicted values, i. e. producing smoother spatial maps that IGISRM for all variables. In addition, all quantitative performance metrics were improved by IDW-IGISRM; correlation coefficient (CC), Index Of Agreement (IOA) increase up to 11.2% and 2.0%, respectively. Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) were also reduced up to 5.4% and 15.2% respectively. At the selected four stations, this study demonstrated that the improvement was more considerable. These results indicate that IDW-IGISRM can improve the predictive performance of IGISRM, consequently providing more reliable high-resolution gridded data for assessment, adaptation, and vulnerability studies of climate change impacts.

Optimizing Clustering and Predictive Modelling for 3-D Road Network Analysis Using Explainable AI

  • Rotsnarani Sethy;Soumya Ranjan Mahanta;Mrutyunjaya Panda
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.9
    • /
    • pp.30-40
    • /
    • 2024
  • Building an accurate 3-D spatial road network model has become an active area of research now-a-days that profess to be a new paradigm in developing Smart roads and intelligent transportation system (ITS) which will help the public and private road impresario for better road mobility and eco-routing so that better road traffic, less carbon emission and road safety may be ensured. Dealing with such a large scale 3-D road network data poses challenges in getting accurate elevation information of a road network to better estimate the CO2 emission and accurate routing for the vehicles in Internet of Vehicle (IoV) scenario. Clustering and regression techniques are found suitable in discovering the missing elevation information in 3-D spatial road network dataset for some points in the road network which is envisaged of helping the public a better eco-routing experience. Further, recently Explainable Artificial Intelligence (xAI) draws attention of the researchers to better interprete, transparent and comprehensible, thus enabling to design efficient choice based models choices depending upon users requirements. The 3-D road network dataset, comprising of spatial attributes (longitude, latitude, altitude) of North Jutland, Denmark, collected from publicly available UCI repositories is preprocessed through feature engineering and scaling to ensure optimal accuracy for clustering and regression tasks. K-Means clustering and regression using Support Vector Machine (SVM) with radial basis function (RBF) kernel are employed for 3-D road network analysis. Silhouette scores and number of clusters are chosen for measuring cluster quality whereas error metric such as MAE ( Mean Absolute Error) and RMSE (Root Mean Square Error) are considered for evaluating the regression method. To have better interpretability of the Clustering and regression models, SHAP (Shapley Additive Explanations), a powerful xAI technique is employed in this research. From extensive experiments , it is observed that SHAP analysis validated the importance of latitude and altitude in predicting longitude, particularly in the four-cluster setup, providing critical insights into model behavior and feature contributions SHAP analysis validated the importance of latitude and altitude in predicting longitude, particularly in the four-cluster setup, providing critical insights into model behavior and feature contributions with an accuracy of 97.22% and strong performance metrics across all classes having MAE of 0.0346, and MSE of 0.0018. On the other hand, the ten-cluster setup, while faster in SHAP analysis, presented challenges in interpretability due to increased clustering complexity. Hence, K-Means clustering with K=4 and SVM hybrid models demonstrated superior performance and interpretability, highlighting the importance of careful cluster selection to balance model complexity and predictive accuracy.

Heat-Wave Data Analysis based on the Zero-Inflated Regression Models (영-과잉 회귀모형을 활용한 폭염자료분석)

  • Kim, Seong Tae;Park, Man Sik
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2829-2840
    • /
    • 2018
  • The random variable with an arbitrary value or more is called semi-continuous variable or zero-inflated one in case that its boundary value is more frequently observed than expected. This means the boundary value is likely to be practically observed more than it should be theoretically under certain probability distribution. When the distribution considered is continuous, the variable is defined as semi-continuous and when one of discrete distribution is assumed for the variable, we regard it as zero-inflated. In this study, we introduce the two-part model, which consists of one part for modelling the binary response and the other part for modelling the variable greater than the boundary value. Especially, the zero-inflated regression models are explained by using Poisson distribution and negative binomial distribution. In real data analysis, we employ the zero-inflated regression models to estimate the number of days under extreme heat-wave circumstances during the last 10 years in South Korea. Based on the estimation results, we create prediction maps for the estimated number of days under heat-wave advisory and heat-wave warning by using the universal kriging, which is one of the spatial prediction methods.

Spatial Downscaling of Grid Precipitation Using Support Vector Machine Regression (SVM 회귀 모형을 활용한 격자 강우량 상세화 기법)

  • Moon, Heewon;Baik, Jongjin;Hwang, Sukhwan;Choi, Minha
    • Journal of Korea Water Resources Association
    • /
    • v.47 no.11
    • /
    • pp.1095-1105
    • /
    • 2014
  • A spatial downscaling method using the Support Vector Machine (SVM) Regression for 25 km Tropical Rainfall Measuring Mission (TRMM) Monthly precipitation is proposed. The nonlinear relationship among hydrometeorological variables and precipitation was effectively depicted by the SVM for predicting downscaled grid precipitation. The accuracy of spatially downscaled precipitation was estimated by comparing with rain gauge data from sixty-four stations and found to be improved than the original TRMM data in overall. Especially the positive bias of the original TRMM data was effectively removed after the downscaling procedure. The spatial distributions of 25 km and 1 km grid precipitation were generally similar, while the local spatial trend was better detected by 1 km grid precipitation. The downscaled grid data derived from the proposed method can be applied in hydrological modelling for higher accuracy and further be studied for developing optimized downscaling method incorporation other regression methods.

The Applicability of the Genetic Algorithm on Spatial Distribution of Demographic Characteristics (인구구조 공간분포 특성에 관한 유전자 알고리즘 적용방안)

  • Choei, Nae-Young;Lee, Kyung-Yoon
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.18 no.3
    • /
    • pp.49-56
    • /
    • 2010
  • The Genetic Algorithm is one of the population surface modelling tool in the field of urban and environmental research based on the gridded population data. Taking the East-Hwasung area as the case, this study first builds a gridded population data based on the GIS databases as well as municipal population survey data. The study then constructs the attribute values of the explanatory variables by way of GIS tools. The regression model constructed with the same variables is also run as a comparative purpose at the same time. It is shown that the GenAlg output predicted as much consistent and meaningful coefficient estimates for the explanatory variables as the regression model, indicating that it is a very useful interdisciplinary research tool to find optimal solutions in urban problems.

Analysis of Spatio-Temporal Patterns of Nighttime Light Brightness of Seoul Metropolitan Area using VIIRS-DNB Data (VIIRS-DNB 데이터를 이용한 수도권 야간 빛 강도의 시·공간 패턴 분석)

  • Zhu, Lei;Cho, Daeheon;Lee, Soyoung
    • Journal of Cadastre & Land InformatiX
    • /
    • v.47 no.2
    • /
    • pp.19-37
    • /
    • 2017
  • Visible Infrared Imaging Radiometer Suite Day-Night Band (VIIRS-DNB) data provides a much higher capability for observing and quantifying nighttime light (NTL) brightness in comparison with Defense Meteorological Satellite-Operational Linescan System (DMSP-OLS) data. In South Korea, there is little research on the detection of NTL brightness change using VIIRS-DNB data. This study analyzed the spatial distribution and change of NTL brightness between 2013 and 2016 using VIIRS-DNB data, and detected its spatial relation with possible influencing factors using regression models. The intra-year seasonality of NTL brightness in 2016 was also studied by analyzing the deviation and change clusters, as well as the influencing factors. Results are as follows: 1) The higher value of NTL brightness in 2013 and 2016 is concentrated in Seoul and its surrounding cities, which positively correlated with population density and residential areas, economic land use, and other factors; 2) There is a decreasing trend of NTL brightness from 2013 to 2016, which is obvious in Seoul, with the change of population density and area of industrial buildings as the main influencing factors; 3) Areas in Seoul, and some surrounding areas have high deviation of the intra-year NTL brightness, and 71% of the total areas have their highest NTL brightness in January, February, October, November and December; and 4) Change of NTL brightness between summer and winter demonstrated a significantly positive relation with snow cover area change, and a slightly and significantly negative relation with albedo change.

A Study on the Development of Model for Estimating the Thickness of Clay Layer of Soft Ground in the Nakdong River Estuary (낙동강 조간대 연약지반의 지역별 점성토층 두께 추정 모델 개발에 관한 연구)

  • Seongin, Ahn;Dong-Woo, Ryu
    • Tunnel and Underground Space
    • /
    • v.32 no.6
    • /
    • pp.586-597
    • /
    • 2022
  • In this study, a model was developed for the estimating the locational thickness information of the upper clay layer to be used for the consolidation vulnerability evaluation in the Nakdong river estuary. To estimate ground layer thickness information, we developed four spatial estimation models using machine learning algorithms, which are RF (Random Forest), SVR (Support Vector Regression) and GPR (Gaussian Process Regression), and geostatistical technique such as Ordinary Kriging. Among the 4,712 borehole data in the study area collected for model development, 2,948 borehole data with an upper clay layer were used, and Pearson correlation coefficient and mean squared error were used to quantitatively evaluate the performance of the developed models. In addition, for qualitative evaluation, each model was used throughout the study area to estimate the information of the upper clay layer, and the thickness distribution characteristics of it were compared with each other.

Assessment through Statistical Methods of Water Quality Parameters(WQPs) in the Han River in Korea

  • Kim, Jae Hyoun
    • Journal of Environmental Health Sciences
    • /
    • v.41 no.2
    • /
    • pp.90-101
    • /
    • 2015
  • Objective: This study was conducted to develop a chemical oxygen demand (COD) regression model using water quality monitoring data (January, 2014) obtained from the Han River auto-monitoring stations. Methods: Surface water quality data at 198 sampling stations along the six major areas were assembled and analyzed to determine the spatial distribution and clustering of monitoring stations based on 18 WQPs and regression modeling using selected parameters. Statistical techniques, including combined genetic algorithm-multiple linear regression (GA-MLR), cluster analysis (CA) and principal component analysis (PCA) were used to build a COD model using water quality data. Results: A best GA-MLR model facilitated computing the WQPs for a 5-descriptor COD model with satisfactory statistical results ($r^2=92.64$,$Q{^2}_{LOO}=91.45$,$Q{^2}_{Ext}=88.17$). This approach includes variable selection of the WQPs in order to find the most important factors affecting water quality. Additionally, ordination techniques like PCA and CA were used to classify monitoring stations. The biplot based on the first two principal components (PCs) of the PCA model identified three distinct groups of stations, but also differs with respect to the correlation with WQPs, which enables better interpretation of the water quality characteristics at particular stations as of January 2014. Conclusion: This data analysis procedure appears to provide an efficient means of modelling water quality by interpreting and defining its most essential variables, such as TOC and BOD. The water parameters selected in a COD model as most important in contributing to environmental health and water pollution can be utilized for the application of water quality management strategies. At present, the river is under threat of anthropogenic disturbances during festival periods, especially at upstream areas.