• Title/Summary/Keyword: modified regression model

Search Result 237, Processing Time 0.026 seconds

Comparative Application of Various Machine Learning Techniques for Lithology Predictions (다양한 기계학습 기법의 암상예측 적용성 비교 분석)

  • Jeong, Jina;Park, Eungyu
    • Journal of Soil and Groundwater Environment
    • /
    • v.21 no.3
    • /
    • pp.21-34
    • /
    • 2016
  • In the present study, we applied various machine learning techniques comparatively for prediction of subsurface structures based on multiple secondary information (i.e., well-logging data). The machine learning techniques employed in this study are Naive Bayes classification (NB), artificial neural network (ANN), support vector machine (SVM) and logistic regression classification (LR). As an alternative model, conventional hidden Markov model (HMM) and modified hidden Markov model (mHMM) are used where additional information of transition probability between primary properties is incorporated in the predictions. In the comparisons, 16 boreholes consisted with four different materials are synthesized, which show directional non-stationarity in upward and downward directions. Futhermore, two types of the secondary information that is statistically related to each material are generated. From the comparative analysis with various case studies, the accuracies of the techniques become degenerated with inclusion of additive errors and small amount of the training data. For HMM predictions, the conventional HMM shows the similar accuracies with the models that does not relies on transition probability. However, the mHMM consistently shows the highest prediction accuracy among the test cases, which can be attributed to the consideration of geological nature in the training of the model.

An Improved Machine Learning-Based Short Message Service Spam Detection System

  • Odukoya Oluwatoyin;Akinyemi Bodunde;Gooding Titus;Aderounmu Ganiyu
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.10
    • /
    • pp.182-190
    • /
    • 2024
  • The use of Short Message Services (SMS) as a mechanism of communication has resulted to loss of sensitive information such as credit card details, medical information and bank account details (user name and password). Several Machine learning-based approaches have been proposed to address this problem, but they are still unable to detect modified SMS spam messages more accurately. Thus, in this research, a stack- ensemble of four machine learning algorithms consisting of Random Forest (RF), Logistic Regression (LR), Multilayer Perceptron (MLP), and Support Vector Machine (SVM), were employed to detect more accurately SMS spams. The simulation was carried out using Python Scikit- learn tools. The performance evaluation of the proposed model was carried out by benchmarking it with an existing model. The evaluation results showed that the proposed model has an increase of 3.03% of accuracy, 8.94% of Recall, 2.17% of F-measure; and a decrease of 4.55% of Precision over the existing model. In conclusion, the ensemble method performed better than any individual algorithms and can be adopted by the Network service providers for better Quality of Service.

Development of a Model for Calculating the Construction Duration of Urban Residential Housing Based on Multiple Regression Analysis (다중 회귀분석 기반 도시형 생활주택의 공사기간 산정 모델 개발)

  • Kim, Jun-Sang;Kim, Young Suk
    • Land and Housing Review
    • /
    • v.12 no.4
    • /
    • pp.93-101
    • /
    • 2021
  • As the number of small households (1 to 2 persons per household) in Korea gradually increases, so does the importance of housing supply policies for small households. In response to the increase in small households, the government has been continuously supplying urban housing for these households. Since housing for small households is a sales and rental business similar to apartments and general business facilities, it is important for the building owner to calculate the project's estimated construction duration during the planning stage. Review of literature found a model for estimating the duration of construction of large-scale buildings but not for small-scale buildings such as urban housing for small households. Therefore this study aimed to develop and verify a model for estimating construction duration for urban housing at the planning stage based on multiple regression analysis. Independent variables inputted into the estimation model were building site area, building gross floor area, number of below ground floors, number of above ground floors, number of buildings, and location. The modified coefficient of determination (Ra2) of the model was 0.547. The developed model resulted in a Root Mean Square Error (RMSE) of 171.26 days and a Mean Absolute Percentage Error (MAPE) of 26.53%. The developed estimation model is expected to provide reliable construction duration calculations for small-scale urban residential buildings during the planning stage of a project.

Feasibility of a Clinical-Radiomics Model to Predict the Outcomes of Acute Ischemic Stroke

  • Yiran Zhou;Di Wu;Su Yan;Yan Xie;Shun Zhang;Wenzhi Lv;Yuanyuan Qin;Yufei Liu;Chengxia Liu;Jun Lu;Jia Li;Hongquan Zhu;Weiyin Vivian Liu;Huan Liu;Guiling Zhang;Wenzhen Zhu
    • Korean Journal of Radiology
    • /
    • v.23 no.8
    • /
    • pp.811-820
    • /
    • 2022
  • Objective: To develop a model incorporating radiomic features and clinical factors to accurately predict acute ischemic stroke (AIS) outcomes. Materials and Methods: Data from 522 AIS patients (382 male [73.2%]; mean age ± standard deviation, 58.9 ± 11.5 years) were randomly divided into the training (n = 311) and validation cohorts (n = 211). According to the modified Rankin Scale (mRS) at 6 months after hospital discharge, prognosis was dichotomized into good (mRS ≤ 2) and poor (mRS > 2); 1310 radiomics features were extracted from diffusion-weighted imaging and apparent diffusion coefficient maps. The minimum redundancy maximum relevance algorithm and the least absolute shrinkage and selection operator logistic regression method were implemented to select the features and establish a radiomics model. Univariable and multivariable logistic regression analyses were performed to identify the clinical factors and construct a clinical model. Ultimately, a multivariable logistic regression analysis incorporating independent clinical factors and radiomics score was implemented to establish the final combined prediction model using a backward step-down selection procedure, and a clinical-radiomics nomogram was developed. The models were evaluated using calibration, receiver operating characteristic (ROC), and decision curve analyses. Results: Age, sex, stroke history, diabetes, baseline mRS, baseline National Institutes of Health Stroke Scale score, and radiomics score were independent predictors of AIS outcomes. The area under the ROC curve of the clinical-radiomics model was 0.868 (95% confidence interval, 0.825-0.910) in the training cohort and 0.890 (0.844-0.936) in the validation cohort, which was significantly larger than that of the clinical or radiomics models. The clinical radiomics nomogram was well calibrated (p > 0.05). The decision curve analysis indicated its clinical usefulness. Conclusion: The clinical-radiomics model outperformed individual clinical or radiomics models and achieved satisfactory performance in predicting AIS outcomes.

Estimation of Soil Depth Using Improved Topographic Attributes in Mountainous Area (개선된 지형학적 속성을 이용한 산악지역의 토심 예측)

  • Shin, Hosung;Bang, Eun-Seok
    • Journal of the Korean Geotechnical Society
    • /
    • v.40 no.6
    • /
    • pp.125-137
    • /
    • 2024
  • Soil depth results from bedrock weathering, erosion, transport, and deposition are critical in landslide stability analysis and sediment-related disasters. This study proposes a soil depth prediction model for mountainous regions using multiple linear regression analysis based on topographic attributes. The specific catchment area (SCA), a key indicator in multiple regression models, was initially developed as a hydrological parameter for runoff estimation. However, for soil depth prediction, the initial triggering volume must be adjusted to account for slope failures based on the topographic slope. The SCA is calculated using the infinite flow direction model for flow tracing and the priority-flood algorithm for depression flattening. In addition, a modified contributing area equation is derived by incorporating slope-dependent initial triggering volume adjustments, thereby enabling the calculation of an improved SCA that is applicable to large mountainous regions. Analyses conducted in the Umyeonsan and Dongjak-gu areas of Seoul reveals that slope, topographic wetness index, and sediment transport index are suitable independent variables for soil depth prediction. The soil depth prediction equation derived from the multiple linear regression model exhibits no multicollinearity issues and demonstrates statistical significance. Residual analysis confirms that the assumptions of normality and homoscedasticity are satisfied. The proposed soil depth prediction method is expected to be systematically applied to various regions in South Korea, thereby contributing to the development of a nationwide soil depth distribution map and supporting practical solutions for various issues, e.g., slope stability assessments in mountainous areas.

Design of Self-Organizing Fuzzy Polynomial Neural Networks Architecture (자기구성 퍼지 다항식 뉴럴 네트워크 구조의 설계)

  • Park, Ho-Sung;Park, Keon-Jun;Oh, Sung-Kwun
    • Proceedings of the KIEE Conference
    • /
    • 2003.07d
    • /
    • pp.2519-2521
    • /
    • 2003
  • In this paper, we propose Self-Organizing Fuzzy Polynomial Neural Networks(SOFPNN) architecture for optimal model identification and discuss a comprehensive design methodology supporting its development. It is shown that this network exhibits a dynamic structure as the number of its layers as well as the number of nodes in each layer of the SOFPNN are not predetermined (as this is the case in a popular topology of a multilayer perceptron). As the form of the conclusion part of the rules, especially the regression polynomial uses several types of high-order polynomials such as linear, quadratic, and modified quadratic. As the premise part of the rules, both triangular and Gaussian-like membership function are studied and the number of the premise input variables used in the rules depends on that of the inputs of its node in each layer. We introduce two kinds of SOFPNN architectures, that is, the basic and modified one with both the generic and the advanced type. The superiority and effectiveness of the proposed SOFPNN architecture is demonstrated through nonlinear function numerical example.

  • PDF

A Study on the Optimal Performance Control of Heat Pump System for Heating Mode Operation (열펌프 시스템의 난방 운전 시 최적 성능 제어에 관한 연구)

  • Yoo, Keun-Joong;Lee, Il-Hwan;Lee, Gil-Bong;Kim, Min-Soo
    • Proceedings of the SAREK Conference
    • /
    • 2006.06a
    • /
    • pp.669-674
    • /
    • 2006
  • The optimal control of heat pump performance for heating mode operation was investigated. Fuzzy logic was applied to control the heating performance of heat pump system and superheat at compressor discharge was taken as a control variable. Regression model was adapted to determine the optimal points where COP is maximized. Optimization of fuzzy rule table was investigated to improve operation performance of heat pump system. Experiments were carried out using original fuzzy table and the modified fuzzy rule table for heating mode operation of heat pump system. The results show that control performance of heat pump system with the modified fuzzy rule table was better than that with the original rule table.

  • PDF

Determining the existence of unit roots based on detrended data (추세 제거된 시계열을 이용한 단위근 식별)

  • Na, Okyoung
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.205-223
    • /
    • 2021
  • In this paper, we study a method to determine the existence of unit roots by using the adaptive lasso. The previously proposed method that applied the adaptive lasso to the original time series has low power when there is an unknown trend. Therefore, we propose a modified version that fits the ADF regression model without deterministic component using the adaptive lasso to the detrended series instead of the original series. Our Monte Carlo simulation experiments show that the modified method improves the power over the original method and works well in large samples.

Development of A Three-Variable Canopy Photosynthetic Rate Model of Romaine Lettuce (Lactuca sativa L.) Grown in Plant Factory Modules Using Light Intensity, Temperature, and Growth Stage (광도, 온도, 생육 시기에 따른 식물공장 모듈 재배 로메인 상추의 3 변수 군락 광합성 모델 개발)

  • Jung, Dae Ho;Yoon, Hyo In;Son, Jung Eek
    • Journal of Bio-Environment Control
    • /
    • v.26 no.4
    • /
    • pp.268-275
    • /
    • 2017
  • The photosynthetic rates of crops depend on growth environment factors, such as light intensity and temperature, and their photosynthetic efficiencies vary with growth stage. The objective of this study was to compare two different models expressing canopy photosynthetic rates of romaine lettuce (Lactuca sativa L., cv. Asia Heuk romaine) using three variables of light intensity, temperature, and growth stage. The canopy photosynthetic rates of the plants were measured 4, 7, 14, 21, and 28 days after transplanting at closed acrylic chambers ($1.0{\times}0.8{\times}0.5m$) using light-emitting diodes, in which indoor temperature and light intensity were designed to change from 19 to $28^{\circ}C$ and 50 to $500{\mu}mol{\cdot}m^{-2}{\cdot}s^{-1}$, respectively. At an initial $CO_2$ concentration of $2,000{\mu}mol{\cdot}mol^{-1}$, the canopy photosynthetic rate began to be calculated with $CO_2$ decrement over time. A simple multiplication model expressed by simply multiplying three single-variable models and a modified rectangular hyperbola model were compared. The modified rectangular hyperbola model additionally included photochemical efficiency, carboxylation conductance, and dark respiration which vary with temperature and growth stage. In validation, $R^2$ value was 0.849 in the simple multiplication model, while it increased to 0.861 in the modified rectangular hyperbola model. It was found that the modified rectangular hyperbola model was more suitable than the simple multiplication model in expressing the canopy photosynthetic rates affected by environmental factors (light Intensity and temperature) and growth factor (growth stage) in plant factory modules.

A Study of Air Freight Forecasting Using the ARIMA Model (ARIMA 모델을 이용한 항공운임예측에 관한 연구)

  • Suh, Sang-Sok;Park, Jong-Woo;Song, Gwangsuk;Cho, Seung-Gyun
    • Journal of Distribution Science
    • /
    • v.12 no.2
    • /
    • pp.59-71
    • /
    • 2014
  • Purpose - In recent years, many firms have attempted various approaches to cope with the continual increase of aviation transportation. The previous research into freight charge forecasting models has focused on regression analyses using a few influence factors to calculate the future price. However, these approaches have limitations that make them difficult to apply into practice: They cannot respond promptly to small price changes and their predictive power is relatively low. Therefore, the current study proposes a freight charge-forecasting model using time series data instead a regression approach. The main purposes of this study can thus be summarized as follows. First, a proper model for freight charge using the autoregressive integrated moving average (ARIMA) model, which is mainly used for time series forecast, is presented. Second, a modified ARIMA model for freight charge prediction and the standard process of determining freight charge based on the model is presented. Third, a straightforward freight charge prediction model for practitioners to apply and utilize is presented. Research design, data, and methodology - To develop a new freight charge model, this study proposes the ARIMAC(p,q) model, which applies time difference constantly to address the correlation coefficient (autocorrelation function and partial autocorrelation function) problem as it appears in the ARIMA(p,q) model and materialize an error-adjusted ARIMAC(p,q). Cargo Account Settlement Systems (CASS) data from the International Air Transport Association (IATA) are used to predict the air freight charge. In the modeling, freight charge data for 72 months (from January 2006 to December 2011) are used for the training set, and a prediction interval of 23 months (from January 2012 to November 2013) is used for the validation set. The freight charge from November 2012 to November 2013 is predicted for three routes - Los Angeles, Miami, and Vienna - and the accuracy of the prediction interval is analyzed using mean absolute percentage error (MAPE). Results - The result of the proposed model shows better accuracy of prediction because the MAPE of the error-adjusted ARIMAC model is 10% and the MAPE of ARIMAC is 11.2% for the L.A. route. For the Miami route, the proposed model also shows slightly better accuracy in that the MAPE of the error-adjusted ARIMAC model is 3.5%, while that of ARIMAC is 3.7%. However, for the Vienna route, the accuracy of ARIMAC is better because the MAPE of ARIMAC is 14.5% and the MAPE of the error-adjusted ARIMAC model is 15.7%. Conclusions - The accuracy of the error-adjusted ARIMAC model appears better when a route's freight charge variance is large, and the accuracy of ARIMA is better when the freight charge variance is small or has a trend of ascent or descent. From the results, it can be concluded that the ARIMAC model, which uses moving averages, has less predictive power for small price changes, while the error-adjusted ARIMAC model, which uses error correction, has the advantage of being able to respond to price changes quickly.