• 제목/요약/키워드: Multivariate regression models

검색결과 174건 처리시간 0.024초

통계분석을 이용한 경기도 대기 중 미세먼지 및 중금속 분포 특성 (Distribution Characteristics of PM10 and Heavy Metals in Ambient Air of Gyeonggi-do Area using Statistical Analysis)

  • 김종수;홍순모;김명숙;김요용;신은상
    • 한국대기환경학회지
    • /
    • 제30권3호
    • /
    • pp.281-290
    • /
    • 2014
  • This study was conducted to evaluate the distribution characteristics of $PM_{10}$ and heavy metals concentrations in the ambient air of Gyeonggi-do area by region and season from February, 2013 to March, 2014. The regression model for the prediction of formation characteristics and contamination degree of $PM_{10}$ and heavy metals by correlation analysis and regression analysis for using the multivariate statistical analysis was also established. The main wind direction during the investigation period was South East (SE) and West South West (WSW) winds, and the concentration of $SO_2$ at Ansan with industrial region showed 1.6 times higher than Suwon, Euiwang with residential region. The concentrations (median) of Pb, Cu and Ni at Ansan showed 3.2~4.5, 1.9~2.2 and 1.7~2.6 times respectively higher than those at Suwon. By the seasonal concentration variation, the concentrations of $PM_{10}$, Pb, Fe and As in winter and spring (December to May) showed 1.7, 1.9, 1.9 and 2.7 times respectively higher than those in summer and fall (June to November). As, Fe and $PM_{10}$ had a big difference by the seasonal factors, and Cu and Ni were evaluated to be influenced by the regional factors. From the results of correlation analysis among the target items, the correlation coefficient of PM and Mn had 0.82 (p/0.01) and that of Fe and Mn had 0.82 (p/0.01), which showed high correlation. And the correlation coefficients for $SO_2$ and Pb, CO and $PM_{10}$ were 0.66 (p/0.01) and 0.62 (p/0.01) respectively. The multiple linear regression models for $PM_{10}$, Pb, Cu, Cr, As, Ni, Fe and Mn were established by independent variables of CO, $SO_2$ and meteorological factors (wind speed, relative humidity). In the regression models, independent variable $SO_2$ was in cause-and-effect relationship with all dependent variables, and $PM_{10}$, Fe and Mn were influenced by CO and wind speed, and Pb, Cu, Ni and As had a main factor of $SO_2$.

Prediction of Time to Recurrence and Influencing Factors for Gastric Cancer in Iran

  • Roshanaei, Ghodratollah;Ghannad, Masoud Sabouri;Safari, Maliheh;Sadighi, Sanambar
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제13권6호
    • /
    • pp.2639-2642
    • /
    • 2012
  • Background: The patterns of gastric cancer recurrence vary across societies. We designed the current study in an attempt to evaluate and reveal the outbreak of the recurrence patterns of gastric cancer and also prediction of time to recurrence and its effected factors in Iran. Materials and Methods: This research was performed from March 2003 to February 2007. Demographic characteristics, clinical and pathological diagnosis and classification including pathologic stage, tumor grade, tumor site and tumor size in of patients with GC recurrent were collected from patients' data files. To evaluate of factors affected on the relapse of the GC patients, gender, age at diagnosis, treatment type and Hgb were included in the research. Data were analyzed using Kaplan-Meier and logistic regression models. Results: After treatment, 82 patients suffered recurrence, 42, 33 and 17 by the ends of first, second and third years. The mean ( SD) and median ( IQR) time to recurrence in patients with GC were 25.5 (20.6-30.1) and 21.5 (15.6-27.1) months, respectively. The results of multivariate analysis logistic regression showed that only pathologic stage, tumor grade and tumor site significantly affected the recurrence. Conclusions: We found that pathologic stage, tumor grade and tumor site significantly affect on the recurrence of GC which has a high positive prognostic value and might be functional for better follow-up and selecting the patients at risk. We also showed time to recurrence to be an important factor for follow-up of patients.

Regression and ANN models for durability and mechanical characteristics of waste ceramic powder high performance sustainable concrete

  • Behforouz, Babak;Memarzadeh, Parham;Eftekhar, Mohammadreza;Fathi, Farshid
    • Computers and Concrete
    • /
    • 제25권2호
    • /
    • pp.119-132
    • /
    • 2020
  • There is a growing interest in the use of by-product materials such as ceramics as alternative materials in construction. The aim of this study is to investigate the mechanical properties and durability of sustainable concrete containing waste ceramic powder (WCP), and to predict the results using artificial neural network (ANN). In this order, different water to binder (W/B) ratios of 0.3, 0.4, and 0.5 were considered, and in each W/B ratio, a percentage of cement (between 5-50%) was replaced with WCP. Compressive and tensile strengths, water absorption, electrical resistivity and rapid chloride permeability (RCP) of the concrete specimens having WCP were evaluated by related experimental tests. The results showed that by replacing 20% of the cement by WCP, the concrete achieves compressive and tensile strengths, more than 95% of those of the control concrete, in the long term. This percentage increases with decreasing W/B ratio. In general, by increasing the percentage of WCP replacement, all durability parameters are significantly improved. In order to validate and suggest a suitable tool for predicting the characteristics of the concrete, ANN model along with various multivariate regression methods were applied. The comparison of the proposed ANN with the regression methods indicates good accuracy of the developed ANN in predicting the mechanical properties and durability of this type of concrete. According to the results, the accuracy of ANN model for estimating the durability parameters did not significantly follow the number of hidden nodes.

지역별 회복기 재활 의료서비스 필요도 결정요인 분석 연구 (A Study on the Determinants of Convalescent Rehabilitation Medical Service Needs at Regional Level)

  • 김정훈;김희년;최용석;정형선
    • 보건행정학회지
    • /
    • 제33권1호
    • /
    • pp.40-54
    • /
    • 2023
  • Background: Based on the increase in the needs for convalescent rehabilitation medical services in Korea, this study aims to calculate the needs for rehabilitation services and examine its determinants for 229 regions. Methods: Claim data from the Health Insurance Review and Assessment Service were used to estimate patients who need to receive rehabilitation services, and data from various sources were also used for analysis. The number of cases and incidence rates of hospitalization related to convalescent rehabilitation were calculated to estimate the needs for services by region, and the results were visualized via a map. Multivariate regression and fixed effects regression using panel data were performed to identify the determinants of regional variation of the incidence rate. Results: First, the incidence rate of rural areas such as Jeolla-do, Gyeongsang-do, and Chungcheong-do was higher than urban areas (metropolitan cities). Second, the population, proportion of the elder, medical aid recipients, financial independence, traffic deaths, smoking, diabetes rate, and medical infrastructure correlated significantly with the incidence rate. Third, 'rho' values which mean the fraction of variance due to individual terms in panel data regression models were 0.965 and 0.976, respectively. Conclusion: The incidence rate of hospitalizations was correlated with most independent variables in this study and there is a gap between urban and rural areas. These regional disparities are fixed in our society. An improved regional convalescent rehabilitation system is suggested to cover the entire area including rural areas with a high rate of aging.

딥러닝 시계열 알고리즘 적용한 기업부도예측모형 유용성 검증 (Corporate Default Prediction Model Using Deep Learning Time Series Algorithm, RNN and LSTM)

  • 차성재;강정석
    • 지능정보연구
    • /
    • 제24권4호
    • /
    • pp.1-32
    • /
    • 2018
  • 본 연구는 경제적으로 국내에 큰 영향을 주었던 글로벌 금융위기를 기반으로 총 10년의 연간 기업데이터를 이용한다. 먼저 시대 변화 흐름에 일관성있는 부도 모형을 구축하는 것을 목표로 금융위기 이전(2000~2006년)의 데이터를 학습한다. 이후 매개 변수 튜닝을 통해 금융위기 기간이 포함(2007~2008년)된 유효성 검증 데이터가 학습데이터의 결과와 비슷한 양상을 보이고, 우수한 예측력을 가지도록 조정한다. 이후 학습 및 유효성 검증 데이터를 통합(2000~2008년)하여 유효성 검증 때와 같은 매개변수를 적용하여 모형을 재구축하고, 결과적으로 최종 학습된 모형을 기반으로 시험 데이터(2009년) 결과를 바탕으로 딥러닝 시계열 알고리즘 기반의 기업부도예측 모형이 유용함을 검증한다. 부도에 대한 정의는 Lee(2015) 연구와 동일하게 기업의 상장폐지 사유들 중 실적이 부진했던 경우를 부도로 선정한다. 독립변수의 경우, 기존 선행연구에서 이용되었던 재무비율 변수를 비롯한 기타 재무정보를 포함한다. 이후 최적의 변수군을 선별하는 방식으로 다변량 판별분석, 로짓 모형, 그리고 Lasso 회귀분석 모형을 이용한다. 기업부도예측 모형 방법론으로는 Altman(1968)이 제시했던 다중판별분석 모형, Ohlson(1980)이 제시한 로짓모형, 그리고 비시계열 기계학습 기반 부도예측모형과 딥러닝 시계열 알고리즘을 이용한다. 기업 데이터의 경우, '비선형적인 변수들', 변수들의 '다중 공선성 문제', 그리고 '데이터 수 부족'이란 한계점이 존재한다. 이에 로짓 모형은 '비선형성'을, Lasso 회귀분석 모형은 '다중 공선성 문제'를 해결하고, 가변적인 데이터 생성 방식을 이용하는 딥러닝 시계열 알고리즘을 접목함으로서 데이터 수가 부족한 점을 보완하여 연구를 진행한다. 현 정부를 비롯한 해외 정부에서는 4차 산업혁명을 통해 국가 및 사회의 시스템, 일상생활 전반을 아우르기 위해 힘쓰고 있다. 즉, 현재는 다양한 산업에 이르러 빅데이터를 이용한 딥러닝 연구가 활발히 진행되고 있지만, 금융 산업을 위한 연구분야는 아직도 미비하다. 따라서 이 연구는 기업 부도에 관하여 딥러닝 시계열 알고리즘 분석을 진행한 초기 논문으로서, 금융 데이터와 딥러닝 시계열 알고리즘을 접목한 연구를 시작하는 비 전공자에게 비교분석 자료로 쓰이기를 바란다.

Role of CD10 Immunohistochemical Expression in Predicting Aggressive Behavior of Phylloides Tumors

  • Tariq, Muhammad Usman;Haroon, Saroona;Kayani, Naila
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제16권8호
    • /
    • pp.3147-3152
    • /
    • 2015
  • Background: Phylloides tumors are rare breast neoplasms with a variable clinical course depending on the tumor category. Along with histologic features, the role of immunohistochemical staining has been studied in predicting their behavior. Objectives: Our aim was to evaluate the role of CD 10 immunohistochemical staining in predicting survival, recurrence and metastasis in phylloides tumor. We also evaluated correlations of other clinicopathological features with overall and disease-free survival. Materials and Methods: CD10 expression was studied in 82 phylloides tumors divided into recurrent/metastatic and non-recurrent/non-metastatic cohorts. The Chi-square test was applied to determine the significance of differences in CD10 expression between outcome cohorts. Uni and multivariate survival analyses were also performed using log-rank test and Cox regression hazard models. Results: All 3 metastatic cases, 5 out of 6 (83.3%) recurrent cases and 37out of 73 (50.7%) non-recurrent and non-metastatic cases expressed significant (2+ or 3+) staining for CD10. This expression significantly varied between outcome cohorts (p<0.03). Tumor category and histological features including mitotic count and necrosis correlated significantly with recurrence and metastasis. A significant decrease in overall and disease free survival was seen with CD10 positivity, malignant category, increased mitoses and necrosis. Neither CD10 expression nor any other clinicopathologic feature proved to be an independent prognostic indicator in multivariate analysis. Conclusions: CD10 immunohistochemical staining can be used as a predictive tool for phylloides tumor but this expression should be interpreted in conjunction with tumor category.

Evaluation of benzene residue in edible oils using Fourier transform infrared (FTIR) spectroscopy

  • Joshi, Ritu;Cho, Byoung-Kwan;Lohumi, Santosh;Joshi, Rahul;Lee, Jayoung;Lee, Hoonsoo;Mo, Changyeun
    • 농업과학연구
    • /
    • 제46권2호
    • /
    • pp.257-271
    • /
    • 2019
  • The use of food grade hexane (FGH) for edible oil extraction is responsible for the presence of benzene in the crude oil. Benzene is a Group 1 carcinogen and could pose a serious threat to the health of consumer. However, its detection still depends on classical methods using chromatography which requires a rapid non-destructive detection method. Hence, the aim of this study was to investigate the feasibility of using Fourier transform infrared (FTIR) spectroscopy combined with multivariate analysis to detect and quantify the benzene residue in edible oil (sesame and cottonseed oil). Oil samples were adulterated with varying quantities of benzene, and their FTIR spectra were acquired with an attenuated total reflectance (ATR) method. Optimal variables for a partial least-squares regression (PLSR) model were selected using the variable importance in projection (VIP) and the selectivity ratio (SR) methods. The developed PLS models with whole variables and the VIP- and SR-selected variables were validated against an independent data set which resulted in $R^2$ values of 0.95, 0.96, and 0.95 and standard error of prediction (SEP) values of 38.5, 33.7, and 41.7 mg/L, respectively. The proposed technique of FTIR combined with multivariate analysis and variable selection methods can detect benzene residuals in edible oils with the advantages of being fast and simple and thus, can replace the conventional methods used for the same purpose.

Prediction models of compressive strength and UPV of recycled material cement mortar

  • Wang, Chien-Chih;Wang, Her-Yung;Chang, Shu-Chuan
    • Computers and Concrete
    • /
    • 제19권4호
    • /
    • pp.419-427
    • /
    • 2017
  • With the rising global environmental awareness on energy saving and carbon reduction, as well as the environmental transition and natural disasters resulted from the greenhouse effect, waste resources should be efficiently used to save environmental space and achieve environmental protection principle of "sustainable development and recycling". This study used recycled cement mortar and adopted the volumetric method for experimental design, which replaced cement (0%, 10%, 20%, 30%) with recycled materials (fly ash, slag, glass powder) to test compressive strength and ultrasonic pulse velocity (UPV). The hyperbolic function for nonlinear multivariate regression analysis was used to build prediction models, in order to study the effect of different recycled material addition levels (the function of $R_m$(F, S, G) was used and be a representative of the content of recycled materials, such as fly ash, slag and glass) on the compressive strength and UPV of cement mortar. The calculated results are in accordance with laboratory-measured data, which are the mortar compressive strength and UPV of various mix proportions. From the comparison between the prediction analysis values and test results, the coefficient of determination $R^2$ and MAPE (mean absolute percentage error) value of compressive strength are 0.970-0.988 and 5.57-8.84%, respectively. Furthermore, the $R^2$ and MAPE values for UPV are 0.960-0.987 and 1.52-1.74%, respectively. All of the $R^2$ and MAPE values are closely to 1.0 and less than 10%, respectively. Thus, the prediction models established in this study have excellent predictive ability of compressive strength and UPV for recycled materials applied in cement mortar.

Estimating the Survival of Patients With Lung Cancer: What Is the Best Statistical Model?

  • Abedi, Siavosh;Janbabaei, Ghasem;Afshari, Mahdi;Moosazadeh, Mahmood;Alashti, Masoumeh Rashidi;Hedayatizadeh-Omran, Akbar;Alizadeh-Navaei, Reza;Abedini, Ehsan
    • Journal of Preventive Medicine and Public Health
    • /
    • 제52권2호
    • /
    • pp.140-144
    • /
    • 2019
  • Objectives: Investigating the survival of patients with cancer is vitally necessary for controlling the disease and for assessing treatment methods. This study aimed to compare various statistical models of survival and to determine the survival rate and its related factors among patients suffering from lung cancer. Methods: In this retrospective cohort, the cumulative survival rate, median survival time, and factors associated with the survival of lung cancer patients were estimated using Cox, Weibull, exponential, and Gompertz regression models. Kaplan-Meier tables and the log-rank test were also used to analyze the survival of patients in different subgroups. Results: Of 102 patients with lung cancer, 74.5% were male. During the follow-up period, 80.4% died. The incidence rate of death among patients was estimated as 3.9 (95% confidence [CI], 3.1 to 4.8) per 100 person-months. The 5-year survival rate for all patients, males, females, patients with non-small cell lung carcinoma (NSCLC), and patients with small cell lung carcinoma (SCLC) was 17%, 13%, 29%, 21%, and 0%, respectively. The median survival time for all patients, males, females, those with NSCLC, and those with SCLC was 12.7 months, 12.0 months, 16.0 months, 16.0 months, and 6.0 months, respectively. Multivariate analyses indicated that the hazard ratios (95% CIs) for male sex, age, and SCLC were 0.56 (0.33 to 0.93), 1.03 (1.01 to 1.05), and 2.91 (1.71 to 4.95), respectively. Conclusions: Our results showed that the exponential model was the most precise. This model identified age, sex, and type of cancer as factors that predicted survival in patients with lung cancer.

Sampling Strategies for Computer Experiments: Design and Analysis

  • Lin, Dennis K.J.;Simpson, Timothy W.;Chen, Wei
    • International Journal of Reliability and Applications
    • /
    • 제2권3호
    • /
    • pp.209-240
    • /
    • 2001
  • Computer-based simulation and analysis is used extensively in engineering for a variety of tasks. Despite the steady and continuing growth of computing power and speed, the computational cost of complex high-fidelity engineering analyses and simulations limit their use in important areas like design optimization and reliability analysis. Statistical approximation techniques such as design of experiments and response surface methodology are becoming widely used in engineering to minimize the computational expense of running such computer analyses and circumvent many of these limitations. In this paper, we compare and contrast five experimental design types and four approximation model types in terms of their capability to generate accurate approximations for two engineering applications with typical engineering behaviors and a wide range of nonlinearity. The first example involves the analysis of a two-member frame that has three input variables and three responses of interest. The second example simulates the roll-over potential of a semi-tractor-trailer for different combinations of input variables and braking and steering levels. Detailed error analysis reveals that uniform designs provide good sampling for generating accurate approximations using different sample sizes while kriging models provide accurate approximations that are robust for use with a variety of experimental designs and sample sizes.

  • PDF