• Title/Summary/Keyword: Random regression model

검색결과 504건 처리시간 0.029초

낙동강 조간대 연약지반의 지역별 점성토층 두께 추정 모델 개발에 관한 연구 (A Study on the Development of Model for Estimating the Thickness of Clay Layer of Soft Ground in the Nakdong River Estuary)

  • 안성인;류동우
    • 터널과지하공간
    • /
    • 제32권6호
    • /
    • pp.586-597
    • /
    • 2022
  • 본 연구에서는 국내 주요 연약지반으로 알려진 낙동강 조간대 지역의 압밀침하 취약성 평가에 활용할 상부 점성토층의 위치별 두께 정보를 추정할 수 있는 모델을 개발하였다. 두께정보 추정을 위하여 기계학습 알고리즘인 RF (Random Forest), SVR (Support Vector Regression), GPR (Gaussian Process Regression)과 지구통계기법인 정규크리깅(Ordinary Kriging)을 이용한 4가지 공간추정 모델을 개발하고 상호 비교하였다. 모델 개발을 위하여 수집한 연구지역의 시추공 자료 4,712개 중 상부점성토층이 존재하는 2,948개의 시추공 자료를 사용하였으며, 개발된 모델들의 성능을 정량적으로 평가하기 위하여 피어슨(Pearson) 상관계수와 오차제곱평균(mean squared error)을 사용하였다. 또한, 정성적 평가를 위하여 연구지역 전역에 상부점성토층의 두께를 추정하여 점성토층의 지역별 분포 특성을 상호 비교하였다.

An Ensemble Approach to Detect Fake News Spreaders on Twitter

  • Sarwar, Muhammad Nabeel;UlAmin, Riaz;Jabeen, Sidra
    • International Journal of Computer Science & Network Security
    • /
    • 제22권5호
    • /
    • pp.294-302
    • /
    • 2022
  • Detection of fake news is a complex and a challenging task. Generation of fake news is very hard to stop, only steps to control its circulation may help in minimizing its impacts. Humans tend to believe in misleading false information. Researcher started with social media sites to categorize in terms of real or fake news. False information misleads any individual or an organization that may cause of big failure and any financial loss. Automatic system for detection of false information circulating on social media is an emerging area of research. It is gaining attention of both industry and academia since US presidential elections 2016. Fake news has negative and severe effects on individuals and organizations elongating its hostile effects on the society. Prediction of fake news in timely manner is important. This research focuses on detection of fake news spreaders. In this context, overall, 6 models are developed during this research, trained and tested with dataset of PAN 2020. Four approaches N-gram based; user statistics-based models are trained with different values of hyper parameters. Extensive grid search with cross validation is applied in each machine learning model. In N-gram based models, out of numerous machine learning models this research focused on better results yielding algorithms, assessed by deep reading of state-of-the-art related work in the field. For better accuracy, author aimed at developing models using Random Forest, Logistic Regression, SVM, and XGBoost. All four machine learning algorithms were trained with cross validated grid search hyper parameters. Advantages of this research over previous work is user statistics-based model and then ensemble learning model. Which were designed in a way to help classifying Twitter users as fake news spreader or not with highest reliability. User statistical model used 17 features, on the basis of which it categorized a Twitter user as malicious. New dataset based on predictions of machine learning models was constructed. And then Three techniques of simple mean, logistic regression and random forest in combination with ensemble model is applied. Logistic regression combined in ensemble model gave best training and testing results, achieving an accuracy of 72%.

부도예측을 위한 KNN 앙상블 모형의 동시 최적화 (Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis)

  • 민성환
    • 지능정보연구
    • /
    • 제22권1호
    • /
    • pp.139-157
    • /
    • 2016
  • 앙상블 분류기란 개별 분류기보다 더 좋은 성과를 내기 위해 다수의 분류기를 결합하는 것을 의미한다. 이와 같은 앙상블 분류기는 단일 분류기의 일반화 성능을 향상시키는데 매우 유용한 것으로 알려져 있다. 랜덤 서브스페이스 앙상블 기법은 각각의 기저 분류기들을 위해 원 입력 변수 집합으로부터 랜덤하게 입력 변수 집합을 선택하며 이를 통해 기저 분류기들을 다양화 시키는 기법이다. k-최근접 이웃(KNN: k nearest neighbor)을 기저 분류기로 하는 랜덤 서브스페이스 앙상블 모형의 성과는 단일 모형의 성과를 개선시키는 데 효과적인 것으로 알려져 있으며, 이와 같은 랜덤 서브스페이스 앙상블의 성과는 각 기저 분류기를 위해 랜덤하게 선택된 입력 변수 집합과 KNN의 파라미터 k의 값이 중요한 영향을 미친다. 하지만, 단일 모형을 위한 k의 최적 선택이나 단일 모형을 위한 입력 변수 집합의 최적 선택에 관한 연구는 있었지만 KNN을 기저 분류기로 하는 앙상블 모형에서 이들의 최적화와 관련된 연구는 없는 것이 현실이다. 이에 본 연구에서는 KNN을 기저 분류기로 하는 앙상블 모형의 성과 개선을 위해 각 기저 분류기들의 k 파라미터 값과 입력 변수 집합을 동시에 최적화하는 새로운 형태의 앙상블 모형을 제안하였다. 본 논문에서 제안한 방법은 앙상블을 구성하게 될 각각의 KNN 기저 분류기들에 대해 최적의 앙상블 성과가 나올 수 있도록 각각의 기저 분류기가 사용할 파라미터 k의 값과 입력 변수를 유전자 알고리즘을 이용해 탐색하였다. 제안한 모형의 검증을 위해 국내 기업의 부도 예측 관련 데이터를 가지고 다양한 실험을 하였으며, 실험 결과 제안한 모형이 기존의 앙상블 모형보다 기저 분류기의 다양화와 예측 성과 개선에 효과적임을 알 수 있었다.

The Impact of Financial Inclusion on Financial Stability in Asian Countries

  • PHAM, Manh Hung;DOAN, Thi Phuong Linh
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제7권6호
    • /
    • pp.47-59
    • /
    • 2020
  • This paper intends to explore the relationship between financial inclusion and financial stability under the scope of Asian economies. The linkage will be thoroughly investigated with country-level and bank-level data of 42 countries in three separate years: 2011, 2014, and 2017. In this study, an inclusive financial system is assessed by two dimensions: usage of financial services and access to the financial system. Usage of financial services ranges from account to credit, savings and payment services. Access to financial system measures the financial outreach where individuals can use financial services. Meanwhile, financial stability, which proxied by Bank Z-score is regarded as the dependent variable. We apply fixed effects regression and random effects regression to capture the impacts of financial inclusion upon financial stability. To enhance the robustness of the model, the Feasible Generalized Least Squares (FGLS) regression is therefore adopted as the solution for the random effects regression. The empirical findings exhibit an overall weak positive influence of financial inclusion on financial stability. The research results also provide both financial institutions and governments with insightful information, which helps them to have an appropriate financial development strategy, improve the regulatory framework and consequently enhance financial stability for the whole system.

생존분석에서의 기계학습 (Machine learning in survival analysis)

  • 백재욱
    • 산업진흥연구
    • /
    • 제7권1호
    • /
    • pp.1-8
    • /
    • 2022
  • 본 논문은 중도중단 데이터가 포함된 생존데이터의 경우 적용할 수 있는 기계학습 방법에 대해 살펴보았다. 우선 탐색적인 자료분석으로 각 특성에 대한 분포, 여러 특성들 간의 관계 및 중요도 순위를 파악할 수 있었다. 다음으로 독립변수에 해당하는 여러 특성들과 종속변수에 해당하는 특성(사망여부) 간의 관계를 분류문제로 보고 logistic regression, K nearest neighbor 등의 기계학습 방법들을 적용해본 결과 적은 수의 데이터이지만 통상적인 기계학습 결과에서와 같이 logistic regression보다는 random forest가 성능이 더 좋게 나왔다. 하지만 근래에 성능이 좋다고 하는 artificial neural network나 gradient boost와 같은 기계학습 방법은 성능이 월등히 좋게 나오지 않았는데, 그 이유는 주어진 데이터가 빅데이터가 아니기 때문인 것으로 판명된다. 마지막으로 Kaplan-Meier나 Cox의 비례위험모델과 같은 통상적인 생존분석 방법을 적용하여 어떤 독립변수가 종속변수 (ti, δi)에 결정적인 영향을 미치는지 살펴볼 수 있었으며, 기계학습 방법에 속하는 random forest를 중도중단 데이터가 포함된 생존데이터에도 적용하여 성능을 평가할 수 있었다.

패널자료를 이용한 지구별·업종별 수산업협동조합의 수익에 영향을 미치는 요인 분석 (Empirical Analysis on the Factors Affecting the Net Income of Regional and Industrial Fisheries Cooperatives Using Panel Data)

  • 김철현;남종오
    • 수산경영론집
    • /
    • 제51권1호
    • /
    • pp.81-96
    • /
    • 2020
  • The purpose of this paper is to analyze factors affecting the net income of regional and industrial fisheries cooperatives in South Korea using panel data. This paper utilizes linear or GLS regression models such as pooled OLS model, fixed effects model, and random effects model to estimate affecting factors of the net income of regional and industrial fisheries cooperatives. After reviewing various tests, we eventually select random effects model. The results, based on panel data between 2013 and 2018 year and 64 fisheries cooperatives, indicate that capital and area dummy variables have positive effects and employment has negative effect on the net income of regional and industrial fisheries cooperatives as predicted. However, debt are opposite with our predictions. Specifically, it turns out that debt has positive effect on the net income of regional and industrial fisheries cooperatives although it has been increased. Additionally, this paper shows that the member of confreres does not show any significant effect on the net income of regional and industrial fisheries cooperatives in South Korea. This study is significant in that it analyzes the major factors influencing changes in the net income that have not been conducted recently for the fisheries cooperatives by region and industry.

Application and evaluation of machine-learning model for fire accelerant classification from GC-MS data of fire residue

  • Park, Chihyun;Park, Wooyong;Jeon, Sookyung;Lee, Sumin;Lee, Joon-Bae
    • 분석과학
    • /
    • 제34권5호
    • /
    • pp.231-239
    • /
    • 2021
  • Detection of fire accelerants from fire residues is critical to determine whether the case was arson or accidental fire. However, to develop a standardized model for determining the presence or absence of fire accelerants was not easy because of high temperature which cause disappearance or combustion of components of fire accelerants. In this study, logistic regression, random forest, and support vector machine models were trained and evaluated from a total of 728 GC-MS analysis data obtained from actual fire residues. Mean classification accuracies of the three models were 63 %, 81 %, and 84 %, respectively, and in particular, mean AU-PR values of the three models were evaluated as 0.68, 0.86, and 0.86, respectively, showing fine performances of random forest and support vector machine models.

Efficient Prediction in the Semi-parametric Non-linear Mixed effect Model

  • So, Beong-Soo
    • Journal of the Korean Statistical Society
    • /
    • 제28권2호
    • /
    • pp.225-234
    • /
    • 1999
  • We consider the following semi-parametric non-linear mixed effect regression model : y\ulcorner=f($\chi$\ulcorner;$\beta$)+$\sigma$$\mu$($\chi$\ulcorner)+$\sigma$$\varepsilon$\ulcorner,i=1,…,n,y*=f($\chi$;$\beta$)+$\sigma$$\mu$($\chi$) where y'=(y\ulcorner,…,y\ulcorner) is a vector of n observations, y* is an unobserved new random variable of interest, f($\chi$;$\beta$) represents fixed effect of known functional form containing unknown parameter vector $\beta$\ulcorner=($\beta$$_1$,…,$\beta$\ulcorner), $\mu$($\chi$) is a random function of mean zero and the known covariance function r(.,.), $\varepsilon$'=($\varepsilon$$_1$,…,$\varepsilon$\ulcorner) is the set of uncorrelated measurement errors with zero mean and unit variance and $\sigma$ is an unknown dispersion(scale) parameter. On the basis of finite-sample, small-dispersion asymptotic framework, we derive an absolute lower bound for the asymptotic mean squared errors of prediction(AMSEP) of the regular-consistent non-linear predictors of the new random variable of interest y*. Then we construct an optimal predictor of y* which attains the lower bound irrespective of types of distributions of random effect $\mu$(.) and measurement errors $\varepsilon$.

  • PDF

Bayesian Conway-Maxwell-Poisson (CMP) regression for longitudinal count data

  • Morshed Alam ;Yeongjin Gwon ;Jane Meza
    • Communications for Statistical Applications and Methods
    • /
    • 제30권3호
    • /
    • pp.291-309
    • /
    • 2023
  • Longitudinal count data has been widely collected in biomedical research, public health, and clinical trials. These repeated measurements over time on the same subjects need to account for an appropriate dependency. The Poisson regression model is the first choice to model the expected count of interest, however, this may not be an appropriate when data exhibit over-dispersion or under-dispersion. Recently, Conway-Maxwell-Poisson (CMP) distribution is popularly used as the distribution offers a flexibility to capture a wide range of dispersion in the data. In this article, we propose a Bayesian CMP regression model to accommodate over and under-dispersion in modeling longitudinal count data. Specifically, we develop a regression model with random intercept and slope to capture subject heterogeneity and estimate covariate effects to be different across subjects. We implement a Bayesian computation via Hamiltonian MCMC (HMCMC) algorithm for posterior sampling. We then compute Bayesian model assessment measures for model comparison. Simulation studies are conducted to assess the accuracy and effectiveness of our methodology. The usefulness of the proposed methodology is demonstrated by a well-known example of epilepsy data.

The Two-Stage Least Squares Regression of the Interplay between Education and Local Roads on Foreign Direct Investment in the Philippines

  • DIZON, Ricardo Laurio;CRUZ, Zita Ann Escabarte
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제7권4호
    • /
    • pp.121-131
    • /
    • 2020
  • This study aims to investigate the interplay between education and local roads on Foreign Direct Investment (FDI) in the Philippines, using economic growth as an instrument. The study used the quantitative research design applying both descriptive and inferential statistics. A combination of Two Stage Least Square Regression Model and three approaches in Panel Regression Model such as Pooled Least Square, Fixed Effect Model, and Random Effect Model were utilized in order to study the effects of education and local roads on foreign direct investment of the Philippines. Based on Fixed Effect regression results, higher education graduates and local road investments, as conditioned by economic growth, were significant factors in order to increase the foreign direct investment in the Philippines. Accordingly, a unit increase in higher education graduates, as conditioned by economic growth, leads to 8.758 unit increases in the foreign direct investment. While, a unit increased in local road investments, as conditioned by economic growth, leads to a 0.002 decrease in foreign direct investment. The regression results of the study suggest that the Foreign Direct Investment in the regions such as CAR, I, II, IV-B, V, VIII, IX, X, XI, XII, XIII, and ARMM are higher compared to Region IV-A.