• Title/Summary/Keyword: Random forests

Search Result 106, Processing Time 0.025 seconds

Application of a comparative analysis of random forest programming to predict the strength of environmentally-friendly geopolymer concrete

  • Ying Bi;Yeng Yi
    • Steel and Composite Structures
    • /
    • v.50 no.4
    • /
    • pp.443-458
    • /
    • 2024
  • The construction industry, one of the biggest producers of greenhouse emissions, is under a lot of pressure as a result of growing worries about how climate change may affect local communities. Geopolymer concrete (GPC) has emerged as a feasible choice for construction materials as a result of the environmental issues connected to the manufacture of cement. The findings of this study contribute to the development of machine learning methods for estimating the properties of eco-friendly concrete, which might be used in lieu of traditional concrete to reduce CO2 emissions in the building industry. In the present work, the compressive strength (fc) of GPC is calculated using random forests regression (RFR) methodology where natural zeolite (NZ) and silica fume (SF) replace ground granulated blast-furnace slag (GGBFS). From the literature, a thorough set of experimental experiments on GPC samples were compiled, totaling 254 data rows. The considered RFR integrated with artificial hummingbird optimization (AHA), black widow optimization algorithm (BWOA), and chimp optimization algorithm (ChOA), abbreviated as ARFR, BRFR, and CRFR. The outcomes obtained for RFR models demonstrated satisfactory performance across all evaluation metrics in the prediction procedure. For R2 metric, the CRFR model gained 0.9988 and 0.9981 in the train and test data set higher than those for BRFR (0.9982 and 0.9969), followed by ARFR (0.9971 and 0.9956). Some other error and distribution metrics depicted a roughly 50% improvement for CRFR respect to ARFR.

Simple Graphs for Complex Prediction Functions

  • Huh, Myung-Hoe;Lee, Yong-Goo
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.3
    • /
    • pp.343-351
    • /
    • 2008
  • By supervised learning with p predictors, we frequently obtain a prediction function of the form $y\;=\;f(x_1,...,x_p)$. When $p\;{\geq}\;3$, it is not easy to understand the inner structure of f, except for the case the function is formulated as additive. In this study, we propose to use p simple graphs for visual understanding of complex prediction functions produced by several supervised learning engines such as LOESS, neural networks, support vector machines and random forests.

Evaluating the quality of baseball pitch using PITCHf/x (PITCHf/x를 이용한 투구의 질 평가)

  • Park, Sungmin;Jang, Woncheol
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.2
    • /
    • pp.171-184
    • /
    • 2020
  • Major League Baseball (MLB) records and releases the trajectory data for every baseball pitch, called the PITCHf/x, using three high-speed cameras installed in every stadium. In a previous study, the quality of the pitch was assessed as the expected number of bases yielded using PITCHf/x data. However, the number of bases yielded does not always lead to baseball scores, or runs. In this paper, we assess the quality of a pitch by combining baseball analytics metric Run Expectancy and Run Value using a Random Forests model. We compare the quality of pitches evaluated with Run Value to the quality of pitches evaluated with the expected number of bases yielded.

Language Matters: A Systemic Functional Linguistics-Enhanced Machine Learning Framework for Cyberbullying Detection

  • Raghad Altowairgi;Ala Eshamwi;Lobna Hsairi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.9
    • /
    • pp.192-198
    • /
    • 2023
  • Cyberbullying is a growing problem among adolescents and can have serious psychological and emotional consequences for the victims. In recent years, machine learning techniques have emerged as promising approach for detecting instances of cyberbullying in online communication. This research paper focuses on developing a machine learning models that are able to detect cyberbullying including support vector machines, naïve bayes, and random forests. The study uses a dataset of real-world examples of cyberbullying collected from Twitter and extracts features that represents the ideational metafunction, then evaluates the performance of each algorithm before and after considering the theory of systemic functional linguistics in terms of precision, recall, and F1-score. The result indicates that all three algorithms are effective at detecting cyberbullying with 92% for naïve bayes and an accuracy of 93% for both SVM and random forests. However, the study also highlights the challenges of accurately detecting cyberbullying, particularly given the nuanced and context-dependent nature of online communication. This paper concludes by discussing the implications of these findings for future research and the development of practical tool for cyberbullying prevention and intervention.

A decision-centric assessment of flood risk and supply reliability at a multi-purpose reservoir under climate change (의사결정중심 다목적댐 이치수 안전도 기후변화 영향평가)

  • Kim, Daeha;Kim, Eunhee;Lee, Seung Cheol;Kim, Eunji
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.112-112
    • /
    • 2022
  • 본 연구에서는 2005-2020년 용담댐의 운영방식이 기후변화에 얼마나 취약한 지 홍수위험과 이수 안전도 지표를 중심으로 평가하였다. 유입량 모의를 위해 GR6J 강우-유출 모형을 사용했고, 댐 운영룰 추출을 위해 Random Forests 모형을 관측자료에 적합시켰다. 294개의 추계학적 기후스트레스 시계열을 GR6J 모형에 입력해 일유입량을 모의한 후 Random Forests 모형으로 방류량과 저수량을 추정하여 연최대일방류량과 공급신뢰도를 분석하였다. 공급신뢰도는 평균강수량 변화에 주로 영향을 받는 것으로 나타났지만 연최대방류량은 평균강수량과 강수변동성 변화에 모두 민감하게 반응하는 것을 알 수 있었다. 2021-2040년 용담댐 저수량은 평균강수량 증가로 인해 공급신뢰도는 과도하게 상승할 것으로 전망되었다. 하지만 강수변동성 증가 인해 20년 빈도 연최대방류량은 가파르게 상승해 댐 하류지역의 홍수위험은 더 가중될 것으로 전망되었다.

  • PDF

An Efficient Pedestrian Detection Approach Using a Novel Split Function of Hough Forests

  • Do, Trung Dung;Vu, Thi Ly;Nguyen, Van Huan;Kim, Hakil;Lee, Chongho
    • Journal of Computing Science and Engineering
    • /
    • v.8 no.4
    • /
    • pp.207-214
    • /
    • 2014
  • In pedestrian detection applications, one of the most popular frameworks that has received extensive attention in recent years is widely known as a 'Hough forest' (HF). To improve the accuracy of detection, this paper proposes a novel split function to exploit the statistical information of the training set stored in each node during the construction of the forest. The proposed split function makes the trees in the forest more robust to noise and illumination changes. Moreover, the errors of each stage in the training forest are minimized using a global loss function to support trees to track harder training samples. After having the forest trained, the standard HF detector follows up to search for and localize instances in the image. Experimental results showed that the detection performance of the proposed framework was improved significantly with respect to the standard HF and alternating decision forest (ADF) in some public datasets.

Comparison of Frequencies in Order to Estimate of Tree Species Diversity in Caspian Forests of Iran

  • Mirzaei, Mehrdad;Bahnemiry, Atefeh Karimiyan;Abkenar, Kambiz Taheri
    • Journal of Forest and Environmental Science
    • /
    • v.35 no.1
    • /
    • pp.1-5
    • /
    • 2019
  • Species diversity is one of the most important indices that used to evaluate the sustainability of forest communities. In the present study, three variables including number of individuals (frequency of species), basal area and volume of tree species were compared to estimate tree species diversity in broadleaves forests of Iran. Based on systematic random design, 30 plots (circle plot, $1000m^2$) was selected. Type of species, number of species, DBH and height of trees were measured. Simpson (1-D), Hill ($N_2$), Shannon-Wiener (H'), Mc Arthur ($N_1$), Smith-Wilson ($E_{var}$) and Margalef ($R_1$) indices used to estimate tree species diversity. Species diversity was calculated in each plot. ANOVA test showed that there was a significant difference between of three variables used for estimation of species diversity. Number of trees variable has more precision than basal area and volume variables to estimate of species diversity. But Duncan test revealed that there were significant difference between of basal area and volume variables with number of trees. Therefore, basal area and volume variables were selected as more suitable variables in order to estimate of biodiversity indices in northern forests of Iran.

Coreference Resolution for Korean Using Random Forests (랜덤 포레스트를 이용한 한국어 상호참조 해결)

  • Jeong, Seok-Won;Choi, MaengSik;Kim, HarkSoo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.535-540
    • /
    • 2016
  • Coreference resolution is to identify mentions in documents and is to group co-referred mentions in the documents. It is an essential step for natural language processing applications such as information extraction, event tracking, and question-answering. Recently, various coreference resolution models based on ML (machine learning) have been proposed, As well-known, these ML-based models need large training data that are manually annotated with coreferred mention tags. Unfortunately, we cannot find usable open data for learning ML-based models in Korean. Therefore, we propose an efficient coreference resolution model that needs less training data than other ML-based models. The proposed model identifies co-referred mentions using random forests based on sieve-guided features. In the experiments with baseball news articles, the proposed model showed a better CoNLL F1-score of 0.6678 than other ML-based models.

Classification Abnormal temperatures based on Meteorological Environment using Random forests (랜덤포레스트를 이용한 기상 환경에 따른 이상기온 분류)

  • Youn Su Kim;Kwang Yoon Song;In Hong Chang
    • Journal of Integrative Natural Science
    • /
    • v.17 no.1
    • /
    • pp.1-12
    • /
    • 2024
  • Many abnormal climate events are occurring around the world. The cause of abnormal climate is related to temperature. Factors that affect temperature include excessive emissions of carbon and greenhouse gases from a global perspective, and air circulation from a local perspective. Due to the air circulation, many abnormal climate phenomena such as abnormally high temperature and abnormally low temperature are occurring in certain areas, which can cause very serious human damage. Therefore, the problem of abnormal temperature should not be approached only as a case of climate change, but should be studied as a new category of climate crisis. In this study, we proposed a model for the classification of abnormal temperature using random forests based on various meteorological data such as longitudinal observations, yellow dust, ultraviolet radiation from 2018 to 2022 for each region in Korea. Here, the meteorological data had an imbalance problem, so the imbalance problem was solved by oversampling. As a result, we found that the variables affecting abnormal temperature are different in different regions. In particular, the central and southern regions are influenced by high pressure (Mainland China, Siberian high pressure, and North Pacific high pressure) due to their regional characteristics, so pressure-related variables had a significant impact on the classification of abnormal temperature. This suggests that a regional approach can be taken to predict abnormal temperatures from the surrounding meteorological environment. In addition, in the event of an abnormal temperature, it seems that it is possible to take preventive measures in advance according to regional characteristics.

A simple diagnostic statistic for determining the size of random forest (랜덤포레스트의 크기 결정을 위한 간편 진단통계량)

  • Park, Cheolyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.855-863
    • /
    • 2016
  • In this study, a simple diagnostic statistic for determining the size of random forest is proposed. This method is based on MV (margin of victory), a scaled difference in the votes at the infinite forest between the first and second most popular categories of the current random forest. We can note that if MV is negative then there is discrepancy between the current and infinite forests. More precisely, our method is based on the proportion of cases that -MV is greater than a fixed small positive number (say, 0.03). We derive an appropriate diagnostic statistic for our method and then calculate the distribution of the statistic. A simulation study is performed to compare our method with a recently proposed diagnostic statistic.