• Title/Summary/Keyword: Determination of random forest size

Search Result 7, Processing Time 0.023 seconds

A measure of discrepancy based on margin of victory useful for the determination of random forest size (랜덤포레스트의 크기 결정에 유용한 승리표차에 기반한 불일치 측도)

  • Park, Cheolyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.515-524
    • /
    • 2017
  • In this study, a measure of discrepancy based on MV (margin of victory) has been suggested that might be useful in determining the size of random forest for classification. Here MV is a scaled difference in the votes, at infinite random forest, of two most popular classes of current random forest. More specifically, max(-MV,0) is proposed as a reasonable measure of discrepancy by noting that negative MV values mean a discrepancy in two most popular classes between the current and infinite random forests. We propose an appropriate diagnostic statistic based on this measure that might be useful for the determination of random forest size, and then we derive its asymptotic distribution. Finally, a simulation study has been conducted to compare the performances, in finite samples, between this proposed statistic and other recently proposed diagnostic statistics.

A simple diagnostic statistic for determining the size of random forest (랜덤포레스트의 크기 결정을 위한 간편 진단통계량)

  • Park, Cheolyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.855-863
    • /
    • 2016
  • In this study, a simple diagnostic statistic for determining the size of random forest is proposed. This method is based on MV (margin of victory), a scaled difference in the votes at the infinite forest between the first and second most popular categories of the current random forest. We can note that if MV is negative then there is discrepancy between the current and infinite forests. More precisely, our method is based on the proportion of cases that -MV is greater than a fixed small positive number (say, 0.03). We derive an appropriate diagnostic statistic for our method and then calculate the distribution of the statistic. A simulation study is performed to compare our method with a recently proposed diagnostic statistic.

Tree size determination for classification ensemble

  • Choi, Sung Hoon;Kim, Hyunjoong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.1
    • /
    • pp.255-264
    • /
    • 2016
  • Classification is a predictive modeling for a categorical target variable. Various classification ensemble methods, which predict with better accuracy by combining multiple classifiers, became a powerful machine learning and data mining paradigm. Well-known methodologies of classification ensemble are boosting, bagging and random forest. In this article, we assume that decision trees are used as classifiers in the ensemble. Further, we hypothesized that tree size affects classification accuracy. To study how the tree size in uences accuracy, we performed experiments using twenty-eight data sets. Then we compare the performances of ensemble algorithms; bagging, double-bagging, boosting and random forest, with different tree sizes in the experiment.

Current Status and Potentiality of Forest Resources in a Proposed Biodiversity Conservation Area of Bangladesh

  • Rana, Md. Parvez;Uddin, Mohammed Salim;Chowdhury, Mohammad Shaheed Hossain;Sohel, Md. Shawkat Lsiam;Akhter, Sayma;Kolke, Masao
    • Journal of Forest and Environmental Science
    • /
    • v.25 no.3
    • /
    • pp.167-175
    • /
    • 2009
  • An exploratory study was conducted in Juri Forest Range-2, a proposed biodiversity conservation area of Bangladesh to explore the present growing stock of tree, regeneration condition and status of non-timber forest products (NTFPs). This conservation area contains both natural and artificial plantation was selected by using multistage random sampling method. For determination of plot size and sampling methods, the quadrate size ($10m{\times}10m$) for tree stock measurement, ($2m{\times}2m$) for regeneration survey, ($20m{\times}20m$) for NTFPs survey was determined. Regarding tree stock survey, 14 species under eight families were found where Tectona grandis shows average number of stem/ha was 624 and basal area/ha was (10.36 $m^2/ha$) followed by Acacia auriculiformis (0.2 $m^2/ha$ and 637 stem/ha), Gmelina arborea (0.2 $m^2/ha$ and 600 stem/ha). In regeneration survey, 14 species were found belonging to 9 families where Alstonia scholaris shows highest (3,750) seedling per hectare. Regarding NTFPs, bamboo and cane are the most common resources. In last ten years, the total timber output was 1,28,596.14 cubic feet and total amount of revenue was 4,64,434 US$. The vacant area is 1,335.5 acre which contains 14% of total area. If this vacant area is planted with suitable species and take proper steps for appropriate management of this species it will be a good biologically diversified area.

  • PDF

Modelling Stem Diameter Variability in Pinus caribaea (Morelet) Plantations in South West Nigeria

  • Adesoye, Peter Oluremi
    • Journal of Forest and Environmental Science
    • /
    • v.32 no.3
    • /
    • pp.280-290
    • /
    • 2016
  • Stem diameter variability is an essential inventory result that provides useful information in forest management decisions. Little has been done to explore the modelling potentials of standard deviation (SDD) and coefficient of variation (CVD) of diameter at breast height (dbh). This study, therefore, was aimed at developing and testing models for predicting SDD and CVD in stands of Pinus caribaea Morelet (pine) in south west Nigeria. Sixty temporary sample plots of size $20m{\times}20m$, ranging between 15 and 37 years were sampled, covering the entire range of pine in south west Nigeria. The dbh (cm), total and merchantable heights (m), number of stems and age of trees were measured within each plot. Basal area ($m^2$), site index (m), relative spacing and percentile positions of dbh at $24^{th}$, $63^{rd}$, $76^{th}$ and $93^{rd}$ (i.e. $P_{24}$, $P_{63}$, $P_{76}$ and $P_{93}$) were computed from measured variables for each plot. Linear mixed model (LMM) was used to test the effects of locations (fixed) and plots (random). Six candidate models (3 for SDD and 3 for CVD), using three categories of explanatory variables (i.e. (i) only stand size measures, (ii) distribution measures, and (iii) combination of i and ii). The best model was chosen based on smaller relative standard error (RSE), prediction residual sum of squares (PRESS), corrected Akaike Information Criterion ($AIC_c$) and larger coefficient of determination ($R^2$). The results of the LMM indicated that location and plot effects were not significant. The CVD and SDD models having only measures of percentiles (i.e. $P_{24}$ and $P_{93}$) as predictors produced better predictions than others. However, CVD model produced the overall best predictions, because of the lower RSE and stability in measuring variability across different stand developments. The results demonstrate the potentials of CVD in modelling stem diameter variability in relationship with percentiles variables.

Homestead Plant Species Diversity and Its Contribution to the Household Economy: a Case Study from Northern Part of Bangladesh

  • Kibria, Mohammad Golam;Anik, Sawon Istiak
    • Journal of Forest and Environmental Science
    • /
    • v.26 no.1
    • /
    • pp.9-15
    • /
    • 2010
  • This paper analyzes data on the plant species diversity and their contribution to the livelihoods of rural people in five villages of Domar upazila, Nilphamari district, Bangladesh. Assessment was done by means of multistage random sampling. Information collected from a total of 40 households ranging from small, medium and large categories. A total of 52 plant species belonging to 34 families were identified as being important to local livelihoods. Fruits (37%), timber (23%) and medicinal (17%) species were the most important plant use categories. Determination of the relative density of the different species revealed that Areca catechu constitutes 19.17% of homestead vegetation of the area followed by Artocarpus heterophyllus, which occupies 10.34%. Margalef index showed that there is no major difference (5.11 for large, 5.49 for medium, 4.73 for small) across the different size classes and Shannon-Weiner Index of the study area varies from 2.75 to 2.98. Results show that the average annual homestead income varied from US$108.69 to US$291.67 and contribute 6.63% of the household income.

Analysis of Feature Importance of Ship's Berthing Velocity Using Classification Algorithms of Machine Learning (머신러닝 분류 알고리즘을 활용한 선박 접안속도 영향요소의 중요도 분석)

  • Lee, Hyeong-Tak;Lee, Sang-Won;Cho, Jang-Won;Cho, Ik-Soon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.26 no.2
    • /
    • pp.139-148
    • /
    • 2020
  • The most important factor affecting the berthing energy generated when a ship berths is the berthing velocity. Thus, an accident may occur if the berthing velocity is extremely high. Several ship features influence the determination of the berthing velocity. However, previous studies have mostly focused on the size of the vessel. Therefore, the aim of this study is to analyze various features that influence berthing velocity and determine their respective importance. The data used in the analysis was based on the berthing velocity of a ship on a jetty in Korea. Using the collected data, machine learning classification algorithms were compared and analyzed, such as decision tree, random forest, logistic regression, and perceptron. As an algorithm evaluation method, indexes according to the confusion matrix were used. Consequently, perceptron demonstrated the best performance, and the feature importance was in the following order: DWT, jetty number, and state. Hence, when berthing a ship, the berthing velocity should be determined in consideration of various features, such as the size of the ship, position of the jetty, and loading condition of the cargo.