• Title/Summary/Keyword: Random forests

Search Result 106, Processing Time 0.023 seconds

Object Classification Method Using Dynamic Random Forests and Genetic Optimization

  • Kim, Jae Hyup;Kim, Hun Ki;Jang, Kyung Hyun;Lee, Jong Min;Moon, Young Shik
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권5호
    • /
    • pp.79-89
    • /
    • 2016
  • In this paper, we proposed the object classification method using genetic and dynamic random forest consisting of optimal combination of unit tree. The random forest can ensure good generalization performance in combination of large amount of trees by assigning the randomization to the training samples and feature selection, etc. allocated to the decision tree as an ensemble classification model which combines with the unit decision tree based on the bagging. However, the random forest is composed of unit trees randomly, so it can show the excellent classification performance only when the sufficient amounts of trees are combined. There is no quantitative measurement method for the number of trees, and there is no choice but to repeat random tree structure continuously. The proposed algorithm is composed of random forest with a combination of optimal tree while maintaining the generalization performance of random forest. To achieve this, the problem of improving the classification performance was assigned to the optimization problem which found the optimal tree combination. For this end, the genetic algorithm methodology was applied. As a result of experiment, we had found out that the proposed algorithm could improve about 3~5% of classification performance in specific cases like common database and self infrared database compare with the existing random forest. In addition, we had shown that the optimal tree combination was decided at 55~60% level from the maximum trees.

생물화학적 산소요구량 농도예측을 위하여 데이터 전처리 접근법을 결합한 새로운 이단계 하이브리드 패러다임 (Novel two-stage hybrid paradigm combining data pre-processing approaches to predict biochemical oxygen demand concentration)

  • 김성원;서영민;자크로프 마샵;말릭 아누락
    • 한국수자원학회논문집
    • /
    • 제54권spc1호
    • /
    • pp.1037-1051
    • /
    • 2021
  • 주요한 수질지표 중의 하나인 생물화학적 산소요구량(BOD) 농도는 호소와 하천에서 생태학적 측면에서 관측항목으로 취급하고 있다. 본 연구에서는 대한민국의 도산 및 황지지점에서 BOD 농도예측을 위하여 새로운 이단계 하이브리드 패러다임(웨이블릿 기반 게이트 순환 유닛, 웨이블릿 기반 일반화된 회귀신경망, 그리고 웨이블릿 기반 랜덤 포레스트) 을 활용하였다. 이러한 모형들은 각 대응하는 독립모형들(게이트 순환 유닛, 일반화된 회귀신경망, 그리고 랜덤 포레스트) 과 함께 평가되었다. 다양한 수질 및 수량지표들이 여러 개의 입력조합(분류1-5) 을 기본으로 하여 독립 및 이단계 하이브리드 모형을 개발하기 위하여 구현되었다. 언급한 모형들은 root mean squared error (RMSE), Nash-Sutcliffe efficiency (NSE), 그리고 correlation coefficient (CC) 를 포함한 세 개의 통계지표로서 평가되었으며, 통계결과치를 분석하면 이단계 하이브리드 모형들이 항상 대응하는 독립모형들의 예측 정도를 개선하지 않은 것으로 나타났다. 대한민국의 도산관측소에서는 DWT-RF5 (RMSE = 0.108 mg/L) 모형이 다른 최적모형과 비교하여 BOD 농도의 더 정확한 예측을 나타내었으며, 황지관측소에서는 DWT-GRNN4 (RMSE = 0.132 mg/L) 모형이 BOD 농도를 예측하는 최고의 모형이다.

랜덤포레스트의 크기 결정에 유용한 승리표차에 기반한 불일치 측도 (A measure of discrepancy based on margin of victory useful for the determination of random forest size)

  • 박철용
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권3호
    • /
    • pp.515-524
    • /
    • 2017
  • 이 연구에서는 분류를 위한 RF (random forest)의 크기 결정에 유용한 승리표차 MV (margin of victory)에 기반한 불일치 측도를 제안하고자 한다. 여기서 MV는 현재의 RF에서 1등과 2등을 차지하는 집단이 무한 RF에서 차지하는 승리표차이다. 구체적으로 -MV가 양수이면 현재와 무한 RF 사이에 1등과 2등인 집단에서 불일치가 생긴다는 점에 착안하여, max(-MV, 0)을 하나의 불일치 측도로 제안한다. 이 불일치 측도에 근거하여 RF의 크기 결정에 적절한 진단통계량을 제안하며, 또한 이 통계량의 이론적인 점근분포를 유도한다. 마지막으로 이 통계량을 최근에 제안된 진단통계량들과 소표본 하에서 성능을 비교하는 모의실험을 실행한다.

Covariance-based Recognition Using Machine Learning Model

  • Osman, Hassab Elgawi
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송공학회 2009년도 IWAIT
    • /
    • pp.223-228
    • /
    • 2009
  • We propose an on-line machine learning approach for object recognition, where new images are continuously added and the recognition decision is made without delay. Random forest (RF) classifier has been extensively used as a generative model for classification and regression applications. We extend this technique for the task of building incremental component-based detector. First we employ object descriptor model based on bag of covariance matrices, to represent an object region then run our on-line RF learner to select object descriptors and to learn an object classifier. Experiments of the object recognition are provided to verify the effectiveness of the proposed approach. Results demonstrate that the propose model yields in object recognition performance comparable to the benchmark standard RF, AdaBoost, and SVM classifiers.

  • PDF

Forest Vertical Structure Mapping from Bi-Seasonal Sentinel-2 Images and UAV-Derived DSM Using Random Forest, Support Vector Machine, and XGBoost

  • Young-Woong Yoon;Hyung-Sup Jung
    • 대한원격탐사학회지
    • /
    • 제40권2호
    • /
    • pp.123-139
    • /
    • 2024
  • Forest vertical structure is vital for comprehending ecosystems and biodiversity, in addition to fundamental forest information. Currently, the forest vertical structure is predominantly assessed via an in-situ method, which is not only difficult to apply to inaccessible locations or large areas but also costly and requires substantial human resources. Therefore, mapping systems based on remote sensing data have been actively explored. Recently, research on analyzing and classifying images using machine learning techniques has been actively conducted and applied to map the vertical structure of forests accurately. In this study, Sentinel-2 and digital surface model images were obtained on two different dates separated by approximately one month, and the spectral index and tree height maps were generated separately. Furthermore, according to the acquisition time, the input data were separated into cases 1 and 2, which were then combined to generate case 3. Using these data, forest vetical structure mapping models based on random forest, support vector machine, and extreme gradient boost(XGBoost)were generated. Consequently, nine models were generated, with the XGBoost model in Case 3 performing the best, with an average precision of 0.99 and an F1 score of 0.91. We confirmed that generating a forest vertical structure mapping model utilizing bi-seasonal data and an appropriate model can result in an accuracy of 90% or higher.

A Novel Network Anomaly Detection Method based on Data Balancing and Recursive Feature Addition

  • Liu, Xinqian;Ren, Jiadong;He, Haitao;Wang, Qian;Sun, Shengting
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권7호
    • /
    • pp.3093-3115
    • /
    • 2020
  • Network anomaly detection system plays an essential role in detecting network anomaly and ensuring network security. Anomaly detection system based machine learning has become an increasingly popular solution. However, due to the unbalance and high-dimension characteristics of network traffic, the existing methods unable to achieve the excellent performance of high accuracy and low false alarm rate. To address this problem, a new network anomaly detection method based on data balancing and recursive feature addition is proposed. Firstly, data balancing algorithm based on improved KNN outlier detection is designed to select part respective data on each category. Combination optimization about parameters of improved KNN outlier detection is implemented by genetic algorithm. Next, recursive feature addition algorithm based on correlation analysis is proposed to select effective features, in which a cross contingency test is utilized to analyze correlation and obtain a features subset with a strong correlation. Then, random forests model is as the classification model to detection anomaly. Finally, the proposed algorithm is evaluated on benchmark datasets KDD Cup 1999 and UNSW_NB15. The result illustrates the proposed strategies enhance accuracy and recall, and decrease the false alarm rate. Compared with other algorithms, this algorithm still achieves significant effects, especially recall in the small category.

중부한국의 하록림 밑 관목층 구성종의 미분포와 종간상관 (Pattern and Association within Shrub Layer under Summer Green Forest in Central Korean Peninsula)

  • 오계칠
    • Journal of Plant Biology
    • /
    • 제15권1호
    • /
    • pp.33-41
    • /
    • 1972
  • Nine shrub layer communities under two relatively well conserved natural summer green forests in the central region of Korean Peninsula were studied for the pattern of stem distribution in terms of Greig-Smith's multiple split-plot experiment and for the association between the population of the two main species in terms of Kershaw's covariance analysis respectively. Four contiguous belt transects, $4{\times}64m size with 1{\times}1m$ basic unit, were set in each shrub layer communities. Significant primary clumps with $1{\times}1m or 1{\times}2m$ dimension wer observed consistently throughout the nine study sites. The primary clumps themselves were significantly distributed either regularly or at random. The association between the two principal species of each shrub layer is highly significantly either positive or negative in $1{\times}1m or 1{\times}2m$ dimension. As the plot size increases from $1{\times}1m to 8{\times}8m$ the associational trends were changed from negative to positive direction in one forests. But the change from positive to negative direction and the consistent negative association were also observed from the other forest. All of the association trends were observed only from $1{\times}1m to 4{\times}4m$ dimension. These results are suggestive that the distributional pattern of the shrub layer species under the summer green forest is simple mosaic fashioned with $1{\times}1m or 1{\times}2m$ dimension. The rest of the principal species are located in that matrix. The simple mosaic pattern of two principal species are located in that matrix. The simple mosaic pattern of two principal species seems to be controlled by change in micro-environmental pattern. Differences between the primary random group and clumped group among sites also suggest that competition exists for light or/and soil between primary clumped groups.

  • PDF

Comparison of survival prediction models for pancreatic cancer: Cox model versus machine learning models

  • Kim, Hyunsuk;Park, Taesung;Jang, Jinyoung;Lee, Seungyeoun
    • Genomics & Informatics
    • /
    • 제20권2호
    • /
    • pp.23.1-23.9
    • /
    • 2022
  • A survival prediction model has recently been developed to evaluate the prognosis of resected nonmetastatic pancreatic ductal adenocarcinoma based on a Cox model using two nationwide databases: Surveillance, Epidemiology and End Results (SEER) and Korea Tumor Registry System-Biliary Pancreas (KOTUS-BP). In this study, we applied two machine learning methods-random survival forests (RSF) and support vector machines (SVM)-for survival analysis and compared their prediction performance using the SEER and KOTUS-BP datasets. Three schemes were used for model development and evaluation. First, we utilized data from SEER for model development and used data from KOTUS-BP for external evaluation. Second, these two datasets were swapped by taking data from KOTUS-BP for model development and data from SEER for external evaluation. Finally, we mixed these two datasets half and half and utilized the mixed datasets for model development and validation. We used 9,624 patients from SEER and 3,281 patients from KOTUS-BP to construct a prediction model with seven covariates: age, sex, histologic differentiation, adjuvant treatment, resection margin status, and the American Joint Committee on Cancer 8th edition T-stage and N-stage. Comparing the three schemes, the performance of the Cox model, RSF, and SVM was better when using the mixed datasets than when using the unmixed datasets. When using the mixed datasets, the C-index, 1-year, 2-year, and 3-year time-dependent areas under the curve for the Cox model were 0.644, 0.698, 0.680, and 0.687, respectively. The Cox model performed slightly better than RSF and SVM.

A descriptive study of on-farm biosecurity and management practices during the incursion of porcine epidemic diarrhea into Canadian swine herds, 2014

  • Perri, Amanda M.;Poljak, Zvonimir;Dewey, Cate;Harding, John CS.;O'Sullivan, Terri L.
    • Journal of Veterinary Science
    • /
    • 제21권2호
    • /
    • pp.25.1-25.16
    • /
    • 2020
  • Porcine epidemic diarrhea virus (PEDV) emerged into Canada in January 2014, primarily affecting sow herds. Subsequent epidemiological analyses suggested contaminated feed was the most likely transmission pathway. The primary objective of this study was to describe general biosecurity and management practices implemented in PEDV-positive sow herds and matched control herds at the time the virus emerged. The secondary objective was to determine if any of these general biosecurity and farm management practices were important in explaining PEDV infection status from January 22, 2014 to March 1, 2014. A case herd was defined as a swine herd with clinical signs and a positive test result for PEDV. A questionnaire was used to a gather 30-day history of herd management practices, animal movements on/off site, feed management practices, semen deliveries and biosecurity practices for case (n = 8) and control (n = 12) herds, primarily located in Ontario. Data was analyzed using descriptive statistics and random forests (RFs). Case herds were larger in size than control herds. Case herds had more animal movements and non-staff movements onto the site. Also, case herds had higher quantities of pigs delivered, feed deliveries and semen deliveries on-site. The biosecurity practices of case herds were considered more rigorous based on herd management, feed deliveries, transportation and truck driver practices than control herds. The RF model found that the most important variables for predicting herd status were related to herd size and feed management variables. Nonetheless, predictive accuracy of the final RF model was 72%.

Stand Structure and Regeneration Pattern of Kalopanax septemlobus at the Natural Deciduous Broad-leaved Forest in Mt. Jeombong, Korea

  • Kang, Ho-Sang;Lee, Don-Koo
    • Journal of Ecology and Environment
    • /
    • 제29권1호
    • /
    • pp.17-22
    • /
    • 2006
  • Since the demands not only for value-added timber but the environmental functions of forests had been increased, native tree species has been, and is rapidly being replaced by foreign tree species in many parts of the world. However, the studies on population structure and regeneration characteristics of native tree species were not conducted enough. Regeneration of Kalopanax septemlobus growing among other hardwoods in natural forests is very difficult because of its low seed viability and germination rate. The study examined the distribution of mature trees of K. septemlobus and their regeneration pattern at the 1.12 ha study plot in natural deciduous broad-leaved forest of Mt. Jeombong. The density and mean DBH of K. septemlobus was 97 trees per ha and 32 cm, respectively. The spatial distribution of K. septemlobus showed a random pattern (aggregation index is 0.935) in the 1.12 ha study plot. The age of 90 trees among 99 sample trees of K. septemlobus ranged from 90 to 110 years and represented a single cohort, thus suggesting that K. septemlobus in advance regeneration has regenerated as a result of disturbances such as canopy opening.