• Title/Summary/Keyword: Random Forest

Search Result 976, Processing Time 0.023 seconds

Estimating Population Density of Leopard Cat (Prionailurus bengalensis) from Camera Traps in Maekdo Riparian Park, South Korea

  • Park, Heebok;Lim, Anya;Choi, Tae-Young;Lim, Sang-Jin;Park, Yung-Chul
    • Journal of Forest and Environmental Science
    • /
    • v.33 no.3
    • /
    • pp.239-242
    • /
    • 2017
  • Although camera traps have been widely used to understand the abundance of wildlife in recent decades, the effort has been restricted to small sub-set of wildlife which can mark-and-recapture. The Random Encounter Model shows an alternative approach to estimate the absolute abundance from camera trap detection rate for any animals without the need for individual recognition. Our study aims to examine the feasibility and validity of the Random Encounter Model for the density estimation of endangered leopard cats (Prionailurus bengalensis) in Maekdo riparian park, Busan, South Korea. According to the model, the estimated leopard cat density was $1.76km^{-2}$ (CI 95%, 0.74-3.49), which indicated 2.46 leopard cats in $1.4km^2$ of our study area. This estimate was not statistically different from the previous leopard cat population count ($2.33{\pm}0.58$) in the same area. As follows, our research demonstrated the application and usefulness of the Random Encounter Model in density estimation of unmarked wildlife which helps to manage and protect the target species with a better understanding of their status.

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

  • Aydadenta, Husna;Adiwijaya, Adiwijaya
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1167-1175
    • /
    • 2018
  • Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.

Identifying the Expression Patterns of Depression Based on the Random Forest (랜덤 포레스트 기반 우울증 발현 패턴 도출)

  • Jeon, Hyeon Jin;Jihn, Chang-Ho
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.4
    • /
    • pp.53-64
    • /
    • 2021
  • Depression is one of the most important psychiatric disorders worldwide. Most depression-related data mining and machine learning studies have been conducted to predict the presence of depression or to derive individual risk factors. However, since depression is caused by a combination of various factors, it is necessary to identify the complex relationship between the factors in order to establish effective anti-depression and management measures. In this study, we propose a methodology for identifying and interpreting patterns of depression expressions using the method of deriving random forest rules, where the random forest rule consists of the condition for the manifestation of the depressive pattern and the prediction result of depression when the condition is met. The analysis was carried out by subdividing into 4 groups in consideration of the different depressive patterns according to gender and age. Depression rules derived by the proposed methodology were validated by comparing them with the results of previous studies. Also, through the AUC comparison test, the depression diagnosis performance of the derived rules was evaluated, and it was not different from the performance of the existing PHQ-9 summing method. The significance of this study can be found in that it enabled the interpretation of the complex relationship between depressive factors beyond the existing studies that focused on prediction and deduction of major factors.

Ensemble Deep Learning Model using Random Forest for Patient Shock Detection

  • Minsu Jeong;Namhwa Lee;Byuk Sung Ko;Inwhee Joe
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.4
    • /
    • pp.1080-1099
    • /
    • 2023
  • Digital healthcare combined with telemedicine services in the form of convergence with digital technology and AI is developing rapidly. Digital healthcare research is being conducted on many conditions including shock. However, the causes of shock are diverse, and the treatment is very complicated, requiring a high level of medical knowledge. In this paper, we propose a shock detection method based on the correlation between shock and data extracted from hemodynamic monitoring equipment. From the various parameters expressed by this equipment, four parameters closely related to patient shock were used as the input data for a machine learning model in order to detect the shock. Using the four parameters as input data, that is, feature values, a random forest-based ensemble machine learning model was constructed. The value of the mean arterial pressure was used as the correct answer value, the so called label value, to detect the patient's shock state. The performance was then compared with the decision tree and logistic regression model using a confusion matrix. The average accuracy of the random forest model was 92.80%, which shows superior performance compared to other models. We look forward to our work playing a role in helping medical staff by making recommendations for the diagnosis and treatment of complex and difficult cases of shock.

Application of machine learning for merging multiple satellite precipitation products

  • Van, Giang Nguyen;Jung, Sungho;Lee, Giha
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.134-134
    • /
    • 2021
  • Precipitation is a crucial component of water cycle and play a key role in hydrological processes. Traditionally, gauge-based precipitation is the main method to achieve high accuracy of rainfall estimation, but its distribution is sparsely in mountainous areas. Recently, satellite-based precipitation products (SPPs) provide grid-based precipitation with spatio-temporal variability, but SPPs contain a lot of uncertainty in estimated precipitation, and the spatial resolution quite coarse. To overcome these limitations, this study aims to generate new grid-based daily precipitation using Automatic weather system (AWS) in Korea and multiple SPPs(i.e. CHIRPSv2, CMORPH, GSMaP, TRMMv7) during the period of 2003-2017. And this study used a machine learning based Random Forest (RF) model for generating new merging precipitation. In addition, several statistical linear merging methods are used to compare with the results of the RF model. In order to investigate the efficiency of RF, observed data from 64 observed Automated Synoptic Observation System (ASOS) were collected to evaluate the accuracy of the products through Kling-Gupta efficiency (KGE), probability of detection (POD), false alarm rate (FAR), and critical success index (CSI). As a result, the new precipitation generated through the random forest model showed higher accuracy than each satellite rainfall product and spatio-temporal variability was better reflected than other statistical merging methods. Therefore, a random forest-based ensemble satellite precipitation product can be efficiently used for hydrological simulations in ungauged basins such as the Mekong River.

  • PDF

Construction of an Internet of Things Industry Chain Classification Model Based on IRFA and Text Analysis

  • Zhimin Wang
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.215-225
    • /
    • 2024
  • With the rapid development of Internet of Things (IoT) and big data technology, a large amount of data will be generated during the operation of related industries. How to classify the generated data accurately has become the core of research on data mining and processing in IoT industry chain. This study constructs a classification model of IoT industry chain based on improved random forest algorithm and text analysis, aiming to achieve efficient and accurate classification of IoT industry chain big data by improving traditional algorithms. The accuracy, precision, recall, and AUC value size of the traditional Random Forest algorithm and the algorithm used in the paper are compared on different datasets. The experimental results show that the algorithm model used in this paper has better performance on different datasets, and the accuracy and recall performance on four datasets are better than the traditional algorithm, and the accuracy performance on two datasets, P-I Diabetes and Loan Default, is better than the random forest model, and its final data classification results are better. Through the construction of this model, we can accurately classify the massive data generated in the IoT industry chain, thus providing more research value for the data mining and processing technology of the IoT industry chain.

Analysis of Debris Flow Hazard Zone by the Optimal Parameters Extraction of Random Walk Model - Case on Debris Flow Area of Bonghwa County in Gyeongbuk Province - (Random Walk Model의 최적 파라미터 추출에 의한 토석류 피해범위 분석 - 경북 봉화군 토석류 발생지를 대상으로 -)

  • Lee, Chang-Woo;Woo, Choongshik;Youn, Ho-Joong
    • Journal of Korean Society of Forest Science
    • /
    • v.100 no.4
    • /
    • pp.664-671
    • /
    • 2011
  • Random Walk Model can predict the sediment areas of debris flow but it must be extracted three parameters fitted topographical environment. This study developed the method to extract the optimal values of three parameters - Once flowing volume, Stopping slope and Gravity weight - for Random Walk Model. And the extracted parameters were validated by aerial photographs of the debris flowed area. To extract the optimal parameters was randomly performed, limiting the range values of three parameters and developing an accuracy decision method that is called the rate of concordance. The set of the optimal parameters was decided on highest the rate of concordance and a consistency. As a result, the optimal parameters in Bonghwa county were showed that the once flowing volume is $1.0m^3$, the stopping slope is $4.2^{\circ}$ and the gravity weight is 2 when the rate of concordance is -0.2. The validating result of the optimal parameters showed closely that the rate of concordance is average -0.2.

Design and Implementation of Indoor Location Recognition System based on Fingerprint and Random Forest (핑거프린트와 랜덤포레스트 기반 실내 위치 인식 시스템 설계와 구현)

  • Lee, Sunmin;Moon, Nammee
    • Journal of Broadcast Engineering
    • /
    • v.23 no.1
    • /
    • pp.154-161
    • /
    • 2018
  • As the number of smartphone users increases, research on indoor location recognition service is necessary. Access to indoor locations is predominantly WiFi, Bluetooth, etc., but in most quarters, WiFi is equipped with WiFi functionality, which uses WiFi features to provide WiFi functionality. The study uses the random forest algorithm, which employs the fingerprint index of the acquired WiFi and the use of the multI-value classification method, which employs the receiver signal strength of the acquired WiFi. As the data of the fingerprint, a total of 4 radio maps using the Mac address together with the received signal strength were used. The experiment was conducted in a limited indoor space and compared to an indoor location recognition system using an existing random forest, similar to the method proposed in this study for experimental analysis. Experiments have shown that the system's positioning accuracy as suggested by this study is approximately 5.8 % higher than that of a conventional indoor location recognition system using a random forest, and that its location recognition speed is consistent and faster than that of a study.

Speed-limit Sign Recognition Using Convolutional Neural Network Based on Random Forest (랜덤 포레스트 분류기 기반의 컨벌루션 뉴럴 네트워크를 이용한 속도제한 표지판 인식)

  • Lee, EunJu;Nam, Jae-Yeal;Ko, ByoungChul
    • Journal of Broadcast Engineering
    • /
    • v.20 no.6
    • /
    • pp.938-949
    • /
    • 2015
  • In this paper, we propose a robust speed-limit sign recognition system which is durable to any sign changes caused by exterior damage or color contrast due to light direction. For recognition of speed-limit sign, we apply CNN which is showing an outstanding performance in pattern recognition field. However, original CNN uses multiple hidden layers to extract features and uses fully-connected method with MLP(Multi-layer perceptron) on the result. Therefore, the major demerit of conventional CNN is to require a long time for training and testing. In this paper, we apply randomly-connected classifier instead of fully-connected classifier by combining random forest with output of 2 layers of CNN. We prove that the recognition results of CNN with random forest show best performance than recognition results of CNN with SVM (Support Vector Machine) or MLP classifier when we use eight speed-limit signs of GTSRB (German Traffic Sign Recognition Benchmark).

Convergence study to detect metabolic syndrome risk factors by gender difference (성별에 따른 대사증후군의 위험요인 탐색을 위한 융복합 연구)

  • Lee, So-Eun;Rhee, Hyun-Sill
    • Journal of Digital Convergence
    • /
    • v.19 no.12
    • /
    • pp.477-486
    • /
    • 2021
  • This study was conducted to detect metabolic syndrome risk factors and gender difference in adults. 18,616 cases of adults are collected by Korea Health and Nutrition Examination Study from 2016 to 2019. Using 4 types of machine Learning(Logistic Regression, Decision Tree, Naïve Bayes, Random Forest) to predict Metabolic Syndrome. The results showed that the Random Forest was superior to other methods in men and women. In both of participants, BMI, diet(fat, vitamin C, vitamin A, protein, energy intake), number of underlying chronic disease and age were the upper importance. In women, education level, menarche age, menopause was additional upper importance and age, number of underlying chronic disease were more powerful importance than men. Future study have to verify various strategy to prevent metabolic syndrome.