• Title/Summary/Keyword: RandomForest

Search Result 1,014, Processing Time 0.032 seconds

Performance analysis and comparison of various machine learning algorithms for early stroke prediction

  • Vinay Padimi;Venkata Sravan Telu;Devarani Devi Ningombam
    • ETRI Journal
    • /
    • v.45 no.6
    • /
    • pp.1007-1021
    • /
    • 2023
  • Stroke is the leading cause of permanent disability in adults, and it can cause permanent brain damage. According to the World Health Organization, 795 000 Americans experience a new or recurrent stroke each year. Early detection of medical disorders, for example, strokes, can minimize the disabling effects. Thus, in this paper, we consider various risk factors that contribute to the occurrence of stoke and machine learning algorithms, for example, the decision tree, random forest, and naive Bayes algorithms, on patient characteristics survey data to achieve high prediction accuracy. We also consider the semisupervised self-training technique to predict the risk of stroke. We then consider the near-miss undersampling technique, which can select only instances in larger classes with the smaller class instances. Experimental results demonstrate that the proposed method obtains an accuracy of approximately 98.83% at low cost, which is significantly higher and more reliable compared with the compared techniques.

Research on predicting changes in crop cultivation areas due to climate change: Focusing on Hallabong (기후변화에 따른 과수작물 재배지 변화 예측 연구: 한라봉을 중심으로)

  • Park, Hye Eun;Lee, Jong Tae
    • The Journal of Information Systems
    • /
    • v.33 no.1
    • /
    • pp.31-44
    • /
    • 2024
  • Purpose The purpose of this study is to use climate data to find the algorithm with the highest Hallabong production prediction ability and to predict future Hallabong production in areas where Hallabong cultivation is expected to be possible. Design/methodology/approach The research is conducted in two stages. In the first step, find the algorithm with the highest predictive power among XGBoost, Random Forest, SVM, and LSTM methodologies. In the second stage, the algorithm found in the first stage is applied to predict future Hallabong production in three regions where Hallabong production is expected to be possible. Findings As with many prediction studies, we found that XGBoost showed the highest prediction power. Even in areas where Hallabong production is expected to be possible, Hallabong production was predicted to be highest in Hongcheon, Gangwon-do, which has the highest latitude.

Axial load prediction in double-skinned profiled steel composite walls using machine learning

  • G., Muthumari G;P. Vincent
    • Computers and Concrete
    • /
    • v.33 no.6
    • /
    • pp.739-754
    • /
    • 2024
  • This study presents an innovative AI-driven approach to assess the ultimate axial load in Double-Skinned Profiled Steel sheet Composite Walls (DPSCWs). Utilizing a dataset of 80 entries, seven input parameters were employed, and various AI techniques, including Linear Regression, Polynomial Regression, Support Vector Regression, Decision Tree Regression, Decision Tree with AdaBoost Regression, Random Forest Regression, Gradient Boost Regression Tree, Elastic Net Regression, Ridge Regression, and LASSO Regression, were evaluated. Decision Tree Regression and Random Forest Regression emerged as the most accurate models. The top three performing models were integrated into a hybrid approach, excelling in accurately estimating DPSCWs' ultimate axial load. This adaptable hybrid model outperforms traditional methods, reducing errors in complex scenarios. The validated Artificial Neural Network (ANN) model showcases less than 1% error, enhancing reliability. Correlation analysis highlights robust predictions, emphasizing the importance of steel sheet thickness. The study contributes insights for predicting DPSCW strength in civil engineering, suggesting optimization and database expansion. The research advances precise load capacity estimation, empowering engineers to enhance construction safety and explore further machine learning applications in structural engineering.

Market Timing and Seasoned Equity Offering (마켓 타이밍과 유상증자)

  • Sung Won Seo
    • Asia-Pacific Journal of Business
    • /
    • v.15 no.1
    • /
    • pp.145-157
    • /
    • 2024
  • Purpose - In this study, we propose an empirical model for predicting seasoned equity offering (SEO here after) using machine learning methods. Design/methodology/approach - The models utilize the random forest method based on decision trees that considers non-linear relationships, as well as the gradient boosting tree model. SEOs incur significant direct and indirect costs. Therefore, CEOs' decisions of seasoned equity issuances are made only when the benefits outweigh the costs, which leads to a non-linear relationship between SEOs and a determinant of them. Particularly, a variable related to market timing effectively exhibit such non-linear relations. Findings - To account for these non-linear relationships, we hypothesize that decision tree-based random forest and gradient boosting tree models are more suitable than the linear methodologies due to the non-linear relations. The results of this study support this hypothesis. Research implications or Originality - We expect that our findings can provide meaningful information to investors and policy makers by classifying companies to undergo SEOs.

Machine Learning for Flood Prediction in Indonesia: Providing Online Access for Disaster Management Control

  • Reta L. Puspasari;Daeung Yoon;Hyun Kim;Kyoung-Woong Kim
    • Economic and Environmental Geology
    • /
    • v.56 no.1
    • /
    • pp.65-73
    • /
    • 2023
  • As one of the most vulnerable countries to floods, there should be an increased necessity for accurate and reliable flood forecasting in Indonesia. Therefore, a new prediction model using a machine learning algorithm is proposed to provide daily flood prediction in Indonesia. Data crawling was conducted to obtain daily rainfall, streamflow, land cover, and flood data from 2008 to 2021. The model was built using a Random Forest (RF) algorithm for classification to predict future floods by inputting three days of rainfall rate, forest ratio, and stream flow. The accuracy, specificity, precision, recall, and F1-score on the test dataset using the RF algorithm are approximately 94.93%, 68.24%, 94.34%, 99.97%, and 97.08%, respectively. Moreover, the AUC (Area Under the Curve) of the ROC (Receiver Operating Characteristics) curve results in 71%. The objective of this research is providing a model that predicts flood events accurately in Indonesian regions 3 months prior the day of flood. As a trial, we used the month of June 2022 and the model predicted the flood events accurately. The result of prediction is then published to the website as a warning system as a form of flood mitigation.

Biomass, Primary Nutrient and Carbon Stock in a Sub-Himalayan Forest of West Bengal, India

  • Shukla, Gopal;Chakravarty, Sumit
    • Journal of Forest and Environmental Science
    • /
    • v.34 no.1
    • /
    • pp.12-23
    • /
    • 2018
  • Quantitative information on biomass and available nutrients are essential for developing sustainable forest management strategies to regulate atmospheric carbon. An attempt was made at Chilapatta Reserve Forest in Duars region of West Bengal to quantify its above and below ground carbon along with available "N", "P" and "K" in the soil. Stratified random nested quadrats were marked for soil, biomass and litter sampling. Indirect or non-destructive procedures were employed for biomass estimation. The amount of these available nutrients and organic carbon quantified in soil indicates that the forest soil is high in organic carbon and available "K" and medium in phosphorus and nitrogen. The biomass, soil carbon and total carbon (soil C+C in plant biomass) in the forest was 1,995.98, 75.83 and $973.65Mg\;ha^{-1}$. More than 90% of the carbon accumulated in the forest was contributed by the trees. The annual litter production of the forest was $5.37Mg\;ha^{-1}$. Carbon accumulation is intricately linked with site quality factors. The estimated biomass of $1,995.98Mg{\cdot}ha^{-1}$ clearly indicates this. The site quality factor i.e. tropical moist deciduous with optimum availability of soil nutrients, heavy precipitation, high mean monthly relative humidity and optimum temperature range supported luxuriant growth which was realized as higher biomass accumulation and hence higher carbon accumulated.

Development of SCAR Markers for the Identification of Phytophthora katsurae Causing Chestnut Ink Disease in Korea

  • Lee, Dong Hyeon;Lee, Sun Keun;Lee, Sang Yong;Lee, Jong Kyu
    • Mycobiology
    • /
    • v.41 no.2
    • /
    • pp.86-93
    • /
    • 2013
  • Sequence characterized amplified region (SCAR) markers are one of the most effective and accurate tools for microbial identification. In this study, we applied SCAR markers for the rapid and accurate detection of Phytophthora katsurae, the casual agent of chestnut ink disease in Korea. In this study, we developed seven SCAR markers specific to P. katsurae using random amplified polymorphic DNA (RAPD), and assessed the potential of the SCAR markers to serve as tools for identifying P. katsurae. Seven primer pairs (SOPC 1F/SOPC 1R, SOPC 1-1F/SOPC 1-1R, SOPC 3F/SOPC 3R, SOPC 4F/SOPC 4R, SOPC 4F/SOPC 4-1R, SOPD 9F/SOPD 9R, and SOPD 10F/SOPD 10R) from a sequence derived from RAPD fragments were designed for the analysis of the SCAR markers. To evaluate the specificity and sensitivity of the SCAR markers, the genomic DNA of P. katsurae was serially diluted 10-fold to final concentrations from 1 mg/mL to 1 pg/mL. The limit of detection using the SCAR markers ranged from $100{\mu}g/mL$ to 100 ng/mL. To identify the limit for detecting P. katsurae zoospores, each suspension of zoospores was serially diluted 10-fold to final concentrations from $10{\times}10^5$ to $10{\times}10^1$ zoospores/mL, and then extracted. The limit of detection by SCAR markers was approximately $10{\times}10^1$ zoospores/mL. PCR detection with SCAR markers was specific for P. katsurae, and did not produce any P. katsurae-specific PCR amplicons from 16 other Phytophthora species used as controls. This study shows that SCAR markers are a useful tool for the rapid and effective detection of P. katsurae.

Morphological and Genetic Characterization of Caffeine-Rich and -Poor Tea Tree (Camellia sinensis L.) Lines

  • Kim, Yong-Duck;Jeong, Mi-Jin;Song, Hyun-Jin;Yun, Seok-Rak;Heo, Chang-Mi;Kim, Chang-Soo;Moon, Hyun-Shik;Choi, Myung-Suk
    • Journal of agriculture & life science
    • /
    • v.45 no.5
    • /
    • pp.1-8
    • /
    • 2011
  • In this study, 160 tea tree (Camellia sinensis L.) lines were classified by caffeine content using colorimetric methods. Among them, caffeine-rich lines (HR-78, HR-137, HR-82 and HR-123) and poor lines (HP-85, HP-88, HP-19, and HP-131) were selected. To know the difference in morphological and genetic characters between caffeine-rich and poor lines, we used leaf/shoot growth and RAPD methods. Cluster pattern of morphological characters (leaf width, leaf length, leaf area and shoot length) showed that shoot length was longer in caffein-rich lines than in -poor lines. In genetic analysis, amplified DNA bands having various sizes were detected in RAPD analysis where 30 random primers were used. However, the discriminated primer set that distinguish caffein-rich tree line from -poor lines was not found. These results can be used as the basic data to determine the morphological and genetic differences among caffein-rich and -poor lines.

Effectiveness of Repeated Examination to Diagnose Enterobiasis in Nursery School Groups

  • Remm, Mare;Remm, Kalle
    • Parasites, Hosts and Diseases
    • /
    • v.47 no.3
    • /
    • pp.235-241
    • /
    • 2009
  • The aim of this study was to estimate the benefit from repeated examinations in the diagnosis of enterobiasis in nursery school groups, and to test the effectiveness of individual-based risk predictions using different methods. A total of 604 children were examined using double, and 96 using triple, anal swab examinations. The questionnaires for parents, structured observations, and interviews with supervisors were used to identify factors of possible infection risk. In order to model the risk of enterobiasis at individual level, a similarity-based machine learning and prediction software Constud was compared with data mining methods in the Statistica 8 Data Miner software package. Prevalence according to a single examination was 22.5%; the increase as a result of double examinations was 8.2%. Single swabs resulted in an estimated prevalence of 20.1% among children examined 3 times; double swabs increased this by 10.1%, and triple swabs by 7.3%. Random forest classification, boosting classification trees, and Constud correctly predicted about 2/3 of the results of the second examination. Constud estimated a mean prevalence of 31.5% in groups. Constud was able to yield the highest overall fit of individual-based predictions while boosting classification tree and random forest models were more effective in recognizing Enterobius positive persons. As a rule, the actual prevalence of enterobiasis is higher than indicated by a single examination. We suggest using either the values of the mean increase in prevalence after double examinations compared to single examinations or group estimations deduced from individual-level modelled risk predictions.

Data-driven Analysis for Future Land-use Change Prediction : Case Study on Seoul (서울 데이터 기반 필지별 용도전환 발생 예측)

  • Yun, Sung Bum;Mun, Sungchul;Park, Soon Yong;Kim, Taehyun
    • Journal of Broadcast Engineering
    • /
    • v.25 no.2
    • /
    • pp.176-184
    • /
    • 2020
  • Due to constant development and decline on Seoul areas the Seoul government is pushing various policies to regenerate declined Seoul areas. Theses various policies lead to land-use changes around numerous Seoul districts. This study aims to create prediction model which can foresee future land-use changes and while doing so, tried to derive various influential factors which leads to land-use changes. To do so, various open-data from national departments and Seoul government have been collected and implemented into random forest algorithm. The results showed promising accuracy and derived multiple influential factors which causes land-use changes around Seoul districts. The result of this study could further be implemented in policy makings for the public sectors, or could also be used as basis for studying gentrification problems happening in Seoul Area.