• Title/Summary/Keyword: Random forest algorithm

Search Result 222, Processing Time 0.024 seconds

Applicability of Image Classification Using Deep Learning in Small Area : Case of Agricultural Lands Using UAV Image (딥러닝을 이용한 소규모 지역의 영상분류 적용성 분석 : UAV 영상을 이용한 농경지를 대상으로)

  • Choi, Seok-Keun;Lee, Soung-Ki;Kang, Yeon-Bin;Seong, Seon-Kyeong;Choi, Do-Yeon;Kim, Gwang-Ho
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.38 no.1
    • /
    • pp.23-33
    • /
    • 2020
  • Recently, high-resolution images can be easily acquired using UAV (Unmanned Aerial Vehicle), so that it is possible to produce small area observation and spatial information at low cost. In particular, research on the generation of cover maps in crop production areas is being actively conducted for monitoring the agricultural environment. As a result of comparing classification performance by applying RF(Random Forest), SVM(Support Vector Machine) and CNN(Convolutional Neural Network), deep learning classification method has many advantages in image classification. In particular, land cover classification using satellite images has the advantage of accuracy and time of classification using satellite image data set and pre-trained parameters. However, UAV images have different characteristics such as satellite images and spatial resolution, which makes it difficult to apply them. In order to solve this problem, we conducted a study on the application of deep learning algorithms that can be used for analyzing agricultural lands where UAV data sets and small-scale composite cover exist in Korea. In this study, we applied DeepLab V3 +, FC-DenseNet (Fully Convolutional DenseNets) and FRRN-B (Full-Resolution Residual Networks), the semantic image classification of the state-of-art algorithm, to UAV data set. As a result, DeepLab V3 + and FC-DenseNet have an overall accuracy of 97% and a Kappa coefficient of 0.92, which is higher than the conventional classification. The applicability of the cover classification using UAV images of small areas is shown.

Application and development of a machine learning based model for identification of apartment building types - Analysis of apartment site characteristics based on main building shape - (머신러닝 기반 아파트 주동형상 자동 판별 모형 개발 및 적용 - 주동형상에 따른 아파트 개발 특성분석을 중심으로 -)

  • Sanguk HAN;Jungseok SEO;Sri Utami Purwaningati;Sri Utami Purwaningati;Jeongseob KIM
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.26 no.2
    • /
    • pp.55-67
    • /
    • 2023
  • This study aims to develop a model that can automatically identify the rooftop shape of apartment buildings using GIS and machine learning algorithms, and apply it to analyze the relationship between rooftop shape and characteristics of apartment complexes. A database of rooftop data for each building in an apartment complex was constructed using geospatial data, and individual buildings within each complex were classified into flat type, tower type, and mixed types using the random forest algorithm. In addition, the relationship between the proportion of rooftop shapes, development density, height, and other characteristics of apartment complexes was analyzed to propose the potential application of geospatial information in the real estate field. This study is expected to serve as a basic research on AI-based building type classification and to be utilized in various spatial and real estate analyses.

Ensemble Learning-Based Prediction of Good Sellers in Overseas Sales of Domestic Books and Keyword Analysis of Reviews of the Good Sellers (앙상블 학습 기반 국내 도서의 해외 판매 굿셀러 예측 및 굿셀러 리뷰 키워드 분석)

  • Do Young Kim;Na Yeon Kim;Hyon Hee Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.4
    • /
    • pp.173-178
    • /
    • 2023
  • As Korean literature spreads around the world, its position in the overseas publishing market has become important. As demand in the overseas publishing market continues to grow, it is essential to predict future book sales and analyze the characteristics of books that have been highly favored by overseas readers in the past. In this study, we proposed ensemble learning based prediction model and analyzed characteristics of the cumulative sales of more than 5,000 copies classified as good sellers published overseas over the past 5 years. We applied the five ensemble learning models, i.e., XGBoost, Gradient Boosting, Adaboost, LightGBM, and Random Forest, and compared them with other machine learning algorithms, i.e., Support Vector Machine, Logistic Regression, and Deep Learning. Our experimental results showed that the ensemble algorithm outperforms other approaches in troubleshooting imbalanced data. In particular, the LightGBM model obtained an AUC value of 99.86% which is the best prediction performance. Among the features used for prediction, the most important feature is the author's number of overseas publications, and the second important feature is publication in countries with the largest publication market size. The number of evaluation participants is also an important feature. In addition, text mining was performed on the four book reviews that sold the most among good-selling books. Many reviews were interested in stories, characters, and writers and it seems that support for translation is needed as many of the keywords of "translation" appear in low-rated reviews.

The Prediction of Survival of Breast Cancer Patients Based on Machine Learning Using Health Insurance Claim Data (건강보험 청구 데이터를 활용한 머신러닝 기반유방암 환자의 생존 여부 예측)

  • Doeggyu Lee;Kyungkeun Byun;Hyungdong Lee;Sunhee Shin
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.2
    • /
    • pp.1-9
    • /
    • 2023
  • Research using AI and big data is also being actively conducted in the health and medical fields such as disease diagnosis and treatment. Most of the existing research data used cohort data from research institutes or some patient data. In this paper, the difference in the prediction rate of survival and the factors affecting survival between breast cancer patients in their 40~50s and other age groups was revealed using health insurance review claim data held by the HIRA. As a result, the accuracy of predicting patients' survival was 0.93 on average in their 40~50s, higher than 0.86 in their 60~80s. In terms of that factor, the number of treatments was high for those in their 40~50s, and age was high for those in their 60~80s. Performance comparison with previous studies, the average precision was 0.90, which was higher than 0.81 of the existing paper. As a result of performance comparison by applied algorithm, the overall average precision of Decision Tree, Random Forest, and Gradient Boosting was 0.90, and the recall was 1.0, and the precision of multi-layer perceptrons was 0.89, and the recall was 1.0. I hope that more research will be conducted using machine learning automation(Auto ML) tools for non-professionals to enhance the use of the value for health insurance review claim data held by the HIRA.

Mapping Mammalian Species Richness Using a Machine Learning Algorithm (머신러닝 알고리즘을 이용한 포유류 종 풍부도 매핑 구축 연구)

  • Zhiying Jin;Dongkun Lee;Eunsub Kim;Jiyoung Choi;Yoonho Jeon
    • Journal of Environmental Impact Assessment
    • /
    • v.33 no.2
    • /
    • pp.53-63
    • /
    • 2024
  • Biodiversity holds significant importance within the framework of environmental impact assessment, being utilized in site selection for development, understanding the surrounding environment, and assessing the impact on species due to disturbances. The field of environmental impact assessment has seen substantial research exploring new technologies and models to evaluate and predict biodiversity more accurately. While current assessments rely on data from fieldwork and literature surveys to gauge species richness indices, limitations in spatial and temporal coverage underscore the need for high-resolution biodiversity assessments through species richness mapping. In this study, leveraging data from the 4th National Ecosystem Survey and environmental variables, we developed a species distribution model using Random Forest. This model yielded mapping results of 24 mammalian species' distribution, utilizing the species richness index to generate a 100-meter resolution map of species richness. The research findings exhibited a notably high predictive accuracy, with the species distribution model demonstrating an average AUC value of 0.82. In addition, the comparison with National Ecosystem Survey data reveals that the species richness distribution in the high-resolution species richness mapping results conforms to a normal distribution. Hence, it stands as highly reliable foundational data for environmental impact assessment. Such research and analytical outcomes could serve as pivotal new reference materials for future urban development projects, offering insights for biodiversity assessment and habitat preservation endeavors.

An Application of Support Vector Machines to Customer Loyalty Classification of Korean Retailing Company Using R Language

  • Nguyen, Phu-Thien;Lee, Young-Chan
    • The Journal of Information Systems
    • /
    • v.26 no.4
    • /
    • pp.17-37
    • /
    • 2017
  • Purpose Customer Loyalty is the most important factor of customer relationship management (CRM). Especially in retailing industry, where customers have many options of where to spend their money. Classifying loyal customers through customers' data can help retailing companies build more efficient marketing strategies and gain competitive advantages. This study aims to construct classification models of distinguishing the loyal customers within a Korean retailing company using data mining techniques with R language. Design/methodology/approach In order to classify retailing customers, we used combination of support vector machines (SVMs) and other classification algorithms of machine learning (ML) with the support of recursive feature elimination (RFE). In particular, we first clean the dataset to remove outlier and impute the missing value. Then we used a RFE framework for electing most significant predictors. Finally, we construct models with classification algorithms, tune the best parameters and compare the performances among them. Findings The results reveal that ML classification techniques can work well with CRM data in Korean retailing industry. Moreover, customer loyalty is impacted by not only unique factor such as net promoter score but also other purchase habits such as expensive goods preferring or multi-branch visiting and so on. We also prove that with retailing customer's dataset the model constructed by SVMs algorithm has given better performance than others. We expect that the models in this study can be used by other retailing companies to classify their customers, then they can focus on giving services to these potential vip group. We also hope that the results of this ML algorithm using R language could be useful to other researchers for selecting appropriate ML algorithms.

Comparison of machine learning algorithms for Chl-a prediction in the middle of Nakdong River (focusing on water quality and quantity factors) (머신러닝 기법을 활용한 낙동강 중류 지역의 Chl-a 예측 알고리즘 비교 연구(수질인자 및 수량 중심으로))

  • Lee, Sang-Min;Park, Kyeong-Deok;Kim, Il-Kyu
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.34 no.4
    • /
    • pp.277-288
    • /
    • 2020
  • In this study, we performed algorithms to predict algae of Chlorophyll-a (Chl-a). Water quality and quantity data of the middle Nakdong River area were used. At first, the correlation analysis between Chl-a and water quality and quantity data was studied. We extracted ten factors of high importance for water quality and quantity data about the two weirs. Algorithms predicted how ten factors affected Chl-a occurrence. We performed algorithms about decision tree, random forest, elastic net, gradient boosting with Python. The root mean square error (RMSE) value was used to evaluate excellent algorithms. The gradient boosting showed 10.55 of RMSE value for the Gangjeonggoryeong (GG) site and 11.43 of RMSE value for the Dalsung (DS) site. The gradient boosting algorithm showed excellent results for GG and DS sites. Prediction value for the four algorithms was also evaluated through the Receiver operating characteristic (ROC) curve and Area under curve (AUC). As a result of the evaluation, the AUC value was 0.877 at GG site and the AUC value was 0.951 at DS site. So the algorithm's ability to interpret seemed to be excellent.

Identifying Process Capability Index for Electricity Distribution System through Thermal Image Analysis (열화상 이미지 분석을 통한 배전 설비 공정능력지수 감지 시스템 개발)

  • Lee, Hyung-Geun;Hong, Yong-Min;Kang, Sung-Woo
    • Journal of Korean Society for Quality Management
    • /
    • v.49 no.3
    • /
    • pp.327-340
    • /
    • 2021
  • Purpose: The purpose of this study is to propose a system predicting whether an electricity distribution system is abnormal by analyzing the temperature of the deteriorated system. Traditional electricity distribution system abnormality diagnosis was mainly limited to post-inspection. This research presents a remote monitoring system for detecting thermal images of the deteriorated electricity distribution system efficiently hereby providing safe and efficient abnormal diagnosis to electricians. Methods: In this study, an object detection algorithm (YOLOv5) is performed using 16,866 thermal images of electricity distribution systems provided by KEPCO(Korea Electric Power Corporation). Abnormality/Normality of the extracted system images from the algorithm are classified via the limit temperature. Each classification model, Random Forest, Support Vector Machine, XGBOOST is performed to explore 463,053 temperature datasets. The process capability index is employed to indicate the quality of the electricity distribution system. Results: This research performs case study with transformers representing the electricity distribution systems. The case study shows the following states: accuracy 100%, precision 100%, recall 100%, F1-score 100%. Also the case study shows the process capability index of the transformers with the following states: steady state 99.47%, caution state 0.16%, and risk state 0.37%. Conclusion: The sum of caution and risk state is 0.53%, which is higher than the actual failure rate. Also most transformer abnormalities can be detected through this monitoring system.

Comparative characteristic of ensemble machine learning and deep learning models for turbidity prediction in a river (딥러닝과 앙상블 머신러닝 모형의 하천 탁도 예측 특성 비교 연구)

  • Park, Jungsu
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.35 no.1
    • /
    • pp.83-91
    • /
    • 2021
  • The increased turbidity in rivers during flood events has various effects on water environmental management, including drinking water supply systems. Thus, prediction of turbid water is essential for water environmental management. Recently, various advanced machine learning algorithms have been increasingly used in water environmental management. Ensemble machine learning algorithms such as random forest (RF) and gradient boosting decision tree (GBDT) are some of the most popular machine learning algorithms used for water environmental management, along with deep learning algorithms such as recurrent neural networks. In this study GBDT, an ensemble machine learning algorithm, and gated recurrent unit (GRU), a recurrent neural networks algorithm, are used for model development to predict turbidity in a river. The observation frequencies of input data used for the model were 2, 4, 8, 24, 48, 120 and 168 h. The root-mean-square error-observations standard deviation ratio (RSR) of GRU and GBDT ranges between 0.182~0.766 and 0.400~0.683, respectively. Both models show similar prediction accuracy with RSR of 0.682 for GRU and 0.683 for GBDT. The GRU shows better prediction accuracy when the observation frequency is relatively short (i.e., 2, 4, and 8 h) where GBDT shows better prediction accuracy when the observation frequency is relatively long (i.e. 48, 120, 160 h). The results suggest that the characteristics of input data should be considered to develop an appropriate model to predict turbidity.

Robust Estimation of Hand Poses Based on Learning (학습을 이용한 손 자세의 강인한 추정)

  • Kim, Sul-Ho;Jang, Seok-Woo;Kim, Gye-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.12
    • /
    • pp.1528-1534
    • /
    • 2019
  • Recently, due to the popularization of 3D depth cameras, new researches and opportunities have been made in research conducted on RGB images, but estimation of human hand pose is still classified as one of the difficult topics. In this paper, we propose a robust estimation method of human hand pose from various input 3D depth images using a learning algorithm. The proposed approach first generates a skeleton-based hand model and then aligns the generated hand model with three-dimensional point cloud data. Then, using a random forest-based learning algorithm, the hand pose is strongly estimated from the aligned hand model. Experimental results in this paper show that the proposed hierarchical approach makes robust and fast estimation of human hand posture from input depth images captured in various indoor and outdoor environments.