• Title/Summary/Keyword: AutoML

Search Result 68, Processing Time 0.02 seconds

Data Processing of AutoML-based Classification Models for Improving Performance in Unbalanced Classes (불균형 클래스에서 AutoML 기반 분류 모델의 성능 향상을 위한 데이터 처리)

  • Lee, Dong-Joon;Kang, Ji-Soo;Chung, Kyungyong
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.49-54
    • /
    • 2021
  • With the recent development of smart healthcare technology, interest in daily diseases is increasing. However, healthcare data has an imbalance between positive and negative data. This is caused by the difficulty of collecting data because there are relatively many people who are not patients compared to patients with certain diseases. Data imbalances need to be adjusted because they affect performance in ongoing learning during disease prediction and analysis. Therefore, in this paper, We replace missing values through multiple imputation in detection models to determine whether they are prevalent or not, and resolve data imbalances through over-sampling. Based on AutoML using preprocessed data, We generate several models and select top 3 models to generate ensemble models.

Combining AutoML and XAI: Automating machine learning models and improving interpretability (AutoML 과 XAI 의 결합 : 기계학습 모델의 자동화와 해석력 향상을 위하여)

  • Min Hyeok Son;Nam Hun Kim;Hyeon Ji Lee;Do Yeon Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.924-925
    • /
    • 2023
  • 본 연구는 최근 기계학습 모델의 복잡성 증가와 '블랙 박스'로 인식된 머신러닝 모델의 해석 문제에 주목하였다. 이를 해결하기 위해, AutoML 기술을 사용하여 효율적으로 최적의 모델을 탐색하고, XAI 기법을 도입하여 모델의 예측 과정에 대한 투명성을 확보하려 하였다. XAI 기법을 도입한 방식은 전통적인 방법에 비해 뛰어난 해석력을 제공하며, 사용자가 머신러닝 모델의 예측 근거와 그 타당성을 명확히 이해할 수 있음을 확인하였다.

Cognitive Impairment Prediction Model Using AutoML and Lifelog

  • Hyunchul Choi;Chiho Yoon;Sae Bom Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.11
    • /
    • pp.53-63
    • /
    • 2023
  • This study developed a cognitive impairment predictive model as one of the screening tests for preventing dementia in the elderly by using Automated Machine Learning(AutoML). We used 'Wearable lifelog data for high-risk dementia patients' of National Information Society Agency, then conducted using PyCaret 3.0.0 in the Google Colaboratory environment. This study analysis steps are as follows; first, selecting five models demonstrating excellent classification performance for the model development and lifelog data analysis. Next, using ensemble learning to integrate these models and assess their performance. It was found that Voting Classifier, Gradient Boosting Classifier, Extreme Gradient Boosting, Light Gradient Boosting Machine, Extra Trees Classifier, and Random Forest Classifier model showed high predictive performance in that order. This study findings, furthermore, emphasized on the the crucial importance of 'Average respiration per minute during sleep' and 'Average heart rate per minute during sleep' as the most critical feature variables for accurate predictions. Finally, these study results suggest that consideration of the possibility of using machine learning and lifelog as a means to more effectively manage and prevent cognitive impairment in the elderly.

Optimizing Input Parameters of Paralichthys olivaceus Disease Classification based on SHAP Analysis (SHAP 분석 기반의 넙치 질병 분류 입력 파라미터 최적화)

  • Kyung-Won Cho;Ran Baik
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.6
    • /
    • pp.1331-1336
    • /
    • 2023
  • In text-based fish disease classification using machine learning, there is a problem that the input parameters of the machine learning model are too many, but due to performance problems, the input parameters cannot be arbitrarily reduced. This paper proposes a method of optimizing input parameters specialized for Paralichthys olivaceus disease classification using SHAP analysis techniques to solve this problem,. The proposed method includes data preprocessing of disease information extracted from the halibut disease questionnaire by applying the SHAP analysis technique and evaluating a machine learning model using AutoML. Through this, the performance of the input parameters of AutoML is evaluated and the optimal input parameter combination is derived. In this study, the proposed method is expected to be able to maintain the existing performance while reducing the number of input parameters required, which will contribute to enhancing the efficiency and practicality of text-based Paralichthys olivaceus disease classification.

An Artificial Intelligence Approach to Waterbody Detection of the Agricultural Reservoirs in South Korea Using Sentinel-1 SAR Images (Sentinel-1 SAR 영상과 AI 기법을 이용한 국내 중소규모 농업저수지의 수표면적 산출)

  • Choi, Soyeon;Youn, Youjeong;Kang, Jonggu;Park, Ganghyun;Kim, Geunah;Lee, Seulchan;Choi, Minha;Jeong, Hagyu;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_3
    • /
    • pp.925-938
    • /
    • 2022
  • Agricultural reservoirs are an important water resource nationwide and vulnerable to abnormal climate effects such as drought caused by climate change. Therefore, it is required enhanced management for appropriate operation. Although water-level tracking is necessary through continuous monitoring, it is challenging to measure and observe on-site due to practical problems. This study presents an objective comparison between multiple AI models for water-body extraction using radar images that have the advantages of wide coverage, and frequent revisit time. The proposed methods in this study used Sentinel-1 Synthetic Aperture Radar (SAR) images, and unlike common methods of water extraction based on optical images, they are suitable for long-term monitoring because they are less affected by the weather conditions. We built four AI models such as Support Vector Machine (SVM), Random Forest (RF), Artificial Neural Network (ANN), and Automated Machine Learning (AutoML) using drone images, sentinel-1 SAR and DSM data. There are total of 22 reservoirs of less than 1 million tons for the study, including small and medium-sized reservoirs with an effective storage capacity of less than 300,000 tons. 45 images from 22 reservoirs were used for model training and verification, and the results show that the AutoML model was 0.01 to 0.03 better in the water Intersection over Union (IoU) than the other three models, with Accuracy=0.92 and mIoU=0.81 in a test. As the result, AutoML performed as well as the classical machine learning methods and it is expected that the applicability of the water-body extraction technique by AutoML to monitor reservoirs automatically.

Comparing automated and non-automated machine learning for autism spectrum disorders classification using facial images

  • Elshoky, Basma Ramdan Gamal;Younis, Eman M.G.;Ali, Abdelmgeid Amin;Ibrahim, Osman Ali Sadek
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.613-623
    • /
    • 2022
  • Autism spectrum disorder (ASD) is a developmental disorder associated with cognitive and neurobehavioral disorders. It affects the person's behavior and performance. Autism affects verbal and non-verbal communication in social interactions. Early screening and diagnosis of ASD are essential and helpful for early educational planning and treatment, the provision of family support, and for providing appropriate medical support for the child on time. Thus, developing automated methods for diagnosing ASD is becoming an essential need. Herein, we investigate using various machine learning methods to build predictive models for diagnosing ASD in children using facial images. To achieve this, we used an autistic children dataset containing 2936 facial images of children with autism and typical children. In application, we used classical machine learning methods, such as support vector machine and random forest. In addition to using deep-learning methods, we used a state-of-the-art method, that is, automated machine learning (AutoML). We compared the results obtained from the existing techniques. Consequently, we obtained that AutoML achieved the highest performance of approximately 96% accuracy via the Hyperpot and tree-based pipeline optimization tool optimization. Furthermore, AutoML methods enabled us to easily find the best parameter settings without any human efforts for feature engineering.

Prediction of Landslides and Determination of Its Variable Importance Using AutoML (AutoML을 이용한 산사태 예측 및 변수 중요도 산정)

  • Nam, KoungHoon;Kim, Man-Il;Kwon, Oil;Wang, Fawu;Jeong, Gyo-Cheol
    • The Journal of Engineering Geology
    • /
    • v.30 no.3
    • /
    • pp.315-325
    • /
    • 2020
  • This study was performed to develop a model to predict landslides and determine the variable importance of landslides susceptibility factors based on the probabilistic prediction of landslides occurring on slopes along the road. Field survey data of 30,615 slopes from 2007 to 2020 in Korea were analyzed to develop a landslide prediction model. Of the total 131 variable factors, 17 topographic factors and 114 geological factors (including 89 bedrocks) were used to predict landslides. Automated machine learning (AutoML) was used to classify landslides and non-landslides. The verification results revealed that the best model, an extremely randomized tree (XRT) with excellent predictive performance, yielded 83.977% of prediction rates on test data. As a result of the analysis to determine the variable importance of the landslide susceptibility factors, it was composed of 10 topographic factors and 9 geological factors, which was presented as a percentage for each factor. This model was evaluated probabilistically and quantitatively for the likelihood of landslide occurrence by deriving the ranking of variable importance using only on-site survey data. It is considered that this model can provide a reliable basis for slope safety assessment through field surveys to decision-makers in the future.

A Study on the Prediction of Nitrogen Oxide Emissions in Rotary Kiln Process using Machine Learning (머신러닝 기법을 이용한 로터리 킬른 공정의 질소산화물 배출예측에 관한 연구)

  • Je-Hyeung Yoo;Cheong-Yeul Park;Jae Kwon Bae
    • Journal of Industrial Convergence
    • /
    • v.21 no.7
    • /
    • pp.19-27
    • /
    • 2023
  • As the secondary battery market expands, the process of producing laterite ore using the rotary kiln and electric furnace method is expanding worldwide. As ESG management expands, the management of air pollutants such as nitrogen oxides in exhaust gases is strengthened. The rotary kiln, one of the main facilities of the pyrometallurgy process, is a facility for drying and preliminary reduction of ore, and it generate nitrogen oxides, thus prediction of nitrogen oxide is important. In this study, LSTM for regression prediction and LightGBM for classification prediction were used to predict and then model optimization was performed using AutoML. When applying LSTM, the predicted value after 5 minutes was 0.86, MAE 5.13ppm, and after 40 minutes, the predicted value was 0.38 and MAE 10.84ppm. As a result of applying LightGBM for classification prediction, the test accuracy rose from 0.75 after 5 minutes to 0.61 after 40 minutes, to a level that can be used for actual operation, and as a result of model optimization through AutoML, the accuracy of the prediction after 5 minutes improved from 0.75 to 0.80 and from 0.61 to 0.70. Through this study, nitrogen oxide prediction values can be applied to actual operations to contribute to compliance with air pollutant emission regulations and ESG management.

A study on automated soil moisture monitoring methods for the Korean peninsula based on Google Earth Engine (Google Earth Engine 기반의 한반도 토양수분 모니터링 자동화 기법 연구)

  • Jang, Wonjin;Chung, Jeehun;Lee, Yonggwan;Kim, Jinuk;Kim, Seongjoon
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.9
    • /
    • pp.615-626
    • /
    • 2024
  • To accurately and efficiently monitor soil moisture (SM) across South Korea, this study developed a SM estimation model that integrates the cloud computing platform Google Earth Engine (GEE) and Automated Machine Learning (AutoML). Various spatial information was utilized based on Terra MODIS (Moderate Resolution Imaging Spectroradiometer) and the global precipitation observation satellite GPM (Global Precipitation Measurement) to test optimal input data combinations. The results indicated that GPM-based accumulated dry-days, 5-day antecedent average precipitation, NDVI (Normalized Difference Vegetation Index), the sum of LST (Land Surface Temperature) acquired during nighttime and daytime, soil properties (sand and clay content, bulk density), terrain data (elevation and slope), and seasonal classification had high feature importance. After setting the objective function (Determination of coefficient, R2 ; Root Mean Square Error, RMSE; Mean Absolute Percent Error, MAPE) using AutoML for the combination of the aforementioned data, a comparative evaluation of machine learning techniques was conducted. The results revealed that tree-based models exhibited high performance, with Random Forest demonstrating the best performance (R2 : 0.72, RMSE: 2.70 vol%, MAPE: 0.14).

Machine Learning-Based Prediction Technology for Medical Treatment Period of Automobile Insurance Accident Patients (머신러닝 기반의 자동차보험 사고 환자의 진료 기간 예측 기술)

  • Kyung-Keun Byun;Doeg-Gyu Lee;Hyung-Dong Lee
    • Convergence Security Journal
    • /
    • v.23 no.1
    • /
    • pp.89-95
    • /
    • 2023
  • In order to help reduce the medical expenses of patients with auto insurance accidents, this study predicted the treatment period, which is the most important factor in the medical expenses of patients in their 40s and 50s, and analyzed the factors affecting the treatment period. To this end, a mechine learning model using five algorithms such as Decision Tree was created, and its performance was compared and analyzed between models. There were three algorithms that showed good performance including Decison Tree, Gradient Boost, and XGBoost. In addition, as a result of analyzing the factors affecting the prediction of the treatment period, the type of hospital, the treatment area, age, and gender were found. Through these studies, easy research methods such as the use of AutoML were presented, and we hope that the results of this study will help policies to reduce medical expenses for automobile insurance accidents.