• 제목/요약/키워드: AdaBoost learning

검색결과 80건 처리시간 0.024초

Prediction of tunneling parameters for ultra-large diameter slurry shield TBM in cross-river tunnels based on integrated algorithms

  • Shujun Xu
    • Geomechanics and Engineering
    • /
    • 제38권1호
    • /
    • pp.69-77
    • /
    • 2024
  • The development of shield-driven cross-river tunnels in China is witnessing a notable shift towards larger diameters, longer distances, and higher water pressures due to the more complex excavation environment. Complex geological formations, such as fault and karst cavities, pose significant construction risks. Real-time adjustment of shield tunneling parameters based on parameter prediction is the key to ensuring the safety and efficiency of shield tunneling. In this study, prediction models for the torque and thrust of the cutter plate of ultra-large diameter slurry shield TBMs is established based on integrated learning algorithms, by analyzing the real data of Heyan Road cross-river tunnel. The influence of geological complexities at the excavation face, substantial burial depth, and high water level on the slurry shield tunneling parameters are considered in the models. The results reveal that the predictive models established by applying Random Forest and AdaBoost algorithms exhibit strong agreement with actual data, which indicates that the good adaptability and predictive accuracy of these two models. The models proposed in this study can be applied in the real-time prediction and adaptive adjustment of the tunneling parameters for shield tunneling under complex geological conditions.

머신러닝을 이용한 기후변화에 따른 천궁 생리 활성 성분 예측 모델 연구 (A Study on the Prediction Model for Bioactive Components of Cnidium officinale Makino according to Climate Change using Machine Learning)

  • 이현조;구현정;이경철;주원균;채철주
    • 스마트미디어저널
    • /
    • 제12권10호
    • /
    • pp.93-101
    • /
    • 2023
  • 최근 기온 상승, 가뭄, 홍수 등 기후변화가 세계적인 문제로 대두되고 있으며, 농업분야에서는 작물의 특성과 생산성에 많은 영향을 미칠 것으로 예측하고 있다. 천궁은 전통적으로 사용되는 한약재뿐만 아니라 건강기능식품, 천연물의약품, 생활소재 등 다양한 산업적 원료로 활용되고 있으나, 연작장해, 기후변화 등 위협 요인으로 인한 생산성이 감소되고 있다. 그러므로 본 논문에서는 기후변화에 취약한 대표 약용 작물인 천궁의 기후변화 시나리오에 따른 생리 활성 성분 지표를 예측할 수 있는 모델을 제안한다. 먼저 기상 정보와 생리 반응, 생리 활성 성분 정보의 수집 데이터 불균형 문제를 해결하기 위해 CTGAN 알고리즘을 이용하여 데이터를 증강하였다. 증강 데이터 품질 측정을 위해 Column Shape, Column Pair Trends를 이용하였으며 평균 88% Overall Quality를 달성하였다. 증강 데이터를 이용하여 지상부와 지하부로 나누어 페놀과 플라보노이드 함량을 예측하기 위해 5가지 모델 RF, SVR, XGBoost, AdaBoost, LightBGM을 이용하여 평가하였다. 모델 성능 평가 결과 XGBoost 모델이 천궁 생리 활성 성분 예측에 가장 우수한 성능을 보였으며, SVR 모델 대비 약 2배 정도의 향상된 정확도를 확인할 수 있었다.

Comparison of three boosting methods in parent-offspring trios for genotype imputation using simulation study

  • Mikhchi, Abbas;Honarvar, Mahmood;Kashan, Nasser Emam Jomeh;Zerehdaran, Saeed;Aminafshar, Mehdi
    • Journal of Animal Science and Technology
    • /
    • 제58권1호
    • /
    • pp.1.1-1.6
    • /
    • 2016
  • Background: Genotype imputation is an important process of predicting unknown genotypes, which uses reference population with dense genotypes to predict missing genotypes for both human and animal genetic variations at a low cost. Machine learning methods specially boosting methods have been used in genetic studies to explore the underlying genetic profile of disease and build models capable of predicting missing values of a marker. Methods: In this study strategies and factors affecting the imputation accuracy of parent-offspring trios compared from lower-density SNP panels (5 K) to high density (10 K) SNP panel using three different Boosting methods namely TotalBoost (TB), LogitBoost (LB) and AdaBoost (AB). The methods employed using simulated data to impute the un-typed SNPs in parent-offspring trios. Four different datasets of G1 (100 trios with 5 k SNPs), G2 (100 trios with 10 k SNPs), G3 (500 trios with 5 k SNPs), and G4 (500 trio with 10 k SNPs) were simulated. In four datasets all parents were genotyped completely, and offspring genotyped with a lower density panel. Results: Comparison of the three methods for imputation showed that the LB outperformed AB and TB for imputation accuracy. The time of computation were different between methods. The AB was the fastest algorithm. The higher SNP densities resulted the increase of the accuracy of imputation. Larger trios (i.e. 500) was better for performance of LB and TB. Conclusions: The conclusion is that the three methods do well in terms of imputation accuracy also the dense chip is recommended for imputation of parent-offspring trios.

탄약검사기록 데이터 분석 및 탄약상태기호 분류 모델 개발 (Analysis of Ammunition Inspection Record Data and Development of Ammunition Condition Code Classification Model)

  • 정영진;홍지수;김솔잎;강성우
    • 대한안전경영과학회지
    • /
    • 제26권2호
    • /
    • pp.23-31
    • /
    • 2024
  • In the military, ammunition and explosives stored and managed can cause serious damage if mishandled, thus securing safety through the utilization of ammunition reliability data is necessary. In this study, exploratory data analysis of ammunition inspection records data is conducted to extract reliability information of stored ammunition and to predict the ammunition condition code, which represents the lifespan information of the ammunition. This study consists of three stages: ammunition inspection record data collection and preprocessing, exploratory data analysis, and classification of ammunition condition codes. For the classification of ammunition condition codes, five models based on boosting algorithms are employed (AdaBoost, GBM, XGBoost, LightGBM, CatBoost). The most superior model is selected based on the performance metrics of the model, including Accuracy, Precision, Recall, and F1-score. The ammunition in this study was primarily produced from the 1980s to the 1990s, with a trend of increased inspection volume in the early stages of production and around 30 years after production. Pre-issue inspections (PII) were predominantly conducted, and there was a tendency for the grade of ammunition condition codes to decrease as the storage period increased. The classification of ammunition condition codes showed that the CatBoost model exhibited the most superior performance, with an Accuracy of 93% and an F1-score of 93%. This study emphasizes the safety and reliability of ammunition and proposes a model for classifying ammunition condition codes by analyzing ammunition inspection record data. This model can serve as a tool to assist ammunition inspectors and is expected to enhance not only the safety of ammunition but also the efficiency of ammunition storage management.

에이다 부스트를 활용한 건설현장 추락재해의 강도 예측과 영향요인 분석 (Analysis of Occupational Injury and Feature Importance of Fall Accidents on the Construction Sites using Adaboost)

  • 최재현;류한국
    • 대한건축학회논문집:구조계
    • /
    • 제35권11호
    • /
    • pp.155-162
    • /
    • 2019
  • The construction industry is the highest safety accident causing industry as 28.55% portion of all industries' accidents in Korea. In particular, falling is the highest accidents type composed of 60.16% among the construction field accidents. Therefore, we analyzed the factors of major disaster affecting the fall accident and then derived feature importances by considering various variables. We used data collected from Korea Occupational Safety & Health Agency (KOSHA) for learning and predicting in the proposed model. We have an effort to predict the degree of occupational fall accidents by using the machine learning model, i.e., Adaboost, short for Adaptive Boosting. Adaboost is a machine learning meta-algorithm which can be used in conjunction with many other types of learning algorithms to improve performance. Decision trees were combined with AdaBoost in this model to predict and classify the degree of occupational fall accidents. HyOperpt was also used to optimize hyperparameters and to combine k-fold cross validation by hierarchy. We extracted and analyzed feature importances and affecting fall disaster by permutation technique. In this study, we verified the degree of fall accidents with predictive accuracy. The machine learning model was also confirmed to be applicable to the safety accident analysis in construction site. In the future, if the safety accident data is accumulated automatically in the network system using IoT(Internet of things) technology in real time in the construction site, it will be possible to analyze the factors and types of accidents according to the site conditions from the real time data.

파티클 필터를 장착한 가중된 다중 인스턴스학습을 이용한 전방차량 추적 (Forward Vehicle Tracking Based on Weighted Multiple Instance Learning Equipped with Particle Filter)

  • 박근호;이준환
    • 한국지능시스템학회논문지
    • /
    • 제25권4호
    • /
    • pp.377-385
    • /
    • 2015
  • 본 논문에서는 파티클 필터를 장착하고 WMIL(Weighted Multiple Instance Learning)을 이용한 전방차량 추적 알고리즘을 제안하였다. 제안된 알고리즘에서 영상표현은 Haar-like 특징들을 사용하고 차량인식 결과는 추적하고자 하는 전방차량의 위치를 알아내는데 사용된다. 제안된 방식에서 WMIL과 파티클 필터를 결합하기 위해 기존의 외관모델을 이용한 추적에서 탐색영역에서 영상조각의 추적객체 신뢰도 맵을 계산하는 대신에 파티클 필터의 전파, 관측, 추정, 선택 그리고 분류기 훈련 등의 단계를 매 프래임 마다 순차적으로 수행하여 객체의 새로운 위치를 갱신하였다. 제안된 전방차량 추적방식은 실험을 통해 Ada-boost, MIL(Multiple Instance Learning)이나 WMIL 방법을 이용하는 추적에 비해 파티클 필터로 인해 계산량 증가는 불가피하나 추적의 질적인 정확도는 국도, 고속도로, 터널 및 시내도로 등의 실험 동영상에서 추적대상의 위치오차가 평균 4.5화소 정도로 기존의 추적방법들에 비해 크게 개선되는 것을 확인하였다.

Intelligent System for the Prediction of Heart Diseases Using Machine Learning Algorithms with Anew Mixed Feature Creation (MFC) technique

  • Rawia Elarabi;Abdelrahman Elsharif Karrar;Murtada El-mukashfi El-taher
    • International Journal of Computer Science & Network Security
    • /
    • 제23권5호
    • /
    • pp.148-162
    • /
    • 2023
  • Classification systems can significantly assist the medical sector by allowing for the precise and quick diagnosis of diseases. As a result, both doctors and patients will save time. A possible way for identifying risk variables is to use machine learning algorithms. Non-surgical technologies, such as machine learning, are trustworthy and effective in categorizing healthy and heart-disease patients, and they save time and effort. The goal of this study is to create a medical intelligent decision support system based on machine learning for the diagnosis of heart disease. We have used a mixed feature creation (MFC) technique to generate new features from the UCI Cleveland Cardiology dataset. We select the most suitable features by using Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination with Random Forest feature selection (RFE-RF) and the best features of both LASSO RFE-RF (BLR) techniques. Cross-validated and grid-search methods are used to optimize the parameters of the estimator used in applying these algorithms. and classifier performance assessment metrics including classification accuracy, specificity, sensitivity, precision, and F1-Score, of each classification model, along with execution time and RMSE the results are presented independently for comparison. Our proposed work finds the best potential outcome across all available prediction models and improves the system's performance, allowing physicians to diagnose heart patients more accurately.

기계학습을 이용한 돈사 급수량 예측방안 개발 (Prediction of Water Usage in Pig Farm based on Machine Learning)

  • 이웅섭;류종열;반태원;김성환;최희철
    • 한국정보통신학회논문지
    • /
    • 제21권8호
    • /
    • pp.1560-1566
    • /
    • 2017
  • 최근 사물 인터넷 센서가 설치된 스마트 돈사의 보급을 통해 돈사 관련 빅데이터 축적이 가능해졌고, 다양한 기계 학습방안들이 수집된 데이터에 적용되어 축산농가의 생산성을 향상시키고 있다. 본 연구에서는 다양한 기계학습 방안을 이용하여 돈사관리에서 가장 중요한 요소 중 하나인 급수량을 예측하였다. 구체적으로 실제 돈사에서 수집된 데이터에 회귀 방안인 선형회귀, 회귀트리 및 아다부스트 회귀 방안과 분류 방안인 로지스틱 분류, 결정트리 및 서포트 벡터 머신 (SVM) 분류방안을 적용하여 돈사의 온도와 습도를 기반으로 급수량을 예측하였다. 성능 분석을 통해서 제안한 방안이 높은 정확도로 급수량을 예측하는 것을 확인할 수 있었다. 제안한 방안은 돈사의 급수시설 이상을 조기에 파악하는데 활용되어 가축을 폐사를 막고 돈사 생산성을 높이는데 활용될 수 있다.

영작문 자동채점 시스템 개발에서 학습데이터 부족 문제 해결을 위한 앙상블 기법 적용의 효과 (Effect of Application of Ensemble Method on Machine Learning with Insufficient Training Set in Developing Automated English Essay Scoring System)

  • 이경호;이공주
    • 정보과학회 논문지
    • /
    • 제42권9호
    • /
    • pp.1124-1132
    • /
    • 2015
  • 일반적으로, 교사 학습 알고리즘이 적절히 학습되기 위해서는 레이블의 편향이 없는 충분한 양의 학습데이터가 필요하다. 그러나 영작문 자동채점 시스템 개발을 위한 충분하고 편향되지 않은 학습데이터를 수집하는 것은 어려운 일이다. 또한 영어 작문 평가의 경우, 전체적인 답안 수준에 대한 다면적인 평가가 이루어진다. 적고 편향되기 쉬운 학습데이터와 이를 이용한 여러 평가영역에 대한 학습모델을 생성해야하기 때문에, 이를 위한 적절한 기계학습 알고리즘을 결정하기 어렵다. 본 논문에서는 이러한 문제를 앙상블학습을 통해 완화할 수 있음을 실험에 통해 보이고자 한다. 실제 중, 고등학교 학생들을 대상으로 시행된 단문형 영작문 채점 결과를 학습데이터 개수와 편향성을 조절하여 실험하였다. 학습데이터의 개수 변화와 편향성 변화의 실험 결과, 에이다부스트 알고리즘을 적용한 결과를 투표로 결합한 앙상블 기법이 다른 알고리즘들 보다 전반적으로 더 나은 성능을 나타냄을 실험을 통해 나타내었다.

Classifying Social Media Users' Stance: Exploring Diverse Feature Sets Using Machine Learning Algorithms

  • Kashif Ayyub;Muhammad Wasif Nisar;Ehsan Ullah Munir;Muhammad Ramzan
    • International Journal of Computer Science & Network Security
    • /
    • 제24권2호
    • /
    • pp.79-88
    • /
    • 2024
  • The use of the social media has become part of our daily life activities. The social web channels provide the content generation facility to its users who can share their views, opinions and experiences towards certain topics. The researchers are using the social media content for various research areas. Sentiment analysis, one of the most active research areas in last decade, is the process to extract reviews, opinions and sentiments of people. Sentiment analysis is applied in diverse sub-areas such as subjectivity analysis, polarity detection, and emotion detection. Stance classification has emerged as a new and interesting research area as it aims to determine whether the content writer is in favor, against or neutral towards the target topic or issue. Stance classification is significant as it has many research applications like rumor stance classifications, stance classification towards public forums, claim stance classification, neural attention stance classification, online debate stance classification, dialogic properties stance classification etc. This research study explores different feature sets such as lexical, sentiment-specific, dialog-based which have been extracted using the standard datasets in the relevant area. Supervised learning approaches of generative algorithms such as Naïve Bayes and discriminative machine learning algorithms such as Support Vector Machine, Naïve Bayes, Decision Tree and k-Nearest Neighbor have been applied and then ensemble-based algorithms like Random Forest and AdaBoost have been applied. The empirical based results have been evaluated using the standard performance measures of Accuracy, Precision, Recall, and F-measures.