• 제목/요약/키워드: random forests regression

검색결과 35건 처리시간 0.025초

Covariance-based Recognition Using Machine Learning Model

  • Osman, Hassab Elgawi
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송공학회 2009년도 IWAIT
    • /
    • pp.223-228
    • /
    • 2009
  • We propose an on-line machine learning approach for object recognition, where new images are continuously added and the recognition decision is made without delay. Random forest (RF) classifier has been extensively used as a generative model for classification and regression applications. We extend this technique for the task of building incremental component-based detector. First we employ object descriptor model based on bag of covariance matrices, to represent an object region then run our on-line RF learner to select object descriptors and to learn an object classifier. Experiments of the object recognition are provided to verify the effectiveness of the proposed approach. Results demonstrate that the propose model yields in object recognition performance comparable to the benchmark standard RF, AdaBoost, and SVM classifiers.

  • PDF

Predicting the Young's modulus of frozen sand using machine learning approaches: State-of-the-art review

  • Reza Sarkhani Benemaran;Mahzad Esmaeili-Falak
    • Geomechanics and Engineering
    • /
    • 제34권5호
    • /
    • pp.507-527
    • /
    • 2023
  • Accurately estimation of the geo-mechanical parameters in Artificial Ground Freezing (AGF) is a most important scientific topic in soil improvement and geotechnical engineering. In order for this, one way is using classical and conventional constitutive models based on different theories like critical state theory, Hooke's law, and so on, which are time-consuming, costly, and troublous. The others are the application of artificial intelligence (AI) techniques to predict considered parameters and behaviors accurately. This study presents a comprehensive data-mining-based model for predicting the Young's Modulus of frozen sand under the triaxial test. For this aim, several single and hybrid models were considered including additive regression, bagging, M5-Rules, M5P, random forests (RF), support vector regression (SVR), locally weighted linear (LWL), gaussian process regression (GPR), and multi-layered perceptron neural network (MLP). In the present study, cell pressure, strain rate, temperature, time, and strain were considered as the input variables, where the Young's Modulus was recognized as target. The results showed that all selected single and hybrid predicting models have acceptable agreement with measured experimental results. Especially, hybrid Additive Regression-Gaussian Process Regression and Bagging-Gaussian Process Regression have the best accuracy based on Model performance assessment criteria.

PGA 투어의 골프 스코어 예측 및 분석 (Prediction of golf scores on the PGA tour using statistical models)

  • 임정은;임영인;송종우
    • 응용통계연구
    • /
    • 제30권1호
    • /
    • pp.41-55
    • /
    • 2017
  • 최근 골프는 많은 사람들의 취미 생활로서 자리를 잡아가고 있으며 골프와 관련된 연구도 다양하게 이루어지고 있다. 본 연구에서는 데이터 마이닝 기법을 사용하여 PGA 투어에 참여하는 선수들의 평균스코어를 예측하고 스코어에 유의한 영향을 미치는 변수들을 제시하고자 한다. 그리고 추가적으로 4개의 PGA 투어 플레이오프에 대해 상위 10명, 상위 25명의 선수들을 예측하는 것을 목표로 한다. 우리는 다양한 선형/비선형 회귀분석 방법을 이용하여 평균스코어를 예측하는데, 선형회귀분석 방법으로는 단계적 선택법, 모든 가능한 회귀모형, 라소(LASSO), 능형회귀, 주성분회귀분석을 사용하였으며 비선형회귀분석 방법으로는 트리(CART), 배깅, 그래디언트 부스팅, 신경망 모형, 랜덤 포레스트, 최근접이웃방법(KNN)을 사용하였다. 대부분의 모형에서 공통적으로 선택된 변수들을 살펴보면 페어웨이의 단단함와 그린의 풀의 높이, 평균최대풍속이 높을수록 선수들의 평균스코어는 높아지며 반대로 한 번에 퍼팅을 성공시키는 횟수와 그린적중률 실패 후 버디나 이글로 점수를 만드는 scrambling 변수들, 그리고 공을 멀리 보낼 수 있는 능력을 나타내는 longest drive는 그 값이 높아짐에 따라 선수들의 평균스코어가 낮아지는 경향이 있음을 알 수 있었다. 11가지 모형 모두 테스트 데이터인 2015년 경기 결과를 예측하는데 낮은 오류율을 보였으나 배깅과 랜덤 포레스트의 예측률이 가장 좋았으며 두 모형 모두 상위 10명과 상위 25명의 랭킹을 예측할 때 상당히 높은 적중률을 보였다.

Predicting the Performance of Forecasting Strategies for Naval Spare Parts Demand: A Machine Learning Approach

  • Moon, Seongmin
    • Management Science and Financial Engineering
    • /
    • 제19권1호
    • /
    • pp.1-10
    • /
    • 2013
  • Hierarchical forecasting strategy does not always outperform direct forecasting strategy. The performance generally depends on demand features. This research guides the use of the alternative forecasting strategies according to demand features. This paper developed and evaluated various classification models such as logistic regression (LR), artificial neural networks (ANN), decision trees (DT), boosted trees (BT), and random forests (RF) for predicting the relative performance of the alternative forecasting strategies for the South Korean navy's spare parts demand which has non-normal characteristics. ANN minimized classification errors and inventory costs, whereas LR minimized the Brier scores and the sum of forecasting errors.

Applications of Machine Learning for Online Learning Systems towards Children with Speech Disorders

  • Jadi, Amr;Alzahrani, Ali
    • International Journal of Computer Science & Network Security
    • /
    • 제22권8호
    • /
    • pp.55-60
    • /
    • 2022
  • Specific Language Impairment is one of the serious disorders that interferes with spontaneous communication skills in children. Children suffering from this disorder may have reading, speaking, or listening impairments, and such type of disorders are also termed Autism Speech Disorder (ASD) in medical terminology. The aim of the article is to define specific language impairment in children and the problems it can cause. The different methods adopted by speech pathologists to diagnose language impairment. Finally implementing machine learning models to automate the process and help speech pathologists and pediatricians/ in diagnosing the specific language impairment.

Emerging Machine Learning in Wearable Healthcare Sensors

  • Gandha Satria Adi;Inkyu Park
    • 센서학회지
    • /
    • 제32권6호
    • /
    • pp.378-385
    • /
    • 2023
  • Human biosignals provide essential information for diagnosing diseases such as dementia and Parkinson's disease. Owing to the shortcomings of current clinical assessments, noninvasive solutions are required. Machine learning (ML) on wearable sensor data is a promising method for the real-time monitoring and early detection of abnormalities. ML facilitates disease identification, severity measurement, and remote rehabilitation by providing continuous feedback. In the context of wearable sensor technology, ML involves training on observed data for tasks such as classification and regression with applications in clinical metrics. Although supervised ML presents challenges in clinical settings, unsupervised learning, which focuses on tasks such as cluster identification and anomaly detection, has emerged as a useful alternative. This review examines and discusses a variety of ML algorithms such as Support Vector Machines (SVM), Random Forests (RF), Decision Trees (DT), Neural Networks (NN), and Deep Learning for the analysis of complex clinical data.

A Best Effort Classification Model For Sars-Cov-2 Carriers Using Random Forest

  • Mallick, Shrabani;Verma, Ashish Kumar;Kushwaha, Dharmender Singh
    • International Journal of Computer Science & Network Security
    • /
    • 제21권1호
    • /
    • pp.27-33
    • /
    • 2021
  • The whole world now is dealing with Coronavirus, and it has turned to be one of the most widespread and long-lived pandemics of our times. Reports reveal that the infectious disease has taken toll of the almost 80% of the world's population. Amidst a lot of research going on with regards to the prediction on growth and transmission through Symptomatic carriers of the virus, it can't be ignored that pre-symptomatic and asymptomatic carriers also play a crucial role in spreading the reach of the virus. Classification Algorithm has been widely used to classify different types of COVID-19 carriers ranging from simple feature-based classification to Convolutional Neural Networks (CNNs). This research paper aims to present a novel technique using a Random Forest Machine learning algorithm with hyper-parameter tuning to classify different types COVID-19-carriers such that these carriers can be accurately characterized and hence dealt timely to contain the spread of the virus. The main idea for selecting Random Forest is that it works on the powerful concept of "the wisdom of crowd" which produces ensemble prediction. The results are quite convincing and the model records an accuracy score of 99.72 %. The results have been compared with the same dataset being subjected to K-Nearest Neighbour, logistic regression, support vector machine (SVM), and Decision Tree algorithms where the accuracy score has been recorded as 78.58%, 70.11%, 70.385,99% respectively, thus establishing the concreteness and suitability of our approach.

Use of GIS to Develop a Multivariate Habitat Model for the Leopard Cat (Prionailurus bengalensis) in Mountainous Region of Korea

  • Rho, Paik-Ho
    • Journal of Ecology and Environment
    • /
    • 제32권4호
    • /
    • pp.229-236
    • /
    • 2009
  • A habitat model was developed to delineate potential habitat of the leopard cat (Prionailurus bengalensis) in a mountainous region of Kangwon Province, Korea. Between 1997 and 2005, 224 leopard cat presence sites were recorded in the province in the Nationwide Survey on Natural Environments. Fifty percent of the sites were used to develop a habitat model, and the remaining sites were used to test the model. Fourteen environmental variables related to topographic features, water resources, vegetation and human disturbance were quantified for 112 of the leopard cat presence sites and an equal number of randomly selected sites. Statistical analyses (e.g., t-tests, and Pearson correlation analysis) showed that elevation, ridges, plains, % water cover, distance to water source, vegetated area, deciduous forest, coniferous forest, and distance to paved road differed significantly (P < 0.01) between presence and random sites. Stepwise logistic regression was used to develop a habitat model. Landform type (e.g., ridges vs. plains) is the major topographic factor affecting leopard cat presence. The species also appears to prefer deciduous forests and areas far from paved roads. The habitat map derived from the model correctly classified 93.75% of data from an independent sample of leopard cat presence sites, and the map at a regional scale showed that the cat's habitats are highly fragmented. Protection and restoration of connectivity of critical habitats should be implemented to preserve the leopard cat in mountainous regions of Korea.

머신러닝을 이용한 미숙아의 재원일수 예측 융복합 연구 (Convergence study to predict length of stay in premature infants using machine learning)

  • 김촉환;강성홍
    • 디지털융복합연구
    • /
    • 제19권7호
    • /
    • pp.271-282
    • /
    • 2021
  • 본 연구는 미숙아의 재원일수 예측 모형을 머신러닝 기법을 통해 개발하기 위해 수행 되었다. 모형 개발을 위해 질병관리본부에서 수집한 퇴원손상심층조사 자료의 2011년부터 2016년까지 퇴원한 미숙아 6,149건을 이용하였다. 입원 초기 신경망 모형은 설명력(R2)이 0.75로 다른 모형에 비해 우수 하였다. 입원 초기 변수에 임상진단을 CCS(Clinical class ification software)로 변환하여 추가 투입한 모형은 큐비스트(Cubist) 모형의 설명력(R2)이 0.81로 랜덤 포레스트(Random Forests), 그라디언트 부스트(Gradient boost), 신경망(neural network), 벌점화 회귀(Penalty regression) 모형에 비해 성능이 우수 하였다. 본 연구는 전국단위 데이터를 이용한 미숙아의 재원일수 예측 모형을 머신러닝을 통해 제시하고 그 활용 가능성을 확인하였다. 하지만 임상정보, 부모정보 등 데이터의 한계로 향후 성능 향상을 위한 추가 연구가 필요하다.

머신러닝을 이용한 정부통계지표가 소매업 매출액에 미치는 예측 변인 탐색: 약국을 중심으로 (Exploring the Predictive Variables of Government Statistical Indicators on Retail sales Using Machine Learning: Focusing on Pharmacy)

  • 이광수
    • 인터넷정보학회논문지
    • /
    • 제23권3호
    • /
    • pp.125-135
    • /
    • 2022
  • 본 연구는 데이터, 네트워크, 인공지능을 기반으로 산업 생태계 조성을 위해 구축된 정부통계지표가 약국 매출액에 영향을 미치는지 머신러닝을 이용하여 변인을 탐색하고 약국 매출액 예측에 적합한 분석 기법을 제공하고자 한다. 이에, 본 연구는 28개 정부통계지표와 소매업종인 약국을 대상으로 2016년 1월부터 2021년 12월까지의 분석 데이터를 활용하여 머신러닝 기법인 랜덤 포레스트, XGBoost, LightGBM, CatBoost을 통해 예측 변인 및 성능을 탐색하였다. 분석결과 경기관련 지표인 경제심리지수, 경기동행지수순환변동치, 소비자심리지수는 약국 매출액에 영향을 미치는 중요한 변인으로 나타났고, 회귀성능은 지표 MAE, MSE, RMSE를 살펴본 결과 랜덤 포레스트가 XGBoost, LightGBM, CatBoost 보다 성능이 가장 우수하게 나타났다. 이에, 본 연구는 머신러닝 결과를 토대로 약국 매출액에 영향을 미치는 변인과 최적의 머신러닝 기법을 제시하였으며, 여러 시사점과 후속연구를 제안하였다.