• Title/Summary/Keyword: Random forests

Search Result 113, Processing Time 0.028 seconds

Developing a regional fog prediction model using tree-based machine-learning techniques and automated visibility observations (시정계 자료와 기계학습 기법을 이용한 지역 안개예측 모형 개발)

  • Kim, Daeha
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.12
    • /
    • pp.1255-1263
    • /
    • 2021
  • While it could become an alternative water resource, fog could undermine traffic safety and operational performance of infrastructures. To reduce such adverse impacts, it is necessary to have spatially continuous fog risk information. In this work, tree-based machine-learning models were developed in order to quantify fog risks with routine meteorological observations alone. The Extreme Gradient Boosting (XGB), Light Gradient Boosting (LGB), and Random Forests (RF) were chosen for the regional fog models using operational weather and visibility observations within the Jeollabuk-do province. Results showed that RF seemed to show the most robust performance to categorize between fog and non-fog situations during the training and evaluation period of 2017-2019. While the LGB performed better than in predicting fog occurrences than the others, its false alarm ratio was the highest (0.695) among the three models. The predictability of the three models considerably declined when applying them for an independent period of 2020, potentially due to the distinctively enhanced air quality in the year under the global lockdown. Nonetheless, even in 2020, the three models were all able to produce fog risk information consistent with the spatial variation of observed fog occurrences. This work suggests that the tree-based machine learning models could be used as tools to find locations with relatively high fog risks.

Analysis of facial expression recognition (표정 분류 연구)

  • Son, Nayeong;Cho, Hyunsun;Lee, Sohyun;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.5
    • /
    • pp.539-554
    • /
    • 2018
  • Effective interaction between user and device is considered an important ability of IoT devices. For some applications, it is necessary to recognize human facial expressions in real time and make accurate judgments in order to respond to situations correctly. Therefore, many researches on facial image analysis have been preceded in order to construct a more accurate and faster recognition system. In this study, we constructed an automatic recognition system for facial expressions through two steps - a facial recognition step and a classification step. We compared various models with different sets of data with pixel information, landmark coordinates, Euclidean distances among landmark points, and arctangent angles. We found a fast and efficient prediction model with only 30 principal components of face landmark information. We applied several prediction models, that included linear discriminant analysis (LDA), random forests, support vector machine (SVM), and bagging; consequently, an SVM model gives the best result. The LDA model gives the second best prediction accuracy but it can fit and predict data faster than SVM and other methods. Finally, we compared our method to Microsoft Azure Emotion API and Convolution Neural Network (CNN). Our method gives a very competitive result.

Prediction of Non-Genotoxic Carcinogenicity Based on Genetic Profiles of Short Term Exposure Assays

  • Perez, Luis Orlando;Gonzalez-Jose, Rolando;Garcia, Pilar Peral
    • Toxicological Research
    • /
    • v.32 no.4
    • /
    • pp.289-300
    • /
    • 2016
  • Non-genotoxic carcinogens are substances that induce tumorigenesis by non-mutagenic mechanisms and long term rodent bioassays are required to identify them. Recent studies have shown that transcription profiling can be applied to develop early identifiers for long term phenotypes. In this study, we used rat liver expression profiles from the NTP (National Toxicology Program, Research Triangle Park, USA) DrugMatrix Database to construct a gene classifier that can distinguish between non-genotoxic carcinogens and other chemicals. The model was based on short term exposure assays (3 days) and the training was limited to oxidative stressors, peroxisome proliferators and hormone modulators. Validation of the predictor was performed on independent toxicogenomic data (TG-GATEs, Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System, Osaka, Japan). To build our model we performed Random Forests together with a recursive elimination algorithm (VarSelRF). Gene set enrichment analysis was employed for functional interpretation. A total of 770 microarrays comprising 96 different compounds were analyzed and a predictor of 54 genes was built. Prediction accuracy was 0.85 in the training set, 0.87 in the test set and increased with increasing concentration in the validation set: 0.6 at low dose, 0.7 at medium doses and 0.81 at high doses. Pathway analysis revealed gene prominence of cellular respiration, energy production and lipoprotein metabolism. The biggest target of toxicogenomics is accurately predict the toxicity of unknown drugs. In this analysis, we presented a classifier that can predict non-genotoxic carcinogenicity by using short term exposure assays. In this approach, dose level is critical when evaluating chemicals at early time points.

An Analysis of Non-linear Effects of Impact Factors on Housing Price (주택매매가격 영향요인의 비선형적 효과 분석)

  • Chang, Youngjae
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2953-2966
    • /
    • 2018
  • Housing prices are closely related to various variables that indicate macroeconomic conditions. In this paper, empirical analysis based on data is performed referring to previous studies. Focusing on the policy interest rate among the factors affecting the housing price, the non-linear impulse responses of other variables to the interest rate shock are analyzed. Using the random forest algorithm, the variable importance scores of the macroeconomic variables presented in the previous studies are calculated. After selecting the variables through this process, the impulse responses are calculated using a model that can capture non-linearity. According to the model, the responses of housing prices to the policy rate is only significant when the rate is raised. Especially, the impulse response is amplified when the shock increases due to the non-linear characteristics that can not be captured by the traditional VAR methodology. The analysis results suggest that the interest rate as a policy instrument should be approached from a more cautious perspective.

Convergence study to predict length of stay in premature infants using machine learning (머신러닝을 이용한 미숙아의 재원일수 예측 융복합 연구)

  • Kim, Cheok-Hwan;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.19 no.7
    • /
    • pp.271-282
    • /
    • 2021
  • This study was conducted to develop a model for predicting the length of stay for premature infants through machine learning. For the development of this model, 6,149 cases of premature infants discharged from the hospital from 2011 to 2016 of the discharge injury in-depth survey data collected by the Korea Centers for Disease Control and Prevention were used. The neural network model of the initial hospitalization was superior to other models with an explanatory power (R2) of 0.75. In the model added by converting the clinical diagnosis to CCS(Clinical class ification software), the explanatory power (R2) of the cubist model was 0.81, which was superior to the random forest, gradient boost, neural network, and penalty regression models. In this study, using national data, a model for predicting the length of stay for premature infants was presented through machine learning and its applicability was confirmed. However, due to the lack of clinical information and parental information, additional research is needed to improve future performance.

Financial Fraud Detection using Data Mining: A Survey

  • Sudhansu Ranjan Lenka;Bikram Kesari Ratha
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.9
    • /
    • pp.169-185
    • /
    • 2024
  • Due to levitate and rapid growth of E-Commerce, most of the organizations are moving towards cashless transaction Unfortunately, the cashless transactions are not only used by legitimate users but also it is used by illegitimate users and which results in trouncing of billions of dollars each year worldwide. Fraud prevention and Fraud Detection are two methods used by the financial institutions to protect against these frauds. Fraud prevention systems (FPSs) are not sufficient enough to provide fully security to the E-Commerce systems. However, with the combined effect of Fraud Detection Systems (FDS) and FPS might protect the frauds. However, there still exist so many issues and challenges that degrade the performances of FDSs, such as overlapping of data, noisy data, misclassification of data, etc. This paper presents a comprehensive survey on financial fraud detection system using such data mining techniques. Over seventy research papers have been reviewed, mainly within the period 2002-2015, were analyzed in this study. The data mining approaches employed in this research includes Neural Network, Logistic Regression, Bayesian Belief Network, Support Vector Machine (SVM), Self Organizing Map(SOM), K-Nearest Neighbor(K-NN), Random Forest and Genetic Algorithm. The algorithms that have achieved high success rate in detecting credit card fraud are Logistic Regression (99.2%), SVM (99.6%) and Random Forests (99.6%). But, the most suitable approach is SOM because it has achieved perfect accuracy of 100%. But the algorithms implemented for financial statement fraud have shown a large difference in accuracy from CDA at 71.4% to a probabilistic neural network with 98.1%. In this paper, we have identified the research gap and specified the performance achieved by different algorithms based on parameters like, accuracy, sensitivity and specificity. Some of the key issues and challenges associated with the FDS have also been identified.

Predicting Soccer Players' Wage Grades Using Big Data and Artificial Intelligence (빅데이터 및 인공지능을 활용한 축구선수 연봉등급 예측)

  • Hyeon-Seong Jeong;Jin-hwa Kim;Dae-Won Hyun
    • Journal of Industrial Convergence
    • /
    • v.22 no.8
    • /
    • pp.19-28
    • /
    • 2024
  • This study proposes a new method for predicting the wage grades of soccer players using big data and artificial intelligence. Predicting the salaries of soccer players is a crucial task that involves accurately assessing players' performance and potential, and reflecting this in their salaries to enhance the economic efficiency of the soccer industry. This research analyzes player ability data provided by FIFA 22 and employs various big data and artificial intelligence techniques to predict players' salary grades. Key methodologies used include decision trees, artificial neural networks, random forests, and boosting, which were utilized to compare the accuracy of the salary prediction models. The results show that the random forest and boosting methods exhibited the highest prediction accuracy. This study demonstrates the process and utility of using big data and artificial intelligence technologies to predict soccer players' salary grades, offering a new perspective on the soccer industry.

Scattering Characteristic from Building Walls with Periodic and Random Surface (규칙적 또는 불규칙적 구조를 가지는 빌딩벽면에서의 전자파 산란 특성)

  • 윤광렬
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.15 no.4
    • /
    • pp.428-435
    • /
    • 2004
  • With the rapid and wide-spread use of cellular telephones much attention has been focussed on propagation in the urban area crowed with buildings and houses. It is often surrounded by hills, forests, and mountains. The importance of surface scattering intereference between transmitters and receivers on the rough surfaces has been interested and investigated. Therefore, a prediction method is necessary to estimate the influence of rough surfaces on microwave radio propagation. Moreover, most of the mobile communications are performed based on the digital communication system rather than the analog one. In this case, we must pay more careful attention to the signal delay caused by the phase delay due to the multi-path propagation. In this paper we have analyzed numerically scattering of electromagnetic waves from building walls by using FVTD(Finite Volume Time Domain) method. We consider three different types of rough surfaces such as periodic, random, and composite structures. We calculate the bistatic normalized radar cross section (NRCS) for horizontal and vertical polarization, and we take account of the conventional optical reflection which corresponds to the n-th Bragg reflection for periodic structures. In addition, we investigated what conditions are needed in order to be able to ignore the higher order Bragg reflection for the periodic structures.

Use of GIS to Develop a Multivariate Habitat Model for the Leopard Cat (Prionailurus bengalensis) in Mountainous Region of Korea

  • Rho, Paik-Ho
    • Journal of Ecology and Environment
    • /
    • v.32 no.4
    • /
    • pp.229-236
    • /
    • 2009
  • A habitat model was developed to delineate potential habitat of the leopard cat (Prionailurus bengalensis) in a mountainous region of Kangwon Province, Korea. Between 1997 and 2005, 224 leopard cat presence sites were recorded in the province in the Nationwide Survey on Natural Environments. Fifty percent of the sites were used to develop a habitat model, and the remaining sites were used to test the model. Fourteen environmental variables related to topographic features, water resources, vegetation and human disturbance were quantified for 112 of the leopard cat presence sites and an equal number of randomly selected sites. Statistical analyses (e.g., t-tests, and Pearson correlation analysis) showed that elevation, ridges, plains, % water cover, distance to water source, vegetated area, deciduous forest, coniferous forest, and distance to paved road differed significantly (P < 0.01) between presence and random sites. Stepwise logistic regression was used to develop a habitat model. Landform type (e.g., ridges vs. plains) is the major topographic factor affecting leopard cat presence. The species also appears to prefer deciduous forests and areas far from paved roads. The habitat map derived from the model correctly classified 93.75% of data from an independent sample of leopard cat presence sites, and the map at a regional scale showed that the cat's habitats are highly fragmented. Protection and restoration of connectivity of critical habitats should be implemented to preserve the leopard cat in mountainous regions of Korea.

Design of comprehensive mechanical properties by machine learning and high-throughput optimization algorithm in RAFM steels

  • Wang, Chenchong;Shen, Chunguang;Huo, Xiaojie;Zhang, Chi;Xu, Wei
    • Nuclear Engineering and Technology
    • /
    • v.52 no.5
    • /
    • pp.1008-1012
    • /
    • 2020
  • In order to make reasonable design for the improvement of comprehensive mechanical properties of RAFM steels, the design system with both machine learning and high-throughput optimization algorithm was established. As the basis of the design system, a dataset of RAFM steels was compiled from previous literatures. Then, feature engineering guided random forests regressors were trained by the dataset and NSGA II algorithm were used for the selection of the optimal solutions from the large-scale solution set with nine composition features and two treatment processing features. The selected optimal solutions by this design system showed prospective mechanical properties, which was also consistent with the physical metallurgy theory. This efficiency design mode could give the enlightenment for the design of other metal structural materials with the requirement of multi-properties.