• Title/Summary/Keyword: RandomForest

Search Result 1,033, Processing Time 0.024 seconds

Taxation Analysis Using Machine Learning (머신러닝을 이용한 세금 계정과목 분류)

  • Choi, Dong-Bin;Jo, In-su;Park, Yong B.
    • Journal of the Semiconductor & Display Technology
    • /
    • v.18 no.2
    • /
    • pp.73-77
    • /
    • 2019
  • Data mining techniques can also be used to increase the efficiency of production in the tax sector, which requires professional skills. As tax-related computerization was carried out, large amounts of data were accumulated, creating a good environment for data mining. In this paper, we have developed a system that can help tax accountant who have existing professional abilities by using data mining techniques on accumulated tax related data. The data mining technique used is random forest and improved by using f1-score. Using the implemented system, data accumulated over two years was learned, showing high accuracy at prediction.

Development of Galaxy Image Classification Based on Hand-crafted Features and Machine Learning (Hand-crafted 특징 및 머신 러닝 기반의 은하 이미지 분류 기법 개발)

  • Oh, Yoonju;Jung, Heechul
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.16 no.1
    • /
    • pp.17-27
    • /
    • 2021
  • In this paper, we develop a galaxy image classification method based on hand-crafted features and machine learning techniques. Additionally, we provide an empirical analysis to reveal which combination of the techniques is effective for galaxy image classification. To achieve this, we developed a framework which consists of four modules such as preprocessing, feature extraction, feature post-processing, and classification. Finally, we found that the best technique for galaxy image classification is a method to use a median filter, ORB vector features and a voting classifier based on RBF SVM, random forest and logistic regression. The final method is efficient so we believe that it is applicable to embedded environments.

Prediction of Energy Harvesting Efficiency of an Inverted Flag Using Machine Learning Algorithms (머신 러닝 알고리즘을 이용한 역방향 깃발의 에너지 하베스팅 효율 예측)

  • Lim, Sehwan;Park, Sung Goon
    • Journal of the Korean Society of Visualization
    • /
    • v.19 no.3
    • /
    • pp.31-38
    • /
    • 2021
  • The energy harvesting system using an inverted flag is analyzed by using an immersed boundary method to consider the fluid and solid interaction. The inverted flag flutters at a lower critical velocity than a conventional flag. A fluttering motion is classified into straight, symmetric, asymmetric, biased, and over flapping modes. The optimal energy harvesting efficiency is observed at the biased flapping mode. Using the three different machine learning algorithms, i.e., artificial neural network, random forest, support vector regression, the energy harvesting efficiency is predicted by taking bending rigidity, inclination angle, and flapping frequency as input variables. The R2 value of the artificial neural network and random forest algorithms is observed to be more than 0.9.

Development of the Machine Learning-based Employment Prediction Model for Internship Applicants (인턴십 지원자를 위한 기계학습기반 취업예측 모델 개발)

  • Kim, Hyun Soo;Kim, Sunho;Kim, Do Hyun
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.2
    • /
    • pp.138-143
    • /
    • 2022
  • The employment prediction model proposed in this paper uses 16 independent variables, including self-introductions of M University students who applied for IPP and work-study internship, and 3 dependent variable data such as large companies, mid-sized companies, and unemployment. The employment prediction model for large companies was developed using Random Forest and Word2Vec with the result of F1_Weighted 82.4%. The employment prediction model for medium-sized companies and above was developed using Logistic Regression and Word2Vec with the result of F1_Weighted 73.24%. These two models can be actively used in predicting employment in large and medium-sized companies for M University students in the future.

A Study On User Skin Color-Based Foundation Color Recommendation Method Using Deep Learning (딥러닝을 이용한 사용자 피부색 기반 파운데이션 색상 추천 기법 연구)

  • Jeong, Minuk;Kim, Hyeonji;Gwak, Chaewon;Oh, Yoosoo
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.9
    • /
    • pp.1367-1374
    • /
    • 2022
  • In this paper, we propose an automatic cosmetic foundation recommendation system that suggests a good foundation product based on the user's skin color. The proposed system receives and preprocesses user images and detects skin color with OpenCV and machine learning algorithms. The system then compares the performance of the training model using XGBoost, Gradient Boost, Random Forest, and Adaptive Boost (AdaBoost), based on 550 datasets collected as essential bestsellers in the United States. Based on the comparison results, this paper implements a recommendation system using the highest performing machine learning model. As a result of the experiment, our system can effectively recommend a suitable skin color foundation. Thus, our system model is 98% accurate. Furthermore, our system can reduce the selection trials of foundations against the user's skin color. It can also save time in selecting foundations.

Machine learning-based regression analysis for estimating Cerchar abrasivity index

  • Kwak, No-Sang;Ko, Tae Young
    • Geomechanics and Engineering
    • /
    • v.29 no.3
    • /
    • pp.219-228
    • /
    • 2022
  • The most widely used parameter to represent rock abrasiveness is the Cerchar abrasivity index (CAI). The CAI value can be applied to predict wear in TBM cutters. It has been extensively demonstrated that the CAI is affected significantly by cementation degree, strength, and amount of abrasive minerals, i.e., the quartz content or equivalent quartz content in rocks. The relationship between the properties of rocks and the CAI is investigated in this study. A database comprising 223 observations that includes rock types, uniaxial compressive strengths, Brazilian tensile strengths, equivalent quartz contents, quartz contents, brittleness indices, and CAIs is constructed. A linear model is developed by selecting independent variables while considering multicollinearity after performing multiple regression analyses. Machine learning-based regression methods including support vector regression, regression tree regression, k-nearest neighbors regression, random forest regression, and artificial neural network regression are used in addition to multiple linear regression. The results of the random forest regression model show that it yields the best prediction performance.

An Exploratory Study on the Usage Patterns of Software-based Design Tools in Designers' Ideation and Collaboration Activities

  • Kim, Dongwook;Kim, Sungbum
    • International Journal of Contents
    • /
    • v.17 no.4
    • /
    • pp.16-34
    • /
    • 2021
  • The purpose of this study was to explore how designers use software-based design tools for ideation and collaboration (for two cases: with designers and with developers). We conducted logistic regression analysis and random forest analysis. Software-based design tools are more popular among product designers and affiliated with design organizations with 51 to 100 members. We identify the features that influence designers to use design tools for the ideation and collaboration, and how these usage patterns are interrelated. Interrelated usage pattern is a key consideration for location of the menu and convenience of use. The results imply that reinforcement of the design tool features per designer profile is required and that design management should be consistent with the field of design and the nature of the organization.

Phishing Email Detection Using Machine Learning Techniques

  • Alammar, Meaad;Badawi, Maria Altaib
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.277-283
    • /
    • 2022
  • Email phishing has become very prevalent especially now that most of our dealings have become technical. The victim receives a message that looks as if it was sent from a known party and the attack is carried out through a fake cookie that includes a phishing program or through links connected to fake websites, in both cases the goal is to install malicious software on the user's device or direct him to a fake website. Today it is difficult to deploy robust cybersecurity solutions without relying heavily on machine learning algorithms. This research seeks to detect phishing emails using high-accuracy machine learning techniques. using the WEKA tool with data preprocessing we create a proposed methodology to detect emails phishing. outperformed random forest algorithm on Naïve Bayes algorithms by accuracy of 99.03 %.

A Hybrid Learning Model to Detect Morphed Images

  • Kumari, Noble;Mohapatra, AK
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.6
    • /
    • pp.364-373
    • /
    • 2022
  • Image morphing methods make seamless transition changes in the image and mask the meaningful information attached to it. This can be detected by traditional machine learning algorithms and new emerging deep learning algorithms. In this research work, scope of different Hybrid learning approaches having combination of Deep learning and Machine learning are being analyzed with the public dataset CASIA V1.0, CASIA V2.0 and DVMM to find the most efficient algorithm. The simulated results with CNN (Convolution Neural Network), Hybrid approach of CNN along with SVM (Support Vector Machine) and Hybrid approach of CNN along with Random Forest algorithm produced 96.92 %, 95.98 and 99.18 % accuracy respectively with the CASIA V2.0 dataset having 9555 images. The accuracy pattern of applied algorithms changes with CASIA V1.0 data and DVMM data having 1721 and 1845 set of images presenting minimal accuracy with Hybrid approach of CNN and Random Forest algorithm. It is confirmed that the choice of best algorithm to find image forgery depends on input data type. This paper presents the combination of best suited algorithm to detect image morphing with different input datasets.

Comparison of tree-based ensemble models for regression

  • Park, Sangho;Kim, Chanmin
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.5
    • /
    • pp.561-589
    • /
    • 2022
  • When multiple classifications and regression trees are combined, tree-based ensemble models, such as random forest (RF) and Bayesian additive regression trees (BART), are produced. We compare the model structures and performances of various ensemble models for regression settings in this study. RF learns bootstrapped samples and selects a splitting variable from predictors gathered at each node. The BART model is specified as the sum of trees and is calculated using the Bayesian backfitting algorithm. Throughout the extensive simulation studies, the strengths and drawbacks of the two methods in the presence of missing data, high-dimensional data, or highly correlated data are investigated. In the presence of missing data, BART performs well in general, whereas RF provides adequate coverage. The BART outperforms in high dimensional, highly correlated data. However, in all of the scenarios considered, the RF has a shorter computation time. The performance of the two methods is also compared using two real data sets that represent the aforementioned situations, and the same conclusion is reached.