• Title/Summary/Keyword: Ensemble Approach

Search Result 175, Processing Time 0.028 seconds

Detecting Fake Job Recruitment with a Machine Learning Approach (머신 러닝 접근 방식을 통한 가짜 채용 탐지)

  • Taghiyev Ilkin;Jae Heung Lee
    • Smart Media Journal
    • /
    • v.12 no.2
    • /
    • pp.36-41
    • /
    • 2023
  • With the advent of applicant tracking systems, online recruitment has become more popular, and recruitment fraud has become a serious problem. This research aims to develop a reliable model to detect recruitment fraud in online recruitment environments to reduce cost losses and enhance privacy. The main contribution of this paper is to provide an automated methodology that leverages insights gained from exploratory analysis of data to distinguish which job postings are fraudulent and which are legitimate. Using EMSCAD, a recruitment fraud dataset provided by Kaggle, we trained and evaluated various single-classifier and ensemble-classifier-based machine learning models, and found that the ensemble classifier, the random forest classifier, performed best with an accuracy of 98.67% and an F1 score of 0.81.

A Study on the Work-time Estimation for Block Erections Using Stacking Ensemble Learning (Stacking Ensemble Learning을 활용한 블록 탑재 시수 예측)

  • Kwon, Hyukcheon;Ruy, Wonsun
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.56 no.6
    • /
    • pp.488-496
    • /
    • 2019
  • The estimation of block erection work time at a dock is one of the important factors when establishing or managing the total shipbuilding schedule. In order to predict the work time, it is a natural approach that the existing block erection data would be used to solve the problem. Generally the work time per unit is the product of coefficient value, quantity, and product value. Previously, the work time per unit is determined statistically by unit load data. However, we estimate the work time per unit through work time coefficient value from series ships using machine learning. In machine learning, the outcome depends mainly on how the training data is organized. Therefore, in this study, we use 'Feature Engineering' to determine which one should be used as features, and to check their influence on the result. In order to get the coefficient value of each block, we try to solve this problem through the Ensemble learning methods which is actively used nowadays. Among the many techniques of Ensemble learning, the final model is constructed by Stacking Ensemble techniques, consisting of the existing Ensemble models (Decision Tree, Random Forest, Gradient Boost, Square Loss Gradient Boost, XG Boost), and the accuracy is maximized by selecting three candidates among all models. Finally, the results of this study are verified by the predicted total work time for one ship among the same series.

Forecasting Day-ahead Electricity Price Using a Hybrid Improved Approach

  • Hu, Jian-Ming;Wang, Jian-Zhou
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.6
    • /
    • pp.2166-2176
    • /
    • 2017
  • Electricity price prediction plays a crucial part in making the schedule and managing the risk to the competitive electricity market participants. However, it is a difficult and challenging task owing to the characteristics of the nonlinearity, non-stationarity and uncertainty of the price series. This study proposes a hybrid improved strategy which incorporates data preprocessor components and a forecasting engine component to enhance the forecasting accuracy of the electricity price. In the developed forecasting procedure, the Seasonal Adjustment (SA) method and the Ensemble Empirical Mode Decomposition (EEMD) technique are synthesized as the data preprocessing component; the Coupled Simulated Annealing (CSA) optimization method and the Least Square Support Vector Regression (LSSVR) algorithm construct the prediction engine. The proposed hybrid approach is verified with electricity price data sampled from the power market of New South Wales in Australia. The simulation outcome manifests that the proposed hybrid approach obtains the observable improvement in the forecasting accuracy compared with other approaches, which suggests that the proposed combinational approach occupies preferable predication ability and enough precision.

Forecasting Sow's Productivity using the Machine Learning Models (머신러닝을 활용한 모돈의 생산성 예측모델)

  • Lee, Min-Soo;Choe, Young-Chan
    • Journal of Agricultural Extension & Community Development
    • /
    • v.16 no.4
    • /
    • pp.939-965
    • /
    • 2009
  • The Machine Learning has been identified as a promising approach to knowledge-based system development. This study aims to examine the ability of machine learning techniques for farmer's decision making and to develop the reference model for using pig farm data. We compared five machine learning techniques: logistic regression, decision tree, artificial neural network, k-nearest neighbor, and ensemble. All models are well performed to predict the sow's productivity in all parity, showing over 87.6% predictability. The model predictability of total litter size are highest at 91.3% in third parity and decreasing as parity increases. The ensemble is well performed to predict the sow's productivity. The neural network and logistic regression is excellent classifier for all parity. The decision tree and the k-nearest neighbor was not good classifier for all parity. Performance of models varies over models used, showing up to 104% difference in lift values. Artificial Neural network and ensemble models have resulted in highest lift values implying best performance among models.

  • PDF

An Ensemble Classifier using Two Dimensional LDA

  • Park, Cheong-Hee
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.6
    • /
    • pp.817-824
    • /
    • 2010
  • Linear Discriminant Analysis (LDA) has been successfully applied for dimension reduction in face recognition. However, LDA requires the transformation of a face image to a one-dimensional vector and this process can cause the correlation information among neighboring pixels to be disregarded. On the other hand, 2D-LDA uses 2D images directly without a transformation process and it has been shown to be superior to the traditional LDA. Nevertheless, there are some problems in 2D-LDA. First, it is difficult to determine the optimal number of feature vectors in a reduced dimensional space. Second, the size of rectangular windows used in 2D-LDA makes strong impacts on classification accuracies but there is no reliable way to determine an optimal window size. In this paper, we propose a new algorithm to overcome those problems in 2D-LDA. We adopt an ensemble approach which combines several classifiers obtained by utilizing various window sizes. And a practical method to determine the number of feature vectors is also presented. Experimental results demonstrate that the proposed method can overcome the difficulties with choosing an optimal window size and the number of feature vectors.

Accounting for Uncertainty Propagation: Streamflow Forecasting using Multiple Climate and Hydrological Models

  • Kwon, Hyun-Han;Moon, Young-Il;Park, Se-Hoon;Oh, Tae-Suck
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2008.05a
    • /
    • pp.1388-1392
    • /
    • 2008
  • Water resources management depends on dealing inherent uncertainties stemming from climatic and hydrological inputs and models. Dealing with these uncertainties remains a challenge. Streamflow forecasts basically contain uncertainties arising from model structure and initial conditions. Recent enhancements in climate forecasting skill and hydrological modeling provide an breakthrough for delivering improved streamflow forecasts. However, little consideration has been given to methodologies that include coupling both multiple climate and multiple hydrological models, increasing the pool of streamflow forecast ensemble members and accounting for cumulative sources of uncertainty. The approach here proposes integration and coupling of global climate models (GCM), multiple regional climate models, and numerous hydrological models to improve streamflow forecasting and characterize system uncertainty through generation of ensemble forecasts.

  • PDF

Anisotropic absorption of CdSe/ZnS quantum rods embedded in polymer film

  • Mukhina, Maria V.;Maslov, Vladimir G.;Baranov, Alexander V.;Artemyev, Mikhail V.;Fedorov, Anatoly V.
    • Advances in nano research
    • /
    • v.1 no.3
    • /
    • pp.153-158
    • /
    • 2013
  • An approach to achieving of spatially homogeneous, ordered ensemble of semiconductor quantum rods in polymer film of polyvinyl butyral is reported. The CdSe/ZnS quantum rods are embedded to the polymer film. Obtained film is stretched up to four times to its initial length. A concentration of quantum rods in the samples is around $2{\times}10^{-5}$ M. The absorption spectra, obtained in the light with orthogonal polarization, confirm the occurrence of spatial ordering in a quantum rod ensemble. Anisotropy of the optical properties in the ordered quantum rod ensemble is examined. The presented method can be used as a low-cost solution for preparing the nanostructured materials with anisotropic properties and high concentration of nanocrystals.

A Forecasting System for KOSPI 200 Option Trading using Artificial Neural Network Ensemble (인공신경망 앙상블을 이용한 옵션 투자예측 시스템)

  • 이재식;송영균;허성회
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.11a
    • /
    • pp.489-497
    • /
    • 2000
  • After IMF situation, the money market environment is changing rapidly. Therefore, many companies including financial institutions and many individual investors are concerned about forecasting the money market, and they make an effort to insure the various profit and hedge methods using derivatives like option, futures and swap. In this research, we developed a prototype of forecasting system for KOSPI 200 option, especially call option, trading using artificial neural networks(ANN), To avoid the overfitting problem and the problem involved int the choice of ANN structure and parameters, we employed the ANN ensemble approach. We conducted two types of simulation. One is conducted with the hold signals taken into account, and the other is conducted without hold signals. Even though our models show low accuracy for the sample set extracted from the data collected in the early stage of IMF situation, they perform better in terms of profit and stability than the model that uses only the theoretical price.

  • PDF

An Ensemble Model for Credit Default Discrimination: Incorporating BERT-based NLP and Transformer

  • Sophot Ky;Ju-Hong Lee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.624-626
    • /
    • 2023
  • Credit scoring is a technique used by financial institutions to assess the creditworthiness of potential borrowers. This involves evaluating a borrower's credit history to predict the likelihood of defaulting on a loan. This paper presents an ensemble of two Transformer based models within a framework for discriminating the default risk of loan applications in the field of credit scoring. The first model is FinBERT, a pretrained NLP model to analyze sentiment of financial text. The second model is FT-Transformer, a simple adaptation of the Transformer architecture for the tabular domain. Both models are trained on the same underlying data set, with the only difference being the representation of the data. This multi-modal approach allows us to leverage the unique capabilities of each model and potentially uncover insights that may not be apparent when using a single model alone. We compare our model with two famous ensemble-based models, Random Forest and Extreme Gradient Boosting.

Incorporating BERT-based NLP and Transformer for An Ensemble Model and its Application to Personal Credit Prediction

  • Sophot Ky;Ju-Hong Lee;Kwangtek Na
    • Smart Media Journal
    • /
    • v.13 no.4
    • /
    • pp.9-15
    • /
    • 2024
  • Tree-based algorithms have been the dominant methods used build a prediction model for tabular data. This also includes personal credit data. However, they are limited to compatibility with categorical and numerical data only, and also do not capture information of the relationship between other features. In this work, we proposed an ensemble model using the Transformer architecture that includes text features and harness the self-attention mechanism to tackle the feature relationships limitation. We describe a text formatter module, that converts the original tabular data into sentence data that is fed into FinBERT along with other text features. Furthermore, we employed FT-Transformer that train with the original tabular data. We evaluate this multi-modal approach with two popular tree-based algorithms known as, Random Forest and Extreme Gradient Boosting, XGBoost and TabTransformer. Our proposed method shows superior Default Recall, F1 score and AUC results across two public data sets. Our results are significant for financial institutions to reduce the risk of financial loss regarding defaulters.