• Title/Summary/Keyword: Ensemble Techniques

Search Result 177, Processing Time 0.026 seconds

Predicting Administrative Issue Designation in KOSDAQ Market Using Machine Learning Techniques (머신러닝을 활용한 코스닥 관리종목지정 예측)

  • Chae, Seung-Il;Lee, Dong-Joo
    • Asia-Pacific Journal of Business
    • /
    • v.13 no.2
    • /
    • pp.107-122
    • /
    • 2022
  • Purpose - This study aims to develop machine learning models to predict administrative issue designation in KOSDAQ Market using financial data. Design/methodology/approach - Employing four classification techniques including logistic regression, support vector machine, random forest, and gradient boosting to a matched sample of five hundred and thirty-six firms over an eight-year period, the authors develop prediction models and explore the practicality of the models. Findings - The resulting four binary selection models reveal overall satisfactory classification performance in terms of various measures including AUC (area under the receiver operating characteristic curve), accuracy, F1-score, and top quartile lift, while the ensemble models (random forest and gradienct boosting) outperform the others in terms of most measures. Research implications or Originality - Although the assessment of administrative issue potential of firms is critical information to investors and financial institutions, detailed empirical investigation has lagged behind. The current research fills this gap in the literature by proposing parsimonious prediction models based on a few financial variables and validating the applicability of the models.

An Exploratory Study on the Prediction of Business Survey Index Using Data Mining (기업경기실사지수 예측에 대한 탐색적 연구: 데이터 마이닝을 이용하여)

  • Kyungbo Park;Mi Ryang Kim
    • Journal of Information Technology Services
    • /
    • v.22 no.4
    • /
    • pp.123-140
    • /
    • 2023
  • In recent times, the global economy has been subject to increasing volatility, which has made it considerably more difficult to accurately predict economic indicators compared to previous periods. In response to this challenge, the present study conducts an exploratory investigation that aims to predict the Business Survey Index (BSI) by leveraging data mining techniques on both structured and unstructured data sources. For the structured data, we have collected information regarding foreign, domestic, and industrial conditions, while the unstructured data consists of content extracted from newspaper articles. By employing an extensive set of 44 distinct data mining techniques, our research strives to enhance the BSI prediction accuracy and provide valuable insights. The results of our analysis demonstrate that the highest predictive power was attained when using data exclusively from the t-1 period. Interestingly, this suggests that previous timeframes play a vital role in forecasting the BSI effectively. The findings of this study hold significant implications for economic decision-makers, as they will not only facilitate better-informed decisions but also serve as a robust foundation for predicting a wide range of other economic indicators. By improving the prediction of crucial economic metrics, this study ultimately aims to contribute to the overall efficacy of economic policy-making and decision processes.

Development of Prediction Model of Chloride Diffusion Coefficient using Machine Learning (기계학습을 이용한 염화물 확산계수 예측모델 개발)

  • Kim, Hyun-Su
    • Journal of Korean Association for Spatial Structures
    • /
    • v.23 no.3
    • /
    • pp.87-94
    • /
    • 2023
  • Chloride is one of the most common threats to reinforced concrete (RC) durability. Alkaline environment of concrete makes a passive layer on the surface of reinforcement bars that prevents the bar from corrosion. However, when the chloride concentration amount at the reinforcement bar reaches a certain level, deterioration of the passive protection layer occurs, causing corrosion and ultimately reducing the structure's safety and durability. Therefore, understanding the chloride diffusion and its prediction are important to evaluate the safety and durability of RC structure. In this study, the chloride diffusion coefficient is predicted by machine learning techniques. Various machine learning techniques such as multiple linear regression, decision tree, random forest, support vector machine, artificial neural networks, extreme gradient boosting annd k-nearest neighbor were used and accuracy of there models were compared. In order to evaluate the accuracy, root mean square error (RMSE), mean square error (MSE), mean absolute error (MAE) and coefficient of determination (R2) were used as prediction performance indices. The k-fold cross-validation procedure was used to estimate the performance of machine learning models when making predictions on data not used during training. Grid search was applied to hyperparameter optimization. It has been shown from numerical simulation that ensemble learning methods such as random forest and extreme gradient boosting successfully predicted the chloride diffusion coefficient and artificial neural networks also provided accurate result.

Machine learning application to seismic site classification prediction model using Horizontal-to-Vertical Spectral Ratio (HVSR) of strong-ground motions

  • Francis G. Phi;Bumsu Cho;Jungeun Kim;Hyungik Cho;Yun Wook Choo;Dookie Kim;Inhi Kim
    • Geomechanics and Engineering
    • /
    • v.37 no.6
    • /
    • pp.539-554
    • /
    • 2024
  • This study explores development of prediction model for seismic site classification through the integration of machine learning techniques with horizontal-to-vertical spectral ratio (HVSR) methodologies. To improve model accuracy, the research employs outlier detection methods and, synthetic minority over-sampling technique (SMOTE) for data balance, and evaluates using seven machine learning models using seismic data from KiK-net. Notably, light gradient boosting method (LGBM), gradient boosting, and decision tree models exhibit improved performance when coupled with SMOTE, while Multiple linear regression (MLR) and Support vector machine (SVM) models show reduced efficacy. Outlier detection techniques significantly enhance accuracy, particularly for LGBM, gradient boosting, and voting boosting. The ensemble of LGBM with the isolation forest and SMOTE achieves the highest accuracy of 0.91, with LGBM and local outlier factor yielding the highest F1-score of 0.79. Consistently outperforming other models, LGBM proves most efficient for seismic site classification when supported by appropriate preprocessing procedures. These findings show the significance of outlier detection and data balancing for precise seismic soil classification prediction, offering insights and highlighting the potential of machine learning in optimizing site classification accuracy.

Comparison of Data Assimilation Methods in a Regional Ocean Circulation Model for the Yellow and East China Seas (자료동화 기법에 따른 황·동중국해 지역 해양순환모델 결과 비교)

  • Lee, Joon-Ho;Moon, Jae-Hong;Choi, Youngjin
    • Ocean and Polar Research
    • /
    • v.42 no.3
    • /
    • pp.179-194
    • /
    • 2020
  • The present study aims to evaluate the effects of satellite-based SST (OSTIA) assimilation on a regional ocean circulation model for the Yellow and East China Seas (YECS), using three different assimilation methods: the Ensemble Optimal Interpolation (EnOI), Ensemble Kalman Filter (EnKF), and 4-Dimensional Variational (4DVAR) techniques, which are widely used in the ocean modeling communities. The model experiments show that an improved initial condition by assimilating the SST affects the seasonal water temperature and water mass distributions of the YECS. In particular, the SST data assimilation influences the temperature structures horizontally and vertically in winter, thereby improving the behavior of the YS warm current water. This is due to the fact that during wintertime the water column is well mixed, which is directly updated by the SST assimilation. The model comparisons indicate that the SST assimilation can improve the model performance in resolving the subsurface structures in wintertime, but has a relatively small impact in summertime due to the strong stratification. The differences among the different assimilation experiments are obvious when the SST was sharply changed due to a typhoon passage. Overall, the EnKF and 4DVAR show better agreement with the observations than the EnOI. The relatively low performance of EnOI under storm conditions may be related with a limitation of EnOI method whereby an analysis is obtained from a number of climatological fields, and thus the typhoon-induced SST changes in short-time scales may not be adequately reflected in the data assimilation.

Design of a Miniature Wideband H-shaped Microstrip Antenna for WLAN (WLAN용 소형 광대역 H-모양 마이크로스트립 안테나)

  • 이진우;이종철;윤서용;이문수
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.41 no.3
    • /
    • pp.15-20
    • /
    • 2004
  • In this paper, a wideband two-layer H-shaped microstrip antenna for WLAN is designed. To increase the bandwidth of microstrip patch antenna a configuration of stacked type using parastic element is used. Furthermore, to reduce the size of microstrip patch antenna, two techniques are employed . the first one is H-shaped patch type and the second one is that the main radiator and parastic patch are shorted to the ground plane using ten shorting posts. The antenna bandwidth and radiation characteristics are calculated by ENSEMBLE ver. 5.0 simulation software, and compared with the experimental results. Experiment results show that the bandwidth of antenna in 740MHz centered at 5.46㎓(13.5%), which is close agreement with the calculations, 770MHz(13%). Also, the antenna size can be reduced by 71.5% compared with the half wavelength rectangular microstrip antenna using the same substrate at the same frequency.

Prediction of Track Quality Index (TQI) Using Vehicle Acceleration Data based on Machine Learning (차량가속도데이터를 이용한 머신러닝 기반의 궤도품질지수(TQI) 예측)

  • Choi, Chanyong;Kim, Hunki;Kim, Young Cheul;Kim, Sang-su
    • Journal of the Korean Geosynthetics Society
    • /
    • v.19 no.1
    • /
    • pp.45-53
    • /
    • 2020
  • There is an increasing tendency to try to make predictive analysis using measurement data based on machine learning techniques in the railway industries. In this paper, it was predicted that Track quality index (TQI) using vehicle acceleration data based on the machine learning method. The XGB (XGBoost) was the most accurate with 85% in the all data sets. Unlike the SVM model with a single algorithm, the RF and XGB model with a ensemble system were considered to be good at the prediction performance. In the case of the Surface TQI, it is shown that the acceleration of the z axis is highly related to the vertical direction and is in good agreement with the previous studies. Therefore, it is appropriate to apply the model with the ensemble algorithm to predict the track quality index using the vehicle vibration acceleration data because the accuracy may vary depending on the applied model in the machine learning methods.

An Ensemble Approach for Cyber Bullying Text messages and Images

  • Zarapala Sunitha Bai;Sreelatha Malempati
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.11
    • /
    • pp.59-66
    • /
    • 2023
  • Text mining (TM) is most widely used to find patterns from various text documents. Cyber-bullying is the term that is used to abuse a person online or offline platform. Nowadays cyber-bullying becomes more dangerous to people who are using social networking sites (SNS). Cyber-bullying is of many types such as text messaging, morphed images, morphed videos, etc. It is a very difficult task to prevent this type of abuse of the person in online SNS. Finding accurate text mining patterns gives better results in detecting cyber-bullying on any platform. Cyber-bullying is developed with the online SNS to send defamatory statements or orally bully other persons or by using the online platform to abuse in front of SNS users. Deep Learning (DL) is one of the significant domains which are used to extract and learn the quality features dynamically from the low-level text inclusions. In this scenario, Convolutional neural networks (CNN) are used for training the text data, images, and videos. CNN is a very powerful approach to training on these types of data and achieved better text classification. In this paper, an Ensemble model is introduced with the integration of Term Frequency (TF)-Inverse document frequency (IDF) and Deep Neural Network (DNN) with advanced feature-extracting techniques to classify the bullying text, images, and videos. The proposed approach also focused on reducing the training time and memory usage which helps the classification improvement.

Machine Learning Framework for Predicting Voids in the Mineral Aggregation in Asphalt Mixtures (아스팔트 혼합물의 골재 간극률 예측을 위한 기계학습 프레임워크)

  • Hyemin Park;Ilho Na;Hyunhwan Kim;Bongjun Ji
    • Journal of the Korean Geosynthetics Society
    • /
    • v.23 no.1
    • /
    • pp.17-25
    • /
    • 2024
  • The Voids in the Mineral Aggregate (VMA) within asphalt mixtures play a crucial role in defining the mixture's structural integrity, durability, and resistance to environmental factors. Accurate prediction and optimization of VMA are essential for enhancing the performance and longevity of asphalt pavements, particularly in varying climatic and environmental conditions. This study introduces a novel machine learning framework leveraging ensemble machine learning model for predicting VMA in asphalt mixtures. By analyzing a comprehensive set of variables, including aggregate size distribution, binder content, and compaction levels, our framework offers a more precise prediction of VMA than traditional single-model approaches. The use of advanced machine learning techniques not only surpasses the accuracy of conventional empirical methods but also significantly reduces the reliance on extensive laboratory testing. Our findings highlight the effectiveness of a data-driven approach in the field of asphalt mixture design, showcasing a path toward more efficient and sustainable pavement engineering practices. This research contributes to the advancement of predictive modeling in construction materials, offering valuable insights for the design and optimization of asphalt mixtures with optimal void characteristics.

Missing Value Imputation Technique for Water Quality Dataset

  • Jin-Young Jun;Youn-A Min
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.4
    • /
    • pp.39-46
    • /
    • 2024
  • Many researchers make efforts to evaluate water quality using various models. Such models require a dataset without missing values, but in real world, most datasets include missing values for various reasons. Simple deletion of samples having missing value(s) could distort distribution of the underlying data and pose a significant risk of biasing the model's inference when the missing mechanism is not MCAR. In this study, to explore the most appropriate technique for handing missing values in water quality data, several imputation techniques were experimented based on existing KNN and MICE imputation with/without the generative neural network model, Autoencoder(AE) and Denoising Autoencoder(DAE). The results shows that KNN and MICE combined imputation without generative networks provides the closest estimated values to the true values. When evaluating binary classification models based on support vector machine and ensemble algorithms after applying the combined imputation technique to the observed water quality dataset with missing values, it shows better performance in terms of Accuracy, F1 score, RoC-AuC score and MCC compared to those evaluated after deleting samples having missing values.