• Title/Summary/Keyword: Training Datasets

Search Result 335, Processing Time 0.023 seconds

Nakdong River Estuary Salinity Prediction Using Machine Learning Methods (머신러닝 기법을 활용한 낙동강 하구 염분농도 예측)

  • Lee, Hojun;Jo, Mingyu;Chun, Sejin;Han, Jungkyu
    • Smart Media Journal
    • /
    • v.11 no.2
    • /
    • pp.31-38
    • /
    • 2022
  • Promptly predicting changes in the salinity in rivers is an important task to predict the damage to agriculture and ecosystems caused by salinity infiltration and to establish disaster prevention measures. Because machine learning(ML) methods show much less computation cost than physics-based hydraulic models, they can predict the river salinity in a relatively short time. Due to shorter training time, ML methods have been studied as a complementary technique to physics-based hydraulic model. Many studies on salinity prediction based on machine learning have been studied actively around the world, but there are few studies in South Korea. With a massive number of datasets available publicly, we evaluated the performance of various kinds of machine learning techniques that predict the salinity of the Nakdong River Estuary Basin. As a result, LightGBM algorithm shows average 0.37 in RMSE as prediction performance and 2-20 times faster learning speed than other algorithms. This indicates that machine learning techniques can be applied to predict the salinity of rivers in Korea.

An Experimental Study on the Automatic Classification of Korean Journal Articles through Feature Selection (자질선정을 통한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.1
    • /
    • pp.69-90
    • /
    • 2022
  • As basic data that can systematically support and evaluate R&D activities as well as set current and future research directions by grasping specific trends in domestic academic research, I sought efficient ways to assign standardized subject categories (control keywords) to individual journal papers. To this end, I conducted various experiments on major factors affecting the performance of automatic classification, focusing on feature selection techniques, for the purpose of automatically allocating the classification categories on the National Research Foundation of Korea's Academic Research Classification Scheme to domestic journal papers. As a result, the automatic classification of domestic journal papers, which are imbalanced datasets of the real environment, showed that a fairly good level of performance can be expected using more simple classifiers, feature selection techniques, and relatively small training sets.

Building Sentiment-Annotated Datasets for Training a FbSA model based on the SSP methodology (반자동 언어데이터 증강 방식에 기반한 FbSA 모델 학습을 위한 감성주석 데이터셋 FeSAD 구축)

  • Yoon, Jeong-Woo;Hwang, Chang-Hoe;Choi, Su-Won;Nam, Jee-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.66-71
    • /
    • 2021
  • 본 연구는 한국어 자질 기반 감성분석(Feature-based Sentiment Analysis: FbSA)을 위한 대규모의 학습데이터 구축에 있어 반자동 언어데이터 증강 기법(SSP: Semi-automatic Symbolic Propagation)에 입각한 자질-감성 주석 데이터셋 FeSAD(Feature-Sentiment-Annotated Dataset)의 개발 과정과 성능 평가를 소개하는 것을 목표로 한다. FeSAD는 언어자원을 활용한 SSP 1단계 주석 이후, 작업자의 주석이 2단계에서 이루어지는 2-STEP 주석 과정을 통해 구축된다. SSP 주석을 위한 언어자원에는 부분 문법 그래프(Local Grammar Graph: LGG) 스키마와 한국어 기계가독형 전자사전 DECO(Dictionnaire Electronique du COréen)가 활용되며, 본 연구에서는 7개의 도메인(코스메틱, IT제품, 패션/의류, 푸드/배달음식, 가구/인테리어, 핀테크앱, KPOP)에 대해, 오피니언 트리플이 주석된 FeSAD 데이터셋을 구축하는 프로세싱을 소개하였다. 코스메틱(COS)과 푸드/배달음식(FOO) 두 도메인에 대해, 언어자원을 활용한 1단계 SSP 주석 성능을 평가한 결과, 각각 F1-score 0.93과 0.90의 성능을 보였으며, 이를 통해 FbSA용 학습데이터 주석을 위한 작업자의 작업이 기존 작업의 10% 이하의 비중으로 감소함으로써, 학습데이터 구축을 위한 프로세싱의 소요시간과 품질이 획기적으로 개선될 수 있음을 확인하였다.

  • PDF

Computational intelligence models for predicting the frictional resistance of driven pile foundations in cold regions

  • Shiguan Chen;Huimei Zhang;Kseniya I. Zykova;Hamed Gholizadeh Touchaei;Chao Yuan;Hossein Moayedi;Binh Nguyen Le
    • Computers and Concrete
    • /
    • v.32 no.2
    • /
    • pp.217-232
    • /
    • 2023
  • Numerous studies have been performed on the behavior of pile foundations in cold regions. This study first attempted to employ artificial neural networks (ANN) to predict pile-bearing capacity focusing on pile data recorded primarily on cold regions. As the ANN technique has disadvantages such as finding global minima or slower convergence rates, this study in the second phase deals with the development of an ANN-based predictive model improved with an Elephant herding optimizer (EHO), Dragonfly Algorithm (DA), Genetic Algorithm (GA), and Evolution Strategy (ES) methods for predicting the piles' bearing capacity. The network inputs included the pile geometrical features, pile area (m2), pile length (m), internal friction angle along the pile body and pile tip (Ø°), and effective vertical stress. The MLP model pile's output was the ultimate bearing capacity. A sensitivity analysis was performed to determine the optimum parameters to select the best predictive model. A trial-and-error technique was also used to find the optimum network architecture and the number of hidden nodes. According to the results, there is a good consistency between the pile-bearing DA-MLP-predicted capacities and the measured bearing capacities. Based on the R2 and determination coefficient as 0.90364 and 0.8643 for testing and training datasets, respectively, it is suggested that the DA-MLP model can be effectively implemented with higher reliability, efficiency, and practicability to predict the bearing capacity of piles.

Prediction Model Design by Concentration Type for Improving PM10 Prediction Performance (PM10 예측 성능 향상을 위한 농도별 예측 모델 설계)

  • Kyoung-Woo Cho;Yong-jin Jung;Chang-Heon Oh
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.6
    • /
    • pp.576-581
    • /
    • 2021
  • Compared to a low concentration, a high concentration clearly entails limitations in terms of predictive performance owing to differences in its frequency and environment of occurrence. To resolve this problem, in this study, an artificial intelligence neural network algorithm was used to classify low and high concentrations; furthermore, two prediction models trained using the characteristics of the classified concentration types were used for prediction. To this end, we constructed training datasets using weather and air pollutant data collected over a decade in the Cheonan region. We designed a DNN-based classification model to classify low and high concentrations; further, we designed low- and high-concentration prediction models to reflect characteristics by concentration type based on the low and high concentrations classified through the classification model. According to the results of the performance assessment of the prediction model by concentration type, the low- and high-concentration prediction accuracies were 90.38% and 96.37%, respectively.

Classification of Tabular Data using High-Dimensional Mapping and Deep Learning Network (고차원 매핑기법과 딥러닝 네트워크를 통한 정형데이터의 분류)

  • Kyeong-Taek Kim;Won-Du Chang
    • Journal of Internet of Things and Convergence
    • /
    • v.9 no.6
    • /
    • pp.119-124
    • /
    • 2023
  • Deep learning has recently demonstrated conspicuous efficacy across diverse domains than traditional machine learning techniques, as the most popular approach for pattern recognition. The classification problems for tabular data, however, are remain for the area of traditional machine learning. This paper introduces a novel network module designed to tabular data into high-dimensional tensors. The module is integrated into conventional deep learning networks and subsequently applied to the classification of structured data. The proposed method undergoes training and validation on four datasets, culminating in an average accuracy of 90.22%. Notably, this performance surpasses that of the contemporary deep learning model, TabNet, by 2.55%p. The proposed approach acquires significance by virtue of its capacity to harness diverse network architectures, renowned for their superior performance in the domain of computer vision, for the analysis of tabular data.

Machine learning techniques for reinforced concrete's tensile strength assessment under different wetting and drying cycles

  • Ibrahim Albaijan;Danial Fakhri;Adil Hussein Mohammed;Arsalan Mahmoodzadeh;Hawkar Hashim Ibrahim;Khaled Mohamed Elhadi;Shima Rashidi
    • Steel and Composite Structures
    • /
    • v.49 no.3
    • /
    • pp.337-348
    • /
    • 2023
  • Successive wetting and drying cycles of concrete due to weather changes can endanger the safety of engineering structures over time. Considering wetting and drying cycles in concrete tests can lead to a more correct and reliable design of engineering structures. This study aims to provide a model that can be used to estimate the resistance properties of concrete under different wetting and drying cycles. Complex sample preparation methods, the necessity for highly accurate and sensitive instruments, early sample failure, and brittle samples all contribute to the difficulty of measuring the strength of concrete in the laboratory. To address these problems, in this study, the potential ability of six machine learning techniques, including ANN, SVM, RF, KNN, XGBoost, and NB, to predict the concrete's tensile strength was investigated by applying 240 datasets obtained using the Brazilian test (80% for training and 20% for test). In conducting the test, the effect of additives such as glass and polypropylene, as well as the effect of wetting and drying cycles on the tensile strength of concrete, was investigated. Finally, the statistical analysis results revealed that the XGBoost model was the most robust one with R2 = 0.9155, mean absolute error (MAE) = 0.1080 Mpa, and variance accounted for (VAF) = 91.54% to predict the concrete tensile strength. This work's significance is that it allows civil engineers to accurately estimate the tensile strength of different types of concrete. In this way, the high time and cost required for the laboratory tests can be eliminated.

Horse race rank prediction using learning-to-rank approaches (Learning-to-rank 기법을 활용한 서울 경마경기 순위 예측)

  • Junhyoung Chung;Donguk Shin;Seyong Hwang;Gunwoong Park
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.239-253
    • /
    • 2024
  • This research applies both point-wise and pair-wise learning strategies within the learning-to-rank (LTR) framework to predict horse race rankings in Seoul. Specifically, for point-wise learning, we employ a linear model and random forest. In contrast, for pair-wise learning, we utilize tools such as RankNet, and LambdaMART (XGBoost Ranker, LightGBM Ranker, and CatBoost Ranker). Furthermore, to enhance predictions, race records are standardized based on race distance, and we integrate various datasets, including race information, jockey information, horse training records, and trainer information. Our results empirically demonstrate that pair-wise learning approaches that can reflect the order information between items generally outperform point-wise learning approaches. Notably, CatBoost Ranker is the top performer. Through Shapley value analysis, we identified that the important variables for CatBoost Ranker include the performance of a horse, its previous race records, the count of its starting trainings, the total number of starting trainings, and the instances of disease diagnoses for the horse.

Wild Bird Sound Classification Scheme using Focal Loss and Ensemble Learning (Focal Loss와 앙상블 학습을 이용한 야생조류 소리 분류 기법)

  • Jaeseung Lee;Jehyeok Rew
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.2
    • /
    • pp.15-25
    • /
    • 2024
  • For effective analysis of animal ecosystems, technology that can automatically identify the current status of animal habitats is crucial. Specifically, animal sound classification, which identifies species based on their sounds, is gaining great attention where video-based discrimination is impractical. Traditional studies have relied on a single deep learning model to classify animal sounds. However, sounds collected in outdoor settings often include substantial background noise, complicating the task for a single model. In addition, data imbalance among species may lead to biased model training. To address these challenges, in this paper, we propose an animal sound classification scheme that combines predictions from multiple models using Focal Loss, which adjusts penalties based on class data volume. Experiments on public datasets have demonstrated that our scheme can improve recall by up to 22.6% compared to an average of single models.

Predicting restraining effects in CFS channels: A machine learning approach

  • Seyed Mohammad Mojtabaei;Rasoul Khandan;Iman Hajirasouliha
    • Steel and Composite Structures
    • /
    • v.51 no.4
    • /
    • pp.441-456
    • /
    • 2024
  • This paper aims to develop Machine Learning (ML) algorithms to predict the buckling resistance of cold-formed steel (CFS) channels with restrained flanges, widely used in typical CFS sheathed wall panels, and provide practical design tools for engineers. The effects of cross-sectional restraints were first evaluated on the elastic buckling behaviour of CFS channels subjected to pure axial compressive load or bending moment. Feedforward multi-layer Artificial Neural Networks (ANNs) were then trained on different datasets comprising CFS channels with various dimensions and properties, plate thicknesses, and restraining conditions on one or two flanges, while the elastic distortional buckling resistance of the elements were determined according to the Finite Strip Method (FSM). To develop less biased networks and ensure that every observation from the original dataset has the chance of appearing in the training and test set, a K-fold cross-validation technique was implemented. In addition, the hyperparameters of the ANNs were tuned using a grid search technique to provide ANNs with optimum performances. The results demonstrated that the trained ANNs were able to predict the elastic distortional buckling resistance of CFS flange-restrained elements with an average accuracy of 99% in terms of coefficient of determination. The developed models were then used to propose a simple ANN-based design formula for the prediction of the elastic distortional buckling stress of CFS flange-restrained elements. Finally, the proposed formula was further evaluated on a separate set of unseen data to ensure its accuracy for practical applications.