• Title/Summary/Keyword: Training Datasets

Search Result 333, Processing Time 0.027 seconds

Enhancing Gene Expression Classification of Support Vector Machines with Generative Adversarial Networks

  • Huynh, Phuoc-Hai;Nguyen, Van Hoa;Do, Thanh-Nghi
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.1
    • /
    • pp.14-20
    • /
    • 2019
  • Currently, microarray gene expression data take advantage of the sufficient classification of cancers, which addresses the problems relating to cancer causes and treatment regimens. However, the sample size of gene expression data is often restricted, because the price of microarray technology on studies in humans is high. We propose enhancing the gene expression classification of support vector machines with generative adversarial networks (GAN-SVMs). A GAN that generates new data from original training datasets was implemented. The GAN was used in conjunction with nonlinear SVMs that efficiently classify gene expression data. Numerical test results on 20 low-sample-size and very high-dimensional microarray gene expression datasets from the Kent Ridge Biomedical and Array Expression repositories indicate that the model is more accurate than state-of-the-art classifying models.

A Study on the Classification Model of Minhwa Genre Based on Deep Learning (딥러닝 기반 민화 장르 분류 모델 연구)

  • Yoon, Soorim;Lee, Young-Suk
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.10
    • /
    • pp.1524-1534
    • /
    • 2022
  • This study proposes the classification model of Minhwa genre based on object detection of deep learning. To detect unique Korean traditional objects in Minhwa, we construct custom datasets by labeling images using object keywords in Minhwa DB. We train YOLOv5 models with custom datasets, and classify images using predicted object labels result, the output of model training. The algorithm consists of two classification steps: 1) according to the painting technique and 2) genre of Minhwa. Through classifying paintings using this algorithm on the Internet, it is expected that the correct information of Minhwa can be built and provided to users forward.

Performance Improvement of Fuzzy C-Means Clustering Algorithm by Optimized Early Stopping for Inhomogeneous Datasets

  • Chae-Rim Han;Sun-Jin Lee;Il-Gu Lee
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.3
    • /
    • pp.198-207
    • /
    • 2023
  • Responding to changes in artificial intelligence models and the data environment is crucial for increasing data-learning accuracy and inference stability of industrial applications. A learning model that is overfitted to specific training data leads to poor learning performance and a deterioration in flexibility. Therefore, an early stopping technique is used to stop learning at an appropriate time. However, this technique does not consider the homogeneity and independence of the data collected by heterogeneous nodes in a differential network environment, thus resulting in low learning accuracy and degradation of system performance. In this study, the generalization performance of neural networks is maximized, whereas the effect of the homogeneity of datasets is minimized by achieving an accuracy of 99.7%. This corresponds to a decrease in delay time by a factor of 2.33 and improvement in performance by a factor of 2.5 compared with the conventional method.

Human Posture Recognition: Methodology and Implementation

  • Htike, Kyaw Kyaw;Khalifa, Othman O.
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.4
    • /
    • pp.1910-1914
    • /
    • 2015
  • Human posture recognition is an attractive and challenging topic in computer vision due to its promising applications in the areas of personal health care, environmental awareness, human-computer-interaction and surveillance systems. Human posture recognition in video sequences consists of two stages: the first stage is training and evaluation and the second is deployment. In the first stage, the system is trained and evaluated using datasets of human postures to ‘teach’ the system to classify human postures for any future inputs. When the training and evaluation process is deemed satisfactory as measured by recognition rates, the trained system is then deployed to recognize human postures in any input video sequence. Different classifiers were used in the training such as Multilayer Perceptron Feedforward Neural networks, Self-Organizing Maps, Fuzzy C Means and K Means. Results show that supervised learning classifiers tend to perform better than unsupervised classifiers for the case of human posture recognition.

Classification of Class-Imbalanced Data: Effect of Over-sampling and Under-sampling of Training Data (계급불균형자료의 분류: 훈련표본 구성방법에 따른 효과)

  • 김지현;정종빈
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.3
    • /
    • pp.445-457
    • /
    • 2004
  • Given class-imbalanced data in two-class classification problem, we often do over-sampling and/or under-sampling of training data to make it balanced. We investigate the validity of such practice. Also we study the effect of such sampling practice on boosting of classification trees. Through experiments on twelve real datasets it is observed that keeping the natural distribution of training data is the best way if you plan to apply boosting methods to class-imbalanced data.

Pipeline wall thinning rate prediction model based on machine learning

  • Moon, Seongin;Kim, Kyungmo;Lee, Gyeong-Geun;Yu, Yongkyun;Kim, Dong-Jin
    • Nuclear Engineering and Technology
    • /
    • v.53 no.12
    • /
    • pp.4060-4066
    • /
    • 2021
  • Flow-accelerated corrosion (FAC) of carbon steel piping is a significant problem in nuclear power plants. The basic process of FAC is currently understood relatively well; however, the accuracy of prediction models of the wall-thinning rate under an FAC environment is not reliable. Herein, we propose a methodology to construct pipe wall-thinning rate prediction models using artificial neural networks and a convolutional neural network, which is confined to a straight pipe without geometric changes. Furthermore, a methodology to generate training data is proposed to efficiently train the neural network for the development of a machine learning-based FAC prediction model. Consequently, it is concluded that machine learning can be used to construct pipe wall thinning rate prediction models and optimize the number of training datasets for training the machine learning algorithm. The proposed methodology can be applied to efficiently generate a large dataset from an FAC test to develop a wall thinning rate prediction model for a real situation.

Benchmark for Deep Learning based Visual Odometry and Monocular Depth Estimation (딥러닝 기반 영상 주행기록계와 단안 깊이 추정 및 기술을 위한 벤치마크)

  • Choi, Hyukdoo
    • The Journal of Korea Robotics Society
    • /
    • v.14 no.2
    • /
    • pp.114-121
    • /
    • 2019
  • This paper presents a new benchmark system for visual odometry (VO) and monocular depth estimation (MDE). As deep learning has become a key technology in computer vision, many researchers are trying to apply deep learning to VO and MDE. Just a couple of years ago, they were independently studied in a supervised way, but now they are coupled and trained together in an unsupervised way. However, before designing fancy models and losses, we have to customize datasets to use them for training and testing. After training, the model has to be compared with the existing models, which is also a huge burden. The benchmark provides input dataset ready-to-use for VO and MDE research in 'tfrecords' format and output dataset that includes model checkpoints and inference results of the existing models. It also provides various tools for data formatting, training, and evaluation. In the experiments, the exsiting models were evaluated to verify their performances presented in the corresponding papers and we found that the evaluation result is inferior to the presented performances.

Channel modeling based on multilayer artificial neural network in metro tunnel environments

  • Jingyuan Qian;Asad Saleem;Guoxin Zheng
    • ETRI Journal
    • /
    • v.45 no.4
    • /
    • pp.557-569
    • /
    • 2023
  • Traditional deterministic channel modeling is accurate in prediction, but due to its complexity, improving computational efficiency remains a challenge. In an alternative approach, we investigated a multilayer artificial neural network (ANN) to predict large-scale and small-scale channel characteristics in metro tunnels. Simulated high-precision training datasets were obtained by combining measurement campaign with a ray tracing (RT) method in a metro tunnel. Performance on the training data was used to determine the number of hidden layers and neurons of the multilayer ANN. The proposed multilayer ANN performed efficiently (10 s for training; 0.19 ms for prediction), and accurately, with better approximation of the RT data than the single-layer ANN. The root mean square errors (RMSE) of path loss (2.82 dB), root mean square delay spread (0.61 ns), azimuth angle spread (3.06°), and elevation angle spread (1.22°) were impressive. These results demonstrate the superior computing efficiency and model complexity of ANNs.

Land Cover Classification Using Sematic Image Segmentation with Deep Learning (딥러닝 기반의 영상분할을 이용한 토지피복분류)

  • Lee, Seonghyeok;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.2
    • /
    • pp.279-288
    • /
    • 2019
  • We evaluated the land cover classification performance of SegNet, which features semantic segmentation of aerial imagery. We selected four semantic classes, i.e., urban, farmland, forest, and water areas, and created 2,000 datasets using aerial images and land cover maps. The datasets were divided at a 8:2 ratio into training (1,600) and validation datasets (400); we evaluated validation accuracy after tuning the hyperparameters. SegNet performance was optimal at a batch size of five with 100,000 iterations. When 200 test datasets were subjected to semantic segmentation using the trained SegNet model, the accuracies were farmland 87.89%, forest 87.18%, water 83.66%, and urban regions 82.67%; the overall accuracy was 85.48%. Thus, deep learning-based semantic segmentation can be used to classify land cover.

A Hybrid Multi-Level Feature Selection Framework for prediction of Chronic Disease

  • G.S. Raghavendra;Shanthi Mahesh;M.V.P. Chandrasekhara Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.101-106
    • /
    • 2023
  • Chronic illnesses are among the most common serious problems affecting human health. Early diagnosis of chronic diseases can assist to avoid or mitigate their consequences, potentially decreasing mortality rates. Using machine learning algorithms to identify risk factors is an exciting strategy. The issue with existing feature selection approaches is that each method provides a distinct set of properties that affect model correctness, and present methods cannot perform well on huge multidimensional datasets. We would like to introduce a novel model that contains a feature selection approach that selects optimal characteristics from big multidimensional data sets to provide reliable predictions of chronic illnesses without sacrificing data uniqueness.[1] To ensure the success of our proposed model, we employed balanced classes by employing hybrid balanced class sampling methods on the original dataset, as well as methods for data pre-processing and data transformation, to provide credible data for the training model. We ran and assessed our model on datasets with binary and multivalued classifications. We have used multiple datasets (Parkinson, arrythmia, breast cancer, kidney, diabetes). Suitable features are selected by using the Hybrid feature model consists of Lassocv, decision tree, random forest, gradient boosting,Adaboost, stochastic gradient descent and done voting of attributes which are common output from these methods.Accuracy of original dataset before applying framework is recorded and evaluated against reduced data set of attributes accuracy. The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy on multi valued class datasets than on binary class attributes.[1]