• Title/Summary/Keyword: Machine-learning Feature

Search Result 713, Processing Time 0.026 seconds

Application of Dimensional Expansion and Reduction to Earthquake Catalog for Machine Learning Analysis (기계학습 분석을 위한 차원 확장과 차원 축소가 적용된 지진 카탈로그)

  • Jang, Jinsu;So, Byung-Dal
    • The Journal of Engineering Geology
    • /
    • v.32 no.3
    • /
    • pp.377-388
    • /
    • 2022
  • Recently, several studies have utilized machine learning to efficiently and accurately analyze seismic data that are exponentially increasing. In this study, we expand earthquake information such as occurrence time, hypocentral location, and magnitude to produce a dataset for applying to machine learning, reducing the dimension of the expended data into dominant features through principal component analysis. The dimensional extended data comprises statistics of the earthquake information from the Global Centroid Moment Tensor catalog containing 36,699 seismic events. We perform data preprocessing using standard and max-min scaling and extract dominant features with principal components analysis from the scaled dataset. The scaling methods significantly reduced the deviation of feature values caused by different units. Among them, the standard scaling method transforms the median of each feature with a smaller deviation than other scaling methods. The six principal components extracted from the non-scaled dataset explain 99% of the original data. The sixteen principal components from the datasets, which are applied with standardization or max-min scaling, reconstruct 98% of the original datasets. These results indicate that more principal components are needed to preserve original data information with even distributed feature values. We propose a data processing method for efficient and accurate machine learning model to analyze the relationship between seismic data and seismic behavior.

Diagnosis of Alzheimer's Disease using Wrapper Feature Selection Method

  • Vyshnavi Ramineni;Goo-Rak Kwon
    • Smart Media Journal
    • /
    • v.12 no.3
    • /
    • pp.30-37
    • /
    • 2023
  • Alzheimer's disease (AD) symptoms are being treated by early diagnosis, where we can only slow the symptoms and research is still undergoing. In consideration, using T1-weighted images several classification models are proposed in Machine learning to identify AD. In this paper, we consider the improvised feature selection, to reduce the complexity by using wrapping techniques and Restricted Boltzmann Machine (RBM). This present work used the subcortical and cortical features of 278 subjects from the ADNI dataset to identify AD and sMRI. Multi-class classification is used for the experiment i.e., AD, EMCI, LMCI, HC. The proposed feature selection consists of Forward feature selection, Backward feature selection, and Combined PCA & RBM. Forward and backward feature selection methods use an iterative method starting being no features in the forward feature selection and backward feature selection with all features included in the technique. PCA is used to reduce the dimensions and RBM is used to select the best feature without interpreting the features. We have compared the three models with PCA to analysis. The following experiment shows that combined PCA &RBM, and backward feature selection give the best accuracy with respective classification model RF i.e., 88.65, 88.56% respectively.

Machine Learning based Traffic Light Detection and Recognition Algorithm using Shape Information (기계학습 기반의 신호등 검출과 형태적 정보를 이용한 인식 알고리즘)

  • Kim, Jung-Hwan;Kim, Sun-Kyu;Lee, Tae-Min;Lim, Yong-Jin;Lim, Joonhong
    • Journal of IKEEE
    • /
    • v.22 no.1
    • /
    • pp.46-52
    • /
    • 2018
  • The problem of traffic light detection and recognition has recently become one of the most important topics in various researches on autonomous driving. Most algorithms are based on colors to detect and recognize traffic light signals. These methods have disadvantage in that the recognition rate is lowered due to the change of the color of the traffic light, the influence of the angle, distance, and surrounding illumination environment of the image. In this paper, we propose machine learning based detection and recognition algorithm using shape information to solve these problems. Unlike the existing algorithms, the proposed algorithm detects and recognizes the traffic signals based on the morphological characteristics of the traffic lights, which is advantageous in that it is robust against the influence from the surrounding environments. Experimental results show that the recognition rate of the signal is higher than those of other color-based algorithms.

Evolutionary Computing Driven Extreme Learning Machine for Objected Oriented Software Aging Prediction

  • Ahamad, Shahanawaj
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.232-240
    • /
    • 2022
  • To fulfill user expectations, the rapid evolution of software techniques and approaches has necessitated reliable and flawless software operations. Aging prediction in the software under operation is becoming a basic and unavoidable requirement for ensuring the systems' availability, reliability, and operations. In this paper, an improved evolutionary computing-driven extreme learning scheme (ECD-ELM) has been suggested for object-oriented software aging prediction. To perform aging prediction, we employed a variety of metrics, including program size, McCube complexity metrics, Halstead metrics, runtime failure event metrics, and some unique aging-related metrics (ARM). In our suggested paradigm, extracting OOP software metrics is done after pre-processing, which includes outlier detection and normalization. This technique improved our proposed system's ability to deal with instances with unbalanced biases and metrics. Further, different dimensional reduction and feature selection algorithms such as principal component analysis (PCA), linear discriminant analysis (LDA), and T-Test analysis have been applied. We have suggested a single hidden layer multi-feed forward neural network (SL-MFNN) based ELM, where an adaptive genetic algorithm (AGA) has been applied to estimate the weight and bias parameters for ELM learning. Unlike the traditional neural networks model, the implementation of GA-based ELM with LDA feature selection has outperformed other aging prediction approaches in terms of prediction accuracy, precision, recall, and F-measure. The results affirm that the implementation of outlier detection, normalization of imbalanced metrics, LDA-based feature selection, and GA-based ELM can be the reliable solution for object-oriented software aging prediction.

Prediction of Transition Temperature and Magnetocaloric Effects in Bulk Metallic Glasses with Ensemble Models (앙상블 기계학습 모델을 이용한 비정질 소재의 자기냉각 효과 및 전이온도 예측)

  • Chunghee Nam
    • Korean Journal of Materials Research
    • /
    • v.34 no.7
    • /
    • pp.363-369
    • /
    • 2024
  • In this study, the magnetocaloric effect and transition temperature of bulk metallic glass, an amorphous material, were predicted through machine learning based on the composition features. From the Python module 'Matminer', 174 compositional features were obtained, and prediction performance was compared while reducing the composition features to prevent overfitting. After optimization using RandomForest, an ensemble model, changes in prediction performance were analyzed according to the number of compositional features. The R2 score was used as a performance metric in the regression prediction, and the best prediction performance was found using only 90 features predicting transition temperature, and 20 features predicting magnetocaloric effects. The most important feature when predicting magnetocaloric effects was the 'Fe' compositional ratio. The feature importance method provided by 'scikit-learn' was applied to sort compositional features. The feature importance method was found to be appropriate by comparing the prediction performance of the Fe-contained dataset with the full dataset.

Wild Image Object Detection using a Pretrained Convolutional Neural Network

  • Park, Sejin;Moon, Young Shik
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.6
    • /
    • pp.366-371
    • /
    • 2014
  • This paper reports a machine learning approach for image object detection. Object detection and localization in a wild image, such as a STL-10 image dataset, is very difficult to implement using the traditional computer vision method. A convolutional neural network is a good approach for such wild image object detection. This paper presents an object detection application using a convolutional neural network with pretrained feature vector. This is a very simple and well organized hierarchical object abstraction model.

A Study on Malware Identification System Using Static Analysis Based Machine Learning Technique (정적 분석 기반 기계학습 기법을 활용한 악성코드 식별 시스템 연구)

  • Kim, Su-jeong;Ha, Ji-hee;Oh, Soo-hyun;Lee, Tae-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.4
    • /
    • pp.775-784
    • /
    • 2019
  • Malware infringement attacks are continuously increasing in various environments such as mobile, IOT, windows and mac due to the emergence of new and variant malware, and signature-based countermeasures have limitations in detection of malware. In addition, analytical performance is deteriorating due to obfuscation, packing, and anti-VM technique. In this paper, we propose a system that can detect malware based on machine learning by using similarity hashing-based pattern detection technique and static analysis after file classification according to packing. This enables more efficient detection because it utilizes both pattern-based detection, which is well-known malware detection, and machine learning-based detection technology, which is advantageous for detecting new and variant malware. The results of this study were obtained by detecting accuracy of 95.79% or more for benign sample files and malware sample files provided by the AI-based malware detection track of the Information Security R&D Data Challenge 2018 competition. In the future, it is expected that it will be possible to build a system that improves detection performance by applying a feature vector and a detection method to the characteristics of a packed file.

A study of methodology for identification models of cardiovascular diseases based on data mining (데이터마이닝을 이용한 심혈관질환 판별 모델 방법론 연구)

  • Lee, Bum Ju
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.4
    • /
    • pp.339-345
    • /
    • 2022
  • Cardiovascular diseases is one of the leading causes of death in the world. The objectives of this study were to build various models using sociodemographic variables based on three variable selection methods and seven machine learning algorithms for the identification of hypertension and dyslipidemia and to evaluate predictive powers of the models. In experiments based on full variables and correlation-based feature subset selection methods, our results showed that performance of models using naive Bayes was better than those of models using other machine learning algorithms in both two diseases. In wrapper-based feature subset selection method, performance of models using logistic regression was higher than those of models using other algorithms. Our finding may provide basic data for public health and machine learning fields.

Classifying Social Media Users' Stance: Exploring Diverse Feature Sets Using Machine Learning Algorithms

  • Kashif Ayyub;Muhammad Wasif Nisar;Ehsan Ullah Munir;Muhammad Ramzan
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.79-88
    • /
    • 2024
  • The use of the social media has become part of our daily life activities. The social web channels provide the content generation facility to its users who can share their views, opinions and experiences towards certain topics. The researchers are using the social media content for various research areas. Sentiment analysis, one of the most active research areas in last decade, is the process to extract reviews, opinions and sentiments of people. Sentiment analysis is applied in diverse sub-areas such as subjectivity analysis, polarity detection, and emotion detection. Stance classification has emerged as a new and interesting research area as it aims to determine whether the content writer is in favor, against or neutral towards the target topic or issue. Stance classification is significant as it has many research applications like rumor stance classifications, stance classification towards public forums, claim stance classification, neural attention stance classification, online debate stance classification, dialogic properties stance classification etc. This research study explores different feature sets such as lexical, sentiment-specific, dialog-based which have been extracted using the standard datasets in the relevant area. Supervised learning approaches of generative algorithms such as Naïve Bayes and discriminative machine learning algorithms such as Support Vector Machine, Naïve Bayes, Decision Tree and k-Nearest Neighbor have been applied and then ensemble-based algorithms like Random Forest and AdaBoost have been applied. The empirical based results have been evaluated using the standard performance measures of Accuracy, Precision, Recall, and F-measures.

Design of Fuzzy k-Nearest Neighbors Classifiers based on Feature Extraction by using Stacked Autoencoder (Stacked Autoencoder를 이용한 특징 추출 기반 Fuzzy k-Nearest Neighbors 패턴 분류기 설계)

  • Rho, Suck-Bum;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.64 no.1
    • /
    • pp.113-120
    • /
    • 2015
  • In this paper, we propose a feature extraction method using the stacked autoencoders which consist of restricted Boltzmann machines. The stacked autoencoders is a sort of deep networks. Restricted Boltzmann machines (RBMs) are probabilistic graphical models that can be interpreted as stochastic neural networks. In terms of pattern classification problem, the feature extraction is a key issue. We use the stacked autoencoders networks to extract new features which have a good influence on the improvement of the classification performance. After feature extraction, fuzzy k-nearest neighbors algorithm is used for a classifier which classifies the new extracted data set. To evaluate the classification ability of the proposed pattern classifier, we make some experiments with several machine learning data sets.