• Title/Summary/Keyword: Support Vector Model

Search Result 867, Processing Time 0.027 seconds

Vibration Data Denoising and Performance Comparison Using Denoising Auto Encoder Method (Denoising Auto Encoder 기법을 활용한 진동 데이터 전처리 및 성능비교)

  • Jang, Jun-gyo;Noh, Chun-myoung;Kim, Sung-soo;Lee, Soon-sup;Lee, Jae-chul
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.27 no.7
    • /
    • pp.1088-1097
    • /
    • 2021
  • Vibration data of mechanical equipment inevitably have noise. This noise adversely af ects the maintenance of mechanical equipment. Accordingly, the performance of a learning model depends on how effectively the noise of the data is removed. In this study, the noise of the data was removed using the Denoising Auto Encoder (DAE) technique which does not include the characteristic extraction process in preprocessing time series data. In addition, the performance was compared with that of the Wavelet Transform, which is widely used for machine signal processing. The performance comparison was conducted by calculating the failure detection rate. For a more accurate comparison, a classification performance evaluation criterion, the F-1 Score, was calculated. Failure data were detected using the One-Class SVM technique. The performance comparison, revealed that the DAE technique performed better than the Wavelet Transform technique in terms of failure diagnosis and error rate.

A semi-supervised interpretable machine learning framework for sensor fault detection

  • Martakis, Panagiotis;Movsessian, Artur;Reuland, Yves;Pai, Sai G.S.;Quqa, Said;Cava, David Garcia;Tcherniak, Dmitri;Chatzi, Eleni
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.251-266
    • /
    • 2022
  • Structural Health Monitoring (SHM) of critical infrastructure comprises a major pillar of maintenance management, shielding public safety and economic sustainability. Although SHM is usually associated with data-driven metrics and thresholds, expert judgement is essential, especially in cases where erroneous predictions can bear casualties or substantial economic loss. Considering that visual inspections are time consuming and potentially subjective, artificial-intelligence tools may be leveraged in order to minimize the inspection effort and provide objective outcomes. In this context, timely detection of sensor malfunctioning is crucial in preventing inaccurate assessment and false alarms. The present work introduces a sensor-fault detection and interpretation framework, based on the well-established support-vector machine scheme for anomaly detection, combined with a coalitional game-theory approach. The proposed framework is implemented in two datasets, provided along the 1st International Project Competition for Structural Health Monitoring (IPC-SHM 2020), comprising acceleration and cable-load measurements from two real cable-stayed bridges. The results demonstrate good predictive performance and highlight the potential for seamless adaption of the algorithm to intrinsically different data domains. For the first time, the term "decision trajectories", originating from the field of cognitive sciences, is introduced and applied in the context of SHM. This provides an intuitive and comprehensive illustration of the impact of individual features, along with an elaboration on feature dependencies that drive individual model predictions. Overall, the proposed framework provides an easy-to-train, application-agnostic and interpretable anomaly detector, which can be integrated into the preprocessing part of various SHM and condition-monitoring applications, offering a first screening of the sensor health prior to further analysis.

Damaged cable detection with statistical analysis, clustering, and deep learning models

  • Son, Hyesook;Yoon, Chanyoung;Kim, Yejin;Jang, Yun;Tran, Linh Viet;Kim, Seung-Eock;Kim, Dong Joo;Park, Jongwoong
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.17-28
    • /
    • 2022
  • The cable component of cable-stayed bridges is gradually impacted by weather conditions, vehicle loads, and material corrosion. The stayed cable is a critical load-carrying part that closely affects the operational stability of a cable-stayed bridge. Damaged cables might lead to the bridge collapse due to their tension capacity reduction. Thus, it is necessary to develop structural health monitoring (SHM) techniques that accurately identify damaged cables. In this work, a combinational identification method of three efficient techniques, including statistical analysis, clustering, and neural network models, is proposed to detect the damaged cable in a cable-stayed bridge. The measured dataset from the bridge was initially preprocessed to remove the outlier channels. Then, the theory and application of each technique for damage detection were introduced. In general, the statistical approach extracts the parameters representing the damage within time series, and the clustering approach identifies the outliers from the data signals as damaged members, while the deep learning approach uses the nonlinear data dependencies in SHM for the training model. The performance of these approaches in classifying the damaged cable was assessed, and the combinational identification method was obtained using the voting ensemble. Finally, the combination method was compared with an existing outlier detection algorithm, support vector machines (SVM). The results demonstrate that the proposed method is robust and provides higher accuracy for the damaged cable detection in the cable-stayed bridge.

Feature selection for text data via topic modeling (토픽 모형을 이용한 텍스트 데이터의 단어 선택)

  • Woosol, Jang;Ye Eun, Kim;Won, Son
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.739-754
    • /
    • 2022
  • Usually, text data consists of many variables, and some of them are closely correlated. Such multi-collinearity often results in inefficient or inaccurate statistical analysis. For supervised learning, one can select features by examining the relationship between target variables and explanatory variables. On the other hand, for unsupervised learning, since target variables are absent, one cannot use such a feature selection procedure as in supervised learning. In this study, we propose a word selection procedure that employs topic models to find latent topics. We substitute topics for the target variables and select terms which show high relevance for each topic. Applying the procedure to real data, we found that the proposed word selection procedure can give clear topic interpretation by removing high-frequency words prevalent in various topics. In addition, we observed that, by applying the selected variables to the classifiers such as naïve Bayes classifiers and support vector machines, the proposed feature selection procedure gives results comparable to those obtained by using class label information.

Determination of the stage and grade of periodontitis according to the current classification of periodontal and peri-implant diseases and conditions (2018) using machine learning algorithms

  • Kubra Ertas;Ihsan Pence;Melike Siseci Cesmeli;Zuhal Yetkin Ay
    • Journal of Periodontal and Implant Science
    • /
    • v.53 no.1
    • /
    • pp.38-53
    • /
    • 2023
  • Purpose: The current Classification of Periodontal and Peri-Implant Diseases and Conditions, published and disseminated in 2018, involves some difficulties and causes diagnostic conflicts due to its criteria, especially for inexperienced clinicians. The aim of this study was to design a decision system based on machine learning algorithms by using clinical measurements and radiographic images in order to determine and facilitate the staging and grading of periodontitis. Methods: In the first part of this study, machine learning models were created using the Python programming language based on clinical data from 144 individuals who presented to the Department of Periodontology, Faculty of Dentistry, Süleyman Demirel University. In the second part, panoramic radiographic images were processed and classification was carried out with deep learning algorithms. Results: Using clinical data, the accuracy of staging with the tree algorithm reached 97.2%, while the random forest and k-nearest neighbor algorithms reached 98.6% accuracy. The best staging accuracy for processing panoramic radiographic images was provided by a hybrid network model algorithm combining the proposed ResNet50 architecture and the support vector machine algorithm. For this, the images were preprocessed, and high success was obtained, with a classification accuracy of 88.2% for staging. However, in general, it was observed that the radiographic images provided a low level of success, in terms of accuracy, for modeling the grading of periodontitis. Conclusions: The machine learning-based decision system presented herein can facilitate periodontal diagnoses despite its current limitations. Further studies are planned to optimize the algorithm and improve the results.

Sentiment Analysis for COVID-19 Vaccine Popularity

  • Muhammad Saeed;Naeem Ahmed;Abid Mehmood;Muhammad Aftab;Rashid Amin;Shahid Kamal
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.5
    • /
    • pp.1377-1393
    • /
    • 2023
  • Social media is used for various purposes including entertainment, communication, information search, and voicing their thoughts and concerns about a service, product, or issue. The social media data can be used for information mining and getting insights from it. The World Health Organization has listed COVID-19 as a global epidemic since 2020. People from every aspect of life as well as the entire health system have been severely impacted by this pandemic. Even now, after almost three years of the pandemic declaration, the fear caused by the COVID-19 virus leading to higher depression, stress, and anxiety levels has not been fully overcome. This has also triggered numerous kinds of discussions covering various aspects of the pandemic on the social media platforms. Among these aspects is the part focused on vaccines developed by different countries, their features and the advantages and disadvantages associated with each vaccine. Social media users often share their thoughts about vaccinations and vaccines. This data can be used to determine the popularity levels of vaccines, which can provide the producers with some insight for future decision making about their product. In this article, we used Twitter data for the vaccine popularity detection. We gathered data by scraping tweets about various vaccines from different countries. After that, various machine learning and deep learning models, i.e., naive bayes, decision tree, support vector machines, k-nearest neighbor, and deep neural network are used for sentiment analysis to determine the popularity of each vaccine. The results of experiments show that the proposed deep neural network model outperforms the other models by achieving 97.87% accuracy.

A Method for Generating Malware Countermeasure Samples Based on Pixel Attention Mechanism

  • Xiangyu Ma;Yuntao Zhao;Yongxin Feng;Yutao Hu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.456-477
    • /
    • 2024
  • With information technology's rapid development, the Internet faces serious security problems. Studies have shown that malware has become a primary means of attacking the Internet. Therefore, adversarial samples have become a vital breakthrough point for studying malware. By studying adversarial samples, we can gain insights into the behavior and characteristics of malware, evaluate the performance of existing detectors in the face of deceptive samples, and help to discover vulnerabilities and improve detection methods for better performance. However, existing adversarial sample generation methods still need help regarding escape effectiveness and mobility. For instance, researchers have attempted to incorporate perturbation methods like Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), and others into adversarial samples to obfuscate detectors. However, these methods are only effective in specific environments and yield limited evasion effectiveness. To solve the above problems, this paper proposes a malware adversarial sample generation method (PixGAN) based on the pixel attention mechanism, which aims to improve adversarial samples' escape effect and mobility. The method transforms malware into grey-scale images and introduces the pixel attention mechanism in the Deep Convolution Generative Adversarial Networks (DCGAN) model to weigh the critical pixels in the grey-scale map, which improves the modeling ability of the generator and discriminator, thus enhancing the escape effect and mobility of the adversarial samples. The escape rate (ASR) is used as an evaluation index of the quality of the adversarial samples. The experimental results show that the adversarial samples generated by PixGAN achieve escape rates of 97%, 94%, 35%, 39%, and 43% on the Random Forest (RF), Support Vector Machine (SVM), Convolutional Neural Network (CNN), Convolutional Neural Network and Recurrent Neural Network (CNN_RNN), and Convolutional Neural Network and Long Short Term Memory (CNN_LSTM) algorithmic detectors, respectively.

Predictive modeling algorithms for liver metastasis in colorectal cancer: A systematic review of the current literature

  • Isaac Seow-En;Ye Xin Koh;Yun Zhao;Boon Hwee Ang;Ivan En-Howe Tan;Aik Yong Chok;Emile John Kwong Wei Tan;Marianne Kit Har Au
    • Annals of Hepato-Biliary-Pancreatic Surgery
    • /
    • v.28 no.1
    • /
    • pp.14-24
    • /
    • 2024
  • This study aims to assess the quality and performance of predictive models for colorectal cancer liver metastasis (CRCLM). A systematic review was performed to identify relevant studies from various databases. Studies that described or validated predictive models for CRCLM were included. The methodological quality of the predictive models was assessed. Model performance was evaluated by the reported area under the receiver operating characteristic curve (AUC). Of the 117 articles screened, seven studies comprising 14 predictive models were included. The distribution of included predictive models was as follows: radiomics (n = 3), logistic regression (n = 3), Cox regression (n = 2), nomogram (n = 3), support vector machine (SVM, n = 2), random forest (n = 2), and convolutional neural network (CNN, n = 2). Age, sex, carcinoembryonic antigen, and tumor staging (T and N stage) were the most frequently used clinicopathological predictors for CRCLM. The mean AUCs ranged from 0.697 to 0.870, with 86% of the models demonstrating clear discriminative ability (AUC > 0.70). A hybrid approach combining clinical and radiomic features with SVM provided the best performance, achieving an AUC of 0.870. The overall risk of bias was identified as high in 71% of the included studies. This review highlights the potential of predictive modeling to accurately predict the occurrence of CRCLM. Integrating clinicopathological and radiomic features with machine learning algorithms demonstrates superior predictive capabilities.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

Wildfire Severity Mapping Using Sentinel Satellite Data Based on Machine Learning Approaches (Sentinel 위성영상과 기계학습을 이용한 국내산불 피해강도 탐지)

  • Sim, Seongmun;Kim, Woohyeok;Lee, Jaese;Kang, Yoojin;Im, Jungho;Kwon, Chunguen;Kim, Sungyong
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.5_3
    • /
    • pp.1109-1123
    • /
    • 2020
  • In South Korea with forest as a major land cover class (over 60% of the country), many wildfires occur every year. Wildfires weaken the shear strength of the soil, forming a layer of soil that is vulnerable to landslides. It is important to identify the severity of a wildfire as well as the burned area to sustainably manage the forest. Although satellite remote sensing has been widely used to map wildfire severity, it is often difficult to determine the severity using only the temporal change of satellite-derived indices such as Normalized Difference Vegetation Index (NDVI) and Normalized Burn Ratio (NBR). In this study, we proposed an approach for determining wildfire severity based on machine learning through the synergistic use of Sentinel-1A Synthetic Aperture Radar-C data and Sentinel-2A Multi Spectral Instrument data. Three wildfire cases-Samcheok in May 2017, Gangreung·Donghae in April 2019, and Gosung·Sokcho in April 2019-were used for developing wildfire severity mapping models with three machine learning algorithms (i.e., Random Forest, Logistic Regression, and Support Vector Machine). The results showed that the random forest model yielded the best performance, resulting in an overall accuracy of 82.3%. The cross-site validation to examine the spatiotemporal transferability of the machine learning models showed that the models were highly sensitive to temporal differences between the training and validation sites, especially in the early growing season. This implies that a more robust model with high spatiotemporal transferability can be developed when more wildfire cases with different seasons and areas are added in the future.