• Title/Summary/Keyword: Multi Feature Selection

Search Result 104, Processing Time 0.029 seconds

Tracing the breeding farm of domesticated pig using feature selection (Sus scrofa)

  • Kwon, Taehyung;Yoon, Joon;Heo, Jaeyoung;Lee, Wonseok;Kim, Heebal
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.30 no.11
    • /
    • pp.1540-1549
    • /
    • 2017
  • Objective: Increasing food safety demands in the animal product market have created a need for a system to trace the food distribution process, from the manufacturer to the retailer, and genetic traceability is an effective method to trace the origin of animal products. In this study, we successfully achieved the farm tracing of 6,018 multi-breed pigs, using single nucleotide polymorphism (SNP) markers strictly selected through least absolute shrinkage and selection operator (LASSO) feature selection. Methods: We performed farm tracing of domesticated pig (Sus scrofa) from SNP markers and selected the most relevant features for accurate prediction. Considering multi-breed composition of our data, we performed feature selection using LASSO penalization on 4,002 SNPs that are shared between breeds, which also includes 179 SNPs with small between-breed difference. The 100 highest-scored features were extracted from iterative simulations and then evaluated using machine-leaning based classifiers. Results: We selected 1,341 SNPs from over 45,000 SNPs through iterative LASSO feature selection, to minimize between-breed differences. We subsequently selected 100 highest-scored SNPs from iterative scoring, and observed high statistical measures in classification of breeding farms by cross-validation only using these SNPs. Conclusion: The study represents a successful application of LASSO feature selection on multi-breed pig SNP data to trace the farm information, which provides a valuable method and possibility for further researches on genetic traceability.

Neural and MTS Algorithms for Feature Selection

  • Su, Chao-Ton;Li, Te-Sheng
    • International Journal of Quality Innovation
    • /
    • v.3 no.2
    • /
    • pp.113-131
    • /
    • 2002
  • The relationships among multi-dimensional data (such as medical examination data) with ambiguity and variation are difficult to explore. The traditional approach to building a data classification system requires the formulation of rules by which the input data can be analyzed. The formulation of such rules is very difficult with large sets of input data. This paper first describes two classification approaches using back-propagation (BP) neural network and Mahalanobis distance (MD) classifier, and then proposes two classification approaches for multi-dimensional feature selection. The first one proposed is a feature selection procedure from the trained back-propagation (BP) neural network. The basic idea of this procedure is to compare the multiplication weights between input and hidden layer and hidden and output layer. In order to simplify the structure, only the multiplication weights of large absolute values are used. The second approach is Mahalanobis-Taguchi system (MTS) originally suggested by Dr. Taguchi. The MTS performs Taguchi's fractional factorial design based on the Mahalanobis distance as a performance metric. We combine the automatic thresholding with MD: it can deal with a reduced model, which is the focus of this paper In this work, two case studies will be used as examples to compare and discuss the complete and reduced models employing BP neural network and MD classifier. The implementation results show that proposed approaches are effective and powerful for the classification.

A Hybrid Multi-Level Feature Selection Framework for prediction of Chronic Disease

  • G.S. Raghavendra;Shanthi Mahesh;M.V.P. Chandrasekhara Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.101-106
    • /
    • 2023
  • Chronic illnesses are among the most common serious problems affecting human health. Early diagnosis of chronic diseases can assist to avoid or mitigate their consequences, potentially decreasing mortality rates. Using machine learning algorithms to identify risk factors is an exciting strategy. The issue with existing feature selection approaches is that each method provides a distinct set of properties that affect model correctness, and present methods cannot perform well on huge multidimensional datasets. We would like to introduce a novel model that contains a feature selection approach that selects optimal characteristics from big multidimensional data sets to provide reliable predictions of chronic illnesses without sacrificing data uniqueness.[1] To ensure the success of our proposed model, we employed balanced classes by employing hybrid balanced class sampling methods on the original dataset, as well as methods for data pre-processing and data transformation, to provide credible data for the training model. We ran and assessed our model on datasets with binary and multivalued classifications. We have used multiple datasets (Parkinson, arrythmia, breast cancer, kidney, diabetes). Suitable features are selected by using the Hybrid feature model consists of Lassocv, decision tree, random forest, gradient boosting,Adaboost, stochastic gradient descent and done voting of attributes which are common output from these methods.Accuracy of original dataset before applying framework is recorded and evaluated against reduced data set of attributes accuracy. The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy on multi valued class datasets than on binary class attributes.[1]

A Study on Deep Learning Structure of Multi-Block Method for Improving Face Recognition (얼굴 인식률 향상을 위한 멀티 블록 방식의 딥러닝 구조에 관한 연구)

  • Ra, Seung-Tak;Kim, Hong-Jik;Lee, Seung-Ho
    • Journal of IKEEE
    • /
    • v.22 no.4
    • /
    • pp.933-940
    • /
    • 2018
  • In this paper, we propose a multi-block deep learning structure for improving face recognition rate. The recognition structure of the proposed deep learning consists of three steps: multi-blocking of the input image, multi-block selection by facial feature numerical analysis, and perform deep learning of the selected multi-block. First, the input image is divided into 4 blocks by multi-block. Secondly, in the multi-block selection by feature analysis, the feature values of the quadruple multi-blocks are checked, and only the blocks with many features are selected. The third step is to perform deep learning with the selected multi-block, and the result is obtained as an efficient block with high feature value by performing recognition on the deep learning model in which the selected multi-block part is learned. To evaluate the performance of the proposed deep learning structure, we used CAS-PEAL face database. Experimental results show that the proposed multi-block deep learning structure shows 2.3% higher face recognition rate than the existing deep learning structure.

Feature Modeling with Multi-Software Product Line of IoT Protocols

  • Abbas, Asad;Siddiqui, Isma Fara;Lee, Scott Uk-Jin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.01a
    • /
    • pp.79-82
    • /
    • 2017
  • IoT devices are interconnected in global network with different functionalities and manage the data transfer in cloud computing. IoT devices can be used anytime, anywhere with any device with different applications and protocols. Same devices but different applications according to end user requirements such as sensors and Wi-Fi devices, reusability of these applications can enhance the development process. However, large number of variations in cloud computing make it difficult the features selection in application because of compatibility issues of devices. In this paper we have proposed multi-Software Product Lines (multi-SPLs) approach to manage the variabilities and commonalities of IoT applications and protocols. Feature modeling is used to manage the commonalities and variabilities of SPL. We proposed that multi-SPLs feature model is more appropriate for modeling of IoT applications and protocols.

  • PDF

Multimodal Biometric Using a Hierarchical Fusion of a Person's Face, Voice, and Online Signature

  • Elmir, Youssef;Elberrichi, Zakaria;Adjoudj, Reda
    • Journal of Information Processing Systems
    • /
    • v.10 no.4
    • /
    • pp.555-567
    • /
    • 2014
  • Biometric performance improvement is a challenging task. In this paper, a hierarchical strategy fusion based on multimodal biometric system is presented. This strategy relies on a combination of several biometric traits using a multi-level biometric fusion hierarchy. The multi-level biometric fusion includes a pre-classification fusion with optimal feature selection and a post-classification fusion that is based on the similarity of the maximum of matching scores. The proposed solution enhances biometric recognition performances based on suitable feature selection and reduction, such as principal component analysis (PCA) and linear discriminant analysis (LDA), as much as not all of the feature vectors components support the performance improvement degree.

Emotion Feature Pattern Classification Algorithm of Speech Signal using Self Organizing Map (자기 조직화 신경망을 이용한 음성 신호의 감정 특징 패턴 분류 알고리즘)

  • Ju, Jong-Tae;Park, Chang-Hyeon;Sim, Gwi-Bo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.11a
    • /
    • pp.179-182
    • /
    • 2006
  • 현재 감정을 인식할 수 있는 방법으로는 음성, 뇌파, 심박, 표정 등 많은 방법들이 존재한다. 본 논문은 이러한 방법 중 음성 신호를 이용한 방법으로써 특징들은 크게 피치, 에너지, 포만트 3가지 특징 점을 고려하였으며 이렇게 다양한 특징들을 사용하는 이유는 아직 획기적인 특징점이 정립되지 않았기 때문이며 이러한 선택의 문제를 해결하기 위해 본 논문에서는 특징 선택 방법 중 Multi Feature Selection(MFS) 방법을 사용하였으며 학습 알고리즘은 Self Organizing Map 알고리즘을 이용하여 음성 신호의 감정 특징 패턴을 분류하는 방법을 제안한다.

  • PDF

Sequential Pattern Mining for Intrusion Detection System with Feature Selection on Big Data

  • Fidalcastro, A;Baburaj, E
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.5023-5038
    • /
    • 2017
  • Big data is an emerging technology which deals with wide range of data sets with sizes beyond the ability to work with software tools which is commonly used for processing of data. When we consider a huge network, we have to process a large amount of network information generated, which consists of both normal and abnormal activity logs in large volume of multi-dimensional data. Intrusion Detection System (IDS) is required to monitor the network and to detect the malicious nodes and activities in the network. Massive amount of data makes it difficult to detect threats and attacks. Sequential Pattern mining may be used to identify the patterns of malicious activities which have been an emerging popular trend due to the consideration of quantities, profits and time orders of item. Here we propose a sequential pattern mining algorithm with fuzzy logic feature selection and fuzzy weighted support for huge volumes of network logs to be implemented in Apache Hadoop YARN, which solves the problem of speed and time constraints. Fuzzy logic feature selection selects important features from the feature set. Fuzzy weighted supports provide weights to the inputs and avoid multiple scans. In our simulation we use the attack log from NS-2 MANET environment and compare the proposed algorithm with the state-of-the-art sequential Pattern Mining algorithm, SPADE and Support Vector Machine with Hadoop environment.

Band Selection Using Forward Feature Selection Algorithm for Citrus Huanglongbing Disease Detection

  • Katti, Anurag R.;Lee, W.S.;Ehsani, R.;Yang, C.
    • Journal of Biosystems Engineering
    • /
    • v.40 no.4
    • /
    • pp.417-427
    • /
    • 2015
  • Purpose: This study investigated different band selection methods to classify spectrally similar data - obtained from aerial images of healthy citrus canopies and citrus greening disease (Huanglongbing or HLB) infected canopies - using small differences without unmixing endmember components and therefore without the need for an endmember library. However, large number of hyperspectral bands has high redundancy which had to be reduced through band selection. The objective, therefore, was to first select the best set of bands and then detect citrus Huanglongbing infected canopies using these bands in aerial hyperspectral images. Methods: The forward feature selection algorithm (FFSA) was chosen for band selection. The selected bands were used for identifying HLB infected pixels using various classifiers such as K nearest neighbor (KNN), support vector machine (SVM), naïve Bayesian classifier (NBC), and generalized local discriminant bases (LDB). All bands were also utilized to compare results. Results: It was determined that a few well-chosen bands yielded much better results than when all bands were chosen, and brought the classification results on par with standard hyperspectral classification techniques such as spectral angle mapper (SAM) and mixture tuned matched filtering (MTMF). Median detection accuracies ranged from 66-80%, which showed great potential toward rapid detection of the disease. Conclusions: Among the methods investigated, a support vector machine classifier combined with the forward feature selection algorithm yielded the best results.

Study for Feature Selection Based on Multi-Agent Reinforcement Learning (다중 에이전트 강화학습 기반 특징 선택에 대한 연구)

  • Kim, Miin-Woo;Bae, Jin-Hee;Wang, Bo-Hyun;Lim, Joon-Shik
    • Journal of Digital Convergence
    • /
    • v.19 no.12
    • /
    • pp.347-352
    • /
    • 2021
  • In this paper, we propose a method for finding feature subsets that are effective for classification in an input dataset by using a multi-agent reinforcement learning method. In the field of machine learning, it is crucial to find features suitable for classification. A dataset may have numerous features; while some features may be effective for classification or prediction, others may have little or rather negative effects on results. In machine learning problems, feature selection for increasing classification or prediction accuracy is a critical problem. To solve this problem, we proposed a feature selection method based on reinforced learning. Each feature has one agent, which determines whether the feature is selected. After obtaining corresponding rewards for each feature that is selected, but not by the agents, the Q-value of each agent is updated by comparing the rewards. The reward comparison of the two subsets helps agents determine whether their actions were right. These processes are performed as many times as the number of episodes, and finally, features are selected. As a result of applying this method to the Wisconsin Breast Cancer, Spambase, Musk, and Colon Cancer datasets, accuracy improvements of 0.0385, 0.0904, 0.1252 and 0.2055 were shown, respectively, and finally, classification accuracies of 0.9789, 0.9311, 0.9691 and 0.9474 were achieved, respectively. It was proved that our proposed method could properly select features that were effective for classification and increase classification accuracy.