• 제목/요약/키워드: Feature selection optimization

검색결과 92건 처리시간 0.025초

확장 가능한 요소선택방법을 위한 분석적 접근 (Analytical Approach for Scalable Feature Selection)

  • 양재경;이태한
    • 산업경영시스템학회지
    • /
    • 제29권2호
    • /
    • pp.75-82
    • /
    • 2006
  • 본 연구에서 조합 최적화(Combinatorial Optimization) 이론에 바탕을 두고 있는 네스티드 분할(Nested Partition, 이하 NP) 방법을 이용한 최적화 기탄 요소선택 방법(Feature Selection)을 제안한다. 이 새로운 방법은 좋은 요소 부분집합을 찾는 휴리스틱 탐색 절차를 채용하고 있으며 데이터의 인스턴스(Instances 또는 Records)의 무작위 추출(Random Sampling)을 이용하여 이 요소선택 방법의 처리시간 관점에서의 성능을 항상 시키고자 한다. 이 새로운 접근 방법은 처리시간 향상을 위해 2단계 샘플링 방법을 채용하여 근접 최적해로의 수렴(Convergence)을 보장하는 샘플 사이즈를 결정한다. 이는 앨고리듬이 유한한 시간내에 끝이날 때 최종 요소 부분집합 해의 질(Qualtiy)에 관한 정확한 설명을 할 수 있는 이론적인 배경을 제시한다. 중요 결과를 예시하기 위해서 다양한 형태의 다섯 개의 데이터 셋을 이용하였으며 다섯 번의 반복 실험을 통한 실험 결과가 제시되며, 이 새로운 접근 방법이 기존의 단순 네스티드 분할 방법 기반의 요소선택 방법보다 처리시간 관점에서 더욱 효율적임을 보여준다.

Improved marine predators algorithm for feature selection and SVM optimization

  • Jia, Heming;Sun, Kangjian;Li, Yao;Cao, Ning
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권4호
    • /
    • pp.1128-1145
    • /
    • 2022
  • Owing to the rapid development of information science, data analysis based on machine learning has become an interdisciplinary and strategic area. Marine predators algorithm (MPA) is a novel metaheuristic algorithm inspired by the foraging strategies of marine organisms. Considering the randomness of these strategies, an improved algorithm called co-evolutionary cultural mechanism-based marine predators algorithm (CECMPA) is proposed. Through this mechanism, search agents in different spaces can share knowledge and experience to improve the performance of the native algorithm. More specifically, CECMPA has a higher probability of avoiding local optimum and can search the global optimum quickly. In this paper, it is the first to use CECMPA to perform feature subset selection and optimize hyperparameters in support vector machine (SVM) simultaneously. For performance evaluation the proposed method, it is tested on twelve datasets from the university of California Irvine (UCI) repository. Moreover, the coronavirus disease 2019 (COVID-19) can be a real-world application and is spreading in many countries. CECMPA is also applied to a COVID-19 dataset. The experimental results and statistical analysis demonstrate that CECMPA is superior to other compared methods in the literature in terms of several evaluation metrics. The proposed method has strong competitive abilities and promising prospects.

Mitigation of Phishing URL Attack in IoT using H-ANN with H-FFGWO Algorithm

  • Gopal S. B;Poongodi C
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권7호
    • /
    • pp.1916-1934
    • /
    • 2023
  • The phishing attack is a malicious emerging threat on the internet where the hackers try to access the user credentials such as login information or Internet banking details through pirated websites. Using that information, they get into the original website and try to modify or steal the information. The problem with traditional defense systems like firewalls is that they can only stop certain types of attacks because they rely on a fixed set of principles to do so. As a result, the model needs a client-side defense mechanism that can learn potential attack vectors to detect and prevent not only the known but also unknown types of assault. Feature selection plays a key role in machine learning by selecting only the required features by eliminating the irrelevant ones from the real-time dataset. The proposed model uses Hyperparameter Optimized Artificial Neural Networks (H-ANN) combined with a Hybrid Firefly and Grey Wolf Optimization algorithm (H-FFGWO) to detect and block phishing websites in Internet of Things(IoT) Applications. In this paper, the H-FFGWO is used for the feature selection from phishing datasets ISCX-URL, Open Phish, UCI machine-learning repository, Mendeley website dataset and Phish tank. The results showed that the proposed model had an accuracy of 98.07%, a recall of 98.04%, a precision of 98.43%, and an F1-Score of 98.24%.

Relevancy contemplation in medical data analytics and ranking of feature selection algorithms

  • P. Antony Seba;J. V. Bibal Benifa
    • ETRI Journal
    • /
    • 제45권3호
    • /
    • pp.448-461
    • /
    • 2023
  • This article performs a detailed data scrutiny on a chronic kidney disease (CKD) dataset to select efficient instances and relevant features. Data relevancy is investigated using feature extraction, hybrid outlier detection, and handling of missing values. Data instances that do not influence the target are removed using data envelopment analysis to enable reduction of rows. Column reduction is achieved by ranking the attributes through feature selection methodologies, namely, extra-trees classifier, recursive feature elimination, chi-squared test, analysis of variance, and mutual information. These methodologies are ranked via Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) using weight optimization to identify the optimal features for model building from the CKD dataset to facilitate better prediction while diagnosing the severity of the disease. An efficient hybrid ensemble and novel similarity-based classifiers are built using the pruned dataset, and the results are thereafter compared with random forest, AdaBoost, naive Bayes, k-nearest neighbors, and support vector machines. The hybrid ensemble classifier yields a better prediction accuracy of 98.31% for the features selected by extra tree classifier (ETC), which is ranked as the best by TOPSIS.

Evaluations of AI-based malicious PowerShell detection with feature optimizations

  • Song, Jihyeon;Kim, Jungtae;Choi, Sunoh;Kim, Jonghyun;Kim, Ikkyun
    • ETRI Journal
    • /
    • 제43권3호
    • /
    • pp.549-560
    • /
    • 2021
  • Cyberattacks are often difficult to identify with traditional signature-based detection, because attackers continually find ways to bypass the detection methods. Therefore, researchers have introduced artificial intelligence (AI) technology for cybersecurity analysis to detect malicious PowerShell scripts. In this paper, we propose a feature optimization technique for AI-based approaches to enhance the accuracy of malicious PowerShell script detection. We statically analyze the PowerShell script and preprocess it with a method based on the tokens and abstract syntax tree (AST) for feature selection. Here, tokens and AST represent the vocabulary and structure of the PowerShell script, respectively. Performance evaluations with optimized features yield detection rates of 98% in both machine learning (ML) and deep learning (DL) experiments. Among them, the ML model with the 3-gram of selected five tokens and the DL model with experiments based on the AST 3-gram deliver the best performance.

Invariant-Feature Based Object Tracking Using Discrete Dynamic Swarm Optimization

  • Kang, Kyuchang;Bae, Changseok;Moon, Jinyoung;Park, Jongyoul;Chung, Yuk Ying;Sha, Feng;Zhao, Ximeng
    • ETRI Journal
    • /
    • 제39권2호
    • /
    • pp.151-162
    • /
    • 2017
  • With the remarkable growth in rich media in recent years, people are increasingly exposed to visual information from the environment. Visual information continues to play a vital role in rich media because people's real interests lie in dynamic information. This paper proposes a novel discrete dynamic swarm optimization (DDSO) algorithm for video object tracking using invariant features. The proposed approach is designed to track objects more robustly than other traditional algorithms in terms of illumination changes, background noise, and occlusions. DDSO is integrated with a matching procedure to eliminate inappropriate feature points geographically. The proposed novel fitness function can aid in excluding the influence of some noisy mismatched feature points. The test results showed that our approach can overcome changes in illumination, background noise, and occlusions more effectively than other traditional methods, including color-tracking and invariant feature-tracking methods.

패턴 인식을 위한 유전 알고리즘의 개관 (Review on Genetic Algorithms for Pattern Recognition)

  • 오일석
    • 한국콘텐츠학회논문지
    • /
    • 제7권1호
    • /
    • pp.58-64
    • /
    • 2007
  • 패턴 인식 분야에는 지수적 탐색 공간을 가진 최적화 문제가 많이 있다. 이를 해결하기 위해 부 최적해를 구하는 순차 탐색 알고리즘이 사용되어 왔고, 이들 알고리즘은 국부 최적점에 빠지는 문제점을 안고 있다. 최근 이를 극복하기 위해 유전 알고리즘을 사용하는 사례가 많아졌다. 이 논문은 특징 선택, 분류기 앙상블 선택, 신경망 가지치기, 군집화 문제의 지수적 탐색 공간 특성을 설명하고 이를 해결하기 위한 유전 알고리즘을 살펴본다. 또한 향후 연구로서 가치가 높은 주제들에 대해 소개한다.

Spatial Information Based Simulator for User Experience's Optimization

  • Bang, Green;Ko, Ilju
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권3호
    • /
    • pp.97-104
    • /
    • 2016
  • In this paper, we propose spatial information based simulator for user experience optimization and minimize real space complexity. We focus on developing simulator how to design virtual space model and to implement virtual character using real space data. Especially, we use expanded events-driven inference model for SVM based on machine learning. Our simulator is capable of feature selection by k-fold cross validation method for optimization of data learning. This strategy efficiently throughput of executing inference of user behavior feature by virtual space model. Thus, we aim to develop the user experience optimization system for people to facilitate mapping as the first step toward to daily life data inference. Methodologically, we focus on user behavior and space modeling for implement virtual space.

사출금형 형상부 가공을 위한 공구 선정 시스템 개발 (Development of Tool selection System for Machining Model Part of Injection Mold)

  • 양학진;김성근;허영무;양진석
    • 한국정밀공학회:학술대회논문집
    • /
    • 한국정밀공학회 2002년도 춘계학술대회 논문집
    • /
    • pp.569-574
    • /
    • 2002
  • As consumer's desire becomes various, agility of mold manufacturing is most important factor for competence of manufacturer. In common works to use commercial CAM system to generate tool path, some decision making process is required to produce optimal result of CAM systems, The paper proposes a methodology for computer-assisted tool selection procedures for various cutting type, such as rough, semi-rough and finish cuts. The system provides assist-tool-items for machining of design model part of injection meld die by analyzing sliced CAD model of die cavity and core. Also, the generating NC-code of the tool size is used to calculate machining time. The system is developed with commercial CAM using API. This module will be used for optimization of tool selection and planning process.

  • PDF

특징 선택에서 선택적 평가를 사용하는 개미 군집 최적화의 수렴 특성 (Convergence Characteristics of Ant Colony Optimization with Selective Evaluation in Feature Selection)

  • 이진선;오일석
    • 한국콘텐츠학회논문지
    • /
    • 제11권10호
    • /
    • pp.41-48
    • /
    • 2011
  • 최근 특징 선택에서 개미군집 최적화를 위한 선택적 평가 기법이 제안되었다. 이 기법은 불필요하거나 가능성이 적은 후보 해를 실제 평가 과정에서 제외함으로써 계산량을 줄인다. 실험을 통해 이 기법의 우수성을 보였으나, 하나의 데이터만을 사용하였으므로 통계적으로 충분한 신뢰성을 보여주지 못한다. 이 논문의 목적은 선택적 평가 기법의 수렴 특성을 분석하고 결론의 신뢰성을 높이는 것이다. 실험을 위해 UCI 데이터베이스에서 필기, 의료, 음성에 관련된 세가지 데이터를 선택하였다. 이들의 특징 집합 크기는 256부터 617까지 분포한다. 통계적으로 안정된 데이터를 얻기 위해, 이들 각각에 대해 프로그램을 독립적으로 12번 실행하였다. 긴 시간에 걸친 수렴을 관찰하기 위해, 각각의 프로그램 실행은 72시간 동안 이루어졌다. 실험 데이터의 분석을 바탕으로, 선택적 평가 기법의 우수성에 대한 이유와 이 기법의 적용 범위에 대해 기술한다.