• Title/Summary/Keyword: Feature selection algorithm

Search Result 336, Processing Time 0.026 seconds

Extraction and classification of characteristic information of malicious code for an intelligent detection model (지능적 탐지 모델을 위한 악의적인 코드의 특징 정보 추출 및 분류)

  • Hwang, Yoon-Cheol
    • Journal of Industrial Convergence
    • /
    • v.20 no.5
    • /
    • pp.61-68
    • /
    • 2022
  • In recent years, malicious codes are being produced using the developing information and communication technology, and it is insufficient to detect them with the existing detection system. In order to accurately and efficiently detect and respond to such intelligent malicious code, an intelligent detection model is required, and in order to maximize detection performance, it is important to train with the main characteristic information set of the malicious code. In this paper, we proposed a technique for designing an intelligent detection model and generating the data required for model training as a set of key feature information through transformation, dimensionality reduction, and feature selection steps. And based on this, the main characteristic information was classified by malicious code. In addition, based on the classified characteristic information, we derived common characteristic information that can be used to analyze and detect modified or newly emerging malicious codes. Since the proposed detection model detects malicious codes by learning with a limited number of characteristic information, the detection time and response are fast, so damage can be greatly reduced and Although the performance evaluation result value is slightly different depending on the learning algorithm, it was found through evaluation that most malicious codes can be detected.

WQI Class Prediction of Sihwa Lake Using Machine Learning-Based Models (기계학습 기반 모델을 활용한 시화호의 수질평가지수 등급 예측)

  • KIM, SOO BIN;LEE, JAE SEONG;KIM, KYUNG TAE
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.27 no.2
    • /
    • pp.71-86
    • /
    • 2022
  • The water quality index (WQI) has been widely used to evaluate marine water quality. The WQI in Korea is categorized into five classes by marine environmental standards. But, the WQI calculation on huge datasets is a very complex and time-consuming process. In this regard, the current study proposed machine learning (ML) based models to predict WQI class by using water quality datasets. Sihwa Lake, one of specially-managed coastal zone, was selected as a modeling site. In this study, adaptive boosting (AdaBoost) and tree-based pipeline optimization (TPOT) algorithms were used to train models and each model performance was evaluated by metrics (accuracy, precision, F1, and Log loss) on classification. Before training, the feature importance and sensitivity analysis were conducted to find out the best input combination for each algorithm. The results proved that the bottom dissolved oxygen (DOBot) was the most important variable affecting model performance. Conversely, surface dissolved inorganic nitrogen (DINSur) and dissolved inorganic phosphorus (DIPSur) had weaker effects on the prediction of WQI class. In addition, the performance varied over features including stations, seasons, and WQI classes by comparing spatio-temporal and class sensitivities of each best model. In conclusion, the modeling results showed that the TPOT algorithm has better performance rather than the AdaBoost algorithm without considering feature selection. Moreover, the WQI class for unknown water quality datasets could be surely predicted using the TPOT model trained with satisfactory training datasets.

Prediction of KOSPI using Data Editing Techniques and Case-based Reasoning (자료편집기법과 사례기반추론을 이용한 한국종합주가지수 예측)

  • Kim, Kyoung-Jae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.6
    • /
    • pp.287-295
    • /
    • 2007
  • This paper proposes a novel data editing techniques with genetic algorithm (GA) in case-based reasoning (CBR) for the prediction of Korea Stock Price Index (KOSPI). CBR has been widely used in various areas because of its convenience and strength in compelax problem solving. Nonetheless, compared to other machine teaming techniques, CBR has been criticized because of its low prediction accuracy. Generally, in order to obtain successful results from CBR, effective retrieval of useful prior cases for the given problem is essential. However. designing a good matching and retrieval mechanism for CBR system is still a controversial research issue. In this paper, the GA optimizes simultaneously feature weights and a selection task for relevant instances for achieving good matching and retrieval in a CBR system. This study applies the proposed model to stock market analysis. Experimental results show that the GA approach is a promising method for data editing in CBR.

  • PDF

Minimization of Post-processing area for Stereolithography Parts by Selection of Part Orientation (부품방향의 선정을 통한 광조형물의 후가공면적 최소화)

  • Kim, Ho-Chan;Lee, Seok-Hee
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.26 no.11
    • /
    • pp.2409-2414
    • /
    • 2002
  • The surfaces of prototypes become rough due to the stair-stepping which is the inevitable phenomenon in the Rapid Prototypes are not used only for the verification of feature. The grinding, coating, or the composition of them is a main operation in post-processing in which lots of costs and long build time are needed. The solution is proposed to increase the efficiency of rapid prototyping by minimizing or removing the composition of them is a main operation in post-processing in which lots of costs and long build time are needed. the solution is proposed to increase the efficiency of rapid prototyping by minimizing or removing the regions for post-processing. the factors to cause the surface roughness and their effects are analyzed through the experiments. Software modules are developed to predict the surface roughness of each face in the prototyping with the result. An experimental compensation method is developed to apply the modules to various RP equipments, materials and build styles. The build direction is searched with use of genetic algorithm to maximize the total areas of the surface of which roughness is better than the user-defined value.

Implementation of a Robust Speech Recognizer in Noisy Car Environment Using a DSP (DSP를 이용한 자동차 소음에 강인한 음성인식기 구현)

  • Chung, Ik-Joo
    • Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.67-77
    • /
    • 2008
  • In this paper, we implemented a robust speech recognizer using the TMS320VC33 DSP. For this implementation, we had built speech and noise database suitable for the recognizer using spectral subtraction method for noise removal. The recognizer has an explicit structure in aspect that a speech signal is enhanced through spectral subtraction before endpoints detection and feature extraction. This helps make the operation of the recognizer clear and build HMM models which give minimum model-mismatch. Since the recognizer was developed for the purpose of controlling car facilities and voice dialing, it has two recognition engines, speaker independent one for controlling car facilities and speaker dependent one for voice dialing. We adopted a conventional DTW algorithm for the latter and a continuous HMM for the former. Though various off-line recognition test, we made a selection of optimal conditions of several recognition parameters for a resource-limited embedded recognizer, which led to HMM models of the three mixtures per state. The car noise added speech database is enhanced using spectral subtraction before HMM parameter estimation for reducing model-mismatch caused by nonlinear distortion from spectral subtraction. The hardware module developed includes a microcontroller for host interface which processes the protocol between the DSP and a host.

  • PDF

Development of SVR model for Visibility Forecasting by using Feature Selection based on Genetic Algorithm (유전 알고리즘 기반의 특징선택을 이용한 SVR 모델의 시정 예측 모델 개발)

  • Lim, Sung-Joon;Ahn, Kwang-Deuk;Ha, Jong-Chul;Lim, Eun-Ha;Lee, Yong Hee;Oh, Sung-Kwun
    • Proceedings of the KIEE Conference
    • /
    • 2015.07a
    • /
    • pp.1353-1354
    • /
    • 2015
  • 본 연구에서는 관측자료 기반의 안개 예보를 수행하기 위해 특징선택을 이용한 SVR 회귀분석 기반 시정 예측 가이던스를 개발하였다. 예측에 필요인자를 사전에 선택하는 유전알고리즘 기반의 최적화 방법을 적용하여, 관측된 여러 기상인자의 입력인자 중 실제 시정을 예측하기 위한 입력인자를 선택하여 준다. 지점별 안개발생에 필요한 입력인자 및 예측 모델을 구성하여 통합적인 예측 모델이 아닌 각 지점에 최적화된 정보를 제공할 수 있도록 예측을 수행한다. 자료의 수집 특성상 3시간 간격으로 3시간 예보를 위한 시정을 예측하고, 예측 모델의 검증을 위해 현업의 수치모델 기반의 시정예측 정보와의 비교를 통해 실제 안개 시점에 대해 비교 분석하였고 그 결과를 통해 긍정적인 효과를 보였다. 예측모델을 적용하여 지도에 예측시정 정보를 제공하는 표출 시스템을 통해 실시간 가이던스를 제공하고자 연구를 수행하였다.

  • PDF

Optimization Methodology Integrated Data Mining and Statistical Method (데이터 마이닝과 통계적 기법을 통합한 최적화 기법)

  • Song, Suh-Ill;Shin, Sang-Mun;Jung, Hey-Jin
    • Journal of Korean Society for Quality Management
    • /
    • v.34 no.4
    • /
    • pp.33-39
    • /
    • 2006
  • These days manufacture technology and manufacture environment are changing rapidly. By development of computer and enlargement of technique, most of manufacture field are computerized. In order to win international competition, it is important for companies how fast get the useful information from vast data. Statistical process control(SPC) techniques have been used as a problem solution tool at manufacturing process until present. However, these statistical methods are not applied more extensively because it has much restrictions in realistic problems. These statistical techniques have lots of problems when much data and factors are analyzed. In this paper, we proposed more practical and efficient a new statistical design technique which integrated data mining (DM) and statistical methods as alternative of problems. First step is selecting significant factor using DM feature selection algorithm from data of manufacturing process including many factors. Second step is finding optimum of process after estimating response function through response surface methodology(RSM) that is a statistical techniques

Evaluation on Performance for Classification of Students Leaving Their Majors Using Data Mining Technique (데이터마이닝 기법을 이용한 전공이탈자 분류를 위한 성능평가)

  • Leem, Young-Moon;Ryu, Chang-Hyun
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2006.11a
    • /
    • pp.293-297
    • /
    • 2006
  • Recently most universities are suffering from students leaving their majors. In order to make a countermeasure for reducing major separation rate, many universities are trying to find a proper solution. As a similar endeavor, this paper uses decision tree algorithm which is one of the data mining techniques which conduct grouping or prediction into several sub-groups from interested groups. This technique can analyze a feature of type on students leaving their majors. The dataset consists of 5,115 features through data selection from total data of 13,346 collected from a university in Kangwon-Do during seven years(2000.3.1 $\sim$ 2006.6.30). The main objective of this study is to evaluate performance of algorithms including CHAID, CART and C4.5 for classification of students leaving their majors with ROC Chart, Lift Chart and Gains Chart. Also, this study provides values about accuracy, sensitivity, specificity using classification table. According to the analysis result, CART showed the best performance for classification of students leaving their majors.

  • PDF

Improvement of Face Recognition Speed Using Pose Estimation (얼굴의 자세추정을 이용한 얼굴인식 속도 향상)

  • Choi, Sun-Hyung;Cho, Seong-Won;Chung, Sun-Tae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.5
    • /
    • pp.677-682
    • /
    • 2010
  • This paper addresses a method of estimating roughly the human pose by comparing Haar-wavelet value which is learned in face detection technology using AdaBoost algorithm. We also presents its application to face recognition. The learned weak classifier is used to a Haar-wavelet robust to each pose's feature by comparing the coefficients during the process of face detection. The Mahalanobis distance is used to measure the matching degree in Haar-wavelet selection. When a facial image is detected using the selected Haar-wavelet, the pose is estimated. The proposed pose estimation can be used to improve face recognition speed. Experiments are conducted to evaluate the performance of the proposed method for pose estimation.

Calculating Attribute Weights in K-Nearest Neighbor Algorithms using Information Theory (정보이론을 이용한 K-최근접 이웃 알고리즘에서의 속성 가중치 계산)

  • Lee Chang-Hwan
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.9
    • /
    • pp.920-926
    • /
    • 2005
  • Nearest neighbor algorithms classify an unseen input instance by selecting similar cases and use the discovered membership to make predictions about the unknown features of the input instance. The usefulness of the nearest neighbor algorithms have been demonstrated sufficiently in many real-world domains. In nearest neighbor algorithms, it is an important issue to assign proper weights to the attributes. Therefore, in this paper, we propose a new method which can automatically assigns to each attribute a weight of its importance with respect to the target attribute. The method has been implemented as a computer program and its effectiveness has been tested on a number of machine learning databases publicly available.