• 제목/요약/키워드: Machine-learning Feature

검색결과 705건 처리시간 0.03초

API Call Time Interval을 활용한 머신러닝 기반의 악성코드 탐지 (Machine Learning Based Malware Detection Using API Call Time Interval)

  • 조영민;권헌영
    • 정보보호학회논문지
    • /
    • 제30권1호
    • /
    • pp.51-58
    • /
    • 2020
  • 사이버 위협에 있어서 악성코드를 활용하는 것은 시대를 불문하고 지속적으로 활용되고 있고, 앞으로 IT기술이 발전하여도 여전히 주요한 공격 방법이 될 것이다. 따라서 이러한 악성코드를 탐지하기 위한 연구는 끊임없이 다양한 방법으로 시도되고 있다. 최근에는 AI 관련 기술이 발전하면서 악성코드 탐지에도 이와 관련한 연구를 많이 진행하고 있다. 본 연구에서는 동적분석 데이터 중 API Call이 발생하는 각각의 호출간격, 즉 시간차이(Time Interval)을 중심으로 특징값(Feature)을 생성하고, 이를 머신러닝 기법에 적용하여 악성코드를 탐지하는 방안을 제시하고자 한다.

기계학습을 활용한 모바일 반도체 제조 공정에서 동작 전압 예측 (Operating Voltage Prediction in Mobile Semiconductor Manufacturing Process Using Machine Learning)

  • 백인환;장승우;김광수
    • 반도체디스플레이기술학회지
    • /
    • 제22권1호
    • /
    • pp.124-128
    • /
    • 2023
  • 반도체 양산을 진행하며 얻어지는 여러 공정 데이터들로 사용 전압을 예측하여 에너지 효율적인 제품을 위한 목적으로 연구를 시작했다. 각각의 feature들 단독으로 전압을 예측하기 어려웠던 문제를 머신 러닝을 통해, 특히 Ensemble model을 이용함으로써 단일 모델보다 정확한 예측을 할 수 있었다. 더욱 중요한 시사점으로는 feature importance 분석을 통해 모델 예측에 영향이 큰 feature와 작은 feature에 대한 분석이다. 영향도가 높은 feature를 통해 비슷한 계열의 측정값을 늘리고, 낮은 feature 들의 문제점을 개선함으로써 차세대 제품에서 더욱 정확도 높은 모델을 위한 발판을 마련할 수 있었다.

  • PDF

머신러닝 및 딥러닝 연구동향 분석: 토픽모델링을 중심으로 (Research Trends Analysis of Machine Learning and Deep Learning: Focused on the Topic Modeling)

  • 김창식;김남규;곽기영
    • 디지털산업정보학회논문지
    • /
    • 제15권2호
    • /
    • pp.19-28
    • /
    • 2019
  • The purpose of this study is to examine the trends on machine learning and deep learning research in the published journals from the Web of Science Database. To achieve the study purpose, we used the abstracts of 20,664 articles published between 1990 and 2017, which include the word 'machine learning', 'deep learning', and 'artificial neural network' in their titles. Twenty major research topics were identified from topic modeling analysis and they were inclusive of classification accuracy, machine learning, optimization problem, time series model, temperature flow, engine variable, neuron layer, spectrum sample, image feature, strength property, extreme machine learning, control system, energy power, cancer patient, descriptor compound, fault diagnosis, soil map, concentration removal, protein gene, and job problem. The analysis of the time-series linear regression showed that all identified topics in machine learning research were 'hot' ones.

기계학습 기반 IDS 보안이벤트 분류 모델의 정확도 및 신속도 향상을 위한 실용적 feature 추출 연구 (A Practical Feature Extraction for Improving Accuracy and Speed of IDS Alerts Classification Models Based on Machine Learning)

  • 신익수;송중석;최장원;권태웅
    • 정보보호학회논문지
    • /
    • 제28권2호
    • /
    • pp.385-395
    • /
    • 2018
  • 인터넷의 성장과 함께 각종 취약점을 악용한 사이버 공격들이 지속적으로 증가하고 있다. 이러한 행위를 탐지하기 위한 방안으로 침입탐지시스템(IDS; Intrusion Detection System)이 널리 사용되고 있지만, IDS에서 발생하는 많은 양의 오탐(정상통신을 공격행위로 잘못 탐지한 보안이벤트)은 여전히 해결되지 않은 문제로 남아있다. IDS 오탐 문제를 해결하기 위한 방법으로 기계학습 알고리즘을 통한 자동분류 연구가 진행되고 있지만 실제 현장 적용을 위해서는 정확도와 데이터 처리속도 향상을 위한 연구가 더 필요하다. 기계학습 기반 분류 모델은 다양한 요인에 의해서 그 성능이 결정된다. 최적의 feature를 선택하는 것은 모델의 분류 성능 및 정확성 향상에 크게 영향을 미치기 때문에 기계학습에서 매우 중요한 부분을 차지한다. 본 논문에서는 보안이벤트 분류 모델의 성능 향상을 위해 기존 연구에서 제안한 기본 feature에 추가로 10종의 신규 feature를 제안한다. 본 논문에서 제안하는 10종의 신규 feature는 실제 보안관제센터 전문 인력의 노하우를 기반으로 고안된 것으로, 모델의 분류 성능을 향상시킬 뿐만 아니라 단일 보안이벤트에서 직접 추출 가능하기 때문에 실시간 모델 구축도 가능하다. 본 논문에서는 실제 네트워크 환경에서 수집된 데이터를 기반으로 제안한 신규 feature들이 분류 모델 성능 향상에 미치는 영향을 검증하였으며, 그 결과, 신규 feature가 모델의 분류 정확도를 향상시키고 오탐지율을 낮춰주는 것을 확인할 수 있었다.

A Novel Feature Selection Approach to Classify Breast Cancer Drug using Optimized Grey Wolf Algorithm

  • Shobana, G.;Priya, N.
    • International Journal of Computer Science & Network Security
    • /
    • 제22권9호
    • /
    • pp.258-270
    • /
    • 2022
  • Cancer has become a common disease for the past two decades throughout the globe and there is significant increase of cancer among women. Breast cancer and ovarian cancers are more prevalent among women. Majority of the patients approach the physicians only during their final stage of the disease. Early diagnosis of cancer remains a great challenge for the researchers. Although several drugs are being synthesized very often, their multi-benefits are less investigated. With millions of drugs synthesized and their data are accessible through open repositories. Drug repurposing can be done using machine learning techniques. We propose a feature selection technique in this paper, which is novel that generates multiple populations for the grey wolf algorithm and classifies breast cancer drugs efficiently. Leukemia drug dataset is also investigated and Multilayer perceptron achieved 96% prediction accuracy. Three supervised machine learning algorithms namely Random Forest classifier, Multilayer Perceptron and Support Vector Machine models were applied and Multilayer perceptron had higher accuracy rate of 97.7% for breast cancer drug classification.

Enhancing prediction accuracy of concrete compressive strength using stacking ensemble machine learning

  • Yunpeng Zhao;Dimitrios Goulias;Setare Saremi
    • Computers and Concrete
    • /
    • 제32권3호
    • /
    • pp.233-246
    • /
    • 2023
  • Accurate prediction of concrete compressive strength can minimize the need for extensive, time-consuming, and costly mixture optimization testing and analysis. This study attempts to enhance the prediction accuracy of compressive strength using stacking ensemble machine learning (ML) with feature engineering techniques. Seven alternative ML models of increasing complexity were implemented and compared, including linear regression, SVM, decision tree, multiple layer perceptron, random forest, Xgboost and Adaboost. To further improve the prediction accuracy, a ML pipeline was proposed in which the feature engineering technique was implemented, and a two-layer stacked model was developed. The k-fold cross-validation approach was employed to optimize model parameters and train the stacked model. The stacked model showed superior performance in predicting concrete compressive strength with a correlation of determination (R2) of 0.985. Feature (i.e., variable) importance was determined to demonstrate how useful the synthetic features are in prediction and provide better interpretability of the data and the model. The methodology in this study promotes a more thorough assessment of alternative ML algorithms and rather than focusing on any single ML model type for concrete compressive strength prediction.

Development of a Machine-Learning based Human Activity Recognition System including Eastern-Asian Specific Activities

  • Jeong, Seungmin;Choi, Cheolwoo;Oh, Dongik
    • 인터넷정보학회논문지
    • /
    • 제21권4호
    • /
    • pp.127-135
    • /
    • 2020
  • The purpose of this study is to develop a human activity recognition (HAR) system, which distinguishes 13 activities, including five activities commonly dealt with in conventional HAR researches and eight activities from the Eastern-Asian culture. The eight special activities include floor-sitting/standing, chair-sitting/standing, floor-lying/up, and bed-lying/up. We used a 3-axis accelerometer sensor on the wrist for data collection and designed a machine learning model for the activity classification. Data clustering through preprocessing and feature extraction/reduction is performed. We then tested six machine learning algorithms for recognition accuracy comparison. As a result, we have achieved an average accuracy of 99.7% for the 13 activities. This result is far better than the average accuracy of current HAR researches based on a smartwatch (89.4%). The superiority of the HAR system developed in this study is proven because we have achieved 98.7% accuracy with publically available 'pamap2' dataset of 12 activities, whose conventionally met the best accuracy is 96.6%.

Identification of Tea Diseases Based on Spectral Reflectance and Machine Learning

  • Zou, Xiuguo;Ren, Qiaomu;Cao, Hongyi;Qian, Yan;Zhang, Shuaitang
    • Journal of Information Processing Systems
    • /
    • 제16권2호
    • /
    • pp.435-446
    • /
    • 2020
  • With the ability to learn rules from training data, the machine learning model can classify unknown objects. At the same time, the dimension of hyperspectral data is usually large, which may cause an over-fitting problem. In this research, an identification methodology of tea diseases was proposed based on spectral reflectance and machine learning, including the feature selector based on the decision tree and the tea disease recognizer based on random forest. The proposed identification methodology was evaluated through experiments. The experimental results showed that the recall rate and the F1 score were significantly improved by the proposed methodology in the identification accuracy of tea disease, with average values of 15%, 7%, and 11%, respectively. Therefore, the proposed identification methodology could make relatively better feature selection and learn from high dimensional data so as to achieve the non-destructive and efficient identification of different tea diseases. This research provides a new idea for the feature selection of high dimensional data and the non-destructive identification of crop diseases.

기계학습을 이용한 밴드갭 예측과 소재의 조성기반 특성인자의 효과 (Compositional Feature Selection and Its Effects on Bandgap Prediction by Machine Learning)

  • 남충희
    • 한국재료학회지
    • /
    • 제33권4호
    • /
    • pp.164-174
    • /
    • 2023
  • The bandgap characteristics of semiconductor materials are an important factor when utilizing semiconductor materials for various applications. In this study, based on data provided by AFLOW (Automatic-FLOW for Materials Discovery), the bandgap of a semiconductor material was predicted using only the material's compositional features. The compositional features were generated using the python module of 'Pymatgen' and 'Matminer'. Pearson's correlation coefficients (PCC) between the compositional features were calculated and those with a correlation coefficient value larger than 0.95 were removed in order to avoid overfitting. The bandgap prediction performance was compared using the metrics of R2 score and root-mean-squared error. By predicting the bandgap with randomforest and xgboost as representatives of the ensemble algorithm, it was found that xgboost gave better results after cross-validation and hyper-parameter tuning. To investigate the effect of compositional feature selection on the bandgap prediction of the machine learning model, the prediction performance was studied according to the number of features based on feature importance methods. It was found that there were no significant changes in prediction performance beyond the appropriate feature. Furthermore, artificial neural networks were employed to compare the prediction performance by adjusting the number of features guided by the PCC values, resulting in the best R2 score of 0.811. By comparing and analyzing the bandgap distribution and prediction performance according to the material group containing specific elements (F, N, Yb, Eu, Zn, B, Si, Ge, Fe Al), various information for material design was obtained.

심음 기반의 심장질환 분류를 위한 새로운 시간영역 특징 (New Temporal Features for Cardiac Disorder Classification by Heart Sound)

  • 곽철;권오욱
    • 한국음향학회지
    • /
    • 제29권2호
    • /
    • pp.133-140
    • /
    • 2010
  • 연속 심음신호로부터 추출한 새로운 시간영역에서의 특징들을 추가하여 심장질환 분류의 성능을 개선한다. 기존에 사용되고 있는 켑스트럼 영역 특징인 멜주파수 켑스트럼 계수 (MFCC)에 심음 포락선, 심잡음 확률벡터, 심잡음 진폭값 변동으로 구성된 새로운 3종류의 시간영역 특징을 추가한다. 심장 질환 분류 및 검출 실험에서, 시간영역 특징의 분류 정확도에 대한 기여도를 평가하고 순차적 특징선택 방식을 이용하여 시간영역 특징을 선택한다. 선택된 특징들은 다층 퍼셉트론(MLP), support rector machine (SVM), extreme learning machine (ELM)와 같은 신경회로망 패턴 분류기에 대하여 의미있고 일관되게 분류 정확도를 개선함을 보여준다.