• 제목/요약/키워드: feature importance

검색결과 422건 처리시간 0.033초

A gradient boosting regression based approach for energy consumption prediction in buildings

  • Bataineh, Ali S. Al
    • Advances in Energy Research
    • /
    • 제6권2호
    • /
    • pp.91-101
    • /
    • 2019
  • This paper proposes an efficient data-driven approach to build models for predicting energy consumption in buildings. Data used in this research is collected by installing humidity and temperature sensors at different locations in a building. In addition to this, weather data from nearby weather station is also included in the dataset to study the impact of weather conditions on energy consumption. One of the main emphasize of this research is to make feature selection independent of domain knowledge. Therefore, to extract useful features from data, two different approaches are tested: one is feature selection through principal component analysis and second is relative importance-based feature selection in original domain. The regression model used in this research is gradient boosting regression and its optimal parameters are chosen through a two staged coarse-fine search approach. In order to evaluate the performance of model, different performance evaluation metrics like r2-score and root mean squared error are used. Results have shown that best performance is achieved, when relative importance-based feature selection is used with gradient boosting regressor. Results of proposed technique has also outperformed the results of support vector machines and neural network-based approaches tested on the same dataset.

트랜잭션 기반 머신러닝에서 특성 추출 자동화를 위한 딥러닝 응용 (A Deep Learning Application for Automated Feature Extraction in Transaction-based Machine Learning)

  • 우덕채;문현실;권순범;조윤호
    • 한국IT서비스학회지
    • /
    • 제18권2호
    • /
    • pp.143-159
    • /
    • 2019
  • Machine learning (ML) is a method of fitting given data to a mathematical model to derive insights or to predict. In the age of big data, where the amount of available data increases exponentially due to the development of information technology and smart devices, ML shows high prediction performance due to pattern detection without bias. The feature engineering that generates the features that can explain the problem to be solved in the ML process has a great influence on the performance and its importance is continuously emphasized. Despite this importance, however, it is still considered a difficult task as it requires a thorough understanding of the domain characteristics as well as an understanding of source data and the iterative procedure. Therefore, we propose methods to apply deep learning for solving the complexity and difficulty of feature extraction and improving the performance of ML model. Unlike other techniques, the most common reason for the superior performance of deep learning techniques in complex unstructured data processing is that it is possible to extract features from the source data itself. In order to apply these advantages to the business problems, we propose deep learning based methods that can automatically extract features from transaction data or directly predict and classify target variables. In particular, we applied techniques that show high performance in existing text processing based on the structural similarity between transaction data and text data. And we also verified the suitability of each method according to the characteristics of transaction data. Through our study, it is possible not only to search for the possibility of automated feature extraction but also to obtain a benchmark model that shows a certain level of performance before performing the feature extraction task by a human. In addition, it is expected that it will be able to provide guidelines for choosing a suitable deep learning model based on the business problem and the data characteristics.

소프트웨어-정의 네트워크에서 CNN 모델을 이용한 DDoS 공격 탐지 기술 (A DDoS Attack Detection Technique through CNN Model in Software Define Network)

  • 고광만
    • 한국정보전자통신기술학회논문지
    • /
    • 제13권6호
    • /
    • pp.605-610
    • /
    • 2020
  • 소프트웨어 정의 네트워크가 확장성, 유연성, 네트워크상 프로그래밍이 가능한 특징으로 네트워크 관리에서 표준으로 자리잡아 가고 있지만 많은 장점에도 불구하고 하나의 컨트롤러에 대한 사이버 공격이 전체 네트워크를 영향을 주는 문제점을 가지고 있다. 특히, 컨트롤러에 대한 DDoS 공격이 대표적인 사례로서 다양한 공격 탐지 기술에 대한 연구가 진행되고 있다. 본 논문에서는 최초로 84개 DDoS 공격 Feature 데이터셋을 Kaggle에서 획득한 후 Permutation Feature Importance 알고리즘을 이용하여 상위 20의 중요도를 갖는 Feature를 선택하여 딥 러닝 기반의 CNN 모델에서 학습과 검증을 수행하였다. 이를 통해, 최적의 공격 탐지율을 갖는 상위 13개의 DDoS Feature 선택이 DDoS 공격 탐지율 96%을 유지하면서 적정한 공격 탐지 시간, 정확성 등에서 매우 우수한 결과를 제시하였다.

건설 현장에서 발생한 업무상 재해가 근로손실일수 심각도에 미치는 특징 중요도 분석 (Analysis of the Feature Importance of Occupational Accidents Occurring at Construction Sites on the Severity of Lost Workdays)

  • 강경수;최재현;류한국
    • 한국건축시공학회지
    • /
    • 제21권2호
    • /
    • pp.165-174
    • /
    • 2021
  • 건설업은 전체 산업 분야 중에서 가장 많은 재해와 사망자를 발생시키는 산업 분야이다. 건설안전 재해를 줄이기 위한 큰 노력이 진행되어왔지만, 사망사고를 제외한 근로자의 업무복귀시간까지 회복되는 근로손실일수에 관한 연구는 매우 적은 편이다. 따라서 본 연구는 근로손실일수를 심각도로 정의하여 이를 분류하는 모형을 제안하고 학습된 모형을 통해 특징 중요도를 도출하고 중요한 특징을 분석하고자 하였다. 블랙박스 모형인 랜덤 포레스트의 학습 과정을 해석하고 추출된 특징 중요도를 통해 근로손실일수 심각도에 영향력을 행사하는 중요 변수를 추출하였다. 추출된 특징을 통해 내부에 존재하는 요인들을 분석하였다. 본 연구의 목적은 건설 현장에서 발생한 사고 사례 데이터를 랜덤 포레스트 모형을 통해 분석하고자 하였다. 근로손실일수의 심각도에 미치는 중요한 특징을 도출해 체계적으로 관리한다면 건설 재해를 예방할 수 있다.

Real-Time Locomotion Mode Recognition Employing Correlation Feature Analysis Using EMG Pattern

  • Kim, Deok-Hwan;Cho, Chi-Young;Ryu, Jaehwan
    • ETRI Journal
    • /
    • 제36권1호
    • /
    • pp.99-105
    • /
    • 2014
  • This paper presents a new locomotion mode recognition method based on a transformed correlation feature analysis using an electromyography (EMG) pattern. Each movement is recognized using six weighted subcorrelation filters, which are applied to the correlation feature analysis through the use of six time-domain features. The proposed method has a high recognition rate because it reflects the importance of the different features according to the movements and thereby enables one to recognize real-time EMG patterns, owing to the rapid execution of the correlation feature analysis. The experiment results show that the discriminating power of the proposed method is 85.89% (${\pm}2.5$) when walking on a level surface, 96.47% (${\pm}0.9$) when going up stairs, and 96.37% (${\pm}1.3$) when going down stairs for given normal movement data. This makes its accuracy and stability better than that found for the principal component analysis and linear discriminant analysis methods.

Residual Learning Based CNN for Gesture Recognition in Robot Interaction

  • Han, Hua
    • Journal of Information Processing Systems
    • /
    • 제17권2호
    • /
    • pp.385-398
    • /
    • 2021
  • The complexity of deep learning models affects the real-time performance of gesture recognition, thereby limiting the application of gesture recognition algorithms in actual scenarios. Hence, a residual learning neural network based on a deep convolutional neural network is proposed. First, small convolution kernels are used to extract the local details of gesture images. Subsequently, a shallow residual structure is built to share weights, thereby avoiding gradient disappearance or gradient explosion as the network layer deepens; consequently, the difficulty of model optimisation is simplified. Additional convolutional neural networks are used to accelerate the refinement of deep abstract features based on the spatial importance of the gesture feature distribution. Finally, a fully connected cascade softmax classifier is used to complete the gesture recognition. Compared with the dense connection multiplexing feature information network, the proposed algorithm is optimised in feature multiplexing to avoid performance fluctuations caused by feature redundancy. Experimental results from the ISOGD gesture dataset and Gesture dataset prove that the proposed algorithm affords a fast convergence speed and high accuracy.

한국 남성의 고혈압에 대한 특징 선택 기반 위험 예측 (Feature selection-based Risk Prediction for Hypertension in Korean men)

  • 홍고르출;김미혜
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2021년도 춘계학술발표대회
    • /
    • pp.323-325
    • /
    • 2021
  • In this article, we have improved the prediction of hypertension detection using the feature selection method for the Korean national health data named by the KNHANES database. The study identified a variety of risk factors associated with chronic hypertension. The paper is divided into two modules. The first of these is a data pre-processing step that uses a factor analysis (FA) based feature selection method from the dataset. The next module applies a predictive analysis step to detect and predict hypertension risk prediction. In this study, we compare the mean standard error (MSE), F1-score, and area under the ROC curve (AUC) for each classification model. The test results show that the proposed FIFA-OE-NB algorithm has an MSE, F1-score, and AUC outcomes 0.259, 0.460, and 64.70%, respectively. These results demonstrate that the proposed FIFA-OE method outperforms other models for hypertension risk predictions.

지지벡터기계의 변수 선택방법 비교 (Comparison of Feature Selection Methods in Support Vector Machines)

  • 김광수;박창이
    • 응용통계연구
    • /
    • 제26권1호
    • /
    • pp.131-139
    • /
    • 2013
  • 지지벡터기계는 잡음변수가 존재하는 경우에 성능이 저하될 수 있다. 또한 최종 분류기에서 각 변수들의 중요도를 알리 어려운 단점이 있다. 따라서 변수선택은 지지벡터기계의 해석력과 정확도를 높일 수 있다. 기존의 문헌상의 대부분의 연구는 선형 지지벡터기계에서 성근 해를 주는 벌점함수를 통해 변수를 선택에 관한 것이다. 실제로는 분류의 정확도를 높이기 위해 비선형 커널을 사용하는 경우가 일반적이다. 따라서 변수선택은 비선형 지지벡터기계에서도 마찬가지로 필요하다. 본 논문에서는 모의실험 및 실제자료를 통하여 비선형 지지벡터의 대표적인 변수선택법인 COSSO(component selection and smoothing operator)와 KNIFE(kernel iterative feature extraction)의 성능을 비교한다.

DLDW: Deep Learning and Dynamic Weighing-based Method for Predicting COVID-19 Cases in Saudi Arabia

  • Albeshri, Aiiad
    • International Journal of Computer Science & Network Security
    • /
    • 제21권9호
    • /
    • pp.212-222
    • /
    • 2021
  • Multiple waves of COVID-19 highlighted one crucial aspect of this pandemic worldwide that factors affecting the spread of COVID-19 infection are evolving based on various regional and local practices and events. The introduction of vaccines since early 2021 is expected to significantly control and reduce the cases. However, virus mutations and its new variant has challenged these expectations. Several countries, which contained the COVID-19 pandemic successfully in the first wave, failed to repeat the same in the second and third waves. This work focuses on COVID-19 pandemic control and management in Saudi Arabia. This work aims to predict new cases using deep learning using various important factors. The proposed method is called Deep Learning and Dynamic Weighing-based (DLDW) COVID-19 cases prediction method. Special consideration has been given to the evolving factors that are responsible for recent surges in the pandemic. For this purpose, two weights are assigned to data instance which are based on feature importance and dynamic weight-based time. Older data is given fewer weights and vice-versa. Feature selection identifies the factors affecting the rate of new cases evolved over the period. The DLDW method produced 80.39% prediction accuracy, 6.54%, 9.15%, and 7.19% higher than the three other classifiers, Deep learning (DL), Random Forest (RF), and Gradient Boosting Machine (GBM). Further in Saudi Arabia, our study implicitly concluded that lockdowns, vaccination, and self-aware restricted mobility of residents are effective tools in controlling and managing the COVID-19 pandemic.

Effects of Temporal Distance on Brand Extension Evaluation: Applying the Construal-Level Perspective to Brand Extensions

  • Park, Kiwan
    • Asia Marketing Journal
    • /
    • 제17권1호
    • /
    • pp.97-121
    • /
    • 2015
  • In this research, we examine whether and why temporal distance influences evaluations of two different types of brand extensions: concept-based extensions, defined as extensions primarily based on the importance or relevance of brand concepts to extension products; and similarity-based extensions, defined as extensions primarily based on the amount of feature similarity at the product-category level. In Study 1, we test the hypothesis that concept-based extensions are evaluated more favorably when they are framed to launch in the distant rather than in the near future, whereas similaritybased extensions are evaluated more favorably when they are framed to launch in the near rather than in the distant future. In Study 2, we confirm that this time-dependent differential evaluation is driven by the difference in construal level between the bases of the two types of extensions - i.e., brand-concept consistency and product-category feature similarity. As such, we find that conceptbased extensions are evaluated more favorably under the abstract than concrete mindset, whereas similarity-based extensions are evaluated more favorably under the concrete than abstract mindset. In Study 3, we extend to the case for a broad brand (i.e., brands that market products across multiple categories), finding that making accessible a specific product category of a broad parent brand influences evaluations of near-future, but not distant-future, brand extensions. Combined together, our findings suggest that temporal distance influences brand extension evaluation through its effect on the importance placed on brand concepts and feature similarity. That is, consumers rely on different bases to evaluate brand extensions, depending on their perception of when the extensions take place and on under what mindset they are placed. This research makes theoretical contributions to the brand extension research by identifying one important determinant to brand extension evaluation and also uncovering its underlying dynamics. It also contributes to expanding the scope of the construal level theory by putting forth a novel interpretation of two bases of perceived fit in terms of construal level. Marketers who are about to launch and advertise brand extensions may benefit by considering temporal-distance information in determining what content to deliver about extensions in their communication efforts. Conceptual relation of a parent brand to extensions needs to be emphasized in the distant future, whereas feature similarity should be highlighted in the near future.