• Title/Summary/Keyword: Machine-learning Feature

Search Result 705, Processing Time 0.027 seconds

An interpretable machine learning approach for forecasting personal heat strain considering the cumulative effect of heat exposure

  • Seo, Seungwon;Choi, Yujin;Koo, Choongwan
    • Korean Journal of Construction Engineering and Management
    • /
    • v.24 no.6
    • /
    • pp.81-90
    • /
    • 2023
  • Climate change has resulted in increased frequency and intensity of heat waves, which poses a significant threat to the health and safety of construction workers, particularly those engaged in labor-intensive and heat-stress vulnerable working environments. To address this challenge, this study aimed to propose an interpretable machine learning approach for forecasting personal heat strain by considering the cumulative effect of heat exposure as a situational variable, which has not been taken into account in the existing approach. As a result, the proposed model, which incorporated the cumulative working time along with environmental and personal variables, was found to have superior forecast performance and explanatory power. Specifically, the proposed Multi-Layer Perceptron (MLP) model achieved a Mean Absolute Error (MAE) of 0.034 (℃) and an R-squared of 99.3% (0.933). Feature importance analysis revealed that the cumulative working time, as a situational variable, had the most significant impact on personal heat strain. These findings highlight the importance of systematic management of personal heat strain at construction sites by comprehensively considering the cumulative working time as a situational variable as well as environmental and personal variables. This study provided a valuable contribution to the construction industry by offering a reliable and accurate heat strain forecasting model, enhancing the health and safety of construction workers.

CNN-based Android Malware Detection Using Reduced Feature Set

  • Kim, Dong-Min;Lee, Soo-jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.10
    • /
    • pp.19-26
    • /
    • 2021
  • The performance of deep learning-based malware detection and classification models depends largely on how to construct a feature set to be applied to training. In this paper, we propose an approach to select the optimal feature set to maximize detection performance for CNN-based Android malware detection. The features to be included in the feature set were selected through the Chi-Square test algorithm, which is widely used for feature selection in machine learning and deep learning. To validate the proposed approach, the CNN model was trained using 36 characteristics selected for the CICANDMAL2017 dataset and then the malware detection performance was measured. As a result, 99.99% of Accuracy was achieved in binary classification and 98.55% in multiclass classification.

Comparative Analysis for Real-Estate Price Index Prediction Models using Machine Learning Algorithms: LIME's Interpretability Evaluation (기계학습 알고리즘을 활용한 지역 별 아파트 실거래가격지수 예측모델 비교: LIME 해석력 검증)

  • Jo, Bo-Geun;Park, Kyung-Bae;Ha, Sung-Ho
    • The Journal of Information Systems
    • /
    • v.29 no.3
    • /
    • pp.119-144
    • /
    • 2020
  • Purpose Real estate usually takes charge of the highest proportion of physical properties which individual, organizations, and government hold and instability of real estate market affects the economic condition seriously for each economic subject. Consequently, practices for predicting the real estate market have attention for various reasons, such as financial investment, administrative convenience, and wealth management. Additionally, development of machine learning algorithms and computing hardware enhances the expectation for more precise and useful prediction models in real estate market. Design/methodology/approach In response to the demand, this paper aims to provide a framework for forecasting the real estate market with machine learning algorithms. The framework consists of demonstrating the prediction efficiency of each machine learning algorithm, interpreting the interior feature effects of prediction model with a state-of-art algorithm, LIME(Local Interpretable Model-agnostic Explanation), and comparing the results in different cities. Findings This research could not only enhance the academic base for information system and real estate fields, but also resolve information asymmetry on real estate market among economic subjects. This research revealed that macroeconomic indicators, real estate-related indicators, and Google Trends search indexes can predict real-estate prices quite well.

Single nucleotide polymorphism marker combinations for classifying Yeonsan Ogye chicken using a machine learning approach

  • Eunjin, Cho;Sunghyun, Cho;Minjun, Kim;Thisarani Kalhari, Ediriweera;Dongwon, Seo;Seung-Sook, Lee;Jihye, Cha;Daehyeok, Jin;Young-Kuk, Kim;Jun Heon, Lee
    • Journal of Animal Science and Technology
    • /
    • v.64 no.5
    • /
    • pp.830-841
    • /
    • 2022
  • Genetic analysis has great potential as a tool to differentiate between different species and breeds of livestock. In this study, the optimal combinations of single nucleotide polymorphism (SNP) markers for discriminating the Yeonsan Ogye chicken (Gallus gallus domesticus) breed were identified using high-density 600K SNP array data. In 3,904 individuals from 198 chicken breeds, SNP markers specific to the target population were discovered through a case-control genome-wide association study (GWAS) and filtered out based on the linkage disequilibrium blocks. Significant SNP markers were selected by feature selection applying two machine learning algorithms: Random Forest (RF) and AdaBoost (AB). Using a machine learning approach, the 38 (RF) and 43 (AB) optimal SNP marker combinations for the Yeonsan Ogye chicken population demonstrated 100% accuracy. Hence, the GWAS and machine learning models used in this study can be efficiently utilized to identify the optimal combination of markers for discriminating target populations using multiple SNP markers.

Linear SVM-Based Android Malware Detection and Feature Selection for Performance Improvement (선형 SVM을 사용한 안드로이드 기반의 악성코드 탐지 및 성능 향상을 위한 Feature 선정)

  • Kim, Ki-Hyun;Choi, Mi-Jung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39C no.8
    • /
    • pp.738-745
    • /
    • 2014
  • Recently, mobile users continuously increase, and mobile applications also increase As mobile applications increase, the mobile users used to store sensitive and private information such as Bank information, location information, ID, password on their mobile devices. Therefore, recent malicious application targeted to mobile device instead of PC environment is increasing. In particular, since the Android is an open platform and includes security vulnerabilities, attackers prefer this environment. This paper analyzes the performance of malware detection system applying linear SVM machine learning classifier to detect Android malware application. This paper also performs feature selection in order to improve detection performance.

Study on Rub Vibration of Rotary Machine for Turbine Blade Diagnosis (터빈 블레이드 진단을 위한 회전기계 마찰 진동에 관한 연구)

  • Yu, Hyeon Tak;Ahn, Byung Hyun;Lee, Jong Myeong;Ha, Jeong Min;Choi, Byeong Keun
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • v.26 no.6_spc
    • /
    • pp.714-720
    • /
    • 2016
  • Rubbing and misalignment are the most usual faults that occurs in rotating machinery and with them severe effect on power plant availability. Especially blade rubbing is hard to detect on FFT spectrum using the vibration signal. In this paper, the possibility of feature analysis of vibration signal is confirmed under blade rubbing and misalignment condition. And the lab-scale rotor test device provides the blade rubbing and shaft misalignment modes. Feature selection based on GA (genetic algorithm) is processed by the extracted feature of the time domain. Then, classification of the features is analyzed by using SVM (support vector machine) which is one of the machine learning algorithm. The results of features selection based on GA compared with those based on PCA (principal component analysis). According to the results, the possibility of feature analysis is confirmed. Therefore, blade rubbing and shaft misalignment can be diagnosed by feature of vibration signal.

Intra-class Local Descriptor-based Prototypical Network for Few-Shot Learning

  • Huang, Xi-Lang;Choi, Seon Han
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.1
    • /
    • pp.52-60
    • /
    • 2022
  • Few-shot learning is a sub-area of machine learning problems, which aims to classify target images that only contain a few labeled samples for training. As a representative few-shot learning method, the Prototypical network has been received much attention due to its simplicity and promising results. However, the Prototypical network uses the sample mean of samples from the same class as the prototypes of that class, which easily results in learning uncharacteristic features in the low-data scenery. In this study, we propose to use local descriptors (i.e., patches along the channel within feature maps) from the same class to explicitly obtain more representative prototypes for Prototypical Network so that significant intra-class feature information can be maintained and thus improving the classification performance on few-shot learning tasks. Experimental results on various benchmark datasets including mini-ImageNet, CUB-200-2011, and tiered-ImageNet show that the proposed method can learn more discriminative intra-class features by the local descriptors and obtain more generic prototype representations under the few-shot setting.

Determining Feature-Size for Text to Numeric Conversion based on BOW and TF-IDF

  • Alyamani, Hasan J.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.1
    • /
    • pp.283-287
    • /
    • 2022
  • Machine Learning is the most popular method used in data science. Growth of data is not only numeric data but also text data. Most of the algorithm of supervised and unsupervised machine learning algorithms use numeric data. Now it is required to convert text data into numeric. There are many techniques for this conversion. Researcher confuses which technique is best in what situation. Here in proposed work BOW (Bag-of-Words) and TF-IDF (Term-Frequency-Inverse-Document-Frequency) has been studied based on different features to determine best method. After experimental results on text data, TF-IDF and BOW both provide better performance at range from 100 to 150 number of features.

An Enhanced Feature Selection Method Based on the Impurity of Words Considering Unbalanced Distribution of Documents (문서의 불균등 분포를 고려한 단어 불순도 기반 특징 선택 방법)

  • Kang, Jin-Beom;Yang, Jae-Young;Choi, Joong-Min
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.9
    • /
    • pp.804-816
    • /
    • 2007
  • Sample training data for machine learning often contain irrelevant information or redundant concept. It is also the case that the original data may include noise. If the information collected for constructing learning model is not reliable, it is difficult to obtain accurate information. So the system attempts to find relations or regulations between features and categories in the teaming phase. The feature selection is to remove irrelevant or redundant information before constructing teaming model. for improving its performance. Existing feature selection methods assume that the distribution of documents is balanced in terms of the number of documents for each class and the length of each document. In practice, however, it is difficult not only to prepare a set of documents with almost equal length, but also to define a number of classes with fixed number of document elements. In this paper, we propose a new feature selection method that considers the impurities among the words and unbalanced distribution of documents in categories. We could obtain feature candidates using the word impurity and eventually select the features through unbalanced distribution of documents. We demonstrate that our method performs better than other existing methods via some experiments.

Effects of Preprocessing and Feature Extraction on CNN-based Fire Detection Performance (전처리와 특징 추출이 CNN기반 화재 탐지 성능에 미치는 효과)

  • Lee, JeongHwan;Kim, Byeong Man;Shin, Yoon Sik
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.23 no.4
    • /
    • pp.41-53
    • /
    • 2018
  • Recently, the development of machine learning technology has led to the application of deep learning technology to existing image based application systems. In this context, some researches have been made to apply CNN (Convolutional Neural Network) to the field of fire detection. To verify the effects of existing preprocessing and feature extraction methods on fire detection when combined with CNN, in this paper, the recognition performance and learning time are evaluated by changing the VGG19 CNN structure while gradually increasing the convolution layer. In general, the accuracy is better when the image is not preprocessed. Also it's shown that the preprocessing method and the feature extraction method have many benefits in terms of learning speed.