• Title/Summary/Keyword: Machine-learning Feature

Search Result 732, Processing Time 0.034 seconds

New Temporal Features for Cardiac Disorder Classification by Heart Sound (심음 기반의 심장질환 분류를 위한 새로운 시간영역 특징)

  • Kwak, Chul;Kwon, Oh-Wook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.2
    • /
    • pp.133-140
    • /
    • 2010
  • We improve the performance of cardiac disorder classification by adding new temporal features extracted from continuous heart sound signals. We add three kinds of novel temporal features to a conventional feature based on mel-frequency cepstral coefficients (MFCC): Heart sound envelope, murmur probabilities, and murmur amplitude variation. In cardiac disorder classification and detection experiments, we evaluate the contribution of the proposed features to classification accuracy and select proper temporal features using the sequential feature selection method. The selected features are shown to improve classification accuracy significantly and consistently for neural network-based pattern classifiers such as multi-layer perceptron (MLP), support vector machine (SVM), and extreme learning machine (ELM).

Trend of Utilization of Machine Learning Technology for Digital Healthcare Data Analysis (디지털 헬스케어 데이터 분석을 위한 머신 러닝 기술 활용 동향)

  • Woo, Y.C.;Lee, S.Y.;Choi, W.;Ahn, C.W.;Baek, O.K.
    • Electronics and Telecommunications Trends
    • /
    • v.34 no.1
    • /
    • pp.98-110
    • /
    • 2019
  • Machine learning has been applied to medical imaging and has shown an excellent recognition rate. Recently, there has been much interest in preventive medicine. If data are accessible, machine learning packages can be used easily in digital healthcare fields. However, it is necessary to prepare the data in advance, and model evaluation and tuning are required to construct a reliable model. On average, these processes take more than 80% of the total effort required. In this study, we describe the basic concepts of machine learning, pre-processing and visualization of datasets, feature engineering for reliable models, model evaluation and tuning, and the latest trends in popular machine learning frameworks. Finally, we survey a explainable machine learning analysis tool and will discuss the future direction of machine learning.

Comparison of Chlorophyll-a Prediction and Analysis of Influential Factors in Yeongsan River Using Machine Learning and Deep Learning (머신러닝과 딥러닝을 이용한 영산강의 Chlorophyll-a 예측 성능 비교 및 변화 요인 분석)

  • Sun-Hee, Shim;Yu-Heun, Kim;Hye Won, Lee;Min, Kim;Jung Hyun, Choi
    • Journal of Korean Society on Water Environment
    • /
    • v.38 no.6
    • /
    • pp.292-305
    • /
    • 2022
  • The Yeongsan River, one of the four largest rivers in South Korea, has been facing difficulties with water quality management with respect to algal bloom. The algal bloom menace has become bigger, especially after the construction of two weirs in the mainstream of the Yeongsan River. Therefore, the prediction and factor analysis of Chlorophyll-a (Chl-a) concentration is needed for effective water quality management. In this study, Chl-a prediction model was developed, and the performance evaluated using machine and deep learning methods, such as Deep Neural Network (DNN), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost). Moreover, the correlation analysis and the feature importance results were compared to identify the major factors affecting the concentration of Chl-a. All models showed high prediction performance with an R2 value of 0.9 or higher. In particular, XGBoost showed the highest prediction accuracy of 0.95 in the test data.The results of feature importance suggested that Ammonia (NH3-N) and Phosphate (PO4-P) were common major factors for the three models to manage Chl-a concentration. From the results, it was confirmed that three machine learning methods, DNN, RF, and XGBoost are powerful methods for predicting water quality parameters. Also, the comparison between feature importance and correlation analysis would present a more accurate assessment of the important major factors.

Stacked Autoencoder Based Malware Feature Refinement Technology Research (Stacked Autoencoder 기반 악성코드 Feature 정제 기술 연구)

  • Kim, Hong-bi;Lee, Tae-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.4
    • /
    • pp.593-603
    • /
    • 2020
  • The advent of malicious code has increased exponentially due to the spread of malicious code generation tools in accordance with the development of the network, but there is a limit to the response through existing malicious code detection methods. According to this situation, a machine learning-based malicious code detection method is evolving, and in this paper, the feature of data is extracted from the PE header for machine-learning-based malicious code detection, and then it is used to automate the malware through autoencoder. Research on how to extract the indicated features and feature importance. In this paper, 549 features composed of information such as DLL/API that can be identified from PE files that are commonly used in malware analysis are extracted, and autoencoder is used through the extracted features to improve the performance of malware detection in machine learning. It was proved to be successful in providing excellent accuracy and reducing the processing time by 2 times by effectively extracting the features of the data by compressively storing the data. The test results have been shown to be useful for classifying malware groups, and in the future, a classifier such as SVM will be introduced to continue research for more accurate malware detection.

Feature Selection and Hyper-Parameter Tuning for Optimizing Decision Tree Algorithm on Heart Disease Classification

  • Tsehay Admassu Assegie;Sushma S.J;Bhavya B.G;Padmashree S
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.150-154
    • /
    • 2024
  • In recent years, there are extensive researches on the applications of machine learning to the automation and decision support for medical experts during disease detection. However, the performance of machine learning still needs improvement so that machine learning model produces result that is more accurate and reliable for disease detection. Selecting the hyper-parameter that could produce the possible maximum classification accuracy on medical dataset is the most challenging task in developing decision support systems with machine learning algorithms for medical dataset classification. Moreover, selecting the features that best characterizes a disease is another challenge in developing machine-learning model with better classification accuracy. In this study, we have proposed an optimized decision tree model for heart disease classification by using heart disease dataset collected from kaggle data repository. The proposed model is evaluated and experimental test reveals that the performance of decision tree improves when an optimal number of features are used for training. Overall, the accuracy of the proposed decision tree model is 98.2% for heart disease classification.

Study on predictive model and mechanism analysis for martensite transformation temperatures through explainable artificial intelligence (설명가능한 인공지능을 통한 마르텐사이트 변태 온도 예측 모델 및 거동 분석 연구)

  • Junhyub Jeon;Seung Bae Son;Jae-Gil Jung;Seok-Jae Lee
    • Journal of the Korean Society for Heat Treatment
    • /
    • v.37 no.3
    • /
    • pp.103-113
    • /
    • 2024
  • Martensite volume fraction significantly affects the mechanical properties of alloy steels. Martensite start temperature (Ms), transformation temperature for martensite 50 vol.% (M50), and transformation temperature for martensite 90 vol.% (M90) are important transformation temperatures to control the martensite phase fraction. Several researchers proposed empirical equations and machine learning models to predict the Ms temperature. These numerical approaches can easily predict the Ms temperature without additional experiment and cost. However, to control martensite phase fraction more precisely, we need to reduce prediction error of the Ms model and propose prediction models for other martensite transformation temperatures (M50, M90). In the present study, machine learning model was applied to suggest the predictive model for the Ms, M50, M90 temperatures. To explain prediction mechanisms and suggest feature importance on martensite transformation temperature of machine learning models, the explainable artificial intelligence (XAI) is employed. Random forest regression (RFR) showed the best performance for predicting the Ms, M50, M90 temperatures using different machine learning models. The feature importance was proposed and the prediction mechanisms were discussed by XAI.

Development of Machine Learning Ensemble Model using Artificial Intelligence (인공지능을 활용한 기계학습 앙상블 모델 개발)

  • Lee, K.W.;Won, Y.J.;Song, Y.B.;Cho, K.S.
    • Journal of the Korean Society for Heat Treatment
    • /
    • v.34 no.5
    • /
    • pp.211-217
    • /
    • 2021
  • To predict mechanical properties of secondary hardening martensitic steels, a machine learning ensemble model was established. Based on ANN(Artificial Neural Network) architecture, some kinds of methods was considered to optimize the model. In particular, interaction features, which can reflect interactions between chemical compositions and processing conditions of real alloy system, was considered by means of feature engineering, and then K-Fold cross validation coupled with bagging ensemble were investigated to reduce R2_score and a factor indicating average learning errors owing to biased experimental database.

Feature selection and prediction modeling of drug responsiveness in Pharmacogenomics (약물유전체학에서 약물반응 예측모형과 변수선택 방법)

  • Kim, Kyuhwan;Kim, Wonkuk
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.153-166
    • /
    • 2021
  • A main goal of pharmacogenomics studies is to predict individual's drug responsiveness based on high dimensional genetic variables. Due to a large number of variables, feature selection is required in order to reduce the number of variables. The selected features are used to construct a predictive model using machine learning algorithms. In the present study, we applied several hybrid feature selection methods such as combinations of logistic regression, ReliefF, TurF, random forest, and LASSO to a next generation sequencing data set of 400 epilepsy patients. We then applied the selected features to machine learning methods including random forest, gradient boosting, and support vector machine as well as a stacking ensemble method. Our results showed that the stacking model with a hybrid feature selection of random forest and ReliefF performs better than with other combinations of approaches. Based on a 5-fold cross validation partition, the mean test accuracy value of the best model was 0.727 and the mean test AUC value of the best model was 0.761. It also appeared that the stacking models outperform than single machine learning predictive models when using the same selected features.

Investigation of pile group response to adjacent twin tunnel excavation utilizing machine learning

  • Su-Bin Kim;Dong-Wook Oh;Hyeon-Jun Cho;Yong-Joo Lee
    • Geomechanics and Engineering
    • /
    • v.38 no.5
    • /
    • pp.517-528
    • /
    • 2024
  • For numerous tunnelling projects implemented in urban areas due to limited space, it is crucial to take into account the interaction between the foundation, ground, and tunnel. In predicting the deformation of piled foundations and the ground during twin tunnel excavation, it is essential to consider various factors. Therefore, this study derived a prediction model for pile group settlement using machine learning to analyze the importance of various factors that determine the settlement of piled foundations during twin tunnelling. Laboratory model tests and numerical analysis were utilized as input data for machine learning. The influence of each independent variable on the prediction model was analyzed. Machine learning techniques such as data preprocessing, feature engineering, and hyperparameter tuning were used to improve the performance of the prediction model. Machine learning models, employing Random Forest (RF), eXtreme Gradient Boosting (XGB), and Light Gradient Boosting Machine (LightGBM, LGB) algorithms, demonstrate enhanced performance after hyperparameter tuning, particularly with LGB achieving an R2 of 0.9782 and RMSE value of 0.0314. The feature importance in the prediction models was analyzed and PN was the highest at 65.04% for RF, 64.81% for XGB, and PCTC (distance between the center of piles) was the highest at 31.32% for LGB. SHAP was utilized for analyzing the impact of each variable. PN (the number of piles) consistently exerted the most influence on the prediction of pile group settlement across all models. The results from both laboratory model tests and numerical analysis revealed a reduction in ground displacement with varying pillar spacing in twin tunnels. However, upon further investigation through machine learning with additional variables, it was found that the number of piles has the most significant impact on ground displacement. Nevertheless, as this study is based on laboratory model testing, further research considering real field conditions is necessary. This study contributes to a better understanding of the complex interactions inherent in twin tunnelling projects and provides a reliable tool for predicting pile group settlement in such scenarios.