• 제목/요약/키워드: correlation feature analysis

검색결과 245건 처리시간 0.029초

Enhancing prediction accuracy of concrete compressive strength using stacking ensemble machine learning

  • Yunpeng Zhao;Dimitrios Goulias;Setare Saremi
    • Computers and Concrete
    • /
    • 제32권3호
    • /
    • pp.233-246
    • /
    • 2023
  • Accurate prediction of concrete compressive strength can minimize the need for extensive, time-consuming, and costly mixture optimization testing and analysis. This study attempts to enhance the prediction accuracy of compressive strength using stacking ensemble machine learning (ML) with feature engineering techniques. Seven alternative ML models of increasing complexity were implemented and compared, including linear regression, SVM, decision tree, multiple layer perceptron, random forest, Xgboost and Adaboost. To further improve the prediction accuracy, a ML pipeline was proposed in which the feature engineering technique was implemented, and a two-layer stacked model was developed. The k-fold cross-validation approach was employed to optimize model parameters and train the stacked model. The stacked model showed superior performance in predicting concrete compressive strength with a correlation of determination (R2) of 0.985. Feature (i.e., variable) importance was determined to demonstrate how useful the synthetic features are in prediction and provide better interpretability of the data and the model. The methodology in this study promotes a more thorough assessment of alternative ML algorithms and rather than focusing on any single ML model type for concrete compressive strength prediction.

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제40권8호
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

소셜미디어 감성분석을 위한 베이지안 속성 선택과 분류에 대한 연구 (Investigating the Performance of Bayesian-based Feature Selection and Classification Approach to Social Media Sentiment Analysis)

  • 강창민;어균선;이건창
    • 경영정보학연구
    • /
    • 제24권1호
    • /
    • pp.1-19
    • /
    • 2022
  • 온라인 사용자들이 소셜 미디어상에 올린 온라인 리뷰 속 숨겨진 감정을 분석하는 감성분석은 소셜미디어의 확산에 힘입어 많은 관심을 받고 있다. 본 연구는 기존 연구들과 차별화된 방법으로 감성분석을 시도하기 위하여 베이지안 네트워크에 기반한 감성 분석 모델을 제안한다. 모델에는 MBFS(Markov Blanket-based Feature Selection)가 속성 선택 기법으로 사용된다. MBFS의 성과를 실증적으로 증명하기 위하여 소셜미디어인 Yelp의 리뷰 데이터를 활용하였다. 벤치마킹 속성 선택 기법으로는 상관관계기반 속성 선택, 정보획득 속성 선택, 획득비율 속성 선택을 사용하였다. 한편, 해당 속성선택방법을 토대로 4개의 머신러닝 알고리즘을 이용하여 분류성과를 비교하였다. 나아가 MBFS로 선택된 속성들 간 인과관계를 확인하고자 베이지안 네트워크를 통해 What-if 분석을 실시하였다. 본 연구에서 택한 머신러닝 분류기는 베이지안 네트워크 기반의 TAN (Tree Augmented Naive Bayes), NB (Naive Bayes), S-Spouses(Sons & Spouses), A-markov (Augmented Markov Blanket)이다. 성과분석 결과 본 연구에서 제안한 MBFS 방법이 정확도, 정밀도, F1점수 측면에서 벤치마킹 방법보다 더 우수한 성과를 나타내었다.

추체외로 증상에 따른 항정신병 약물 복용량과 음성 특성의 상관관계 분석 (Correlation analysis of antipsychotic dose and speech characteristics according to extrapyramidal symptoms)

  • 이수빈;김서영;김혜윤;김의태;유경상;이호영;이교구
    • 한국음향학회지
    • /
    • 제41권3호
    • /
    • pp.367-374
    • /
    • 2022
  • 본 논문은 항정신병 약물의 복용량에 따른 음성 특징의 상관관계 분석을 수행하였다. 항정신병 약물의 대표적 부작용 중 하나인 추체외로 증상(ExtraPyramidal Symptoms, EPS) 발생에 따른 음성 특징의 패턴을 알아보기 위하여, 문장 개발을 통해 한국어 기반 추체외로 증상 음성 코퍼스를 구축하였다. 수집된 자료는 추체외로 증상 군과 비 추체외로 증상 군으로 나누어 음성 특징 패턴을 조사하였으며, 특히 추체외로 증상 군의 높은 음성 특징 상관관계를 보였다. 또한, 발화 문장의 종류가 음성 특징 패턴에 영향을 미친다는 것을 확인할 수 있었으며, 이를 통해 음성 특징을 기반한 추체외로 증상의 조기 발견 가능성을 기대해볼 수 있었다.

갯벌의 수직적 환경 특성 (The vertical environmental characteristics in the tidal flat sediments)

  • 김종구;유선재
    • 한국환경과학회지
    • /
    • 제9권2호
    • /
    • pp.125-129
    • /
    • 2000
  • As one of the fundamental survey to evaluate purification capacity of pollutants at the tidal flat sediments, we studied vertical environmental characteristics in three tidal flat sediments, Chunjangdae, Eueunri and Gyewhado. These are dissmilar to external feature in each other. The results of this study may be summarized as followed; As the results of particle analysis, Eueunri tidal flat fediment located in Keum river estuary consists of 98.98% as silt & clay, Chunjangdae tidal flat sediment located in SeocheonGun consists of 97.99% as sand. And Gyewhado tidal flat sediment located in Saemankeum in Saemankeum area consists area consists of 32.81% as silt & clay and 67.19% as sand. The concentration of organic pollutants(I.L., COD, POC, PON) in Eueunri tidal flat sediment which highly content of silt & clay were 3~4 times higher than others. The concentration of organic pollutants at each layer were slightly increase goes with deepen layer. The linear correlation between I.L. and COD, POC, PON were obtained. Correlation coefficients were in range of 0.821~0.940. Also the correlation between pH and COD, POC, PON were high(>r=0.9). Filteration rate in Chunjangdae tidal flat sediment was 0.01584cm/s as mean value, but the other were almost nothing filtered off.

  • PDF

컴퓨터 보조진단을 위한 초음파 영상에서 갑상선 결절의 텍스쳐 분석 (Texture analysis of Thyroid Nodules in Ultrasound Image for Computer Aided Diagnostic system)

  • 박병은;장원석;유선국
    • 한국멀티미디어학회논문지
    • /
    • 제20권1호
    • /
    • pp.43-50
    • /
    • 2017
  • According to living environment, the number of deaths due to thyroid diseases increased. In this paper, we proposed an algorithm for recognizing a thyroid detection using texture analysis based on shape, gray level co-occurrence matrix and gray level run length matrix. First of all, we segmented the region of interest (ROI) using active contour model algorithm. Then, we applied a total of 18 features (5 first order descriptors, 10 Gray level co-occurrence matrix features(GLCM), 2 Gray level run length matrix features and shape feature) to each thyroid region of interest. The extracted features are used as statistical analysis. Our results show that first order statistics (Skewness, Entropy, Energy, Smoothness), GLCM (Correlation, Contrast, Energy, Entropy, Difference variance, Difference Entropy, Homogeneity, Maximum Probability, Sum average, Sum entropy), GLRLM features and shape feature helped to distinguish thyroid benign and malignant. This algorithm will be helpful to diagnose of thyroid nodule on ultrasound images.

Assisted Magnetic Resonance Imaging Diagnosis for Alzheimer's Disease Based on Kernel Principal Component Analysis and Supervised Classification Schemes

  • Wang, Yu;Zhou, Wen;Yu, Chongchong;Su, Weijun
    • Journal of Information Processing Systems
    • /
    • 제17권1호
    • /
    • pp.178-190
    • /
    • 2021
  • Alzheimer's disease (AD) is an insidious and degenerative neurological disease. It is a new topic for AD patients to use magnetic resonance imaging (MRI) and computer technology and is gradually explored at present. Preprocessing and correlation analysis on MRI data are firstly made in this paper. Then kernel principal component analysis (KPCA) is used to extract features of brain gray matter images. Finally supervised classification schemes such as AdaBoost algorithm and support vector machine algorithm are used to classify the above features. Experimental results by means of AD program Alzheimer's Disease Neuroimaging Initiative (ADNI) database which contains brain structural MRI (sMRI) of 116 AD patients, 116 patients with mild cognitive impairment, and 117 normal controls show that the proposed method can effectively assist the diagnosis and analysis of AD. Compared with principal component analysis (PCA) method, all classification results on KPCA are improved by 2%-6% among which the best result can reach 84%. It indicates that KPCA algorithm for feature extraction is more abundant and complete than PCA.

Automated Detection of Retinal Nerve Fiber Layer by Texture-Based Analysis for Glaucoma Evaluation

  • Septiarini, Anindita;Harjoko, Agus;Pulungan, Reza;Ekantini, Retno
    • Healthcare Informatics Research
    • /
    • 제24권4호
    • /
    • pp.335-345
    • /
    • 2018
  • Objectives: The retinal nerve fiber layer (RNFL) is a site of glaucomatous optic neuropathy whose early changes need to be detected because glaucoma is one of the most common causes of blindness. This paper proposes an automated RNFL detection method based on the texture feature by forming a co-occurrence matrix and a backpropagation neural network as the classifier. Methods: We propose two texture features, namely, correlation and autocorrelation based on a co-occurrence matrix. Those features are selected by using a correlation feature selection method. Then the backpropagation neural network is applied as the classifier to implement RNFL detection in a retinal fundus image. Results: We used 40 retinal fundus images as testing data and 160 sub-images (80 showing a normal RNFL and 80 showing RNFL loss) as training data to evaluate the performance of our proposed method. Overall, this work achieved an accuracy of 94.52%. Conclusions: Our results demonstrated that the proposed method achieved a high accuracy, which indicates good performance.

FFT를 활용한 제조데이터 전처리 및 제품분류 (Manufacturing Data Preprocessing Method and Product Classification Method using FFT)

  • 김한솔;진교홍
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2021년도 추계학술대회
    • /
    • pp.82-84
    • /
    • 2021
  • 스마트 공장 구축사업을 통해 생산 설비로부터 전력, 진동, 압력, 온도 등의 센서 데이터가 수집되고 있으며 데이터 분석을 통해 예지보전, 불량예측, 이상탐지 등의 서비스 개발이 진행되고 있다. 일반적으로 제조데이터의 경우 정상과 비정상 데이터의 불균형이 극심하여 이상탐지 서비스가 선호되고 있다. 본 논문에서는 이상탐지 서비스 개발의 전단계로 제조데이터의 특징 데이터 추출을 위해 FFT 방법을 사용하였으며, 이를 통해 생산되는 제품을 분류해보고 그 결과를 확인하였다. 즉, 제품별 대표 패턴을 FFT 변환 후 상관계수를 계산하여 제품분류가 가능한지 확인하였다.

  • PDF

다중채널 뇌파를 이용한 감정상태 분류에 관한 연구 (A Study on the Emotion State Classification using Multi-channel EEG)

  • 강동기;김흥환;김동준;이병채;고한우
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2001년도 하계학술대회 논문집 D
    • /
    • pp.2815-2817
    • /
    • 2001
  • This study describes the emotion classification using two different feature extraction methods for four-channel EEG signals. One of the methods is linear prediction analysis based on AR model. Another method is cross-correlation coefficients on frequencies of ${\theta}$, ${\alpha}$, ${\beta}$ bands. Using the linear predictor coefficients and the cross-correlation coefficients of frequencies, the emotion classification test for four emotions, such as anger, sad, joy, and relaxation is performed with a neural network. Comparing the results of two methods, it seems that the linear predictor coefficients produce the better results than the cross-correlation coefficients of frequencies for-emotion classification.

  • PDF