• Title/Summary/Keyword: Validation Metrics

Search Result 61, Processing Time 0.019 seconds

Machine Learning Methods for Trust-based Selection of Web Services

  • Hasnain, Muhammad;Ghani, Imran;Pasha, Muhammad F.;Jeong, Seung R.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.1
    • /
    • pp.38-59
    • /
    • 2022
  • Web services instances can be classified into two categories, namely trusted and untrusted from users. A web service with high throughput (TP) and low response time (RT) instance values is a trusted web service. Web services are not trustworthy due to the mismatch in the guaranteed instance values and the actual values achieved by users. To perform web services selection from users' attained TP and RT values, we need to verify the correct prediction of trusted and untrusted instances from invoked web services. This accurate prediction of web services instances is used to perform the selection of web services. We propose to construct fuzzy rules to label web services instances correctly. This paper presents web services selection using a well-known machine learning algorithm, namely REPTree, for the correct prediction of trusted and untrusted instances. Performance comparison of REPTree with five machine learning models is conducted on web services datasets. We have performed experiments on web services datasets using a ten k-fold cross-validation method. To evaluate the performance of the REPTree classifier, we used accuracy metrics (Sensitivity and Specificity). Experimental results showed that web service (WS1) gained top selection score with the (47.0588%) trusted instances, and web service (WS2) was selected the least with (25.00%) trusted instances. Evaluation results of the proposed web services selection approach were found as (asymptotic sig. = 0.019), demonstrating the relationship between final selection and recommended trust score of web services.

Compositional Feature Selection and Its Effects on Bandgap Prediction by Machine Learning (기계학습을 이용한 밴드갭 예측과 소재의 조성기반 특성인자의 효과)

  • Chunghee Nam
    • Korean Journal of Materials Research
    • /
    • v.33 no.4
    • /
    • pp.164-174
    • /
    • 2023
  • The bandgap characteristics of semiconductor materials are an important factor when utilizing semiconductor materials for various applications. In this study, based on data provided by AFLOW (Automatic-FLOW for Materials Discovery), the bandgap of a semiconductor material was predicted using only the material's compositional features. The compositional features were generated using the python module of 'Pymatgen' and 'Matminer'. Pearson's correlation coefficients (PCC) between the compositional features were calculated and those with a correlation coefficient value larger than 0.95 were removed in order to avoid overfitting. The bandgap prediction performance was compared using the metrics of R2 score and root-mean-squared error. By predicting the bandgap with randomforest and xgboost as representatives of the ensemble algorithm, it was found that xgboost gave better results after cross-validation and hyper-parameter tuning. To investigate the effect of compositional feature selection on the bandgap prediction of the machine learning model, the prediction performance was studied according to the number of features based on feature importance methods. It was found that there were no significant changes in prediction performance beyond the appropriate feature. Furthermore, artificial neural networks were employed to compare the prediction performance by adjusting the number of features guided by the PCC values, resulting in the best R2 score of 0.811. By comparing and analyzing the bandgap distribution and prediction performance according to the material group containing specific elements (F, N, Yb, Eu, Zn, B, Si, Ge, Fe Al), various information for material design was obtained.

Metric based Performance Measurement of Software Development Methodologies from Traditional to DevOps Automation Culture

  • Poonam Narang;Pooja Mittal
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.6
    • /
    • pp.107-114
    • /
    • 2023
  • Successful implementations of DevOps practices significantly improvise software efficiency, collaboration and security. Most of the organizations are adopting DevOps for faster and quality software delivery. DevOps brings development and operation teams together to overcome all kind of communication gaps responsible for software failures. It relies on different sets of alternative tools to automate the tasks of continuous integration, testing, delivery, deployment and monitoring. Although DevOps is followed for being very reliable and responsible environment for quality software delivery yet it lacks many quantifiable aspects to prove it on the top of other traditional and agile development methods. This research evaluates quantitative performance of DevOps and traditional/ agile development methods based on software metrics. This research includes three sample projects or code repositories to quantify the results and for DevOps integrated selective tool chain; current research considers our earlier proposed and implemented DevOps hybrid model of integrated automation tools. For result discussion and validation, tabular and graphical comparisons have also been included to retrieve best performer model. This comparative and evaluative research will be of much advantage to our young researchers/ students to get well versed with automotive environment of DevOps, latest emerging buzzword of development industries.

Identification of Pb-Zn ore under the condition of low count rate detection of slim hole based on PGNAA technology

  • Haolong Huang;Pingkun Cai;Wenbao Jia;Yan Zhang
    • Nuclear Engineering and Technology
    • /
    • v.55 no.5
    • /
    • pp.1708-1717
    • /
    • 2023
  • The grade analysis of lead-zinc ore is the basis for the optimal development and utilization of deposits. In this study, a method combining Prompt Gamma Neutron Activation Analysis (PGNAA) technology and machine learning is proposed for lead-zinc mine borehole logging, which can identify lead-zinc ores of different grades and gangue in the formation, providing real-time grade information qualitatively and semi-quantitatively. Firstly, Monte Carlo simulation is used to obtain a gamma-ray spectrum data set for training and testing machine learning classification algorithms. These spectra are broadened, normalized and separated into inelastic scattering and capture spectra, and then used to fit different classifier models. When the comprehensive grade boundary of high- and low-grade ores is set to 5%, the evaluation metrics calculated by the 5-fold cross-validation show that the SVM (Support Vector Machine), KNN (K-Nearest Neighbor), GNB (Gaussian Naive Bayes) and RF (Random Forest) models can effectively distinguish lead-zinc ore from gangue. At the same time, the GNB model has achieved the optimal accuracy of 91.45% when identifying high- and low-grade ores, and the F1 score for both types of ores is greater than 0.9.

Accuracy Evaluation of Machine Learning Model for Concrete Aging Prediction due to Thermal Effect and Carbonation (콘크리트 탄산화 및 열효과에 의한 경년열화 예측을 위한 기계학습 모델의 정확성 검토)

  • Kim, Hyun-Su
    • Journal of Korean Association for Spatial Structures
    • /
    • v.23 no.4
    • /
    • pp.81-88
    • /
    • 2023
  • Numerous factors contribute to the deterioration of reinforced concrete structures. Elevated temperatures significantly alter the composition of the concrete ingredients, consequently diminishing the concrete's strength properties. With the escalation of global CO2 levels, the carbonation of concrete structures has emerged as a critical challenge, substantially affecting concrete durability research. Assessing and predicting concrete degradation due to thermal effects and carbonation are crucial yet intricate tasks. To address this, multiple prediction models for concrete carbonation and compressive strength under thermal impact have been developed. This study employs seven machine learning algorithms-specifically, multiple linear regression, decision trees, random forest, support vector machines, k-nearest neighbors, artificial neural networks, and extreme gradient boosting algorithms-to formulate predictive models for concrete carbonation and thermal impact. Two distinct datasets, derived from reported experimental studies, were utilized for training these predictive models. Performance evaluation relied on metrics like root mean square error, mean square error, mean absolute error, and coefficient of determination. The optimization of hyperparameters was achieved through k-fold cross-validation and grid search techniques. The analytical outcomes demonstrate that neural networks and extreme gradient boosting algorithms outshine the remaining five machine learning approaches, showcasing outstanding predictive performance for concrete carbonation and thermal effect modeling.

Development of Machine Learning Based Seismic Response Prediction Model for Shear Wall Structure considering Aging Deteriorations (경년열화를 고려한 전단벽 구조물의 기계학습 기반 지진응답 예측모델 개발)

  • Kim, Hyun-Su;Kim, Yukyung;Lee, So Yeon;Jang, Jun Su
    • Journal of Korean Association for Spatial Structures
    • /
    • v.24 no.2
    • /
    • pp.83-90
    • /
    • 2024
  • Machine learning is widely applied to various engineering fields. In structural engineering area, machine learning is generally used to predict structural responses of building structures. The aging deterioration of reinforced concrete structure affects its structural behavior. Therefore, the aging deterioration of R.C. structure should be consider to exactly predict seismic responses of the structure. In this study, the machine learning based seismic response prediction model was developed. To this end, four machine learning algorithms were employed and prediction performance of each algorithm was compared. A 3-story coupled shear wall structure was selected as an example structure for numerical simulation. Artificial ground motions were generated based on domestic site characteristics. Elastic modulus, damping ratio and density were changed to considering concrete degradation due to chloride penetration and carbonation, etc. Various intensity measures were used input parameters of the training database. Performance evaluation was performed using metrics like root mean square error, mean square error, mean absolute error, and coefficient of determination. The optimization of hyperparameters was achieved through k-fold cross-validation and grid search techniques. The analysis results show that neural networks and extreme gradient boosting algorithms present good prediction performance.

Backpack- and UAV-based Laser Scanning Application for Estimating Overstory and Understory Biomass of Forest Stands (임분 상하층의 바이오매스 조사를 위한 백팩형 라이다와 드론 라이다의 적용성 평가)

  • Heejae Lee;Seunguk Kim;Hyeyeong Choe
    • Journal of Korean Society of Forest Science
    • /
    • v.112 no.3
    • /
    • pp.363-373
    • /
    • 2023
  • Forest biomass surveys are regularly conducted to assess and manage forests as carbon sinks. LiDAR (Light Detection and Ranging), a remote sensing technology, has attracted considerable attention, as it allows for objective acquisition of forest structure information with minimal labor. In this study, we propose a method for estimating overstory and understory biomass in forest stands using backpack laser scanning (BPLS) and unmanned aerial vehicle laser scanning (UAV-LS), and assessed its accuracy. For overstory biomass, we analyzed the accuracy of BPLS and UAV-LS in estimating diameter at breast height (DBH) and tree height. For understory biomass, we developed a multiple regression model for estimating understory biomass using the best combination of vertical structure metrics extracted from the BPLS data. The results indicated that BPLS provided accurate estimations of DBH (R2 =0.92), but underestimated tree height (R2 =0.63, bias=-5.56 m), whereas UAV-LS showed strong performance in estimating tree height (R2 =0.91). For understory biomass, metrics representing the mean height of the points and the point density of the fourth layer were selected to develop the model. The cross-validation result of the understory biomass estimation model showed a coefficient of determination of 0.68. The study findings suggest that the proposed overstory and understory biomass survey methods using BPLS and UAV-LS can effectively replace traditional biomass survey methods.

An Experimental Study on Feature Ranking Schemes for Text Classification (텍스트 분류를 위한 자질 순위화 기법에 관한 연구)

  • Pan Jun Kim
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.1
    • /
    • pp.1-21
    • /
    • 2023
  • This study specifically reviewed the performance of the ranking schemes as an efficient feature selection method for text classification. Until now, feature ranking schemes are mostly based on document frequency, and relatively few cases have used the term frequency. Therefore, the performance of single ranking metrics using term frequency and document frequency individually was examined as a feature selection method for text classification, and then the performance of combination ranking schemes using both was reviewed. Specifically, a classification experiment was conducted in an environment using two data sets (Reuters-21578, 20NG) and five classifiers (SVM, NB, ROC, TRA, RNN), and to secure the reliability of the results, 5-Fold cross-validation and t-test were applied. As a result, as a single ranking scheme, the document frequency-based single ranking metric (chi) showed good performance overall. In addition, it was found that there was no significant difference between the highest-performance single ranking and the combination ranking schemes. Therefore, in an environment where sufficient learning documents can be secured in text classification, it is more efficient to use a single ranking metric (chi) based on document frequency as a feature selection method.

Improvement of a Context-aware Recommender System through User's Emotional State Prediction (사용자 감정 예측을 통한 상황인지 추천시스템의 개선)

  • Ahn, Hyunchul
    • Journal of Information Technology Applications and Management
    • /
    • v.21 no.4
    • /
    • pp.203-223
    • /
    • 2014
  • This study proposes a novel context-aware recommender system, which is designed to recommend the items according to the customer's responses to the previously recommended item. In specific, our proposed system predicts the user's emotional state from his or her responses (such as facial expressions and movements) to the previous recommended item, and then it recommends the items that are similar to the previous one when his or her emotional state is estimated as positive. If the customer's emotional state on the previously recommended item is regarded as negative, the system recommends the items that have characteristics opposite to the previous item. Our proposed system consists of two sub modules-(1) emotion prediction module, and (2) responsive recommendation module. Emotion prediction module contains the emotion prediction model that predicts a customer's arousal level-a physiological and psychological state of being awake or reactive to stimuli-using the customer's reaction data including facial expressions and body movements, which can be measured using Microsoft's Kinect Sensor. Responsive recommendation module generates a recommendation list by using the results from the first module-emotion prediction module. If a customer shows a high level of arousal on the previously recommended item, the module recommends the items that are most similar to the previous item. Otherwise, it recommends the items that are most dissimilar to the previous one. In order to validate the performance and usefulness of the proposed recommender system, we conducted empirical validation. In total, 30 undergraduate students participated in the experiment. We used 100 trailers of Korean movies that had been released from 2009 to 2012 as the items for recommendation. For the experiment, we manually constructed Korean movie trailer DB which contains the fields such as release date, genre, director, writer, and actors. In order to check if the recommendation using customers' responses outperforms the recommendation using their demographic information, we compared them. The performance of the recommendation was measured using two metrics-satisfaction and arousal levels. Experimental results showed that the recommendation using customers' responses (i.e. our proposed system) outperformed the recommendation using their demographic information with statistical significance.

Optimization of Multi-Atlas Segmentation with Joint Label Fusion Algorithm for Automatic Segmentation in Prostate MR Imaging

  • Choi, Yoon Ho;Kim, Jae-Hun;Kim, Chan Kyo
    • Investigative Magnetic Resonance Imaging
    • /
    • v.24 no.3
    • /
    • pp.123-131
    • /
    • 2020
  • Purpose: Joint label fusion (JLF) is a popular multi-atlas-based segmentation algorithm, which compensates for dependent errors that may exist between atlases. However, in order to get good segmentation results, it is very important to set the several free parameters of the algorithm to optimal values. In this study, we first investigate the feasibility of a JLF algorithm for prostate segmentation in MR images, and then suggest the optimal set of parameters for the automatic prostate segmentation by validating the results of each parameter combination. Materials and Methods: We acquired T2-weighted prostate MR images from 20 normal heathy volunteers and did a series of cross validations for every set of parameters of JLF. In each case, the atlases were rigidly registered for the target image. Then, we calculated their voting weights for label fusion from each combination of JLF's parameters (rpxy, rpz, rsxy, rsz, β). We evaluated the segmentation performances by five validation metrics of the Prostate MR Image Segmentation challenge. Results: As the number of voxels participating in the voting weight calculation and the number of referenced atlases is increased, the overall segmentation performance is gradually improved. The JLF algorithm showed the best results for dice similarity coefficient, 0.8495 ± 0.0392; relative volume difference, 15.2353 ± 17.2350; absolute relative volume difference, 18.8710 ± 13.1546; 95% Hausdorff distance, 7.2366 ± 1.8502; and average boundary distance, 2.2107 ± 0.4972; in parameters of rpxy = 10, rpz = 1, rsxy = 3, rsz = 1, and β = 3. Conclusion: The evaluated results showed the feasibility of the JLF algorithm for automatic segmentation of prostate MRI. This empirical analysis of segmentation results by label fusion allows for the appropriate setting of parameters.