• Title/Summary/Keyword: feature importance

Search Result 409, Processing Time 0.028 seconds

Performance Improvement of Image Retrieval System by Presenting Query based on Human Perception (인간의 인지도에 근거한 질의를 통한 영상 검색의 성능 향상)

  • 유헌우;장동식;오근태
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.2
    • /
    • pp.158-165
    • /
    • 2003
  • Image similarity is often decided by computing the distance between two feature vectors. Unfortunately, the feature vector cannot always reflect the notion of similarity in human perception. Therefore, most current image retrieval systems use weights measuring the importance of each feature. In this paper new initial weight selection and update rules are proposed for image retrieval purpose. In order to obtain the purpose, database images are first divided into groups based on human perception and, inner and outer query are performed, and, then, optimal feature weights for each database images are computed through searching the group where the result images among retrieved images are belong. Experimental results on 2000 images show the performance of proposed algorithm.

Compositional Feature Selection and Its Effects on Bandgap Prediction by Machine Learning (기계학습을 이용한 밴드갭 예측과 소재의 조성기반 특성인자의 효과)

  • Chunghee Nam
    • Korean Journal of Materials Research
    • /
    • v.33 no.4
    • /
    • pp.164-174
    • /
    • 2023
  • The bandgap characteristics of semiconductor materials are an important factor when utilizing semiconductor materials for various applications. In this study, based on data provided by AFLOW (Automatic-FLOW for Materials Discovery), the bandgap of a semiconductor material was predicted using only the material's compositional features. The compositional features were generated using the python module of 'Pymatgen' and 'Matminer'. Pearson's correlation coefficients (PCC) between the compositional features were calculated and those with a correlation coefficient value larger than 0.95 were removed in order to avoid overfitting. The bandgap prediction performance was compared using the metrics of R2 score and root-mean-squared error. By predicting the bandgap with randomforest and xgboost as representatives of the ensemble algorithm, it was found that xgboost gave better results after cross-validation and hyper-parameter tuning. To investigate the effect of compositional feature selection on the bandgap prediction of the machine learning model, the prediction performance was studied according to the number of features based on feature importance methods. It was found that there were no significant changes in prediction performance beyond the appropriate feature. Furthermore, artificial neural networks were employed to compare the prediction performance by adjusting the number of features guided by the PCC values, resulting in the best R2 score of 0.811. By comparing and analyzing the bandgap distribution and prediction performance according to the material group containing specific elements (F, N, Yb, Eu, Zn, B, Si, Ge, Fe Al), various information for material design was obtained.

Fuzzy One Class Support Vector Machine (퍼지 원 클래스 서포트 벡터 머신)

  • Kim, Ki-Joo;Choi, Young-Sik
    • Journal of Internet Computing and Services
    • /
    • v.6 no.3
    • /
    • pp.159-170
    • /
    • 2005
  • OC-SVM(One Class Support Vector Machine) avoids solving a full density estimation problem, and instead focuses on a simpler task, estimating quantiles of a data distribution, i.e. its support. OC-SVM seeks to estimate regions where most of data resides and represents the regions as a function of the support vectors, Although OC-SVM is powerful method for data description, it is difficult to incorporate human subjective importance into its estimation process, In order to integrate the importance of each point into the OC-SVM process, we propose a fuzzy version of OC-SVM. In FOC-SVM (Fuzzy One-Class Support Vector Machine), we do not equally treat data points and instead weight data points according to the importance measure of the corresponding objects. That is, we scale the kernel feature vector according to the importance measure of the object so that a kernel feature vector of a less important object should contribute less to the detection process of OC-SVM. We demonstrate the performance of our algorithm on several synthesized data sets, Experimental results showed the promising results.

  • PDF

Identification of Topological Entities and Naming Mapping for Parametric CAD Model Exchanges

  • Mun, Duh-Wan;Han, Soon-Hung
    • International Journal of CAD/CAM
    • /
    • v.5 no.1
    • /
    • pp.69-81
    • /
    • 2005
  • As collaborative design and configuration design gain increasing importance in product development, it becomes essential to exchange parametric CAD models among participants. Parametric CAD models can be represented and exchanged in the form of a macro file or a part file that contains the modeling history of a product. The modeling history of a parametric CAD model contains feature specifications and each feature has selection information that records the name of the referenced topological entities. Translating this selection information requires solving the problems of how to identify the referenced topological entities of a feature (persistent naming problem) and how to convert the selection information into the format of the receiving CAD system (naming mapping problem). The present paper introduces the problem of exchanging parametric CAD models and proposes a solution to naming mapping.

Improved image alignment algorithm based on projective invariant for aerial video stabilization

  • Yi, Meng;Guo, Bao-Long;Yan, Chun-Man
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.9
    • /
    • pp.3177-3195
    • /
    • 2014
  • In many moving object detection problems of an aerial video, accurate and robust stabilization is of critical importance. In this paper, a novel accurate image alignment algorithm for aerial electronic image stabilization (EIS) is described. The feature points are first selected using optimal derivative filters based Harris detector, which can improve differentiation accuracy and obtain the precise coordinates of feature points. Then we choose the Delaunay Triangulation edges to find the matching pairs between feature points in overlapping images. The most "useful" matching points that belong to the background are used to find the global transformation parameters using the projective invariant. Finally, intentional motion of the camera is accumulated for correction by Sage-Husa adaptive filtering. Experiment results illustrate that the proposed algorithm is applied to the aerial captured video sequences with various dynamic scenes for performance demonstrations.

Explainable Machine Learning Based a Packed Red Blood Cell Transfusion Prediction and Evaluation for Major Internal Medical Condition

  • Lee, Seongbin;Lee, Seunghee;Chang, Duhyeuk;Song, Mi-Hwa;Kim, Jong-Yeup;Lee, Suehyun
    • Journal of Information Processing Systems
    • /
    • v.18 no.3
    • /
    • pp.302-310
    • /
    • 2022
  • Efficient use of limited blood products is becoming very important in terms of socioeconomic status and patient recovery. To predict the appropriateness of patient-specific transfusions for the intensive care unit (ICU) patients who require real-time monitoring, we evaluated a model to predict the possibility of transfusion dynamically by using the Medical Information Mart for Intensive Care III (MIMIC-III), an ICU admission record at Harvard Medical School. In this study, we developed an explainable machine learning to predict the possibility of red blood cell transfusion for major medical diseases in the ICU. Target disease groups that received packed red blood cell transfusions at high frequency were selected and 16,222 patients were finally extracted. The prediction model achieved an area under the ROC curve of 0.9070 and an F1-score of 0.8166 (LightGBM). To explain the performance of the machine learning model, feature importance analysis and a partial dependence plot were used. The results of our study can be used as basic data for recommendations related to the adequacy of blood transfusions and are expected to ultimately contribute to the recovery of patients and prevention of excessive consumption of blood products.

Truck Weight Estimation using Operational Statistics at 3rd Party Logistics Environment (운영 데이터를 활용한 제3자 물류 환경에서의 배송 트럭 무게 예측)

  • Yu-jin Lee;Kyung Min Choi;Song-eun Kim;Kyungsu Park;Seung Hwan Jung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.4
    • /
    • pp.127-133
    • /
    • 2022
  • Many manufacturers applying third party logistics (3PLs) have some challenges to increase their logistics efficiency. This study introduces an effort to estimate the weight of the delivery trucks provided by 3PL providers, which allows the manufacturer to package and load products in trailers in advance to reduce delivery time. The accuracy of the weigh estimation is more important due to the total weight regulation. This study uses not only the data from the company but also many general prediction variables such as weather, oil prices and population of destinations. In addition, operational statistics variables are developed to indicate the availabilities of the trucks in a specific weight category for each 3PL provider. The prediction model using XGBoost regressor and permutation feature importance method provides highly acceptable performance with MAPE of 2.785% and shows the effectiveness of the developed operational statistics variables.

Study on predictive model and mechanism analysis for martensite transformation temperatures through explainable artificial intelligence (설명가능한 인공지능을 통한 마르텐사이트 변태 온도 예측 모델 및 거동 분석 연구)

  • Junhyub Jeon;Seung Bae Son;Jae-Gil Jung;Seok-Jae Lee
    • Journal of the Korean Society for Heat Treatment
    • /
    • v.37 no.3
    • /
    • pp.103-113
    • /
    • 2024
  • Martensite volume fraction significantly affects the mechanical properties of alloy steels. Martensite start temperature (Ms), transformation temperature for martensite 50 vol.% (M50), and transformation temperature for martensite 90 vol.% (M90) are important transformation temperatures to control the martensite phase fraction. Several researchers proposed empirical equations and machine learning models to predict the Ms temperature. These numerical approaches can easily predict the Ms temperature without additional experiment and cost. However, to control martensite phase fraction more precisely, we need to reduce prediction error of the Ms model and propose prediction models for other martensite transformation temperatures (M50, M90). In the present study, machine learning model was applied to suggest the predictive model for the Ms, M50, M90 temperatures. To explain prediction mechanisms and suggest feature importance on martensite transformation temperature of machine learning models, the explainable artificial intelligence (XAI) is employed. Random forest regression (RFR) showed the best performance for predicting the Ms, M50, M90 temperatures using different machine learning models. The feature importance was proposed and the prediction mechanisms were discussed by XAI.

Application of Random Forests to Assessment of Importance of Variables in Multi-sensor Data Fusion for Land-cover Classification

  • Park No-Wook;Chi kwang-Hoon
    • Korean Journal of Remote Sensing
    • /
    • v.22 no.3
    • /
    • pp.211-219
    • /
    • 2006
  • A random forests classifier is applied to multi-sensor data fusion for supervised land-cover classification in order to account for the importance of variable. The random forests approach is a non-parametric ensemble classifier based on CART-like trees. The distinguished feature is that the importance of variable can be estimated by randomly permuting the variable of interest in all the out-of-bag samples for each classifier. Two different multi-sensor data sets for supervised classification were used to illustrate the applicability of random forests: one with optical and polarimetric SAR data and the other with multi-temporal Radarsat-l and ENVISAT ASAR data sets. From the experimental results, the random forests approach could extract important variables or bands for land-cover discrimination and showed reasonably good performance in terms of classification accuracy.