• 제목/요약/키워드: feature importance

검색결과 409건 처리시간 0.037초

인간의 인지도에 근거한 질의를 통한 영상 검색의 성능 향상 (Performance Improvement of Image Retrieval System by Presenting Query based on Human Perception)

  • 유헌우;장동식;오근태
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제9권2호
    • /
    • pp.158-165
    • /
    • 2003
  • 영상간의 유사도는 일반적으로 영상으로부터 추출한 특징벡터간의 벡터공간상의 거리를 계산해서 판단한다. 그러나 이러한 특징벡터가 유사도 계산을 위한 하나의 방법이지만 항상 인간의 유사도 개념을 충실히 반영하지는 않는다. 그러므로 현존하는 대부분의 영상검색시스템들은 각 특징간의 중요도를 선정하여 유사도에 반영하는 방법을 사용하고 있다. 본 논문에서는 영상검색을 위한 새로운 초기 가중치 설정과 갱신 알고리즘을 제안한다. 이를 위해서 먼저 데이터 베이스 영상을 인간의 인지도 판단에 의해 그룹화 한 후, 내부질의와 외부질의를 수행하고, 검색된 영상중 유사한 영상이 어느 그룹에 속하는지 알아내어 각 영상별로 유사도 계산에 필요한 최적 특징 가중치를 계산한다. 2000개의 영상 데이타에 대한 실험을 통해서 제안된 알고리즘의 우수성을 보인다.

기계학습을 이용한 밴드갭 예측과 소재의 조성기반 특성인자의 효과 (Compositional Feature Selection and Its Effects on Bandgap Prediction by Machine Learning)

  • 남충희
    • 한국재료학회지
    • /
    • 제33권4호
    • /
    • pp.164-174
    • /
    • 2023
  • The bandgap characteristics of semiconductor materials are an important factor when utilizing semiconductor materials for various applications. In this study, based on data provided by AFLOW (Automatic-FLOW for Materials Discovery), the bandgap of a semiconductor material was predicted using only the material's compositional features. The compositional features were generated using the python module of 'Pymatgen' and 'Matminer'. Pearson's correlation coefficients (PCC) between the compositional features were calculated and those with a correlation coefficient value larger than 0.95 were removed in order to avoid overfitting. The bandgap prediction performance was compared using the metrics of R2 score and root-mean-squared error. By predicting the bandgap with randomforest and xgboost as representatives of the ensemble algorithm, it was found that xgboost gave better results after cross-validation and hyper-parameter tuning. To investigate the effect of compositional feature selection on the bandgap prediction of the machine learning model, the prediction performance was studied according to the number of features based on feature importance methods. It was found that there were no significant changes in prediction performance beyond the appropriate feature. Furthermore, artificial neural networks were employed to compare the prediction performance by adjusting the number of features guided by the PCC values, resulting in the best R2 score of 0.811. By comparing and analyzing the bandgap distribution and prediction performance according to the material group containing specific elements (F, N, Yb, Eu, Zn, B, Si, Ge, Fe Al), various information for material design was obtained.

퍼지 원 클래스 서포트 벡터 머신 (Fuzzy One Class Support Vector Machine)

  • 김기주;최영식
    • 인터넷정보학회논문지
    • /
    • 제6권3호
    • /
    • pp.159-170
    • /
    • 2005
  • OC-SVM(One Class Support Vector Machine)은 주어진 전체 데이터의 분포를 측정하는 대신에. 데이터 분포의 서포트(support)를 측정하는 기술로서 주어진 데이터를 가장 잘 설명할 수 있는 최적의 서포트 벡터(support vector)를 구하는 기술이다. OC-SVM은 데이터 분포의 표현에 아주 뛰어난 접근 방법이지만, 사람의 주관적인 중요도를 반영하는 것은 힘들다. 본 논문에서는 각 데이터에 퍼지 맴버쉽(fuzzy membership)을 적용하여 기존의 OC-SVM에 사용자의 주관적인 중요도를 표현할 수 있는 FOC-SVM(Fuzzy One class Support Vector Machine)을 유도 하였다. FOC-SVM은 데이터들을 동등하게 다루는 것이 아니라, 데이터 객체의 중요도에 따라 데이터를 다룬다. 즉, 덜 중요한 데이터의 특징 벡터는 OC-SVM의 처리과정에 덜 기여하도록 하기 위하여, 객체의 중요도에 따라 특징 벡터의 크기를 조정하였다. 이를 증명하기 위하여 가상의 데이터를 가지고 실험을 하였고, 실험 결과는 예측된 결과를 보여 주었다.

  • PDF

Identification of Topological Entities and Naming Mapping for Parametric CAD Model Exchanges

  • Mun, Duh-Wan;Han, Soon-Hung
    • International Journal of CAD/CAM
    • /
    • 제5권1호
    • /
    • pp.69-81
    • /
    • 2005
  • As collaborative design and configuration design gain increasing importance in product development, it becomes essential to exchange parametric CAD models among participants. Parametric CAD models can be represented and exchanged in the form of a macro file or a part file that contains the modeling history of a product. The modeling history of a parametric CAD model contains feature specifications and each feature has selection information that records the name of the referenced topological entities. Translating this selection information requires solving the problems of how to identify the referenced topological entities of a feature (persistent naming problem) and how to convert the selection information into the format of the receiving CAD system (naming mapping problem). The present paper introduces the problem of exchanging parametric CAD models and proposes a solution to naming mapping.

Improved image alignment algorithm based on projective invariant for aerial video stabilization

  • Yi, Meng;Guo, Bao-Long;Yan, Chun-Man
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제8권9호
    • /
    • pp.3177-3195
    • /
    • 2014
  • In many moving object detection problems of an aerial video, accurate and robust stabilization is of critical importance. In this paper, a novel accurate image alignment algorithm for aerial electronic image stabilization (EIS) is described. The feature points are first selected using optimal derivative filters based Harris detector, which can improve differentiation accuracy and obtain the precise coordinates of feature points. Then we choose the Delaunay Triangulation edges to find the matching pairs between feature points in overlapping images. The most "useful" matching points that belong to the background are used to find the global transformation parameters using the projective invariant. Finally, intentional motion of the camera is accumulated for correction by Sage-Husa adaptive filtering. Experiment results illustrate that the proposed algorithm is applied to the aerial captured video sequences with various dynamic scenes for performance demonstrations.

Explainable Machine Learning Based a Packed Red Blood Cell Transfusion Prediction and Evaluation for Major Internal Medical Condition

  • Lee, Seongbin;Lee, Seunghee;Chang, Duhyeuk;Song, Mi-Hwa;Kim, Jong-Yeup;Lee, Suehyun
    • Journal of Information Processing Systems
    • /
    • 제18권3호
    • /
    • pp.302-310
    • /
    • 2022
  • Efficient use of limited blood products is becoming very important in terms of socioeconomic status and patient recovery. To predict the appropriateness of patient-specific transfusions for the intensive care unit (ICU) patients who require real-time monitoring, we evaluated a model to predict the possibility of transfusion dynamically by using the Medical Information Mart for Intensive Care III (MIMIC-III), an ICU admission record at Harvard Medical School. In this study, we developed an explainable machine learning to predict the possibility of red blood cell transfusion for major medical diseases in the ICU. Target disease groups that received packed red blood cell transfusions at high frequency were selected and 16,222 patients were finally extracted. The prediction model achieved an area under the ROC curve of 0.9070 and an F1-score of 0.8166 (LightGBM). To explain the performance of the machine learning model, feature importance analysis and a partial dependence plot were used. The results of our study can be used as basic data for recommendations related to the adequacy of blood transfusions and are expected to ultimately contribute to the recovery of patients and prevention of excessive consumption of blood products.

운영 데이터를 활용한 제3자 물류 환경에서의 배송 트럭 무게 예측 (Truck Weight Estimation using Operational Statistics at 3rd Party Logistics Environment)

  • 이유진;최경민;김송은;박경수;정승환
    • 산업경영시스템학회지
    • /
    • 제45권4호
    • /
    • pp.127-133
    • /
    • 2022
  • Many manufacturers applying third party logistics (3PLs) have some challenges to increase their logistics efficiency. This study introduces an effort to estimate the weight of the delivery trucks provided by 3PL providers, which allows the manufacturer to package and load products in trailers in advance to reduce delivery time. The accuracy of the weigh estimation is more important due to the total weight regulation. This study uses not only the data from the company but also many general prediction variables such as weather, oil prices and population of destinations. In addition, operational statistics variables are developed to indicate the availabilities of the trucks in a specific weight category for each 3PL provider. The prediction model using XGBoost regressor and permutation feature importance method provides highly acceptable performance with MAPE of 2.785% and shows the effectiveness of the developed operational statistics variables.

설명가능한 인공지능을 통한 마르텐사이트 변태 온도 예측 모델 및 거동 분석 연구 (Study on predictive model and mechanism analysis for martensite transformation temperatures through explainable artificial intelligence)

  • 전준협;손승배;정재길;이석재
    • 열처리공학회지
    • /
    • 제37권3호
    • /
    • pp.103-113
    • /
    • 2024
  • Martensite volume fraction significantly affects the mechanical properties of alloy steels. Martensite start temperature (Ms), transformation temperature for martensite 50 vol.% (M50), and transformation temperature for martensite 90 vol.% (M90) are important transformation temperatures to control the martensite phase fraction. Several researchers proposed empirical equations and machine learning models to predict the Ms temperature. These numerical approaches can easily predict the Ms temperature without additional experiment and cost. However, to control martensite phase fraction more precisely, we need to reduce prediction error of the Ms model and propose prediction models for other martensite transformation temperatures (M50, M90). In the present study, machine learning model was applied to suggest the predictive model for the Ms, M50, M90 temperatures. To explain prediction mechanisms and suggest feature importance on martensite transformation temperature of machine learning models, the explainable artificial intelligence (XAI) is employed. Random forest regression (RFR) showed the best performance for predicting the Ms, M50, M90 temperatures using different machine learning models. The feature importance was proposed and the prediction mechanisms were discussed by XAI.

Application of Random Forests to Assessment of Importance of Variables in Multi-sensor Data Fusion for Land-cover Classification

  • Park No-Wook;Chi kwang-Hoon
    • 대한원격탐사학회지
    • /
    • 제22권3호
    • /
    • pp.211-219
    • /
    • 2006
  • A random forests classifier is applied to multi-sensor data fusion for supervised land-cover classification in order to account for the importance of variable. The random forests approach is a non-parametric ensemble classifier based on CART-like trees. The distinguished feature is that the importance of variable can be estimated by randomly permuting the variable of interest in all the out-of-bag samples for each classifier. Two different multi-sensor data sets for supervised classification were used to illustrate the applicability of random forests: one with optical and polarimetric SAR data and the other with multi-temporal Radarsat-l and ENVISAT ASAR data sets. From the experimental results, the random forests approach could extract important variables or bands for land-cover discrimination and showed reasonably good performance in terms of classification accuracy.