• Title/Summary/Keyword: linear SVM

Search Result 172, Processing Time 0.025 seconds

Optimizing Clustering and Predictive Modelling for 3-D Road Network Analysis Using Explainable AI

  • Rotsnarani Sethy;Soumya Ranjan Mahanta;Mrutyunjaya Panda
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.9
    • /
    • pp.30-40
    • /
    • 2024
  • Building an accurate 3-D spatial road network model has become an active area of research now-a-days that profess to be a new paradigm in developing Smart roads and intelligent transportation system (ITS) which will help the public and private road impresario for better road mobility and eco-routing so that better road traffic, less carbon emission and road safety may be ensured. Dealing with such a large scale 3-D road network data poses challenges in getting accurate elevation information of a road network to better estimate the CO2 emission and accurate routing for the vehicles in Internet of Vehicle (IoV) scenario. Clustering and regression techniques are found suitable in discovering the missing elevation information in 3-D spatial road network dataset for some points in the road network which is envisaged of helping the public a better eco-routing experience. Further, recently Explainable Artificial Intelligence (xAI) draws attention of the researchers to better interprete, transparent and comprehensible, thus enabling to design efficient choice based models choices depending upon users requirements. The 3-D road network dataset, comprising of spatial attributes (longitude, latitude, altitude) of North Jutland, Denmark, collected from publicly available UCI repositories is preprocessed through feature engineering and scaling to ensure optimal accuracy for clustering and regression tasks. K-Means clustering and regression using Support Vector Machine (SVM) with radial basis function (RBF) kernel are employed for 3-D road network analysis. Silhouette scores and number of clusters are chosen for measuring cluster quality whereas error metric such as MAE ( Mean Absolute Error) and RMSE (Root Mean Square Error) are considered for evaluating the regression method. To have better interpretability of the Clustering and regression models, SHAP (Shapley Additive Explanations), a powerful xAI technique is employed in this research. From extensive experiments , it is observed that SHAP analysis validated the importance of latitude and altitude in predicting longitude, particularly in the four-cluster setup, providing critical insights into model behavior and feature contributions SHAP analysis validated the importance of latitude and altitude in predicting longitude, particularly in the four-cluster setup, providing critical insights into model behavior and feature contributions with an accuracy of 97.22% and strong performance metrics across all classes having MAE of 0.0346, and MSE of 0.0018. On the other hand, the ten-cluster setup, while faster in SHAP analysis, presented challenges in interpretability due to increased clustering complexity. Hence, K-Means clustering with K=4 and SVM hybrid models demonstrated superior performance and interpretability, highlighting the importance of careful cluster selection to balance model complexity and predictive accuracy.

Development of Knee Pain Diagnosis Questionnaire and Clinical Study of Diagnostic Correspondent Rate (슬통 진단용 설문지개발 및 진단 일치도 평가연구)

  • Hwang, Ji-Hoo;Kim, Yu-Jong;Kim, Eun-Jung;Lee, Cham-Kyul;Lee, Eun-Yong;Lee, Seung-Deok;Kim, Kap-Sung
    • Journal of Acupuncture Research
    • /
    • v.29 no.5
    • /
    • pp.61-74
    • /
    • 2012
  • Objectives : This study is perfomed for preparation of oriental medicine clinical guidelines for drawing up the standards of oriental medicine demonstration and diagnosis classification about the knee pain. Methods : Statistical analysis about Crane's-knee wind(鶴膝風), arthralgia syndrome(痺症), knee injury(膝傷), gout arthritis(痛風), Youk jeol poung(歷節風) classified experts' opinions about knee pain patients by Delphi method is conducted by using oriental medicine diagnosis questionnaire. The result was classified by using linear discriminant analysis(LDA), diagonal linear discriminant analysis(DLDA), diagonal quadratic discriminant analysis(DQDA), K-nearest neighbor classification(KNN), classification and regression trees(CART), support vector machines(SVM). Results : The results are summarized as follows. 1. The result analyzed by using LDA has a hit rate of 81.65% in comparison with the original diagnosis. 2. The result analyzed by using DLDA has a hit rate of 63.3% in comparison with the original diagnosis. 3. The result analyzed by using DQDA has a hit rate of 65.14% in comparison with the original diagnosis. 4. The result analyzed by using KNN has a hit rate of 74.31% in comparison with the original diagnosis. 5. The result analyzed by using CART has a hit rate of 75.23% in comparison with the original diagnosis when the test of selected 13 significant questions based on analysis of variance was performed. 6. The result analyzed by using SVM has a hit rate of 87.16% in comparison with the original diagnosis. Conclusions : Statistical analysis using oriental medicine diagnosis questionnaire on knee pain generally turned out to have a significant result.

Using Mechanical Learning Analysis of Determinants of Housing Sales and Establishment of Forecasting Model (기계학습을 활용한 주택매도 결정요인 분석 및 예측모델 구축)

  • Kim, Eun-mi;Kim, Sang-Bong;Cho, Eun-seo
    • Journal of Cadastre & Land InformatiX
    • /
    • v.50 no.1
    • /
    • pp.181-200
    • /
    • 2020
  • This study used the OLS model to estimate the determinants affecting the tenure of a home and then compared the predictive power of each model with SVM, Decision Tree, Random Forest, Gradient Boosting, XGBooest and LightGBM. There is a difference from the preceding study in that the Stacking model, one of the ensemble models, can be used as a base model to establish a more predictable model to identify the volume of housing transactions in the housing market. OLS analysis showed that sales profits, housing prices, the number of household members, and the type of residential housing (detached housing, apartments) affected the period of housing ownership, and compared the predictability of the machine learning model with RMSE, the results showed that the machine learning model had higher predictability. Afterwards, the predictive power was compared by applying each machine learning after rebuilding the data with the influencing variables, and the analysis showed the best predictive power of Random Forest. In addition, the most predictable Random Forest, Decision Tree, Gradient Boosting, and XGBooost models were applied as individual models, and the Stacking model was constructed using Linear, Ridge, and Lasso models as meta models. As a result of the analysis, the RMSE value in the Ridge model was the lowest at 0.5181, thus building the highest predictive model.

Using Keystroke Dynamics for Implicit Authentication on Smartphone

  • Do, Son;Hoang, Thang;Luong, Chuyen;Choi, Seungchan;Lee, Dokyeong;Bang, Kihyun;Choi, Deokjai
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.8
    • /
    • pp.968-976
    • /
    • 2014
  • Authentication methods on smartphone are demanded to be implicit to users with minimum users' interaction. Existing authentication methods (e.g. PINs, passwords, visual patterns, etc.) are not effectively considering remembrance and privacy issues. Behavioral biometrics such as keystroke dynamics and gait biometrics can be acquired easily and implicitly by using integrated sensors on smartphone. We propose a biometric model involving keystroke dynamics for implicit authentication on smartphone. We first design a feature extraction method for keystroke dynamics. And then, we build a fusion model of keystroke dynamics and gait to improve the authentication performance of single behavioral biometric on smartphone. We operate the fusion at both feature extraction level and matching score level. Experiment using linear Support Vector Machines (SVM) classifier reveals that the best results are achieved with score fusion: a recognition rate approximately 97.86% under identification mode and an error rate approximately 1.11% under authentication mode.

Design of Black Plastics Classifier Using Data Information (데이터 정보를 이용한 흑색 플라스틱 분류기 설계)

  • Park, Sang-Beom;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.4
    • /
    • pp.569-577
    • /
    • 2018
  • In this paper, with the aid of information which is included within data, preprocessing algorithm-based black plastic classifier is designed. The slope and area of spectrum obtained by using laser induced breakdown spectroscopy(LIBS) are analyzed for each material and its ensuing information is applied as the input data of the proposed classifier. The slope is represented by the rate of change of wavelength and intensity. Also, the area is calculated by the wavelength of the spectrum peak where the material property of chemical elements such as carbon and hydrogen appears. Using informations such as slope and area, input data of the proposed classifier is constructed. In the preprocessing part of the classifier, Principal Component Analysis(PCA) and fuzzy transform are used for dimensional reduction from high dimensional input variables to low dimensional input variables. Characteristic analysis of the materials as well as the processing speed of the classifier is improved. In the condition part, FCM clustering is applied and linear function is used as connection weight in the conclusion part. By means of Particle Swarm Optimization(PSO), parameters such as the number of clusters, fuzzification coefficient and the number of input variables are optimized. To demonstrate the superiority of classification performance, classification rate is compared by using WEKA 3.8 data mining software which contains various classifiers such as Naivebayes, SVM and Multilayer perceptron.

Imputation of Missing Data Based on Hot Deck Method Using K-nn (K-nn을 이용한 Hot Deck 기반의 결측치 대체)

  • Kwon, Soonchang
    • Journal of Information Technology Services
    • /
    • v.13 no.4
    • /
    • pp.359-375
    • /
    • 2014
  • Researchers cannot avoid missing data in collecting data, because some respondents arbitrarily or non-arbitrarily do not answer questions in studies and experiments. Missing data not only increase and distort standard deviations, but also impair the convenience of estimating parameters and the reliability of research results. Despite widespread use of hot deck, researchers have not been interested in it, since it handles missing data in ambiguous ways. Hot deck can be complemented using K-nn, a method of machine learning, which can organize donor groups closest to properties of missing data. Interested in the role of k-nn, this study was conducted to impute missing data based on the hot deck method using k-nn. After setting up imputation of missing data based on hot deck using k-nn as a study objective, deletion of listwise, mean, mode, linear regression, and svm imputation were compared and verified regarding nominal and ratio data types and then, data closest to original values were obtained reasonably. Simulations using different neighboring numbers and the distance measuring method were carried out and better performance of k-nn was accomplished. In this study, imputation of hot deck was re-discovered which has failed to attract the attention of researchers. As a result, this study shall be able to help select non-parametric methods which are less likely to be affected by the structure of missing data and its causes.

New Normalization Methods using Support Vector Machine Regression Approach in cDNA Microarray Analysis

  • Sohn, In-Suk;Kim, Su-Jong;Hwang, Chang-Ha;Lee, Jae-Won
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.51-56
    • /
    • 2005
  • There are many sources of systematic variations in cDNA microarray experiments which affect the measured gene expression levels like differences in labeling efficiency between the two fluorescent dyes. Print-tip lowess normalization is used in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. However, print-tip lowess normalization performs poorly in situation where error variability for each gene is heterogeneous over intensity ranges. We proposed the new print-tip normalization methods based on support vector machine regression(SVMR) and support vector machine quantile regression(SVMQR). SVMQR was derived by employing the basic principle of support vector machine (SVM) for the estimation of the linear and nonlinear quantile regressions. We applied our proposed methods to previous cDNA micro array data of apolipoprotein-AI-knockout (apoAI-KO) mice, diet-induced obese mice, and genistein-fed obese mice. From our statistical analysis, we found that the proposed methods perform better than the existing print-tip lowess normalization method.

  • PDF

Machine Learning Based Automatic Categorization Model for Text Lines in Invoice Documents

  • Shin, Hyun-Kyung
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.12
    • /
    • pp.1786-1797
    • /
    • 2010
  • Automatic understanding of contents in document image is a very hard problem due to involvement with mathematically challenging problems originated mainly from the over-determined system induced by document segmentation process. In both academic and industrial areas, there have been incessant and various efforts to improve core parts of content retrieval technologies by the means of separating out segmentation related issues using semi-structured document, e.g., invoice,. In this paper we proposed classification models for text lines on invoice document in which text lines were clustered into the five categories in accordance with their contents: purchase order header, invoice header, summary header, surcharge header, purchase items. Our investigation was concentrated on the performance of machine learning based models in aspect of linear-discriminant-analysis (LDA) and non-LDA (logic based). In the group of LDA, na$\"{\i}$ve baysian, k-nearest neighbor, and SVM were used, in the group of non LDA, decision tree, random forest, and boost were used. We described the details of feature vector construction and the selection processes of the model and the parameter including training and validation. We also presented the experimental results of comparison on training/classification error levels for the models employed.

Classification of Sleep/Wakefulness using Nasal Pressure for Patients with Sleep-disordered Breathing (비강압력신호를 이용한 수면호흡장애 환자의 수면/각성 분류)

  • Park, Jong-Uk;Jeoung, Pil-Soo;Kang, Kyu-Min;Lee, Kyoung-Joung
    • Journal of Biomedical Engineering Research
    • /
    • v.37 no.4
    • /
    • pp.127-133
    • /
    • 2016
  • This study proposes the feasibility for automatic classification of sleep/wakefulness using nasal pressure in patients with sleep-disordered breathing (SDB). First, SDB events were detected using the methods developed in our previous studies. In epochs for normal breathing, we extracted the features for classifying sleep/wakefulness based on time-domain, frequency-domain and non-linear analysis. And then, we conducted the independent two-sample t-test and calculated Mahalanobis distance (MD) between the two categories. As a results, $SD_{LEN}$ (MD = 0.84, p < 0.01), $P_{HF}$ (MD = 0.81, p < 0.01), $SD_{AMP}$ (MD = 0.76, p = 0.031) and $MEAN_{AMP}$ (MD = 0.75, p = 0.027) were selected as optimal feature. We classified sleep/wakefulness based on support vector machine (SVM). The classification results showed mean of sensitivity (Sen.), specificity (Spc.) and accuracy (Acc.) of 60.5%, 89.0% and 84.8% respectively. This method showed the possibilities to automatically classify sleep/wakefulness only using nasal pressure.

Unified Detection and Tracking of Humans Using Gaussian Particle Swarm Optimization (가우시안 입자 군집 최적화를 이용한 사람의 통합된 검출 및 추적)

  • An, Sung-Tae;Kim, Jeong-Jung;Lee, Ju-Jang
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.18 no.4
    • /
    • pp.353-358
    • /
    • 2012
  • Human detection is a challenging task in many fields because it is difficult to detect humans due to their variable appearance and posture. Furthermore, it is also hard to track the detected human because of their dynamic and unpredictable behavior. The evaluation speed of method is also important as well as its accuracy. In this paper, we propose unified detection and tracking method for humans using Gaussian-PSO (Gaussian Particle Swarm Optimization) with the HOG (Histograms of Oriented Gradients) features to achieve a fast and accurate performance. Keeping the robustness of HOG features on human detection, we raise the process speed in detection and tracking so that it can be used for real-time applications. These advantages are given by a simple process which needs just one linear-SVM classifier with HOG features and Gaussian-PSO procedure for the both of detection and tracking.