• Title/Summary/Keyword: K-Nearest Neighbor(KNN)

Search Result 86, Processing Time 0.024 seconds

Emotion Recognition in Arabic Speech from Saudi Dialect Corpus Using Machine Learning and Deep Learning Algorithms

  • Hanaa Alamri;Hanan S. Alshanbari
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.9-16
    • /
    • 2023
  • Speech can actively elicit feelings and attitudes by using words. It is important for researchers to identify the emotional content contained in speech signals as well as the sort of emotion that resulted from the speech that was made. In this study, we studied the emotion recognition system using a database in Arabic, especially in the Saudi dialect, the database is from a YouTube channel called Telfaz11, The four emotions that were examined were anger, happiness, sadness, and neutral. In our experiments, we extracted features from audio signals, such as Mel Frequency Cepstral Coefficient (MFCC) and Zero-Crossing Rate (ZCR), then we classified emotions using many classification algorithms such as machine learning algorithms (Support Vector Machine (SVM) and K-Nearest Neighbor (KNN)) and deep learning algorithms such as (Convolution Neural Network (CNN) and Long Short-Term Memory (LSTM)). Our Experiments showed that the MFCC feature extraction method and CNN model obtained the best accuracy result with 95%, proving the effectiveness of this classification system in recognizing Arabic spoken emotions.

Design and Implementation of Advanced Traffic Monitoring System based on Integration of Data Stream Management System and Spatial DBMS

  • Xia, Ying;Gan, Hongmei;Kim, Gyoung-Bae
    • Journal of Korea Spatial Information System Society
    • /
    • v.11 no.2
    • /
    • pp.162-169
    • /
    • 2009
  • The real-time traffic data is generated continuous and unbounded stream data type while intelligent transport system (ITS) needs to provide various and high quality services by combining with spatial information. Traditional database techniques in ITS has shortage for processing dynamic real-time stream data and static spatial data simultaneously. In this paper, we design and implement an advanced traffic monitoring system (ATMS) with the integration of existed data stream management system (DSMS) and spatial DBMS using IntraMap. Besides, the developed ATMS can deal with the stream data of DSMS, the trajectory data of relational DBMS, and the spatial data of SDBMS concurrently. The implemented ATMS supports historical and one time query, continuous query and combined query. Application programmer can develop various intelligent services such as moving trajectory tracking, k-nearest neighbor (KNN) query and dynamic intelligent navigation by using components of the ATMS.

  • PDF

Text-independent Speaker Identification Using Soft Bag-of-Words Feature Representation

  • Jiang, Shuangshuang;Frigui, Hichem;Calhoun, Aaron W.
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.4
    • /
    • pp.240-248
    • /
    • 2014
  • We present a robust speaker identification algorithm that uses novel features based on soft bag-of-word representation and a simple Naive Bayes classifier. The bag-of-words (BoW) based histogram feature descriptor is typically constructed by summarizing and identifying representative prototypes from low-level spectral features extracted from training data. In this paper, we define a generalization of the standard BoW. In particular, we define three types of BoW that are based on crisp voting, fuzzy memberships, and possibilistic memberships. We analyze our mapping with three common classifiers: Naive Bayes classifier (NB); K-nearest neighbor classifier (KNN); and support vector machines (SVM). The proposed algorithms are evaluated using large datasets that simulate medical crises. We show that the proposed soft bag-of-words feature representation approach achieves a significant improvement when compared to the state-of-art methods.

Default Prediction of Automobile Credit Based on Support Vector Machine

  • Chen, Ying;Zhang, Ruirui
    • Journal of Information Processing Systems
    • /
    • v.17 no.1
    • /
    • pp.75-88
    • /
    • 2021
  • Automobile credit business has developed rapidly in recent years, and corresponding default phenomena occur frequently. Credit default will bring great losses to automobile financial institutions. Therefore, the successful prediction of automobile credit default is of great significance. Firstly, the missing values are deleted, then the random forest is used for feature selection, and then the sample data are randomly grouped. Finally, six prediction models of support vector machine (SVM), random forest and k-nearest neighbor (KNN), logistic, decision tree, and artificial neural network (ANN) are constructed. The results show that these six machine learning models can be used to predict the default of automobile credit. Among these six models, the accuracy of decision tree is 0.79, which is the highest, but the comprehensive performance of SVM is the best. And random grouping can improve the efficiency of model operation to a certain extent, especially SVM.

A comparison of imputation methods using machine learning models

  • Heajung Suh;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.331-341
    • /
    • 2023
  • Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.

Prediction of Delivery Quality Assurance Via Machine Learning in Helical Tomotherapy (방사선치료 시 다양한 기계학습을 이용한 선량품질관리 결과의 예측)

  • Kyung Hwan Chang
    • Journal of radiological science and technology
    • /
    • v.47 no.4
    • /
    • pp.263-270
    • /
    • 2024
  • The objective of this study was to evaluate the accuracy and impact of leaf open time (LOT) and pitch using various machine learning models on EBT film-based delivery quality assurance (DQA) performed on 211 patients of helical tomotherapy (HT). We randomly selected passed (n=191) and failed (n=20) DQA measurements to evaluate the accuracy of the k-nearest neighbor (KNN), support vector machine (SVM), naive Bayes (NB) and logistic regression (LR) models using scale-dependent metrics such as the coefficient of determination (R2), mean squared error (MSE), and root MSE (RMSE). We evaluated the performance of the four prediction models in terms of the accuracy, precision, sensitivity, and F1-score using a confusion matrix, finding the NB and LR models to achieve optimal results. The results of this study are expected to reduce the workload of medical physicists and dosimetrists by predicting DQA results according to LOT and pitch in advance.

A Study on the Drug Classification Using Machine Learning Techniques (머신러닝 기법을 이용한 약물 분류 방법 연구)

  • Anmol Kumar Singh;Ayush Kumar;Adya Singh;Akashika Anshum;Pradeep Kumar Mallick
    • Advanced Industrial SCIence
    • /
    • v.3 no.2
    • /
    • pp.8-16
    • /
    • 2024
  • This paper shows the system of drug classification, the goal of this is to foretell the apt drug for the patients based on their demographic and physiological traits. The dataset consists of various attributes like Age, Sex, BP (Blood Pressure), Cholesterol Level, and Na_to_K (Sodium to Potassium ratio), with the objective to determine the kind of drug being given. The models used in this paper are K-Nearest Neighbors (KNN), Logistic Regression and Random Forest. Further to fine-tune hyper parameters using 5-fold cross-validation, GridSearchCV was used and each model was trained and tested on the dataset. To assess the performance of each model both with and without hyper parameter tuning evaluation metrics like accuracy, confusion matrices, and classification reports were used and the accuracy of the models without GridSearchCV was 0.7, 0.875, 0.975 and with GridSearchCV was 0.75, 1.0, 0.975. According to GridSearchCV Logistic Regression is the most suitable model for drug classification among the three-model used followed by the K-Nearest Neighbors. Also, Na_to_K is an essential feature in predicting the outcome.

Method for Assessing Landslide Susceptibility Using SMOTE and Classification Algorithms (SMOTE와 분류 기법을 활용한 산사태 위험 지역 결정 방법)

  • Yoon, Hyung-Koo
    • Journal of the Korean Geotechnical Society
    • /
    • v.39 no.6
    • /
    • pp.5-12
    • /
    • 2023
  • Proactive assessment of landslide susceptibility is necessary for minimizing casualties. This study proposes a methodology for classifying the landslide safety factor using a classification algorithm based on machine learning techniques. The high-risk area model is adopted to perform the classification and eight geotechnical parameters are adopted as inputs. Four classification algorithms-namely decision tree, k-nearest neighbor, logistic regression, and random forest-are employed for comparing classification accuracy for the safety factors ranging between 1.2 and 2.0. Notably, a high accuracy is demonstrated in the safety factor range of 1.2~1.7, but a relatively low accuracy is obtained in the range of 1.8~2.0. To overcome this issue, the synthetic minority over-sampling technique (SMOTE) is adopted to generate additional data. The application of SMOTE improves the average accuracy by ~250% in the safety factor range of 1.8~2.0. The results demonstrate that SMOTE algorithm improves the accuracy of classification algorithms when applied to geotechnical data.

A Study on the Failure Diagnosis of Transfer Robot for Semiconductor Automation Based on Machine Learning Algorithm (머신러닝 알고리즘 기반 반도체 자동화를 위한 이송로봇 고장진단에 대한 연구)

  • Kim, Mi Jin;Ko, Kwang In;Ku, Kyo Mun;Shim, Jae Hong;Kim, Kihyun
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.4
    • /
    • pp.65-70
    • /
    • 2022
  • In manufacturing and semiconductor industries, transfer robots increase productivity through accurate and continuous work. Due to the nature of the semiconductor process, there are environments where humans cannot intervene to maintain internal temperature and humidity in a clean room. So, transport robots take responsibility over humans. In such an environment where the manpower of the process is cutting down, the lack of maintenance and management technology of the machine may adversely affect the production, and that's why it is necessary to develop a technology for the machine failure diagnosis system. Therefore, this paper tries to identify various causes of failure of transport robots that are widely used in semiconductor automation, and the Prognostics and Health Management (PHM) method is considered for determining and predicting the process of failures. The robot mainly fails in the driving unit due to long-term repetitive motion, and the core components of the driving unit are motors and gear reducer. A simulation drive unit was manufactured and tested around this component and then applied to 6-axis vertical multi-joint robots used in actual industrial sites. Vibration data was collected for each cause of failure of the robot, and then the collected data was processed through signal processing and frequency analysis. The processed data can determine the fault of the robot by utilizing machine learning algorithms such as SVM (Support Vector Machine) and KNN (K-Nearest Neighbor). As a result, the PHM environment was built based on machine learning algorithms using SVM and KNN, confirming that failure prediction was partially possible.

Development of methodology for daily rainfall simulation considering distribution of rainfall events in each duration (강우사상의 지속기간별 분포 특성을 고려한 일강우 모의 기법 개발)

  • Jung, Jaewon;Kim, Soojun;Kim, Hung Soo
    • Journal of Korea Water Resources Association
    • /
    • v.52 no.2
    • /
    • pp.141-148
    • /
    • 2019
  • When simulating the daily rainfall amount by existing Markov Chain model, it is general to simulate the rainfall occurrence and to estimate the rainfall amount randomly from the distribution which is similar to the daily rainfall distribution characteristic using Monte Carlo simulation. At this time, there is a limitation that the characteristics of rainfall intensity and distribution by time according to the rainfall duration are not reflected in the results. In this study, 1-day, 2-day, 3-day, 4-day rainfall event are classified, and the rainfall amount is estimated by rainfall duration. In other words, the distributions of the total amount of rainfall event by the duration are set using the Kernel Density Estimation (KDE), the daily rainfall in each day are estimated from the distribution of each duration. Total rainfall amount determined for each event are divided into each daily rainfall considering the type of daily distribution of the rainfall event which has most similar rainfall amount of the observed rainfall using the k-Nearest Neighbor algorithm (KNN). This study is to develop the limitation of the existing rainfall estimation method, and it is expected that this results can use for the future rainfall estimation and as the primary data in water resource design.