• Title/Summary/Keyword: k nearest neighbor method

Search Result 316, Processing Time 0.029 seconds

Classifying Cancer Using Partially Correlated Genes Selected by Forward Selection Method (전진선택법에 의해 선택된 부분 상관관계의 유전자들을 이용한 암 분류)

  • 유시호;조성배
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.3
    • /
    • pp.83-92
    • /
    • 2004
  • Gene expression profile is numerical data of gene expression level from organism measured on the microarray. Generally, each specific tissue indicates different expression levels in related genes, so that we can classify cancer with gene expression profile. Because not all the genes are related to classification, it is needed to select related genes that is called feature selection. This paper proposes a new gene selection method using forward selection method in regression analysis. This method reduces redundant information in the selected genes to have more efficient classification. We used k-nearest neighbor as a classifier and tested with colon cancer dataset. The results are compared with Pearson's coefficient and Spearman's coefficient methods and the proposed method showed better performance. It showed 90.3% accuracy in classification. The method also successfully applied to lymphoma cancer dataset.

Comparative Analysis of Machine Learning Models for Crop's yield Prediction

  • Babar, Zaheer Ud Din;UlAmin, Riaz;Sarwar, Muhammad Nabeel;Jabeen, Sidra;Abdullah, Muhammad
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.330-334
    • /
    • 2022
  • In light of the decreasing crop production and shortage of food across the world, one of the crucial criteria of agriculture nowadays is selecting the right crop for the right piece of land at the right time. First problem is that How Farmers can predict the right crop for cultivation because famers have no knowledge about prediction of crop. Second problem is that which algorithm is best that provide the maximum accuracy for crop prediction. Therefore, in this research Author proposed a method that would help to select the most suitable crop(s) for a specific land based on the analysis of the affecting parameters (Temperature, Humidity, Soil Moisture) using machine learning. In this work, the author implemented Random Forest Classifier, Support Vector Machine, k-Nearest Neighbor, and Decision Tree for crop selection. The author trained these algorithms with the training dataset and later these algorithms were tested with the test dataset. The author compared the performances of all the tested methods to arrive at the best outcome. In this way best algorithm from the mention above is selected for crop prediction.

Emotion Recognition in Arabic Speech from Saudi Dialect Corpus Using Machine Learning and Deep Learning Algorithms

  • Hanaa Alamri;Hanan S. Alshanbari
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.9-16
    • /
    • 2023
  • Speech can actively elicit feelings and attitudes by using words. It is important for researchers to identify the emotional content contained in speech signals as well as the sort of emotion that resulted from the speech that was made. In this study, we studied the emotion recognition system using a database in Arabic, especially in the Saudi dialect, the database is from a YouTube channel called Telfaz11, The four emotions that were examined were anger, happiness, sadness, and neutral. In our experiments, we extracted features from audio signals, such as Mel Frequency Cepstral Coefficient (MFCC) and Zero-Crossing Rate (ZCR), then we classified emotions using many classification algorithms such as machine learning algorithms (Support Vector Machine (SVM) and K-Nearest Neighbor (KNN)) and deep learning algorithms such as (Convolution Neural Network (CNN) and Long Short-Term Memory (LSTM)). Our Experiments showed that the MFCC feature extraction method and CNN model obtained the best accuracy result with 95%, proving the effectiveness of this classification system in recognizing Arabic spoken emotions.

Future flood frequency analysis from the heterogeneous impacts of Tropical Cyclone and non-Tropical Cyclone rainfalls in the Nam River Basin, South Korea

  • Alcantara, Angelika;Ahn, Kuk-Hyun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.139-139
    • /
    • 2021
  • Flooding events often result from extreme precipitations driven by various climate mechanisms, which are often disregarded in flood risk assessments. To bridge this gap, we propose a climate-mechanism-based flood frequency analysis that accommodates the direct linkage between the dominant climate processes and risk management decisions. Several statistical methods have been utilized in this approach including the Markov Chain analysis, K-nearest neighbor (KNN) resampling approach, and Z-score-based jittering method. After that, the impacts of climate change are associated with the modification of the transition matrix (TM) and the application of the quantile mapping approach. For this study, we have selected the Nam River Basin, South Korea, to consider the heterogeneous impacts of the two climate mechanisms, including the Tropical Cyclone (TC) and non-TCs. Based on our results, while both climate mechanisms have significant impacts on future flood extremes, TCs have been observed to bring more significant and immediate impacts on the flood extremes. The results in this study have proven that the proposed approach can lead to a new insights into future flooding management.

  • PDF

Identification of Pb-Zn ore under the condition of low count rate detection of slim hole based on PGNAA technology

  • Haolong Huang;Pingkun Cai;Wenbao Jia;Yan Zhang
    • Nuclear Engineering and Technology
    • /
    • v.55 no.5
    • /
    • pp.1708-1717
    • /
    • 2023
  • The grade analysis of lead-zinc ore is the basis for the optimal development and utilization of deposits. In this study, a method combining Prompt Gamma Neutron Activation Analysis (PGNAA) technology and machine learning is proposed for lead-zinc mine borehole logging, which can identify lead-zinc ores of different grades and gangue in the formation, providing real-time grade information qualitatively and semi-quantitatively. Firstly, Monte Carlo simulation is used to obtain a gamma-ray spectrum data set for training and testing machine learning classification algorithms. These spectra are broadened, normalized and separated into inelastic scattering and capture spectra, and then used to fit different classifier models. When the comprehensive grade boundary of high- and low-grade ores is set to 5%, the evaluation metrics calculated by the 5-fold cross-validation show that the SVM (Support Vector Machine), KNN (K-Nearest Neighbor), GNB (Gaussian Naive Bayes) and RF (Random Forest) models can effectively distinguish lead-zinc ore from gangue. At the same time, the GNB model has achieved the optimal accuracy of 91.45% when identifying high- and low-grade ores, and the F1 score for both types of ores is greater than 0.9.

Analyzing Key Variables in Network Attack Classification on NSL-KDD Dataset using SHAP (SHAP 기반 NSL-KDD 네트워크 공격 분류의 주요 변수 분석)

  • Sang-duk Lee;Dae-gyu Kim;Chang Soo Kim
    • Journal of the Society of Disaster Information
    • /
    • v.19 no.4
    • /
    • pp.924-935
    • /
    • 2023
  • Purpose: The central aim of this study is to leverage machine learning techniques for the classification of Intrusion Detection System (IDS) data, with a specific focus on identifying the variables responsible for enhancing overall performance. Method: First, we classified 'R2L(Remote to Local)' and 'U2R (User to Root)' attacks in the NSL-KDD dataset, which are difficult to detect due to class imbalance, using seven machine learning models, including Logistic Regression (LR) and K-Nearest Neighbor (KNN). Next, we use the SHapley Additive exPlanation (SHAP) for two classification models that showed high performance, Random Forest (RF) and Light Gradient-Boosting Machine (LGBM), to check the importance of variables that affect classification for each model. Result: In the case of RF, the 'service' variable and in the case of LGBM, the 'dst_host_srv_count' variable were confirmed to be the most important variables. These pivotal variables serve as key factors capable of enhancing performance in the context of classification for each respective model. Conclusion: In conclusion, this paper successfully identifies the optimal models, RF and LGBM, for classifying 'R2L' and 'U2R' attacks, while elucidating the crucial variables associated with each selected model.

Machine Learning-Based Rapid Prediction Method of Failure Mode for Reinforced Concrete Column (기계학습 기반 철근콘크리트 기둥에 대한 신속 파괴유형 예측 모델 개발 연구)

  • Kim, Subin;Oh, Keunyeong;Shin, Jiuk
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.113-119
    • /
    • 2024
  • Existing reinforced concrete buildings with seismically deficient column details affect the overall behavior depending on the failure type of column. This study aims to develop and validate a machine learning-based prediction model for the column failure modes (shear, flexure-shear, and flexure failure modes). For this purpose, artificial neural network (ANN), K-nearest neighbor (KNN), decision tree (DT), and random forest (RF) models were used, considering previously collected experimental data. Using four machine learning methodologies, we developed a classification learning model that can predict the column failure modes in terms of the input variables using concrete compressive strength, steel yield strength, axial load ratio, height-to-dept aspect ratio, longitudinal reinforcement ratio, and transverse reinforcement ratio. The performance of each machine learning model was compared and verified by calculating accuracy, precision, recall, F1-Score, and ROC. Based on the performance measurements of the classification model, the RF model represents the highest average value of the classification model performance measurements among the considered learning methods, and it can conservatively predict the shear failure mode. Thus, the RF model can rapidly predict the column failure modes with simple column details.

Ultra-Light-Weight Automotive Intrusion Detection System Using Random Sample Consensus (랜덤 샘플 합의를 사용한 초경량 차량용 침입 탐지 시스템)

  • Jonggwon Kim;Hyungchul Im;Joosock Lee;Seongsoo Lee
    • Journal of IKEEE
    • /
    • v.28 no.3
    • /
    • pp.412-418
    • /
    • 2024
  • This paper proposes an effective method for detecting hacking attacks in automotive CAN bus using the RANSAC (Random Sample Consensus) algorithm. Conventional deep learning-based detection techniques are difficult to be applied to resource-constrained environments such as vehicles. In this paper, the attack detection performance in vehicular CAN communication has been improved by utilizing the lightweight nature and efficiency of the RANSAC algorithm. The RANSAC algorithm can perform effective detection with minimal computational resources, providing a practical hacking detection solution for vehicles.

An Analysis of Accuracy for Submarine Topographic Information by Interpolation Method (보간기법에 따른 해저지형의 정확도 분석)

  • Kim Ga-Ya;Moon Doo-Youl;Seo Dong-Ju
    • Journal of Ocean Engineering and Technology
    • /
    • v.20 no.3 s.70
    • /
    • pp.67-76
    • /
    • 2006
  • Three-dimensional information of submarine topography was acquired by assembling DGPS and Echo Sounder, which is mainly used in the marine survey. However, the features of submarine topography, derived according to mechanical data, were confirmed using human eyes. Because the dredging capacity using a submarine surveying data influences harbor public affairs, analysis and the process method of surveying data is a very special element in construction costs. In this study, information on submarine topography is acquired by assembling DGPS and Echo Sounder. Moreover, the dredging capacity in harbor public affairs has been analyzed by the interpolation method: inverse distance to a power, kriging, minimum curvature, nearest neighbor, and radial basis function. Also, utilization of DGPS and Echo Sounder method in calculation of the dredging capacity have been confirmed by comparing and analyzing the dredging capacity and the actual one, as per each interpolation. According to this comparison result, in the case of applying Radial basis function interpolation and Kriging, 3.94 % and 4.61 % of error rates have been shown, respectively. In the case of the study for application of the proper interpolation, as per characteristics of submarine topography, is preceded in calculation of the dredging capacity relevant to harbor public affairs, it is expected that more speedy and correct calculation for the dredging capacity can be made.

Detection of E.coli biofilms with hyperspectral imaging and machine learning techniques

  • Lee, Ahyeong;Seo, Youngwook;Lim, Jongguk;Park, Saetbyeol;Yoo, Jinyoung;Kim, Balgeum;Kim, Giyoung
    • Korean Journal of Agricultural Science
    • /
    • v.47 no.3
    • /
    • pp.645-655
    • /
    • 2020
  • Bacteria are a very common cause of food poisoning. Moreover, bacteria form biofilms to protect themselves from harsh environments. Conventional detection methods for foodborne bacterial pathogens including the plate count method, enzyme-linked immunosorbent assays (ELISA), and polymerase chain reaction (PCR) assays require a lot of time and effort. Hyperspectral imaging has been used for food safety because of its non-destructive and real-time detection capability. This study assessed the feasibility of using hyperspectral imaging and machine learning techniques to detect biofilms formed by Escherichia coli. E. coli was cultured on a high-density polyethylene (HDPE) coupon, which is a main material of food processing facilities. Hyperspectral fluorescence images were acquired from 420 to 730 nm and analyzed by a single wavelength method and machine learning techniques to determine whether an E. coli culture was present. The prediction accuracy of a biofilm by the single wavelength method was 84.69%. The prediction accuracy by the machine learning techniques were 87.49, 91.16, 86.61, and 86.80% for decision tree (DT), k-nearest neighbor (k-NN), linear discriminant analysis (LDA), and partial least squares-discriminant analysis (PLS-DA), respectively. This result shows the possibility of using machine learning techniques, especially the k-NN model, to effectively detect bacterial pathogens and confirm food poisoning through hyperspectral images.