• 제목/요약/키워드: k-Nearest Neighbor Method

검색결과 314건 처리시간 0.025초

Comparative Analysis of Machine Learning Models for Crop's yield Prediction

  • Babar, Zaheer Ud Din;UlAmin, Riaz;Sarwar, Muhammad Nabeel;Jabeen, Sidra;Abdullah, Muhammad
    • International Journal of Computer Science & Network Security
    • /
    • 제22권5호
    • /
    • pp.330-334
    • /
    • 2022
  • In light of the decreasing crop production and shortage of food across the world, one of the crucial criteria of agriculture nowadays is selecting the right crop for the right piece of land at the right time. First problem is that How Farmers can predict the right crop for cultivation because famers have no knowledge about prediction of crop. Second problem is that which algorithm is best that provide the maximum accuracy for crop prediction. Therefore, in this research Author proposed a method that would help to select the most suitable crop(s) for a specific land based on the analysis of the affecting parameters (Temperature, Humidity, Soil Moisture) using machine learning. In this work, the author implemented Random Forest Classifier, Support Vector Machine, k-Nearest Neighbor, and Decision Tree for crop selection. The author trained these algorithms with the training dataset and later these algorithms were tested with the test dataset. The author compared the performances of all the tested methods to arrive at the best outcome. In this way best algorithm from the mention above is selected for crop prediction.

Emotion Recognition in Arabic Speech from Saudi Dialect Corpus Using Machine Learning and Deep Learning Algorithms

  • Hanaa Alamri;Hanan S. Alshanbari
    • International Journal of Computer Science & Network Security
    • /
    • 제23권8호
    • /
    • pp.9-16
    • /
    • 2023
  • Speech can actively elicit feelings and attitudes by using words. It is important for researchers to identify the emotional content contained in speech signals as well as the sort of emotion that resulted from the speech that was made. In this study, we studied the emotion recognition system using a database in Arabic, especially in the Saudi dialect, the database is from a YouTube channel called Telfaz11, The four emotions that were examined were anger, happiness, sadness, and neutral. In our experiments, we extracted features from audio signals, such as Mel Frequency Cepstral Coefficient (MFCC) and Zero-Crossing Rate (ZCR), then we classified emotions using many classification algorithms such as machine learning algorithms (Support Vector Machine (SVM) and K-Nearest Neighbor (KNN)) and deep learning algorithms such as (Convolution Neural Network (CNN) and Long Short-Term Memory (LSTM)). Our Experiments showed that the MFCC feature extraction method and CNN model obtained the best accuracy result with 95%, proving the effectiveness of this classification system in recognizing Arabic spoken emotions.

Future flood frequency analysis from the heterogeneous impacts of Tropical Cyclone and non-Tropical Cyclone rainfalls in the Nam River Basin, South Korea

  • Alcantara, Angelika;Ahn, Kuk-Hyun
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2021년도 학술발표회
    • /
    • pp.139-139
    • /
    • 2021
  • Flooding events often result from extreme precipitations driven by various climate mechanisms, which are often disregarded in flood risk assessments. To bridge this gap, we propose a climate-mechanism-based flood frequency analysis that accommodates the direct linkage between the dominant climate processes and risk management decisions. Several statistical methods have been utilized in this approach including the Markov Chain analysis, K-nearest neighbor (KNN) resampling approach, and Z-score-based jittering method. After that, the impacts of climate change are associated with the modification of the transition matrix (TM) and the application of the quantile mapping approach. For this study, we have selected the Nam River Basin, South Korea, to consider the heterogeneous impacts of the two climate mechanisms, including the Tropical Cyclone (TC) and non-TCs. Based on our results, while both climate mechanisms have significant impacts on future flood extremes, TCs have been observed to bring more significant and immediate impacts on the flood extremes. The results in this study have proven that the proposed approach can lead to a new insights into future flooding management.

  • PDF

Identification of Pb-Zn ore under the condition of low count rate detection of slim hole based on PGNAA technology

  • Haolong Huang;Pingkun Cai;Wenbao Jia;Yan Zhang
    • Nuclear Engineering and Technology
    • /
    • 제55권5호
    • /
    • pp.1708-1717
    • /
    • 2023
  • The grade analysis of lead-zinc ore is the basis for the optimal development and utilization of deposits. In this study, a method combining Prompt Gamma Neutron Activation Analysis (PGNAA) technology and machine learning is proposed for lead-zinc mine borehole logging, which can identify lead-zinc ores of different grades and gangue in the formation, providing real-time grade information qualitatively and semi-quantitatively. Firstly, Monte Carlo simulation is used to obtain a gamma-ray spectrum data set for training and testing machine learning classification algorithms. These spectra are broadened, normalized and separated into inelastic scattering and capture spectra, and then used to fit different classifier models. When the comprehensive grade boundary of high- and low-grade ores is set to 5%, the evaluation metrics calculated by the 5-fold cross-validation show that the SVM (Support Vector Machine), KNN (K-Nearest Neighbor), GNB (Gaussian Naive Bayes) and RF (Random Forest) models can effectively distinguish lead-zinc ore from gangue. At the same time, the GNB model has achieved the optimal accuracy of 91.45% when identifying high- and low-grade ores, and the F1 score for both types of ores is greater than 0.9.

SHAP 기반 NSL-KDD 네트워크 공격 분류의 주요 변수 분석 (Analyzing Key Variables in Network Attack Classification on NSL-KDD Dataset using SHAP)

  • 이상덕;김대규;김창수
    • 한국재난정보학회 논문집
    • /
    • 제19권4호
    • /
    • pp.924-935
    • /
    • 2023
  • Purpose: The central aim of this study is to leverage machine learning techniques for the classification of Intrusion Detection System (IDS) data, with a specific focus on identifying the variables responsible for enhancing overall performance. Method: First, we classified 'R2L(Remote to Local)' and 'U2R (User to Root)' attacks in the NSL-KDD dataset, which are difficult to detect due to class imbalance, using seven machine learning models, including Logistic Regression (LR) and K-Nearest Neighbor (KNN). Next, we use the SHapley Additive exPlanation (SHAP) for two classification models that showed high performance, Random Forest (RF) and Light Gradient-Boosting Machine (LGBM), to check the importance of variables that affect classification for each model. Result: In the case of RF, the 'service' variable and in the case of LGBM, the 'dst_host_srv_count' variable were confirmed to be the most important variables. These pivotal variables serve as key factors capable of enhancing performance in the context of classification for each respective model. Conclusion: In conclusion, this paper successfully identifies the optimal models, RF and LGBM, for classifying 'R2L' and 'U2R' attacks, while elucidating the crucial variables associated with each selected model.

기계학습 기반 철근콘크리트 기둥에 대한 신속 파괴유형 예측 모델 개발 연구 (Machine Learning-Based Rapid Prediction Method of Failure Mode for Reinforced Concrete Column)

  • 김수빈;오근영;신지욱
    • 한국지진공학회논문집
    • /
    • 제28권2호
    • /
    • pp.113-119
    • /
    • 2024
  • Existing reinforced concrete buildings with seismically deficient column details affect the overall behavior depending on the failure type of column. This study aims to develop and validate a machine learning-based prediction model for the column failure modes (shear, flexure-shear, and flexure failure modes). For this purpose, artificial neural network (ANN), K-nearest neighbor (KNN), decision tree (DT), and random forest (RF) models were used, considering previously collected experimental data. Using four machine learning methodologies, we developed a classification learning model that can predict the column failure modes in terms of the input variables using concrete compressive strength, steel yield strength, axial load ratio, height-to-dept aspect ratio, longitudinal reinforcement ratio, and transverse reinforcement ratio. The performance of each machine learning model was compared and verified by calculating accuracy, precision, recall, F1-Score, and ROC. Based on the performance measurements of the classification model, the RF model represents the highest average value of the classification model performance measurements among the considered learning methods, and it can conservatively predict the shear failure mode. Thus, the RF model can rapidly predict the column failure modes with simple column details.

보간기법에 따른 해저지형의 정확도 분석 (An Analysis of Accuracy for Submarine Topographic Information by Interpolation Method)

  • 김가야;문두열;서동주
    • 한국해양공학회지
    • /
    • 제20권3호
    • /
    • pp.67-76
    • /
    • 2006
  • Three-dimensional information of submarine topography was acquired by assembling DGPS and Echo Sounder, which is mainly used in the marine survey. However, the features of submarine topography, derived according to mechanical data, were confirmed using human eyes. Because the dredging capacity using a submarine surveying data influences harbor public affairs, analysis and the process method of surveying data is a very special element in construction costs. In this study, information on submarine topography is acquired by assembling DGPS and Echo Sounder. Moreover, the dredging capacity in harbor public affairs has been analyzed by the interpolation method: inverse distance to a power, kriging, minimum curvature, nearest neighbor, and radial basis function. Also, utilization of DGPS and Echo Sounder method in calculation of the dredging capacity have been confirmed by comparing and analyzing the dredging capacity and the actual one, as per each interpolation. According to this comparison result, in the case of applying Radial basis function interpolation and Kriging, 3.94 % and 4.61 % of error rates have been shown, respectively. In the case of the study for application of the proper interpolation, as per characteristics of submarine topography, is preceded in calculation of the dredging capacity relevant to harbor public affairs, it is expected that more speedy and correct calculation for the dredging capacity can be made.

Detection of E.coli biofilms with hyperspectral imaging and machine learning techniques

  • Lee, Ahyeong;Seo, Youngwook;Lim, Jongguk;Park, Saetbyeol;Yoo, Jinyoung;Kim, Balgeum;Kim, Giyoung
    • 농업과학연구
    • /
    • 제47권3호
    • /
    • pp.645-655
    • /
    • 2020
  • Bacteria are a very common cause of food poisoning. Moreover, bacteria form biofilms to protect themselves from harsh environments. Conventional detection methods for foodborne bacterial pathogens including the plate count method, enzyme-linked immunosorbent assays (ELISA), and polymerase chain reaction (PCR) assays require a lot of time and effort. Hyperspectral imaging has been used for food safety because of its non-destructive and real-time detection capability. This study assessed the feasibility of using hyperspectral imaging and machine learning techniques to detect biofilms formed by Escherichia coli. E. coli was cultured on a high-density polyethylene (HDPE) coupon, which is a main material of food processing facilities. Hyperspectral fluorescence images were acquired from 420 to 730 nm and analyzed by a single wavelength method and machine learning techniques to determine whether an E. coli culture was present. The prediction accuracy of a biofilm by the single wavelength method was 84.69%. The prediction accuracy by the machine learning techniques were 87.49, 91.16, 86.61, and 86.80% for decision tree (DT), k-nearest neighbor (k-NN), linear discriminant analysis (LDA), and partial least squares-discriminant analysis (PLS-DA), respectively. This result shows the possibility of using machine learning techniques, especially the k-NN model, to effectively detect bacterial pathogens and confirm food poisoning through hyperspectral images.

Fast Search with Data-Oriented Multi-Index Hashing for Multimedia Data

  • Ma, Yanping;Zou, Hailin;Xie, Hongtao;Su, Qingtang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권7호
    • /
    • pp.2599-2613
    • /
    • 2015
  • Multi-index hashing (MIH) is the state-of-the-art method for indexing binary codes, as it di-vides long codes into substrings and builds multiple hash tables. However, MIH is based on the dataset codes uniform distribution assumption, and will lose efficiency in dealing with non-uniformly distributed codes. Besides, there are lots of results sharing the same Hamming distance to a query, which makes the distance measure ambiguous. In this paper, we propose a data-oriented multi-index hashing method (DOMIH). We first compute the covariance ma-trix of bits and learn adaptive projection vector for each binary substring. Instead of using substrings as direct indices into hash tables, we project them with corresponding projection vectors to generate new indices. With adaptive projection, the indices in each hash table are near uniformly distributed. Then with covariance matrix, we propose a ranking method for the binary codes. By assigning different bit-level weights to different bits, the returned bina-ry codes are ranked at a finer-grained binary code level. Experiments conducted on reference large scale datasets show that compared to MIH the time performance of DOMIH can be improved by 36.9%-87.4%, and the search accuracy can be improved by 22.2%. To pinpoint the potential of DOMIH, we further use near-duplicate image retrieval as examples to show the applications and the good performance of our method.

Okapi BM25 단어 가중치법 적용을 통한 문서 범주화의 성능 향상 (A Research on Enhancement of Text Categorization Performance by using Okapi BM25 Word Weight Method)

  • 이용훈;이상범
    • 한국산학기술학회논문지
    • /
    • 제11권12호
    • /
    • pp.5089-5096
    • /
    • 2010
  • 문서 범주화는 정보검색 시스템의 중요한 기능중의 하나로 문서들을 어떤 기준에 의해 그룹화를 하는 것을 말한다. 범주화의 일반적인 방법은 대상 문서에서 중요한 단어들을 추출하고 가중치를 부여한 후에 분류 알고리즘에 따라 문서를 분류한다. 따라서 성능과 정확성은 분류 알고리즘에 의해 결정됨으로 알고리즘의 효율성이 중요하다. 본 논문에서는 단어 가중치 계산 방법을 개선하여 문서분류 성능을 향상시키는 것을 소개하였다. Okapi BM25 단어 가중치법은 일반적인 정보검색분야에서 사용되어 검색 결과에 좋은 결과를 보여주고 있다. 이를 적용하여 문서 범주화에서도 좋은 성능을 보이는지를 실험하였다. 비교한 단어 가중치법에는 가장 일반적인 TF-IDF법와 문서분류에 최적화된 가중치법 TF-ICF법, 그리고 문서요약에서 많이 사용되는 TF-ISF법을 이용하여 4가지 가중치법에 따라 결과를 측정하였다. 실험에 사용한 문서로는 Reuter-21578 문서를 사용하였으며 분류기 알고리즘으로는 Support Vector Machine(SVM)와 K-Nearest Neighbor(KNN)알고리즘을 사용하여 실험하였다. 사용된 가중치법 중 Okapi BM25 법이 가장 좋은 성능을 보였다.