• 제목/요약/키워드: K-Nearest Neighbor

검색결과 650건 처리시간 0.036초

Discriminating Eggs from Two Local Breeds Based on Fatty Acid Profile and Flavor Characteristics Combined with Classification Algorithms

  • Dong, Xiao-Guang;Gao, Li-Bing;Zhang, Hai-Jun;Wang, Jing;Qiu, Kai;Qi, Guang-Hai;Wu, Shu-Geng
    • 한국축산식품학회지
    • /
    • 제41권6호
    • /
    • pp.936-949
    • /
    • 2021
  • This study discriminated fatty acid profile and flavor characteristics of Beijing You Chicken (BYC) as a precious local breed and Dwarf Beijing You Chicken (DBYC) eggs. Fatty acid profile and flavor characteristics were analyzed to identify differences between BYC and DBYC eggs. Four classification algorithms were used to build classification models. Arachidic acid, oleic acid (OA), eicosatrienoic acid, docosapentaenoic acid (DPA), hexadecenoic acid, monounsaturated fatty acids (MUFA), polyunsaturated fatty acids (PUFA), unsaturated fatty acids (UFA) and 35 volatile compounds had significant differences in fatty acids and volatile compounds by gas chromatography-mass spectrometry (GC-MS) (p<0.05). For fatty acid data, k-nearest neighbor (KNN) and support vector machine (SVM) got 91.7% classification accuracy. SPME-GC-MS data failed in classification models. For electronic nose data, classification accuracy of KNN, linear discriminant analysis (LDA), SVM and decision tree was all 100%. The overall results indicated that BYC and DBYC eggs could be discriminated based on electronic nose with suitable classification algorithms. This research compared the differentiation of the fatty acid profile and volatile compounds of various egg yolks. The results could be applied to evaluate egg nutrition and distinguish avian eggs.

환자 IQR 이상치와 상관계수 기반의 머신러닝 모델을 이용한 당뇨병 예측 메커니즘 (Diabetes prediction mechanism using machine learning model based on patient IQR outlier and correlation coefficient)

  • 정주호;이나은;김수민;서가은;오하영
    • 한국정보통신학회논문지
    • /
    • 제25권10호
    • /
    • pp.1296-1301
    • /
    • 2021
  • 최근 전 세계적으로 당뇨병 유발률이 증가함에 따라 다양한 머신러닝과 딥러닝 기술을 통해 당뇨병을 예측하려고 는 연구가 이어지고 있다. 본 연구에서는 독일의 Frankfurt Hospital 데이터로 머신러닝 기법을 활용하여 당뇨병을 예측하는 모델을 제시한다. IQR(Interquartile Range) 기법을 이용한 이상치 처리와 피어슨 상관관계 분석을 적용하고 Decision Tree, Random Forest, Knn, SVM, 앙상블 기법인 XGBoost, Voting, Stacking로 모델별 당뇨병 예측 성능을 비교한다. 연구를 진행한 결과 Stacking ensemble 기법의 정확도가 98.75%로 가장 뛰어난 성능을 보였다. 따라서 해당 모델을 이용하여 현대 사회에 만연한 당뇨병을 정확히 예측하고 예방할 수 있다는 점에서 본 연구는 의의가 있다.

머신러닝 기반의 수도권 지역 고령운전자 차대사람 사고심각도 분류 연구 (Classifying Severity of Senior Driver Accidents In Capital Regions Based on Machine Learning Algorithms)

  • 김승훈;임영빈;김기정
    • 디지털융복합연구
    • /
    • 제19권4호
    • /
    • pp.25-31
    • /
    • 2021
  • 고령화 시대에 따라 고령운전자 역시 증가하고 있으며, 이들에 의한 교통사고 심각성에 대한 관심이 높아지고 있다. 이에 고령운전자에 의한 사고심각도 예측 모형의 필요성이 점차 요구됨에 따라, 본 연구에서는 기계학습 기법을 활용하여 고령운전자에 의한 차대사람 사고심각도 예측을 위한 모형 정립 및 분석을 수행하고자 한다. 이를 위해 4개의 기계학습 알고리즘 (Logistic Model, KNN, RF, SVM)을 활용, 예측 모형을 개발하고 각 결과를 비교하였다. 연구 결과에 따르면 Logistic과 SVM 모형이 상대적으로 높은 예측력을 보였으며, 정확도 측면에서는 RF가 높은 것으로 나타났다. 추가적으로 각 중요 변수들을 이용하여 교차분석을 수행한 후 그 결과를 제시하였다. 본 연구의 결과들은 고령화시대에 고령운전자에 의한 사고심각성을 예방하기 위한 안전정책 및 인프라 개발에 활용될 것으로 판단된다.

Improved LTE Fingerprint Positioning Through Clustering-based Repeater Detection and Outlier Removal

  • Kwon, Jae Uk;Chae, Myeong Seok;Cho, Seong Yun
    • Journal of Positioning, Navigation, and Timing
    • /
    • 제11권4호
    • /
    • pp.369-379
    • /
    • 2022
  • In weighted k-nearest neighbor (WkNN)-based Fingerprinting positioning step, a process of comparing the requested positioning signal with signal information for each reference point stored in the fingerprint DB is performed. At this time, the higher the number of matched base station identifiers, the higher the possibility that the terminal exists in the corresponding location, and in fact, an additional weight is added to the location in proportion to the number of matching base stations. On the other hand, if the matching number of base stations is small, the selected candidate reference point has high dependence on the similarity value of the signal. But one problem arises here. The positioning signal can be compared with the repeater signal in the signal information stored on the DB, and the corresponding reference point can be selected as a candidate location. The selected reference point is likely to be an outlier, and if a certain weight is applied to the corresponding location, the error of the estimated location information increases. In order to solve this problem, this paper proposes a WkNN technique including an outlier removal function. To this end, it is first determined whether the repeater signal is included in the DB information of the matched base station. If the reference point for the repeater signal is selected as the candidate position, the reference position corresponding to the outlier is removed based on the clustering technique. The performance of the proposed technique is verified through data acquired in Seocho 1 and 2 dongs in Seoul.

잡음과 스펙트럼 이동에 강인한 CNN 기반 라만 분광 알고리즘 (CNN based Raman Spectroscopy Algorithm That is Robust to Noise and Spectral Shift)

  • 박재현;유형근;이창식;장동의;박동조;남현우;박병황
    • 한국군사과학기술학회지
    • /
    • 제24권3호
    • /
    • pp.264-271
    • /
    • 2021
  • Raman spectroscopy is an equipment that is widely used for classifying chemicals in chemical defense operations. However, the classification performance of Raman spectrum may deteriorate due to dark current noise, background noise, spectral shift by vibration of equipment, spectral shift by pressure change, etc. In this paper, we compare the classification accuracy of various machine learning algorithms including k-nearest neighbor, decision tree, linear discriminant analysis, linear support vector machine, nonlinear support vector machine, and convolutional neural network under noisy and spectral shifted conditions. Experimental results show that convolutional neural network maintains a high classification accuracy of over 95 % despite noise and spectral shift. This implies that convolutional neural network can be an ideal classification algorithm in a real combat situation where there is a lot of noise and spectral shift.

Indoor 3D Dynamic Reconstruction Fingerprint Matching Algorithm in 5G Ultra-Dense Network

  • Zhang, Yuexia;Jin, Jiacheng;Liu, Chong;Jia, Pengfei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권1호
    • /
    • pp.343-364
    • /
    • 2021
  • In the 5G era, the communication networks tend to be ultra-densified, which will improve the accuracy of indoor positioning and further improve the quality of positioning service. In this study, we propose an indoor three-dimensional (3D) dynamic reconstruction fingerprint matching algorithm (DSR-FP) in a 5G ultra-dense network. The first step of the algorithm is to construct a local fingerprint matrix having low-rank characteristics using partial fingerprint data, and then reconstruct the local matrix as a complete fingerprint library using the FPCA reconstruction algorithm. In the second step of the algorithm, a dynamic base station matching strategy is used to screen out the best quality service base stations and multiple sub-optimal service base stations. Then, the fingerprints of the other base station numbers are eliminated from the fingerprint database to simplify the fingerprint database. Finally, the 3D estimated coordinates of the point to be located are obtained through the K-nearest neighbor matching algorithm. The analysis of the simulation results demonstrates that the average relative error between the reconstructed fingerprint database by the DSR-FP algorithm and the original fingerprint database is 1.21%, indicating that the accuracy of the reconstruction fingerprint database is high, and the influence of the location error can be ignored. The positioning error of the DSR-FP algorithm is less than 0.31 m. Furthermore, at the same signal-to-noise ratio, the positioning error of the DSR-FP algorithm is lesser than that of the traditional fingerprint matching algorithm, while its positioning accuracy is higher.

Cable anomaly detection driven by spatiotemporal correlation dissimilarity measurements of bridge grouped cable forces

  • Dong-Hui, Yang;Hai-Lun, Gu;Ting-Hua, Yi;Zhan-Jun, Wu
    • Smart Structures and Systems
    • /
    • 제30권6호
    • /
    • pp.661-671
    • /
    • 2022
  • Stayed cables are the key components for transmitting loads in cable-stayed bridges. Therefore, it is very important to evaluate the cable force condition to ensure bridge safety. An online condition assessment and anomaly localization method is proposed for cables based on the spatiotemporal correlation of grouped cable forces. First, an anomaly sensitive feature index is obtained based on the distribution characteristics of grouped cable forces. Second, an adaptive anomaly detection method based on the k-nearest neighbor rule is used to perform dissimilarity measurements on the extracted feature index, and such a method can effectively remove the interference of environment factors and vehicle loads on online condition assessment of the grouped cable forces. Furthermore, an online anomaly isolation and localization method for stay cables is established, and the complete decomposition contributions method is used to decompose the feature matrix of the grouped cable forces and build an anomaly isolation index. Finally, case studies were carried out to validate the proposed method using an in-service cable-stayed bridge equipped with a structural health monitoring system. The results show that the proposed approach is sensitive to the abnormal distribution of grouped cable forces and is robust to the influence of interference factors. In addition, the proposed approach can also localize the cables with abnormal cable forces online, which can be successfully applied to the field monitoring of cables for cable-stayed bridges.

생존분석에서의 기계학습 (Machine learning in survival analysis)

  • 백재욱
    • 산업진흥연구
    • /
    • 제7권1호
    • /
    • pp.1-8
    • /
    • 2022
  • 본 논문은 중도중단 데이터가 포함된 생존데이터의 경우 적용할 수 있는 기계학습 방법에 대해 살펴보았다. 우선 탐색적인 자료분석으로 각 특성에 대한 분포, 여러 특성들 간의 관계 및 중요도 순위를 파악할 수 있었다. 다음으로 독립변수에 해당하는 여러 특성들과 종속변수에 해당하는 특성(사망여부) 간의 관계를 분류문제로 보고 logistic regression, K nearest neighbor 등의 기계학습 방법들을 적용해본 결과 적은 수의 데이터이지만 통상적인 기계학습 결과에서와 같이 logistic regression보다는 random forest가 성능이 더 좋게 나왔다. 하지만 근래에 성능이 좋다고 하는 artificial neural network나 gradient boost와 같은 기계학습 방법은 성능이 월등히 좋게 나오지 않았는데, 그 이유는 주어진 데이터가 빅데이터가 아니기 때문인 것으로 판명된다. 마지막으로 Kaplan-Meier나 Cox의 비례위험모델과 같은 통상적인 생존분석 방법을 적용하여 어떤 독립변수가 종속변수 (ti, δi)에 결정적인 영향을 미치는지 살펴볼 수 있었으며, 기계학습 방법에 속하는 random forest를 중도중단 데이터가 포함된 생존데이터에도 적용하여 성능을 평가할 수 있었다.

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • 제44권4호
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.

Determination of the stage and grade of periodontitis according to the current classification of periodontal and peri-implant diseases and conditions (2018) using machine learning algorithms

  • Kubra Ertas;Ihsan Pence;Melike Siseci Cesmeli;Zuhal Yetkin Ay
    • Journal of Periodontal and Implant Science
    • /
    • 제53권1호
    • /
    • pp.38-53
    • /
    • 2023
  • Purpose: The current Classification of Periodontal and Peri-Implant Diseases and Conditions, published and disseminated in 2018, involves some difficulties and causes diagnostic conflicts due to its criteria, especially for inexperienced clinicians. The aim of this study was to design a decision system based on machine learning algorithms by using clinical measurements and radiographic images in order to determine and facilitate the staging and grading of periodontitis. Methods: In the first part of this study, machine learning models were created using the Python programming language based on clinical data from 144 individuals who presented to the Department of Periodontology, Faculty of Dentistry, Süleyman Demirel University. In the second part, panoramic radiographic images were processed and classification was carried out with deep learning algorithms. Results: Using clinical data, the accuracy of staging with the tree algorithm reached 97.2%, while the random forest and k-nearest neighbor algorithms reached 98.6% accuracy. The best staging accuracy for processing panoramic radiographic images was provided by a hybrid network model algorithm combining the proposed ResNet50 architecture and the support vector machine algorithm. For this, the images were preprocessed, and high success was obtained, with a classification accuracy of 88.2% for staging. However, in general, it was observed that the radiographic images provided a low level of success, in terms of accuracy, for modeling the grading of periodontitis. Conclusions: The machine learning-based decision system presented herein can facilitate periodontal diagnoses despite its current limitations. Further studies are planned to optimize the algorithm and improve the results.