• Title/Summary/Keyword: k-NN algorithm

Search Result 270, Processing Time 0.032 seconds

Algorithms for Classifying the Results at the Baccalaureate Exam-Comparative Analysis of Performances

  • Marcu, Daniela;Danubianu, Mirela;Barila, Adina;Simionescu, Corina
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.35-42
    • /
    • 2021
  • In the current context of digitalization of education, the use of modern methods and techniques of data analysis and processing in order to improve students' school results has a very important role. In our paper, we aimed to perform a comparative study of the classification performances of AdaBoost, SVM, Naive Bayes, Neural Network and kNN algorithms to classify the results obtained at the Baccalaureate by students from a college in Suceava, during 2012-2019. To evaluate the results we used the metrics: AUC, CA, F1, Precision and Recall. The AdaBoost algorithm achieves incredible performance for classifying the results into two categories: promoted / rejected. Next in terms of performance is Naive Bayes with a score of 0.999 for the AUC metric. The Neural Network and kNN algorithms obtain scores of 0.998 and 0.996 for AUC, respectively. SVM shows poorer performance with the score 0.987 for AUC. With the help of the HeatMap and DataTable visualization tools we identified possible correlations between classification results and some characteristics of data.

Comparison of Classification Rate Between BP and ANFIS with FCM Clustering Method on Off-line PD Model of Stator Coil

  • Park Seong-Hee;Lim Kee-Joe;Kang Seong-Hwa;Seo Jeong-Min;Kim Young-Geun
    • KIEE International Transactions on Electrophysics and Applications
    • /
    • v.5C no.3
    • /
    • pp.138-142
    • /
    • 2005
  • In this paper, we compared recognition rates between NN(neural networks) and clustering method as a scheme of off-line PD(partial discharge) diagnosis which occurs at the stator coil of traction motor. To acquire PD data, three defective models are made. PD data for classification were acquired from PD detector. And then statistical distributions are calculated to classify model discharge sources. These statistical distributions were applied as input data of two classification tools, BP(Back propagation algorithm) and ANFIS(adaptive network based fuzzy inference system) pre-processed FCM(fuzzy c-means) clustering method. So, classification rate of BP were somewhat higher than ANFIS. But other items of ANFIS were better than BP; learning time, parameter number, simplicity of algorithm.

Analysis of Market Trajectory Data using k-NN

  • Park, So-Hyun;Ihm, Sun-Young;Park, Young-Ho
    • Journal of Multimedia Information System
    • /
    • v.5 no.3
    • /
    • pp.195-200
    • /
    • 2018
  • Recently, as the sensor and big data analysis technology have been developed, there have been a lot of researches that analyze the purchase-related data such as the trajectory information and the stay time. Such purchase-related data is usefully used for the purchase pattern prediction and the purchase time prediction. Because it is difficult to find periodic patterns in large-scale human data, it is necessary to look at actual data sets, find various feature patterns, and then apply a machine learning algorithm appropriate to the pattern and purpose. Although existing papers have been used to analyze data using various machine learning methods, there is a lack of statistical analysis such as finding feature patterns before applying the machine learning algorithm. Therefore, we analyze the purchasing data of Songjeong Maeil Market, which is a data gathering place, and finds some characteristic patterns through statistical data analysis. Based on the results of 1, we derive meaningful conclusions by applying the machine learning algorithm and present future research directions. Through the data analysis, it was confirmed that the number of visits was different according to the regional characteristics around Songjeong Maeil Market, and the distribution of time spent by consumers could be grasped.

A Research on the Adaptive Control by the Modification of Control Structure and Neural Network Compensation (제어구조 변경과 신경망 보정에 의한 적응제어에 관한 연구)

  • Kim, Yun-Sang;Lee, Jong-Soo;Choi, Kyung-Sam
    • Proceedings of the KIEE Conference
    • /
    • 1999.11c
    • /
    • pp.812-814
    • /
    • 1999
  • In this paper, we propose a new control algorithm based on the neural network(NN) feedback compensation with a desired trajectory modification. The proposed algorithm decreases trajectory errors by a feed-forward desired torque combined with a neural network feedback torque component. And, to robustly control the tracking error, we modified the desired trajectory by variable structure concept smoothed by a fuzzy logic. For the numerical simulation, a 2-link robot manipulator model was assumed. To simulate the disturbance due to the modelling uncertainty. As a result of this simulation, the proposed method shows better trajectory tracking performance compared with the CTM and decreases the chattering in control inputs.

  • PDF

Customer Relationship Management in Telecom Market using an Optimized Case-based Reasoning (최적화 사례기반추론을 이용한 통신시장 고객관계관리)

  • An, Hyeon-Cheol;Kim, Gyeong-Jae
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.11a
    • /
    • pp.285-288
    • /
    • 2006
  • Most previous studies on improving the effectiveness of CBR have focused on the similarity function aspect or optimization of case features and their weights. However, according to some of the prior research, finding the optimal k parameter for the k-nearest neighbor (k-NN) is also crucial for improving the performance of the CBR system. Nonetheless, there have been few attempts to optimize the number of neighbors, especially using artificial intelligence (AI) techniques. In this study, we introduce a genetic algorithm (GA) to optimize the number of neighbors that combine, as well as the weight of each feature. The new model is applied to the real-world case of a major telecommunication company in Korea in order to build the prediction model for the customer profitability level. Experimental results show that our GA-optimized CBR approach outperforms other AI techniques for this mulriclass classification problem.

  • PDF

The Design of Feature Selecting Algorithm for Sleep Stage Analysis (수면단계 분석을 위한 특징 선택 알고리즘 설계)

  • Lee, JeeEun;Yoo, Sun K.
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.10
    • /
    • pp.207-216
    • /
    • 2013
  • The aim of this study is to design a classifier for sleep stage analysis and select important feature set which shows sleep stage well based on physiological signals during sleep. Sleep has a significant effect on the quality of human life. When people undergo lack of sleep or sleep-related disease, they are likely to reduced concentration and cognitive impairment affects, etc. Therefore, there are a lot of research to analyze sleep stage. In this study, after acquisition physiological signals during sleep, we do pre-processing such as filtering for extracting features. The features are used input for the new combination algorithm using genetic algorithm(GA) and neural networks(NN). The algorithm selects features which have high weights to classify sleep stage. As the result of this study, accuracy of the algorithm is up to 90.26% with electroencephalography(EEG) signal and electrocardiography(ECG) signal, and selecting features are alpha and delta frequency band power of EEG signal and standard deviation of all normal RR intervals(SDNN) of ECG signal. We checked the selected features are well shown that they have important information to classify sleep stage as doing repeating the algorithm. This research could use for not only diagnose disease related to sleep but also make a guideline of sleep stage analysis.

Performance Comparison of Automatic Classification Using Word Embeddings of Book Titles (단행본 서명의 단어 임베딩에 따른 자동분류의 성능 비교)

  • Yong-Gu Lee
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.4
    • /
    • pp.307-327
    • /
    • 2023
  • To analyze the impact of word embedding on book titles, this study utilized word embedding models (Word2vec, GloVe, fastText) to generate embedding vectors from book titles. These vectors were then used as classification features for automatic classification. The classifier utilized the k-nearest neighbors (kNN) algorithm, with the categories for automatic classification based on the DDC (Dewey Decimal Classification) main class 300 assigned by libraries to books. In the automatic classification experiment applying word embeddings to book titles, the Skip-gram architectures of Word2vec and fastText showed better results in the automatic classification performance of the kNN classifier compared to the TF-IDF features. In the optimization of various hyperparameters across the three models, the Skip-gram architecture of the fastText model demonstrated overall good performance. Specifically, better performance was observed when using hierarchical softmax and larger embedding dimensions as hyperparameters in this model. From a performance perspective, fastText can generate embeddings for substrings or subwords using the n-gram method, which has been shown to increase recall. The Skip-gram architecture of the Word2vec model generally showed good performance at low dimensions(size 300) and with small sizes of negative sampling (3 or 5).

Performance Evaluation on the Learning Algorithm for Automatic Classification of Q&A Documents (고객 질의 문서 자동 분류를 위한 학습 알고리즘 성능 평가)

  • Choi Jung-Min;Lee Byoung-Soo
    • The KIPS Transactions:PartD
    • /
    • v.13D no.1 s.104
    • /
    • pp.133-138
    • /
    • 2006
  • Electric commerce of surpassing the traditional one appeared before the public and has currently led the change in the management of enterprises. To establish and maintain good relations with customers, electric commerce has various channels for customers that understand what they want to and suggest it to them. The bulletin board and e-mail among em are inbound information that enterprises can directly listen to customers' opinions and are different from other channels in characters. Enterprises can effectively manage the bulletin board and e-mail by understanding customers' ideas as many as possible and provide them with optimum answers. It is one of the important factors to improve the reliability of the notice board and e-mail as well as the whole electric commerce. Therefore this thesis researches into methods to classify various kinds of documents automatically in electric commerce; they are possible to solve existing problems of the bulletin board and e-mail, to operate effectively and to manage systematically. Moreover, it researches what the most suitable algorithm is in the automatic classification of Q&A documents by experiment the classifying performance of Naive Bayesian, TFIDF, Neural Network, k-NN

A Study on the Applicability of Machine Learning Algorithms for Detecting Hydraulic Outliers in a Borehole (시추공 수리 이상점 탐지를 위한 기계학습 알고리즘의 적용성 연구)

  • Seungbeom Choi; Kyung-Woo Park;Changsoo Lee
    • Tunnel and Underground Space
    • /
    • v.33 no.6
    • /
    • pp.561-573
    • /
    • 2023
  • Korea Atomic Energy Research Institute (KAERI) constructed the KURT (KAERI Underground Research Tunnel) to analyze the hydrogeological/geochemical characteristics of deep rock mass. Numerous boreholes have been drilled to conduct various field tests. The selection of suitable investigation intervals within a borehole is of great importance. When objectives are centered around hydraulic flow and groundwater sampling, intervals with sufficient groundwater flow are the most suitable. This study defines such points as hydraulic outliers and aimed to detect them using borehole geophysical logging data (temperature and EC) from a 1 km depth borehole. For systematic and efficient outlier detection, machine learning algorithms, such as DBSCAN, OCSVM, kNN, and isolation forest, were applied and their applicability was assessed. Following data preprocessing and algorithm optimization, the four algorithms detected 55, 12, 52, and 68 outliers, respectively. Though this study confirms applicability of the machine learning algorithms, it is suggested that further verification and supplements are desirable since the input data were relatively limited.

Optimization Algorithms for Site Facility Layout Problems Using Self-Organizing Maps

  • Park, U-Yeol;An, Sung-Hoon
    • Journal of the Korea Institute of Building Construction
    • /
    • v.12 no.6
    • /
    • pp.664-673
    • /
    • 2012
  • Determining the layout of temporary facilities that support construction activities at a site is an important planning activity, as layout can significantly affect cost, quality of work, safety, and other aspects of the project. The construction site layout problem involves difficult combinatorial optimization. Recently, various artificial intelligence(AI)-based algorithms have been applied to solving many complex optimization problems, including neural networks(NN), genetic algorithms(GA), and swarm intelligence(SI) which relates to the collective behavior of social systems such as honey bees and birds. This study proposes a site facility layout optimization algorithm based on self-organizing maps(SOM). Computational experiments are carried out to justify the efficiency of the proposed method and compare it with particle swarm optimization(PSO). The results show that the proposed algorithm can be efficiently employed to solve the problem of site layout.