• Title/Summary/Keyword: k-최근접 이웃 알고리즘

Search Result 82, Processing Time 0.027 seconds

Exploratory Research on Automating the Analysis of Scientific Argumentation Using Machine Learning (머신 러닝을 활용한 과학 논변 구성 요소 코딩 자동화 가능성 탐색 연구)

  • Lee, Gyeong-Geon;Ha, Heesoo;Hong, Hun-Gi;Kim, Heui-Baik
    • Journal of The Korean Association For Science Education
    • /
    • v.38 no.2
    • /
    • pp.219-234
    • /
    • 2018
  • In this study, we explored the possibility of automating the process of analyzing elements of scientific argument in the context of a Korean classroom. To gather training data, we collected 990 sentences from science education journals that illustrate the results of coding elements of argumentation according to Toulmin's argumentation structure framework. We extracted 483 sentences as a test data set from the transcription of students' discourse in scientific argumentation activities. The words and morphemes of each argument were analyzed using the Python 'KoNLPy' package and the 'Kkma' module for Korean Natural Language Processing. After constructing the 'argument-morpheme:class' matrix for 1,473 sentences, five machine learning techniques were applied to generate predictive models relating each sentences to the element of argument with which it corresponded. The accuracy of the predictive models was investigated by comparing them with the results of pre-coding by researchers and confirming the degree of agreement. The predictive model generated by the k-nearest neighbor algorithm (KNN) demonstrated the highest degree of agreement [54.04% (${\kappa}=0.22$)] when machine learning was performed with the consideration of morpheme of each sentence. The predictive model generated by the KNN exhibited higher agreement [55.07% (${\kappa}=0.24$)] when the coding results of the previous sentence were added to the prediction process. In addition, the results indicated importance of considering context of discourse by reflecting the codes of previous sentences to the analysis. The results have significance in that, it showed the possibility of automating the analysis of students' argumentation activities in Korean language by applying machine learning.

Efficient Processing of k-Farthest Neighbor Queries for Road Networks

  • Kim, Taelee;Cho, Hyung-Ju;Hong, Hee Ju;Nam, Hyogeun;Cho, Hyejun;Do, Gyung Yoon;Jeon, Pilkyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.10
    • /
    • pp.79-89
    • /
    • 2019
  • While most research focuses on the k-nearest neighbors (kNN) queries in the database community, an important type of proximity queries called k-farthest neighbors (kFN) queries has not received much attention. This paper addresses the problem of finding the k-farthest neighbors in road networks. Given a positive integer k, a query object q, and a set of data points P, a kFN query returns k data objects farthest from the query object q. Little attention has been paid to processing kFN queries in road networks. The challenge of processing kFN queries in road networks is reducing the number of network distance computations, which is the most prominent difference between a road network and a Euclidean space. In this study, we propose an efficient algorithm called FANS for k-FArthest Neighbor Search in road networks. We present a shared computation strategy to avoid redundant computation of the distances between a query object and data objects. We also present effective pruning techniques based on the maximum distance from a query object to data segments. Finally, we demonstrate the efficiency and scalability of our proposed solution with extensive experiments using real-world roadmaps.

Development of methodology for daily rainfall simulation considering distribution of rainfall events in each duration (강우사상의 지속기간별 분포 특성을 고려한 일강우 모의 기법 개발)

  • Jung, Jaewon;Kim, Soojun;Kim, Hung Soo
    • Journal of Korea Water Resources Association
    • /
    • v.52 no.2
    • /
    • pp.141-148
    • /
    • 2019
  • When simulating the daily rainfall amount by existing Markov Chain model, it is general to simulate the rainfall occurrence and to estimate the rainfall amount randomly from the distribution which is similar to the daily rainfall distribution characteristic using Monte Carlo simulation. At this time, there is a limitation that the characteristics of rainfall intensity and distribution by time according to the rainfall duration are not reflected in the results. In this study, 1-day, 2-day, 3-day, 4-day rainfall event are classified, and the rainfall amount is estimated by rainfall duration. In other words, the distributions of the total amount of rainfall event by the duration are set using the Kernel Density Estimation (KDE), the daily rainfall in each day are estimated from the distribution of each duration. Total rainfall amount determined for each event are divided into each daily rainfall considering the type of daily distribution of the rainfall event which has most similar rainfall amount of the observed rainfall using the k-Nearest Neighbor algorithm (KNN). This study is to develop the limitation of the existing rainfall estimation method, and it is expected that this results can use for the future rainfall estimation and as the primary data in water resource design.

Smartphone Accelerometer-Based Gesture Recognition and its Robotic Application (스마트폰 가속도 센서 기반의 제스처 인식과 로봇 응용)

  • Nam, Sang-Ha;Kim, Joo-Hee;Heo, Se-Kyeong;Kim, In-Cheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.6
    • /
    • pp.395-402
    • /
    • 2013
  • We propose an accelerometer-based gesture recognition method for smartphone users. In our method, similarities between a new time series accelerometer data and each gesture exemplar are computed with DTW algorithm, and then the best matching gesture is determined based on k-NN algorithm. In order to investigate the performance of our method, we implemented a gesture recognition program working on an Android smartphone and a gesture-based teleoperating robot system. Through a set of user-mixed and user-independent experiments, we showed that the proposed method and implementation have high performance and scalability.

Dense-Depth Map Estimation with LiDAR Depth Map and Optical Images based on Self-Organizing Map (라이다 깊이 맵과 이미지를 사용한 자기 조직화 지도 기반의 고밀도 깊이 맵 생성 방법)

  • Choi, Hansol;Lee, Jongseok;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.26 no.3
    • /
    • pp.283-295
    • /
    • 2021
  • This paper proposes a method for generating dense depth map using information of color images and depth map generated based on lidar based on self-organizing map. The proposed depth map upsampling method consists of an initial depth prediction step for an area that has not been acquired from LiDAR and an initial depth filtering step. In the initial depth prediction step, stereo matching is performed on two color images to predict an initial depth value. In the depth map filtering step, in order to reduce the error of the predicted initial depth value, a self-organizing map technique is performed on the predicted depth pixel by using the measured depth pixel around the predicted depth pixel. In the process of self-organization map, a weight is determined according to a difference between a distance between a predicted depth pixel and an measured depth pixel and a color value corresponding to each pixel. In this paper, we compared the proposed method with the bilateral filter and k-nearest neighbor widely used as a depth map upsampling method for performance comparison. Compared to the bilateral filter and the k-nearest neighbor, the proposed method reduced by about 6.4% and 8.6% in terms of MAE, and about 10.8% and 14.3% in terms of RMSE.

Design and Implementation of a Web Server Using a Learning-based Dynamic Thread Pool Scheme (학습 기반의 동적 쓰레드 풀 기법을 적용한 웹 서버의 설계 및 구현)

  • Yoo, Seo-Hee;Kang, Dong-Hyun;Lee, Kwon-Yong;Park, Sung-Yong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.1
    • /
    • pp.23-34
    • /
    • 2010
  • As the number of user increases according to the improvement of the network, the multi-thread schemes are used to process the service requests of several users who are connected simultaneously. The static thread pool scheme has the problem of occupying a static amount of system resources. On the other hand, the dynamic thread pool scheme can control the number of threads according to the users' requests. However, it has disadvantage that this scheme cannot react to the requests which are larger than the maximum value assigned. In this paper, a web server using a learning-based dynamic thread pool scheme is suggested, which will be running on a server programming of a multi-thread environment. The suggested scheme adds the creation of the threads through the prediction of the next number of periodic requests using Auto Regressive scheme with the web server apache worker MPM (Multi-processing Module). Unlike previous schemes, in order to set the exact number of the necessary threads during the unchanged number of work requests in a certain period, K-Nearest Neighbor algorithm is used to learn the number of threads in advance according to the number of requests. The required number of threads is set by comparing with the previously learned objects. Then, the similar objects are selected to decide the number of the threads according to the request, and they create the threads. In this paper, the response time has decreased by modifying the number of threads dynamically, and the system resources can be used more efficiently by managing the number of threads according to the requests.

A Study on the Data Fusion Method using Decision Rule for Data Enrichment (의사결정 규칙을 이용한 데이터 통합에 관한 연구)

  • Kim S.Y.;Chung S.S.
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.2
    • /
    • pp.291-303
    • /
    • 2006
  • Data mining is the work to extract information from existing data file. So, the one of best important thing in data mining process is the quality of data to be used. In this thesis, we propose the data fusion technique using decision rule for data enrichment that one phase to improve data quality in KDD process. Simulations were performed to compare the proposed data fusion technique with the existing techniques. As a result, our data fusion technique using decision rule is characterized with low MSE or misclassification rate in fusion variables.

Binary classification by the combination of Adaboost and feature extraction methods (특징 추출 알고리즘과 Adaboost를 이용한 이진분류기)

  • Ham, Seaung-Lok;Kwak, No-Jun
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.4
    • /
    • pp.42-53
    • /
    • 2012
  • In pattern recognition and machine learning society, classification has been a classical problem and the most widely researched area. Adaptive boosting also known as Adaboost has been successfully applied to binary classification problems. It is a kind of boosting algorithm capable of constructing a strong classifier through a weighted combination of weak classifiers. On the other hand, the PCA and LDA algorithms are the most popular linear feature extraction methods used mainly for dimensionality reduction. In this paper, the combination of Adaboost and feature extraction methods is proposed for efficient classification of two class data. Conventionally, in classification problems, the roles of feature extraction and classification have been distinct, i.e., a feature extraction method and a classifier are applied sequentially to classify input variable into several categories. In this paper, these two steps are combined into one resulting in a good classification performance. More specifically, each projection vector is treated as a weak classifier in Adaboost algorithm to constitute a strong classifier for binary classification problems. The proposed algorithm is applied to UCI dataset and FRGC dataset and showed better recognition rates than sequential application of feature extraction and classification methods.

Automatic Document Classification Based on k-NN Classifier and Object-Based Thesaurus (k-NN 분류 알고리즘과 객체 기반 시소러스를 이용한 자동 문서 분류)

  • Bang Sun-Iee;Yang Jae-Dong;Yang Hyung-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.9
    • /
    • pp.1204-1217
    • /
    • 2004
  • Numerous statistical and machine learning techniques have been studied for automatic text classification. However, because they train the classifiers using only feature vectors of documents, ambiguity between two possible categories significantly degrades precision of classification. To remedy the drawback, we propose a new method which incorporates relationship information of categories into extant classifiers. In this paper, we first perform the document classification using the k-NN classifier which is generally known for relatively good performance in spite of its simplicity. We employ the relationship information from an object-based thesaurus to reduce the ambiguity. By referencing various relationships in the thesaurus corresponding to the structured categories, the precision of k-NN classification is drastically improved, removing the ambiguity. Experiment result shows that this method achieves the precision up to 13.86% over the k-NN classification, preserving its recall.

A Study on Data Clustering of Light Buoy Using DBSCAN(I) (DBSCAN을 이용한 등부표 위치 데이터 Clustering 연구(I))

  • Gwang-Young Choi;So-Ra Kim;Sang-Won Park;Chae-Uk Song
    • Journal of Navigation and Port Research
    • /
    • v.47 no.4
    • /
    • pp.231-238
    • /
    • 2023
  • The position of a light buoy is always flexible due to the influence of external forces such as tides and wind. The position can be checked through AIS (Automatic Identification System) or RTU (Remote Terminal Unit) for AtoN. As a result of analyzing the position data for the last five years (2017-2021) of a light buoy, the average position error was 15.4%. It is necessary to detect position error data and obtain refined position data to prevent navigation safety accidents and management. This study aimed to detect position error data and obtain refined position data by DBSCAN Clustering position data obtained through AIS or RTU for AtoN. For this purpose, 21 position data of Gunsan Port No. 1 light buoy where RTU was installed among western waters with the most position errors were DBSCAN clustered using Python library. The minPts required for DBSCAN Clustering applied the value commonly used for two-dimensional data. Epsilon was calculated and its value was applied using the k-NN (nearest neighbor) algorithm. As a result of DBSCAN Clustering, position error data that did not satisfy minPts and epsilon were detected and refined position data were acquired. This study can be used as asic data for obtaining reliable position data of a light buoy installed with AIS or RTU for AtoN. It is expected to be of great help in preventing navigation safety accidents.