• Title/Summary/Keyword: neighbor selection

Search Result 130, Processing Time 0.023 seconds

Automatic Selection of Similar Sentences for Teaching Writing in Elementary School (초등 글쓰기 교육을 위한 유사 문장 자동 선별)

  • Park, Youngki
    • Journal of The Korean Association of Information Education
    • /
    • v.20 no.4
    • /
    • pp.333-340
    • /
    • 2016
  • When elementary students write their own sentences, it is often educationally beneficial to compare them with other people's similar sentences. However, it is impractical for use in most classrooms, because it is burdensome for teachers to look up all of the sentences written by students. To cope with this problem, we propose a novel approach for automatic selection of similar sentences based on a three-step process: 1) extracting the subword units from the word-level sentences, 2) training the model with the encoder-decoder architecture, and 3) using the approximate k-nearest neighbor search algorithm to find the similar sentences. Experimental results show that the proposed approach achieves the accuracy of 75% for our test data.

Designing Hypothesis of 2-Substituted-N-[4-(1-methyl-4,5-diphenyl-1H-imidazole-2-yl)phenyl] Acetamide Analogs as Anticancer Agents: QSAR Approach

  • Bedadurge, Ajay B.;Shaikh, Anwar R.
    • Journal of the Korean Chemical Society
    • /
    • v.57 no.6
    • /
    • pp.744-754
    • /
    • 2013
  • Quantitative structure-activity relationship (QSAR) analysis for recently synthesized imidazole-(benz)azole and imidazole - piperazine derivatives was studied for their anticancer activities against breast (MCF-7) cell lines. The statistically significant 2D-QSAR models ($r^2=0.8901$; $q^2=0.8130$; F test = 36.4635; $r^2$ se = 0.1696; $q^2$ se = 0.12212; pred_$r^2=0.4229$; pred_$r^2$ se = 0.4606 and $r^2=0.8763$; $q^2=0.7617$; F test = 31.8737; $r^2$ se = 0.1951; $q^2$ se = 0.2708; pred_$r^2=0.4386$; pred_$r^2$ se = 0.3950) were developed using molecular design suite (VLifeMDS 4.2). The study was performed with 18 compounds (data set) using random selection and manual selection methods used for the division of the data set into training and test set. Multiple linear regression (MLR) methodology with stepwise (SW) forward-backward variable selection method was used for building the QSAR models. The results of the 2D-QSAR models were further compared with 3D-QSAR models generated by kNN-MFA, (k-Nearest Neighbor Molecular Field Analysis) investigating the substitutional requirements for the favorable anticancer activity. The results derived may be useful in further designing novel imidazole-(benz)azole and imidazole-piperazine derivatives against breast (MCF-7) cell lines prior to synthesis.

Resume Classification System using Natural Language Processing & Machine Learning Techniques

  • Irfan Ali;Nimra;Ghulam Mujtaba;Zahid Hussain Khand;Zafar Ali;Sajid Khan
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.7
    • /
    • pp.108-117
    • /
    • 2024
  • The selection and recommendation of a suitable job applicant from the pool of thousands of applications are often daunting jobs for an employer. The recommendation and selection process significantly increases the workload of the concerned department of an employer. Thus, Resume Classification System using the Natural Language Processing (NLP) and Machine Learning (ML) techniques could automate this tedious process and ease the job of an employer. Moreover, the automation of this process can significantly expedite and transparent the applicants' selection process with mere human involvement. Nevertheless, various Machine Learning approaches have been proposed to develop Resume Classification Systems. However, this study presents an automated NLP and ML-based system that classifies the Resumes according to job categories with performance guarantees. This study employs various ML algorithms and NLP techniques to measure the accuracy of Resume Classification Systems and proposes a solution with better accuracy and reliability in different settings. To demonstrate the significance of NLP & ML techniques for processing & classification of Resumes, the extracted features were tested on nine machine learning models Support Vector Machine - SVM (Linear, SGD, SVC & NuSVC), Naïve Bayes (Bernoulli, Multinomial & Gaussian), K-Nearest Neighbor (KNN) and Logistic Regression (LR). The Term-Frequency Inverse Document (TF-IDF) feature representation scheme proven suitable for Resume Classification Task. The developed models were evaluated using F-ScoreM, RecallM, PrecissionM, and overall Accuracy. The experimental results indicate that using the One-Vs-Rest-Classification strategy for this multi-class Resume Classification task, the SVM class of Machine Learning algorithms performed better on the study dataset with over 96% overall accuracy. The promising results suggest that NLP & ML techniques employed in this study could be used for the Resume Classification task.

An Online Forklift Dispatching Algorithm Based on Minimal Cost Assignment Approach (최소 비용할당 기반 온라인 지게차 운영 알고리즘)

  • kwon, BoBae;Son, Jung-Ryoul;Ha, Byung-Hyun
    • Journal of the Korea Society for Simulation
    • /
    • v.27 no.2
    • /
    • pp.71-81
    • /
    • 2018
  • Forklifts in a shipyard lift and transport heavy objects. Tasks occur dynamically and the rate of the task occurrence changes over time. Especially, the rate of the task occurrence is high immediately after morning and afternoon business hours. The weight of objects varies according to task characteristic, and a forklift also has the workable or allowable weight limit. In this study, we propose an online forklift dispatching algorithm based on nearest-neighbor dispatching rule using minimal cost assignment approach in order to attain the efficient operations. The proposed algorithm considers various types of forklift and multiple jobs at the same time to determine the dispatch plan. We generate dummy forklifts and dummy tasks to handle unbalance in the numbers of forklifts and tasks by taking their capacity limits and weights. In addition, a method of systematic forklift selection is also devised considering the condition of the forklift. The performance indicator is the total travel distance and the average task waiting time. We validate our approach against the priority rule-based method of the previous study by discrete-event simulation.

Estimation of Aboveground Biomass Carbon Stock Using Landsat TM and Ratio Images - $k$NN algorithm and Regression Model Priority (Landsat TM 위성영상과 비율영상을 적용한 지상부 탄소 저장량 추정 - $k$NN 알고리즘 및 회귀 모델을 중점적으로)

  • Yoo, Su-Hong;Heo, Joon;Jung, Jae-Hoon;Han, Soo-Hee;Kim, Kyoung-Min
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.19 no.2
    • /
    • pp.39-48
    • /
    • 2011
  • Global warming causes the climate change and makes severe damage to ecosystem and civilization Carbon dioxide greatly contributes to global warming, thus many studies have been conducted to estimate the forest biomass carbon stock as an important carbon storage. However, more studies are required for the selection and use of technique and remotely sensed data suitable for the carbon stock estimation in Korea In this study, the aboveground forest biomass carbon stocks of Danyang-Gun in South Korea was estimated using $k$NN($k$-Nearest Neighbor) algorithm and regression model, then the results were compared. The Landsat TM and 5th NFI(National Forest Inventory) data were prepared, and ratio images, which are effective in topographic effect correction and distinction of forest biomass, were also used. Consequently, it was found that $k$NN algorithm was better than regression model to estimate the forest carbon stocks in Danyang-Gun, and there was no significant improvement in terms of accuracy for the use of ratio images.

An Adaptive Method For Face Recognition Based Filters and Selection of Features (필터 및 특징 선택 기반의 적응형 얼굴 인식 방법)

  • Cho, Byoung-Mo;Kim, Gi-Han;Rhee, Phill-Kyu
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.6
    • /
    • pp.1-8
    • /
    • 2009
  • There are a lot of influences, such as location of camera, luminosity, brightness, and direction of light, which affect the performance of 2-dimensional image recognition. This paper suggests an adaptive method for face-image recognition in noisy environments using evolvable filtering and feature extraction which uses one sample image from camera. This suggested method consists of two main parts. One is the environmental-adjustment module which determines optimum sets of filters, filter parameters, and dimensions of features by using "steady state genetic algorithm". The other another part is for face recognition module which performs recognition of face-image using the previous results. In the processing, we used Gabor wavelet for extracting features in the images and k-Nearest Neighbor method for the classification. For testing of the adaptive face recognition method, we tested the adaptive method in the brightness noise, in the impulse noise and in the composite noise and verified that the adaptive method protects face recognition-rate's rapidly decrease which can be occurred generally in the noisy environments.

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

Visualized Determination for Installation Location of Monitoring Devices using CPTED (CPTED기법을 통한 모니터링 시스템 설치위치 시각화 결정법)

  • Kim, Joohwan;Nam, Doohee
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.15 no.2
    • /
    • pp.145-150
    • /
    • 2015
  • Needs about safety of residents are important in urbanized society, elderly and small-size family. People are looking for safety information system and device of CPTED. That is, Needs and Installations of CCTV increased steadily. But, scientific analysis about validity, systematic plan and location of security CCTV is nonexistent. It is simply put these devised in more demanded areas. It has limits to look for safety of residents by increasing density of CCTVs. One of the characteristics of crime is clustering and stong interconnectivity. So, exploratory spatial data of crime is geo-coded using 2 years data and carried out cluster analysis and space statistical analysis through GIS space analysis by dividing 18 variables into social economy, urban space, crime prevention facility and crime occurrence index. The result of analysis shows cluster of 5 major crimes, theft, violence and sexual violence by Nearest Neighbor distance analysis and Ripley's K function. It also shows strong crime interconnectivity through criminal correlation analysis. In case of finding criminal cluster, you can find criminal hotspot. So, in this study I found concept of hotspot and considered technique about selection of hotspot. And then, selected hotspot about 5 major crimes, theft, violence and sexual violence through Nearest Neighbor Hierarchical Spatial Clustering.

Query Slipping Prevention for Trajectory-based Contents Publishing and Subscribing in Wireless Sensor Networks (무선 센서 네트워크에서의 궤도 기반 콘텐츠 발간 및 구독을 위한 질의 이탈 방지)

  • Tscha, Yeong-Hwan
    • Journal of KIISE:Information Networking
    • /
    • v.32 no.4
    • /
    • pp.525-534
    • /
    • 2005
  • This paper is concerned with the query slipping and its prevention for trajectory-based matchmaking service in wireless sensor networks. The problem happens when a query propagating along a subscribe trajectory moves through a publish trajectory without obtaining desired information, even though two trajectories intersect geometrically. There follows resubmission of the query or initiation of another subscribe trajectory Thus, query slipping results in considerable time delay and in the worst, looping in the trajectory or query flooding the network. We address the problem formally and suggest a solution. First, the area where nodes are distributed is logically partitioned into smaller grids, and a grid-based multicast next-hop selection algorithm is proposed. Our algorithm not only attempts to make the trajectory straight but also considers the nodal density of recipient nodes and the seamless grid-by-grid multicast. We prove that the publishing and subscribing using the algorithm eventually eliminate the possibility of the slipping. It toms out that our algorithm dissipates significantly less power of neighbor nodes, compared to the non grid-based method, as greedy forwarding, and the fixed- sized grid approach, as GAF (Geographical Adaptive Fidelity)

Improving of kNN-based Korean text classifier by using heuristic information (경험적 정보를 이용한 kNN 기반 한국어 문서 분류기의 개선)

  • Lim, Heui-Seok;Nam, Kichun
    • The Journal of Korean Association of Computer Education
    • /
    • v.5 no.3
    • /
    • pp.37-44
    • /
    • 2002
  • Automatic text classification is a task of assigning predefined categories to free text documents. Its importance is increased to organize and manage a huge amount of text data. There have been some researches on automatic text classification based on machine learning techniques. While most of them was focused on proposal of a new machine learning methods and cross evaluation between other systems, a through evaluation or optimization of a method has been rarely been done. In this paper, we propose an improving method of kNN-based Korean text classification system using heuristic informations about decision function, the number of nearest neighbor, and feature selection method. Experimental results showed that the system with similarity-weighted decision function, global method in considering neighbors, and DF/ICF feature selection was more accurate than simple kNN-based classifier. Also, we found out that the performance of the local method with well chosen k value was as high as that of the global method with much computational costs.

  • PDF