통합 검색 | Korea Science

Resume Classification System using Natural Language Processing & Machine Learning Techniques

Irfan Ali;Nimra;Ghulam Mujtaba;Zahid Hussain Khand;Zafar Ali;Sajid Khan
- International Journal of Computer Science & Network Security
- /
- 제24권7호
- /
- pp.108-117
- /
- 2024
The selection and recommendation of a suitable job applicant from the pool of thousands of applications are often daunting jobs for an employer. The recommendation and selection process significantly increases the workload of the concerned department of an employer. Thus, Resume Classification System using the Natural Language Processing (NLP) and Machine Learning (ML) techniques could automate this tedious process and ease the job of an employer. Moreover, the automation of this process can significantly expedite and transparent the applicants' selection process with mere human involvement. Nevertheless, various Machine Learning approaches have been proposed to develop Resume Classification Systems. However, this study presents an automated NLP and ML-based system that classifies the Resumes according to job categories with performance guarantees. This study employs various ML algorithms and NLP techniques to measure the accuracy of Resume Classification Systems and proposes a solution with better accuracy and reliability in different settings. To demonstrate the significance of NLP & ML techniques for processing & classification of Resumes, the extracted features were tested on nine machine learning models Support Vector Machine - SVM (Linear, SGD, SVC & NuSVC), Naïve Bayes (Bernoulli, Multinomial & Gaussian), K-Nearest Neighbor (KNN) and Logistic Regression (LR). The Term-Frequency Inverse Document (TF-IDF) feature representation scheme proven suitable for Resume Classification Task. The developed models were evaluated using F-Score_M, Recall_M, Precission_M, and overall Accuracy. The experimental results indicate that using the One-Vs-Rest-Classification strategy for this multi-class Resume Classification task, the SVM class of Machine Learning algorithms performed better on the study dataset with over 96% overall accuracy. The promising results suggest that NLP & ML techniques employed in this study could be used for the Resume Classification task.
https://doi.org/10.22937/IJCSNS.2024.24.7.13 인용 PDF

A STUDY ON SPATIAL FEATURE EXTRACTION IN THE CLASSIFICATION OF HIGH RESOLUTIION SATELLITE IMAGERY

Han, You-Kyung;Kim, Hye-Jin;Choi, Jae-Wan;Kim, Yong-Il
- 대한원격탐사학회:학술대회논문집
- /
- 대한원격탐사학회 2008년도 International Symposium on Remote Sensing
- /
- pp.361-364
- /
- 2008
It is well known that combining spatial and spectral information can improve land use classification from satellite imagery. High spatial resolution classification has a limitation when only using the spectral information due to the complex spatial arrangement of features and spectral heterogeneity within each class. Therefore, extracting the spatial information is one of the most important steps in high resolution satellite image classification. In this paper, we propose a new spatial feature extraction method. The extracted features are integrated with spectral bands to improve overall classification accuracy. The classification is achieved by applying a Support Vector Machines classifier. In order to evaluate the proposed feature extraction method, we applied our approach to KOMPSAT-2 data and compared the result with the other methods.
PDF

블리스(Bliss)의 서지 분류법에 관한 연구 (A Study on Bliss's Bibliographic Classification)

남태우;유광연
- 정보관리학회지
- /
- 제22권2호
- /
- pp.57-85
- /
- 2005
비십진식 분류법에 속하는 BC는 Henry Evelyn Bliss에 의해서 창안된 것으로, 미국에서 시작되었으나 영국에서 개정되고 현재까지 사용되고 있다. BC는 지식의 분류에 근거하여 주류를 배열했기 때문에 학구적이라는 평가를 받고 있다. 또한 기존 분류 체계 중에서는 가장 완전한 분류법으로 인정받고 있다. 그러나 우수한 분류체계임에도 불구하고, 국내에서는 분류론에 조금씩 언급되어 있을 뿐 그 연구가 체계적으로 분석된 적은 없다. 따라서 본 연구에서는 BC의 창안자인 Bliss에 대한 생애 및 사상 연구를 통해 그가 분류학 분야에 끼친 영향을 분석하고자 한다. 또한 BC에 대한 역사 및 특성 연구를 통해 그 우수성과 가치를 연구하였다. 가장 학구적이라고 평가받고 있는 BC의 연구를 통해 분류학이론에 대한 논리성 및 철학성에 대한 기반을 구축할 수 있을 것이다.
https://doi.org/10.3743/KOSIM.2005.22.2.057 인용 PDF

오류 학습 문서 제거를 통한 문서 범주화 기법의 성능 향상 (A Text Categorization Method Improved by Removing Noisy Training Documents)

한형동;고영중;서정연
- 한국정보과학회논문지:소프트웨어및응용
- /
- 제32권9호
- /
- pp.912-919
- /
- 2005
문서 범주화에서 이진 분류를 다중 분류에 적용할 때 일반적으로 '한 범주에 적합-다른 모든 범주에서는 부적합(One-Against-All) 판정 방법'을 사용한다. 하지만, 이러한 '한 범주에 적합-다른 모든 범주에서는 부적합 판정 방법'은 한 가지 문제점을 가지는데, 적합(positive) 집합의 문서들은 사람이 직접범주를 할당한 것이지만 부적합(negative) 집합의 문서들은 사람이 직접 범주를 할당한 것이 아니기 때문에 오류 문서들이 많이 포함될 수 있다는 것이다. 본 논문에서는 이러한 문제점을 해결하기 위해서 슬라이딩 원도우(sliding window) 기법과 EM 알고리즘을 이진 분류 기반의 문서 범주화에 적용할 것을 제안한다. 제안된 기법은 먼저 슬라이딩 윈도우 기법을 사용하여 오류 문서들을 추출하고 이들을 EM알고리즘을 사용해서 다시 범주를 할당함으로써 이진 분류 기반의 문서 범주화 기법의 성능을 향상시킨다.
PDF KSCI

Robust Face Recognition under Limited Training Sample Scenario using Linear Representation

Iqbal, Omer;Jadoon, Waqas;ur Rehman, Zia;Khan, Fiaz Gul;Nazir, Babar;Khan, Iftikhar Ahmed
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제12권7호
- /
- pp.3172-3193
- /
- 2018
Recently, several studies have shown that linear representation based approaches are very effective and efficient for image classification. One of these linear-representation-based approaches is the Collaborative representation (CR) method. The existing algorithms based on CR have two major problems that degrade their classification performance. First problem arises due to the limited number of available training samples. The large variations, caused by illumintion and expression changes, among query and training samples leads to poor classification performance. Second problem occurs when an image is partially noised (contiguous occlusion), as some part of the given image become corrupt the classification performance also degrades. We aim to extend the collaborative representation framework under limited training samples face recognition problem. Our proposed solution will generate virtual samples and intra-class variations from training data to model the variations effectively between query and training samples. For robust classification, the image patches have been utilized to compute representation to address partial occlusion as it leads to more accurate classification results. The proposed method computes representation based on local regions in the images as opposed to CR, which computes representation based on global solution involving entire images. Furthermore, the proposed solution also integrates the locality structure into CR, using Euclidian distance between the query and training samples. Intuitively, if the query sample can be represented by selecting its nearest neighbours, lie on a same linear subspace then the resulting representation will be more discriminate and accurately classify the query sample. Hence our proposed framework model the limited sample face recognition problem into sufficient training samples problem using virtual samples and intra-class variations, generated from training samples that will result in improved classification accuracy as evident from experimental results. Moreover, it compute representation based on local image patches for robust classification and is expected to greatly increase the classification performance for face recognition task.
https://doi.org/10.3837/tiis.2018.07.011 인용 PDF KSCI

재분류의 이론과 실제 (The Theory and the Practice of the Reclassification)

김명옥
- 한국문헌정보학회지
- /
- 제20권
- /
- pp.127-161
- /
- 1991
This study concerns with the reasons of the revision of the classification scheme and the kinds and methods of the reclassification. The reclassification IS to be implemented in case that classification numbers are wrongly given, or the scheme is revised, or it is wanted that presently using scheme should be changed to a different one. In the case of a revised edition, it is desired that reclassification is made based on the new edition because of the modernization of a data organization. However, in case of that it is not possible for the situations in the library such as the number of collections, staffs, facilities, budget, etc., the old edition can be based and the new one can be referred to. In this case, however, classification numbers may be dualized on one subject, and therefore, library must prepare the reference cards and the marks of the shelves for the different class numbers. Also, because much budget is required when the scheme is changed to another one due to its unsatisfactory usage, it should be carefully considered whether to change or not. The required time in reclassification for the relocated classification number of the revised edition is 18 minutes 54 seconds per volume, and its cost requires W 1,224.
PDF

Naive Bayes classifiers boosted by sufficient dimension reduction: applications to top-k classification

Yang, Su Hyeong;Shin, Seung Jun;Sung, Wooseok;Lee, Choon Won
- Communications for Statistical Applications and Methods
- /
- 제29권5호
- /
- pp.603-614
- /
- 2022
The naive Bayes classifier is one of the most straightforward classification tools and directly estimates the class probability. However, because it relies on the independent assumption of the predictor, which is rarely satisfied in real-world problems, its application is limited in practice. In this article, we propose employing sufficient dimension reduction (SDR) to substantially improve the performance of the naive Bayes classifier, which is often deteriorated when the number of predictors is not restrictively small. This is not surprising as SDR reduces the predictor dimension without sacrificing classification information, and predictors in the reduced space are constructed to be uncorrelated. Therefore, SDR leads the naive Bayes to no longer be naive. We applied the proposed naive Bayes classifier after SDR to build a recommendation system for the eyewear-frames based on customers' face shape, demonstrating its utility in the top-k classification problem.
https://doi.org/10.29220/CSAM.2022.29.5.603 인용 PDF KSCI

A HIERARCHICAL APPROACH TO HIGH-RESOLUTION HYPERSPECTRAL IMAGE CLASSIFICATION OF LITTLE MIAMI RIVER WATERSHED FOR ENVIRONMENTAL MODELING

Heo, Joon;Troyer, Michael;Lee, Jung-Bin;Kim, Woo-Sun
- 대한원격탐사학회:학술대회논문집
- /
- 대한원격탐사학회 2006년도 Proceedings of ISRS 2006 PORSEC Volume II
- /
- pp.647-650
- /
- 2006
Compact Airborne Spectrographic Imager (CASI) hyperspectral imagery was acquired over the Little Miami River Watershed (1756 square miles) in Ohio, U.S.A., which is one of the largest hyperspectral image acquisition. For the development of a 4m-resolution land cover dataset, a hierarchical approach was employed using two different classification algorithms: 'Image Object Segmentation' for level-1 and 'Spectral Angle Mapper' for level-2. This classification scheme was developed to overcome the spectral inseparability of urban and rural features and to deal with radiometric distortions due to cross-track illumination. The land cover class members were lentic, lotic, forest, corn, soybean, wheat, dry herbaceous, grass, urban barren, rural barren, urban/built, and unclassified. The final phase of processing was completed after an extensive Quality Assurance and Quality Control (QA/QC) phase. With respect to the eleven land cover class members, the overall accuracy with a total of 902 reference points was 83.9% at 4m resolution. The dataset is available for public research, and applications of this product will represent an improvement over more commonly utilized data of coarser spatial resolution such as National Land Cover Data (NLCD).
PDF

Evidential Fusion of Multsensor Multichannel Imagery

Lee Sang-Hoon
- 대한원격탐사학회지
- /
- 제22권1호
- /
- pp.75-85
- /
- 2006
This paper has dealt with a data fusion for the problem of land-cover classification using multisensor imagery. Dempster-Shafer evidence theory has been employed to combine the information extracted from the multiple data of same site. The Dempster-Shafer's approach has two important advantages for remote sensing application: one is that it enables to consider a compound class which consists of several land-cover types and the other is that the incompleteness of each sensor data due to cloud-cover can be modeled for the fusion process. The image classification based on the Dempster-Shafer theory usually assumes that each sensor is represented by a single channel. The evidential approach to image classification, which utilizes a mass function obtained under the assumption of class-independent beta distribution, has been discussed for the multiple sets of mutichannel data acquired from different sensors. The proposed method has applied to the KOMPSAT-1 EOC panchromatic imagery and LANDSAT ETM+ data, which were acquired over Yongin/Nuengpyung area of Korean peninsula. The experiment has shown that it is greatly effective on the applications in which it is hard to find homogeneous regions represented by a single land-cover type in training process.
https://doi.org/10.7780/kjrs.2006.22.1.75 인용 PDF KSCI

Multi-Class SVM+MTL for the Prediction of Corporate Credit Rating with Structured Data

Ren, Gang;Hong, Taeho;Park, YoungKi
- Asia pacific journal of information systems
- /
- 제25권3호
- /
- pp.579-596
- /
- 2015
Many studies have focused on the prediction of corporate credit rating using various data mining techniques. One of the most frequently used algorithms is support vector machines (SVM), and recently, novel techniques such as SVM+ and SVM+MTL have emerged. This paper intends to show the applicability of such new techniques to multi-classification and corporate credit rating and compare them with conventional SVM regarding prediction performance. We solve multi-class SVM+ and SVM+MTL problems by constructing several binary classifiers. Furthermore, to demonstrate the robustness and outstanding performance of SVM+MTL algorithm over other techniques, we utilized four typical multi-class processing methods in our experiments. The results show that SVM+MTL outperforms both conventional SVM and novel SVM+ in predicting corporate credit rating. This study contributes to the literature by showing the applicability of new techniques such as SVM+ and SVM+MTL and the outperformance of SVM+MTL over conventional techniques. Thus, this study enriches solving techniques for addressing multi-class problems such as corporate credit rating prediction.
https://doi.org/10.14329/apjis.2015.25.3.579 인용 PDF

검색결과 351건 처리시간 0.024초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)