• Title/Summary/Keyword: document frequency

Search Result 304, Processing Time 0.026 seconds

Resume Classification System using Natural Language Processing & Machine Learning Techniques

  • Irfan Ali;Nimra;Ghulam Mujtaba;Zahid Hussain Khand;Zafar Ali;Sajid Khan
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.7
    • /
    • pp.108-117
    • /
    • 2024
  • The selection and recommendation of a suitable job applicant from the pool of thousands of applications are often daunting jobs for an employer. The recommendation and selection process significantly increases the workload of the concerned department of an employer. Thus, Resume Classification System using the Natural Language Processing (NLP) and Machine Learning (ML) techniques could automate this tedious process and ease the job of an employer. Moreover, the automation of this process can significantly expedite and transparent the applicants' selection process with mere human involvement. Nevertheless, various Machine Learning approaches have been proposed to develop Resume Classification Systems. However, this study presents an automated NLP and ML-based system that classifies the Resumes according to job categories with performance guarantees. This study employs various ML algorithms and NLP techniques to measure the accuracy of Resume Classification Systems and proposes a solution with better accuracy and reliability in different settings. To demonstrate the significance of NLP & ML techniques for processing & classification of Resumes, the extracted features were tested on nine machine learning models Support Vector Machine - SVM (Linear, SGD, SVC & NuSVC), Naïve Bayes (Bernoulli, Multinomial & Gaussian), K-Nearest Neighbor (KNN) and Logistic Regression (LR). The Term-Frequency Inverse Document (TF-IDF) feature representation scheme proven suitable for Resume Classification Task. The developed models were evaluated using F-ScoreM, RecallM, PrecissionM, and overall Accuracy. The experimental results indicate that using the One-Vs-Rest-Classification strategy for this multi-class Resume Classification task, the SVM class of Machine Learning algorithms performed better on the study dataset with over 96% overall accuracy. The promising results suggest that NLP & ML techniques employed in this study could be used for the Resume Classification task.

A Study on the Characteristics of Social Worker's Duties and Type of Qualification 1st, 2nd, and 3rd Grades Social Worker (사회복지사의 직무특성과 1급과 2,3급의 직무 유형화에 관한 연구)

  • Kang, Heung-Gu
    • Korean Journal of Social Welfare
    • /
    • v.58 no.1
    • /
    • pp.209-235
    • /
    • 2006
  • This study was performed to analysis the social worker's duty in korea and suggest for typology duties of 1st grade social worker, 2nd or 3th grade social worker's qualification. 911 social workers responded to questioner. This objectives was accomplished by the measuring to frequency of duties, and the qualifying cognition of each grade social worker's duties. As a result, social worker spent more time carried out maintenance of facility, direct services, intake, management of file and official document duties than personal management, planning and financial program, evaluation and termination duties. Type of social worker's duties fined out composed of 4 type. Type I(The 1st grade social worker's duties) was belonged to 53 task elements, type IV(The 2nd or 3rd grade social worker's duties) was subjected to 11 task elements. 21 task elements performed to either 1st grade social worker or 2nd, 3rd grade social worker by type or uniqueness of social work practices. To allocation of duties by each grade social worker, fitting out of qualification system, the task elements for each grade of social worker must be prescribed by the rules. This allocation of duties by each grade social worker would be utilized to support qualified social work services, and to strengthening of their professional.

  • PDF

Analyzing Comments of YouTube Video to Measure Use and Gratification Theory Using Videos of Trot Singer, Cho Myung-sub (YouTube 동영상 의견분석을 통한 사용과 충족 이론 측정 : 트로트 가수 조명섭 동영상을 중심으로)

  • Hong, Han-Kook;Leem, Byung-hak;Kim, Sam-Moon
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.9
    • /
    • pp.29-42
    • /
    • 2020
  • The purpose of this study is to present a qualitative research method for extracting and analyzing the comments written by YouTube video users. To do this, we used YouTube users' feedback to measure the hedonic, social, and utilitarian gratification of use and gratification theory(UGT) through by using analysis and topic modeling. The result of the measurement found that the first reason why users watch the trot singer, Cho Myung-sub's video in the KBS Korean broadcasting channel is to achieve hedonic gratification with high frequency. In word-document network analysis, the degree of centrality was high in words, such as 'cheering', 'thank you', 'fighting', and 'best'. Betweenness centrality is similar to the degree of centrality. Eigenvector centrality also shows that words such as 'love', 'heart', and 'thank you' are the most influential words of users' opinions. The results of the centrality analysis present that the majority of video users show their 'love', 'heart' and 'thank you' for the video. it indicates that the high words in centrality analysis is consistent with the high frequency words of hedonic and social gratification dimension of the UGT. The study has research methodological implication that shed light on the motivations for watching YouTube videos with UGT using text mining techniques that automate qualitative analysis, rather than following a survey-based structural equation model.

Outcome of Inhaler Withdrawal in Patients Receiving Triple Therapy for COPD

  • Kim, Sae Ahm;Lee, Ji-Hyun;Kim, Eun-Kyung;Kim, Tae-Hyung;Kim, Woo Jin;Lee, Jin Hwa;Yoon, Ho Il;Baek, Seunghee;Lee, Jae Seung;Oh, Yeon-Mok;Lee, Sang-Do
    • Tuberculosis and Respiratory Diseases
    • /
    • v.79 no.1
    • /
    • pp.22-30
    • /
    • 2016
  • Background: The purpose of this study was to document outcomes following withdrawal of a single inhaler (step-down) in chronic obstructive pulmonary disease (COPD) patients on triple therapy (long-acting muscarinic antagonist and a combination of long-acting ${\beta}2$-agonists and inhaled corticosteroid), which a common treatment strategy in clinical practice. Methods: Through a retrospective observational study, COPD patients receiving triple therapy over 2 years (triple group; n=109) were compared with those who had undergone triple therapy for at least 1 year and subsequently, over 9 months, initiated inhaler withdrawal (step-down group, n=39). The index time was defined as the time of withdrawal in the step-down group and as 1 year after the start of triple therapy in the triple group. Results: Lung function at the index time was superior and the previous exacerbation frequency was lower in the step-down group than in the triple group. Step-down resulted in aggravating disease symptoms, a reduced overall quality of life, decreasing exercise performance, and accelerated forced expiratory volume in 1 second ($FEV_1$) decline ($54.7{\pm}15.7mL/yr$ vs. $10.7{\pm}7.1mL/yr$, p=0.007), but there was no observed increase in the frequency of exacerbations. Conclusion: Withdrawal of a single inhaler during triple therapy in COPD patients should be conducted with caution as it may impair the exercise capacity and quality of life while accelerating $FEV_1$ decline.

Analysis of the Yearbook from the Korea Meteorological Administration using a text-mining agorithm (텍스트 마이닝 알고리즘을 이용한 기상청 기상연감 자료 분석)

  • Sun, Hyunseok;Lim, Changwon;Lee, YungSeop
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.4
    • /
    • pp.603-613
    • /
    • 2017
  • Many people have recently posted about personal interests on social media. The development of the Internet and computer technology has enabled the storage of digital forms of documents that has resulted in an explosion of the amount of textual data generated; subsequently there is an increased demand for technology to create valuable information from a large number of documents. A text mining technique is often used since text-based data is mostly composed of unstructured forms that are not suitable for the application of statistical analysis or data mining techniques. This study analyzed the Meteorological Yearbook data of the Korea Meteorological Administration (KMA) with a text mining technique. First, a term dictionary was constructed through preprocessing and a term-document matrix was generated. This term dictionary was then used to calculate the annual frequency of term, and observe the change in relative frequency for frequently appearing words. We also used regression analysis to identify terms with increasing and decreasing trends. We analyzed the trends in the Meteorological Yearbook of the KMA and analyzed trends of weather related news, weather status, and status of work trends that the KMA focused on. This study is to provide useful information that can help analyze and improve the meteorological services and reflect meteorological policy.

Application of Text-Classification Based Machine Learning in Predicting Psychiatric Diagnosis (텍스트 분류 기반 기계학습의 정신과 진단 예측 적용)

  • Pak, Doohyun;Hwang, Mingyu;Lee, Minji;Woo, Sung-Il;Hahn, Sang-Woo;Lee, Yeon Jung;Hwang, Jaeuk
    • Korean Journal of Biological Psychiatry
    • /
    • v.27 no.1
    • /
    • pp.18-26
    • /
    • 2020
  • Objectives The aim was to find effective vectorization and classification models to predict a psychiatric diagnosis from text-based medical records. Methods Electronic medical records (n = 494) of present illness were collected retrospectively in inpatient admission notes with three diagnoses of major depressive disorder, type 1 bipolar disorder, and schizophrenia. Data were split into 400 training data and 94 independent validation data. Data were vectorized by two different models such as term frequency-inverse document frequency (TF-IDF) and Doc2vec. Machine learning models for classification including stochastic gradient descent, logistic regression, support vector classification, and deep learning (DL) were applied to predict three psychiatric diagnoses. Five-fold cross-validation was used to find an effective model. Metrics such as accuracy, precision, recall, and F1-score were measured for comparison between the models. Results Five-fold cross-validation in training data showed DL model with Doc2vec was the most effective model to predict the diagnosis (accuracy = 0.87, F1-score = 0.87). However, these metrics have been reduced in independent test data set with final working DL models (accuracy = 0.79, F1-score = 0.79), while the model of logistic regression and support vector machine with Doc2vec showed slightly better performance (accuracy = 0.80, F1-score = 0.80) than the DL models with Doc2vec and others with TF-IDF. Conclusions The current results suggest that the vectorization may have more impact on the performance of classification than the machine learning model. However, data set had a number of limitations including small sample size, imbalance among the category, and its generalizability. With this regard, the need for research with multi-sites and large samples is suggested to improve the machine learning models.

A Method for Prediction of Quality Defects in Manufacturing Using Natural Language Processing and Machine Learning (자연어 처리 및 기계학습을 활용한 제조업 현장의 품질 불량 예측 방법론)

  • Roh, Jeong-Min;Kim, Yongsung
    • Journal of Platform Technology
    • /
    • v.9 no.3
    • /
    • pp.52-62
    • /
    • 2021
  • Quality control is critical at manufacturing sites and is key to predicting the risk of quality defect before manufacturing. However, the reliability of manual quality control methods is affected by human and physical limitations because manufacturing processes vary across industries. These limitations become particularly obvious in domain areas with numerous manufacturing processes, such as the manufacture of major nuclear equipment. This study proposed a novel method for predicting the risk of quality defects by using natural language processing and machine learning. In this study, production data collected over 6 years at a factory that manufactures main equipment that is installed in nuclear power plants were used. In the preprocessing stage of text data, a mapping method was applied to the word dictionary so that domain knowledge could be appropriately reflected, and a hybrid algorithm, which combined n-gram, Term Frequency-Inverse Document Frequency, and Singular Value Decomposition, was constructed for sentence vectorization. Next, in the experiment to classify the risky processes resulting in poor quality, k-fold cross-validation was applied to categorize cases from Unigram to cumulative Trigram. Furthermore, for achieving objective experimental results, Naive Bayes and Support Vector Machine were used as classification algorithms and the maximum accuracy and F1-score of 0.7685 and 0.8641, respectively, were achieved. Thus, the proposed method is effective. The performance of the proposed method were compared and with votes of field engineers, and the results revealed that the proposed method outperformed field engineers. Thus, the method can be implemented for quality control at manufacturing sites.

A Study on the Analysis of Related Information through the Establishment of the National Core Technology Network: Focused on Display Technology (국가핵심기술 관계망 구축을 통한 연관정보 분석연구: 디스플레이 기술을 중심으로)

  • Pak, Se Hee;Yoon, Won Seok;Chang, Hang Bae
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.2
    • /
    • pp.123-141
    • /
    • 2021
  • As the dependence of technology on the economic structure increases, the importance of National Core Technology is increasing. However, due to the nature of the technology itself, it is difficult to determine the scope of the technology to be protected because the scope of the relation is abstract and information disclosure is limited due to the nature of the National Core Technology. To solve this problem, we propose the most appropriate literature type and method of analysis to distinguish important technologies related to National Core Technology. We conducted a pilot test to apply TF-IDF, and LDA topic modeling, two techniques of text mining analysis for big data analysis, to four types of literature (news, papers, reports, patents) collected with National Core Technology keywords in the field of Display industry. As a result, applying LDA theme modeling to patent data are highly relevant to National Core Technology. Important technologies related to the front and rear industries of displays, including OLEDs and microLEDs, were identified, and the results were visualized as networks to clarify the scope of important technologies associated with National Core Technology. Throughout this study, we have clarified the ambiguity of the scope of association of technologies and overcome the limited information disclosure characteristics of national core technologies.

Methodological Status and Improvement of Additional Evaluation of Health Impact Items in Environmental Impact Assessment (환경영향평가서 내 건강영향 항목 추가·평가의 방법론적 현황과 개선)

  • Ha, Jongsik
    • Journal of Environmental Impact Assessment
    • /
    • v.29 no.6
    • /
    • pp.453-466
    • /
    • 2020
  • The addition and evaluation of health impact items in Environmental Impact Assessment document are written in hygiene and public health items only for specific development projects and are being reviewed. However, after the publication of the evaluation manual on the addition and evaluation of health impact items in 2011, there is a demand for continuous methodology and improvement plans despite partial improvement. Therefore, in order to propose a methodological improvement of the evaluation manual, this technical paper identified detailed improvement requirements based on the consultation opinions on hygiene and public health items, and investigated and suggested ways to solve this problem by reviewing the contents of the research so far. As for the improvement requirements, the contents related to mitigation plan, post management, effect prediction, assessment, and present-condition investigation were presented in Environmental Impact Assessment documents for the entire development project at a frequency of 93%, 85%, 80%, 74%, and 67%, respectively. Particularly, the detailed improvement requirements related to mitigation plan consisted of an establishment direction and a management of development project. Considering the current evaluation manual and the frequency of improvement requirements, this paper proposed concrete methods or improvement plans for major methodologies for each classification of hygiene and public health items. Furthermore, a comprehensive evaluation methodology related to whether a project is implemented was proposed, which is not provided in the current assessment manual.

Analysis of major issues in the field of Maritime Autonomous Surface Ships using text mining: focusing on S.Korea news data (텍스트 마이닝을 활용한 자율운항선박 분야 주요 이슈 분석 : 국내 뉴스 데이터를 중심으로)

  • Hyeyeong Lee;Jin Sick Kim;Byung Soo Gu;Moon Ju Nam;Kook Jin Jang;Sung Won Han;Joo Yeoun Lee;Myoung Sug Chung
    • Journal of the Korean Society of Systems Engineering
    • /
    • v.20 no.spc1
    • /
    • pp.12-29
    • /
    • 2024
  • The purpose of this study is to identify the social issues discussed in Korea regarding Maritime Autonomous Surface Ships (MASS), the most advanced ICT field in the shipbuilding industry, and to suggest policy implications. In recent years, it has become important to reflect social issues of public interest in the policymaking process. For this reason, an increasing number of studies use media data and social media to identify public opinion. In this study, we collected 2,843 domestic media articles related to MASS from 2017 to 2022, when MASS was officially discussed at the International Maritime Organization, and analyzed them using text mining techniques. Through term frequency-inverse document frequency (TF-IDF) analysis, major keywords such as 'shipbuilding,' 'shipping,' 'US,' and 'HD Hyundai' were derived. For LDA topic modeling, we selected eight topics with the highest coherence score (-2.2) and analyzed the main news for each topic. According to the combined analysis of five years, the topics '1. Technology integration of the shipbuilding industry' and '3. Shipping industry in the post-COVID-19 era' received the most media attention, each accounting for 16%. Conversely, the topic '5. MASS pilotage areas' received the least media attention, accounting for 8 percent. Based on the results of the study, the implications for policy, society, and international security are as follows. First, from a policy perspective, the government should consider the current situation of each industry sector and introduce MASS in stages and carefully, as they will affect the shipbuilding, port, and shipping industries, and a radical introduction may cause various adverse effects. Second, from a social perspective, while the positive aspects of MASS are often reported, there are also negative issues such as cybersecurity issues and the loss of seafarer jobs, which require institutional development and strategic commercialization timing. Third, from a security perspective, MASS are expected to change the paradigm of future maritime warfare, and South Korea is promoting the construction of a maritime unmanned system-based power, but it emphasizes the need for a clear plan and military leadership to secure and develop the technology. This study has academic and policy implications by shedding light on the multidimensional political and social issues of MASS through news data analysis, and suggesting implications from national, regional, strategic, and security perspectives beyond legal and institutional discussions.