• 제목/요약/키워드: machine learning techniques

검색결과 1,073건 처리시간 0.026초

텍스트 마이닝과 기계 학습을 이용한 국내 가짜뉴스 예측 (Fake News Detection for Korean News Using Text Mining and Machine Learning Techniques)

  • 윤태욱;안현철
    • Journal of Information Technology Applications and Management
    • /
    • 제25권1호
    • /
    • pp.19-32
    • /
    • 2018
  • Fake news is defined as the news articles that are intentionally and verifiably false, and could mislead readers. Spread of fake news may provoke anxiety, chaos, fear, or irrational decisions of the public. Thus, detecting fake news and preventing its spread has become very important issue in our society. However, due to the huge amount of fake news produced every day, it is almost impossible to identify it by a human. Under this context, researchers have tried to develop automated fake news detection method using Artificial Intelligence techniques over the past years. But, unfortunately, there have been no prior studies proposed an automated fake news detection method for Korean news. In this study, we aim to detect Korean fake news using text mining and machine learning techniques. Our proposed method consists of two steps. In the first step, the news contents to be analyzed is convert to quantified values using various text mining techniques (Topic Modeling, TF-IDF, and so on). After that, in step 2, classifiers are trained using the values produced in step 1. As the classifiers, machine learning techniques such as multiple discriminant analysis, case based reasoning, artificial neural networks, and support vector machine can be applied. To validate the effectiveness of the proposed method, we collected 200 Korean news from Seoul National University's FactCheck (http://factcheck.snu.ac.kr). which provides with detailed analysis reports from about 20 media outlets and links to source documents for each case. Using this dataset, we will identify which text features are important as well as which classifiers are effective in detecting Korean fake news.

한글 저자명 중의성 해소를 위한 기계학습기법의 적용 (Application of Machine Learning Techniques for Resolving Korean Author Names)

  • 강인수
    • 정보관리학회지
    • /
    • 제25권3호
    • /
    • pp.27-39
    • /
    • 2008
  • 동일한 인명을 갖는 서로 다른 실세계 사람들이 존재하는 현실은 인터넷 세계에서 인명으로 표현된 개체의 신원을 식별해야 하는 문제를 발생시킨다. 상기의 문제가 학술정보 내의 저자명 개체로 제한된 경우를 저자식별이라 부른다. 저자식별은 식별 대상이 되는 저자명 개체 사이의 유사도 즉 저자유사도를 계산하는 단계와 이후 저자명 개체들을 군집화하는 단계로 이루어진다. 저자유사도는 공저자, 논문제목, 게재지정보 등의 저자식별자질들의 자질유사도로부터 계산되는데, 이를 위해 기존에 교사방법과 비교사방법들이 사용되었다. 저자식별된 학습샘플을 사용하는 교사방법은 비교사방법에 비해 다양한 저자식별자진들을 결합하는 최저의 저자유사도함수를 자동학습할 수 있다는 장점이 있다. 그러나, 기존교사방법 연구에서는 SVM, MEM 등의 일부 기계학습기법만이 시도되었다. 이 논문은 다양한 기계학습기법들이 저자식별에 미치는 성능, 오류, 효율성을 비교하고, 공저자와 논문제목 자질에 대해 자질값 추출 및 자질 유사도 계산을 위한 여러 기법들의 비교분석을 제공한다.

스마트폰 과의존 판별을 위한 기계 학습 기법의 응용 (Application of Machine Learning Techniques for Problematic Smartphone Use)

  • 김우성;한준희
    • 아태비즈니스연구
    • /
    • 제13권3호
    • /
    • pp.293-309
    • /
    • 2022
  • Purpose - The purpose of this study is to explore the possibility of predicting the degree of smartphone overdependence based on mobile phone usage patterns. Design/methodology/approach - In this study, a survey conducted by Korea Internet and Security Agency(KISA) called "problematic smartphone use survey" was analyzed. The survey consists of 180 questions, and data were collected from 29,712 participants. Based on the data on the smartphone usage pattern obtained through the questionnaire, the smartphone addiction level was predicted using machine learning techniques. k-NN, gradient boosting, XGBoost, CatBoost, AdaBoost and random forest algorithms were employed. Findings - First, while various factors together influence the smartphone overdependence level, the results show that all machine learning techniques perform well to predict the smartphone overdependence level. Especially, we focus on the features which can be obtained from the smartphone log data (without psychological factors). It means that our results can be a basis for diagnostic programs to detect problematic smartphone use. Second, the results show that information on users' age, marriage and smartphone usage patterns can be used as predictors to determine whether users are addicted to smartphones. Other demographic characteristics such as sex or region did not appear to significantly affect smartphone overdependence levels. Research implications or Originality - While there are some studies that predict smartphone overdependence level using machine learning techniques, but the studies only present algorithm performance based on survey data. In this study, based on the information gain measure, questions that have more influence on the smartphone overdependence level are presented, and the performance of algorithms according to the questions is compared. Through the results of this study, it is shown that smartphone overdependence level can be predicted with less information if questions about smartphone use are given appropriately.

Deep Learning-based Delinquent Taxpayer Prediction: A Scientific Administrative Approach

  • YongHyun Lee;Eunchan Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권1호
    • /
    • pp.30-45
    • /
    • 2024
  • This study introduces an effective method for predicting individual local tax delinquencies using prevalent machine learning and deep learning algorithms. The evaluation of credit risk holds great significance in the financial realm, impacting both companies and individuals. While credit risk prediction has been explored using statistical and machine learning techniques, their application to tax arrears prediction remains underexplored. We forecast individual local tax defaults in Republic of Korea using machine and deep learning algorithms, including convolutional neural networks (CNN), long short-term memory (LSTM), and sequence-to-sequence (seq2seq). Our model incorporates diverse credit and public information like loan history, delinquency records, credit card usage, and public taxation data, offering richer insights than prior studies. The results highlight the superior predictive accuracy of the CNN model. Anticipating local tax arrears more effectively could lead to efficient allocation of administrative resources. By leveraging advanced machine learning, this research offers a promising avenue for refining tax collection strategies and resource management.

통계분석 기법과 머신러닝 기법의 비교분석을 통한 건물의 지진취약도 공간분석 (A Spatial Analysis of Seismic Vulnerability of Buildings Using Statistical and Machine Learning Techniques Comparative Analysis)

  • 김성훈;김상빈;김대현
    • 산업융합연구
    • /
    • 제21권1호
    • /
    • pp.159-165
    • /
    • 2023
  • 최근 지진 발생 빈도가 증가하고 있는 반면 국내 지진 대응 체계는 취약한 현실에서, 본 연구의 목적은 통계분석 기법과 머신러닝 기법을 활용한 공간분석을 통해 건물의 지진취약도를 비교분석 하는 것이다. 통계분석 기법을 활용한 결과, 최적화척도법을 활용해 개발된 모델의 예측정확도는 약 87%로 도출되었다. 머신러닝 기법을 활용한 결과, 분석된 4가지 방법 중, Random Forest의 정확도가 Train Set의 경우 94%, Test Set의 경우 76.7%로 가장 높아, 최종적으로 Random Forest가 선정되었다. 따라서, 예측정확도는 통계분석 기법이 약 87%, 머신러닝 기법이 76.7%로, 통계분석 기법의 예측정확도가 더 높은 것으로 분석되었다. 최종 결과로, 건물의 지진취약도는 분석된 건물데이터 총 22,296개 중, 1,627(0.1%)개의 건물데이터는 통계분석 기법 사용 시 더 위험하다고 도출되었고, 10,146(49%)개의 건물데이터는 동일하게 도출되었으며, 나머지 10,523(50%)개의 건물데이터는 머신러닝 기법 사용 시 더 위험하게 도출되었다. 기존 통계분석 기법에 첨단 머신러닝 기법활용결과가 추가로 비교검토 됨으로써 공간분석 의사결정에 있어서, 좀더 신뢰도가 높은 지진대응책 마련에 도움이 되길 기대한다.

머신러닝 컴파일러와 모듈로 스케쥴러에 관한 연구 (A Study on Machine Learning Compiler and Modulo Scheduler)

  • 조두산
    • 한국산업융합학회 논문집
    • /
    • 제27권1호
    • /
    • pp.87-95
    • /
    • 2024
  • This study is on modulo scheduling algorithms for multicore processor in machine learning applications. Machine learning algorithms are designed to perform a large amount of operations such as vectors and matrices in order to quickly process large amounts of data stream. To support such large amounts of computations, processor architectures to support applications such as artificial intelligence, neural networks, and machine learning are designed in the form of parallel processing such as multicore. To effectively utilize these multi-core hardware resources, various compiler techniques are being used and studied. In this study, among these compiler techniques, we analyzed the modular scheduler, which is especially important in one core's computation pipeline. This paper looked at and compared the iterative modular scheduler and the swing modular scheduler, which are the most widely used and studied. As a result, both schedulers provided similar performance results, and when measuring register pressure as an indicator, it was confirmed that the swing modulo scheduler provided slightly better performance. In this study, a technique that divides recurrence edge is proposed to improve the minimum initiation interval of the modulo schedulers.

A Prediction Triage System for Emergency Department During Hajj Period using Machine Learning Models

  • Huda N. Alhazmi
    • International Journal of Computer Science & Network Security
    • /
    • 제24권7호
    • /
    • pp.11-23
    • /
    • 2024
  • Triage is a practice of accurately prioritizing patients in emergency department (ED) based on their medical condition to provide them with proper treatment service. The variation in triage assessment among medical staff can cause mis-triage which affect the patients negatively. Developing ED triage system based on machine learning (ML) techniques can lead to accurate and efficient triage outcomes. This study aspires to develop a triage system using machine learning techniques to predict ED triage levels using patients' information. We conducted a retrospective study using Security Forces Hospital ED data, from 2021 through 2023 during Hajj period in Saudia Arabi. Using demographics, vital signs, and chief complaints as predictors, two machine learning models were investigated, naming gradient boosted decision tree (XGB) and deep neural network (DNN). The models were trained to predict ED triage levels and their predictive performance was evaluated using area under the receiver operating characteristic curve (AUC) and confusion matrix. A total of 11,584 ED visits were collected and used in this study. XGB and DNN models exhibit high abilities in the predicting performance with AUC-ROC scores 0.85 and 0.82, respectively. Compared to the traditional approach, our proposed system demonstrated better performance and can be implemented in real-world clinical settings. Utilizing ML applications can power the triage decision-making, clinical care, and resource utilization.

Hand-crafted 특징 및 머신 러닝 기반의 은하 이미지 분류 기법 개발 (Development of Galaxy Image Classification Based on Hand-crafted Features and Machine Learning)

  • 오윤주;정희철
    • 대한임베디드공학회논문지
    • /
    • 제16권1호
    • /
    • pp.17-27
    • /
    • 2021
  • In this paper, we develop a galaxy image classification method based on hand-crafted features and machine learning techniques. Additionally, we provide an empirical analysis to reveal which combination of the techniques is effective for galaxy image classification. To achieve this, we developed a framework which consists of four modules such as preprocessing, feature extraction, feature post-processing, and classification. Finally, we found that the best technique for galaxy image classification is a method to use a median filter, ORB vector features and a voting classifier based on RBF SVM, random forest and logistic regression. The final method is efficient so we believe that it is applicable to embedded environments.

머신러닝 알고리즘 기반의 의료비 예측 모델 개발 (Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm)

  • Han Bi KIM;Dong Hoon HAN
    • Journal of Korea Artificial Intelligence Association
    • /
    • 제1권1호
    • /
    • pp.11-16
    • /
    • 2023
  • Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.

Implementing a Branch-and-bound Algorithm for Transductive Support Vector Machines

  • Park, Chan-Kyoo
    • Management Science and Financial Engineering
    • /
    • 제16권1호
    • /
    • pp.81-117
    • /
    • 2010
  • Semi-supervised learning incorporates unlabeled examples, whose labels are unknown, as well as labeled examples into learning process. Although transductive support vector machine (TSVM), one of semi-supervised learning models, was proposed about a decade ago, its application to large-scaled data has still been limited due to its high computational complexity. Our previous research addressed this limitation by introducing a branch-and-bound algorithm for finding an optimal solution to TSVM. In this paper, we propose three new techniques to enhance the performance of the branch-and-bound algorithm. The first one tightens min-cut bound, one of two bounding strategies. Another technique exploits a graph-based approximation to a support vector machine problem to avoid the most time-consuming step. The last one tries to fix the labels of unlabeled examples whose labels can be obviously predicted based on labeled examples. Experimental results are presented which demonstrate that the proposed techniques can reduce drastically the number of subproblems and eventually computational time.