• Title/Summary/Keyword: machine learning classification models

Search Result 364, Processing Time 0.029 seconds

The Detection of Online Manipulated Reviews Using Machine Learning and GPT-3 (기계학습과 GPT3를 시용한 조작된 리뷰의 탐지)

  • Chernyaeva, Olga;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.347-364
    • /
    • 2022
  • Fraudulent companies or sellers strategically manipulate reviews to influence customers' purchase decisions; therefore, the reliability of reviews has become crucial for customer decision-making. Since customers increasingly rely on online reviews to search for more detailed information about products or services before purchasing, many researchers focus on detecting manipulated reviews. However, the main problem in detecting manipulated reviews is the difficulties with obtaining data with manipulated reviews to utilize machine learning techniques with sufficient data. Also, the number of manipulated reviews is insufficient compared with the number of non-manipulated reviews, so the class imbalance problem occurs. The class with fewer examples is under-represented and can hamper a model's accuracy, so machine learning methods suffer from the class imbalance problem and solving the class imbalance problem is important to build an accurate model for detecting manipulated reviews. Thus, we propose an OpenAI-based reviews generation model to solve the manipulated reviews imbalance problem, thereby enhancing the accuracy of manipulated reviews detection. In this research, we applied the novel autoregressive language model - GPT-3 to generate reviews based on manipulated reviews. Moreover, we found that applying GPT-3 model for oversampling manipulated reviews can recover a satisfactory portion of performance losses and shows better performance in classification (logit, decision tree, neural networks) than traditional oversampling models such as random oversampling and SMOTE.

Comparison of Machine Learning Models to Predict the Occurrence of Ground Subsidence According to the Characteristics of Sewer (하수관로 특성에 따른 지반함몰 발생 예측을 위한 기계학습 모델 비교)

  • Lee, Sungyeol;Kim, Jinyoung;Kang, Jaemo;Baek, Wonjin
    • Journal of the Korean GEO-environmental Society
    • /
    • v.23 no.4
    • /
    • pp.5-10
    • /
    • 2022
  • Recently, ground subsidence has been continuously occurring in downtown areas, threatening the safety of citizens. Various underground facilities such as water and sewage pipelines and communication pipelines are buried under the road. It is reported that the cause of ground subsidence is the deterioration of various facilities and the reckless development of the underground. In particular, it is known that the biggest cause of ground subsidence is the aging of sewage pipelines. As an existing study related to this, several representative factors of sewage pipelines were selected and a study to predict the risk of ground subsidence through statistical analysis has been conducted. In this study, a data SET was constructed using the characteristics of OO city's sewage pipe characteristics and ground subsidence data, The data set constructed from the characteristics of the sewage pipe of OO city and the location of the ground subsidence was used. The goal of this study was to present a classification model for the occurrence of ground subsidence according to the characteristics of sewage pipes through machine learning. In addition, the importance of each sewage pipe characteristic affecting the ground subsidence was calculated.

White striping degree assessment using computer vision system and consumer acceptance test

  • Kato, Talita;Mastelini, Saulo Martiello;Campos, Gabriel Fillipe Centini;Barbon, Ana Paula Ayub da Costa;Prudencio, Sandra Helena;Shimokomaki, Massami;Soares, Adriana Lourenco;Barbon, Sylvio Jr.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.32 no.7
    • /
    • pp.1015-1026
    • /
    • 2019
  • Objective: The objective of this study was to evaluate three different degrees of white striping (WS) addressing their automatic assessment and customer acceptance. The WS classification was performed based on a computer vision system (CVS), exploring different machine learning (ML) algorithms and the most important image features. Moreover, it was verified by consumer acceptance and purchase intent. Methods: The samples for image analysis were classified by trained specialists, according to severity degrees regarding visual and firmness aspects. Samples were obtained with a digital camera, and 25 features were extracted from these images. ML algorithms were applied aiming to induce a model capable of classifying the samples into three severity degrees. In addition, two sensory analyses were performed: 75 samples properly grilled were used for the first sensory test, and 9 photos for the second. All tests were performed using a 10-cm hybrid hedonic scale (acceptance test) and a 5-point scale (purchase intention). Results: The information gain metric ranked 13 attributes. However, just one type of image feature was not enough to describe the phenomenon. The classification models support vector machine, fuzzy-W, and random forest showed the best results with similar general accuracy (86.4%). The worst performance was obtained by multilayer perceptron (70.9%) with the high error rate in normal (NORM) sample predictions. The sensory analysis of acceptance verified that WS myopathy negatively affects the texture of the broiler breast fillets when grilled and the appearance attribute of the raw samples, which influenced the purchase intention scores of raw samples. Conclusion: The proposed system has proved to be adequate (fast and accurate) for the classification of WS samples. The sensory analysis of acceptance showed that WS myopathy negatively affects the tenderness of the broiler breast fillets when grilled, while the appearance attribute of the raw samples eventually influenced purchase intentions.

A study on the Improvement of the Food Waste Discharge System through the Classification on Foreign Substances (이물질 구별을 통한 음식물쓰레기 배출시스템 개선에 관한 연구)

  • Kim, Yongil;Kim, Seungcheon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.6
    • /
    • pp.51-56
    • /
    • 2022
  • With the development of industrialization, the amount of food and waste is rapidly increasing. Accordingly, the government is aware of the seriousness and is making efforts in various ways to reduce it. As a part of that, the volume-based food system was introduced, and although there were several trials and errors at the beginning of the introduction, it shows a reduction effect of 20 to 30%. These results suggest that the volume-based food system is being established. However, the waste is caused by foreign substances in the process of recycling resources by collecting them from the 1st collection to the 2nd collection process. Therefore, in this study, to solve these problems fundamentally, artificial intelligence is applied to classify foreign substances and improve them. Due to the nature of food waste, there is a limit to obtaining many images, so we compare several models based on CNNs and classify them as abnormal data, that is, CNN-based models are trained on various types of foreign substances, and then models with high accuracy are selected. We intend to prepare improvement measures for maintenance, such as manpower input to protect equipment and classify foreign substances by applying it.

An Empirical Analysis of Boosing of Neural Networks for Bankruptcy Prediction (부스팅 인공신경망학습의 기업부실예측 성과비교)

  • Kim, Myoung-Jong;Kang, Dae-Ki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.1
    • /
    • pp.63-69
    • /
    • 2010
  • Ensemble is one of widely used methods for improving the performance of classification and prediction models. Two popular ensemble methods, Bagging and Boosting, have been applied with great success to various machine learning problems using mostly decision trees as base classifiers. This paper performs an empirical comparison of Boosted neural networks and traditional neural networks on bankruptcy prediction tasks. Experimental results on Korean firms indicated that the boosted neural networks showed the improved performance over traditional neural networks.

Subset selection in multiple linear regression: An improved Tabu search

  • Bae, Jaegug;Kim, Jung-Tae;Kim, Jae-Hwan
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.2
    • /
    • pp.138-145
    • /
    • 2016
  • This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.

Non-Curriculum Recommendation Techniques Using Collaborative Filtering for C University (협업 필터링을 활용한 비교과 프로그램 추천 기법: C대학 적용사례)

  • yujung Janu;Kyungeun Yang;Wan-Sup Cho
    • The Journal of Bigdata
    • /
    • v.7 no.1
    • /
    • pp.187-192
    • /
    • 2022
  • Many schools are trying to improve students' competencies through many subjects and non-curricular activities, each students has different goals and different activities to prepare for employment. Accordingly, it is difficult to determine whether the programs offered in a comprehensive and comprehensive manner in the existing subject and non-curricular subjects systems are actually suitable for students, so it is necessary to introduce a personalized system. In this study, a method was proposed to classify non-departmental subjects that are uniformly provided to all students of Chungbuk National University by grade level and department. In addition, three types of collaborative filtering models are implemented using the evaluation score of students who participated in the non-curricular program, and personalized recommendations are proposed with the most accurate model by comparing performance.

CRF Based Intrusion Detection System using Genetic Search Feature Selection for NSSA

  • Azhagiri M;Rajesh A;Rajesh P;Gowtham Sethupathi M
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.7
    • /
    • pp.131-140
    • /
    • 2023
  • Network security situational awareness systems helps in better managing the security concerns of a network, by monitoring for any anomalies in the network connections and recommending remedial actions upon detecting an attack. An Intrusion Detection System helps in identifying the security concerns of a network, by monitoring for any anomalies in the network connections. We have proposed a CRF based IDS system using genetic search feature selection algorithm for network security situational awareness to detect any anomalies in the network. The conditional random fields being discriminative models are capable of directly modeling the conditional probabilities rather than joint probabilities there by achieving better classification accuracy. The genetic search feature selection algorithm is capable of identifying the optimal subset among the features based on the best population of features associated with the target class. The proposed system, when trained and tested on the bench mark NSL-KDD dataset exhibited higher accuracy in identifying an attack and also classifying the attack category.

CORRECT? CORECT!: Classification of ESG Ratings with Earnings Call Transcript

  • Haein Lee;Hae Sun Jung;Heungju Park;Jang Hyun Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.4
    • /
    • pp.1090-1100
    • /
    • 2024
  • While the incorporating ESG indicator is recognized as crucial for sustainability and increased firm value, inconsistent disclosure of ESG data and vague assessment standards have been key challenges. To address these issues, this study proposes an ambiguous text-based automated ESG rating strategy. Earnings Call Transcript data were classified as E, S, or G using the Refinitiv-Sustainable Leadership Monitor's over 450 metrics. The study employed advanced natural language processing techniques such as BERT, RoBERTa, ALBERT, FinBERT, and ELECTRA models to precisely classify ESG documents. In addition, the authors computed the average predicted probabilities for each label, providing a means to identify the relative significance of different ESG factors. The results of experiments demonstrated the capability of the proposed methodology in enhancing ESG assessment criteria established by various rating agencies and highlighted that companies primarily focus on governance factors. In other words, companies were making efforts to strengthen their governance framework. In conclusion, this framework enables sustainable and responsible business by providing insight into the ESG information contained in Earnings Call Transcript data.

The Prediction of Cryptocurrency Prices Using eXplainable Artificial Intelligence based on Deep Learning (설명 가능한 인공지능과 CNN을 활용한 암호화폐 가격 등락 예측모형)

  • Taeho Hong;Jonggwan Won;Eunmi Kim;Minsu Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.129-148
    • /
    • 2023
  • Bitcoin is a blockchain technology-based digital currency that has been recognized as a representative cryptocurrency and a financial investment asset. Due to its highly volatile nature, Bitcoin has gained a lot of attention from investors and the public. Based on this popularity, numerous studies have been conducted on price and trend prediction using machine learning and deep learning. This study employed LSTM (Long Short Term Memory) and CNN (Convolutional Neural Networks), which have shown potential for predictive performance in the finance domain, to enhance the classification accuracy in Bitcoin price trend prediction. XAI(eXplainable Artificial Intelligence) techniques were applied to the predictive model to enhance its explainability and interpretability by providing a comprehensive explanation of the model. In the empirical experiment, CNN was applied to technical indicators and Google trend data to build a Bitcoin price trend prediction model, and the CNN model using both technical indicators and Google trend data clearly outperformed the other models using neural networks, SVM, and LSTM. Then SHAP(Shapley Additive exPlanations) was applied to the predictive model to obtain explanations about the output values. Important prediction drivers in input variables were extracted through global interpretation, and the interpretation of the predictive model's decision process for each instance was suggested through local interpretation. The results show that our proposed research framework demonstrates both improved classification accuracy and explainability by using CNN, Google trend data, and SHAP.