• Title/Summary/Keyword: machine learning techniques

Search Result 1,088, Processing Time 0.028 seconds

Systematic Research on Privacy-Preserving Distributed Machine Learning (프라이버시를 보호하는 분산 기계 학습 연구 동향)

  • Min Seob Lee;Young Ah Shin;Ji Young Chun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.2
    • /
    • pp.76-90
    • /
    • 2024
  • Although artificial intelligence (AI) can be utilized in various domains such as smart city, healthcare, it is limited due to concerns about the exposure of personal and sensitive information. In response, the concept of distributed machine learning has emerged, wherein learning occurs locally before training a global model, mitigating the concentration of data on a central server. However, overall learning phase in a collaborative way among multiple participants poses threats to data privacy. In this paper, we systematically analyzes recent trends in privacy protection within the realm of distributed machine learning, considering factors such as the presence of a central server, distribution environment of the training datasets, and performance variations among participants. In particular, we focus on key distributed machine learning techniques, including horizontal federated learning, vertical federated learning, and swarm learning. We examine privacy protection mechanisms within these techniques and explores potential directions for future research.

Systematic Review of Bug Report Processing Techniques to Improve Software Management Performance

  • Lee, Dong-Gun;Seo, Yeong-Seok
    • Journal of Information Processing Systems
    • /
    • v.15 no.4
    • /
    • pp.967-985
    • /
    • 2019
  • Bug report processing is a key element of bug fixing in modern software maintenance. Bug reports are not processed immediately after submission and involve several processes such as bug report deduplication and bug report triage before bug fixing is initiated; however, this method of bug fixing is very inefficient because all these processes are performed manually. Software engineers have persistently highlighted the need to automate these processes, and as a result, many automation techniques have been proposed for bug report processing; however, the accuracy of the existing methods is not satisfactory. Therefore, this study focuses on surveying to improve the accuracy of existing techniques for bug report processing. Reviews of each method proposed in this study consist of a description, used techniques, experiments, and comparison results. The results of this study indicate that research in the field of bug deduplication still lacks and therefore requires numerous studies that integrate clustering and natural language processing. This study further indicates that although all studies in the field of triage are based on machine learning, results of studies on deep learning are still insufficient.

Prediction Model of Software Fault using Deep Learning Methods (딥러닝 기법을 사용하는 소프트웨어 결함 예측 모델)

  • Hong, Euyseok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.4
    • /
    • pp.111-117
    • /
    • 2022
  • Many studies have been conducted on software fault prediction models for decades, and the models using machine learning techniques showed the best performance. Deep learning techniques have become the most popular in the field of machine learning, but few studies have used them as classifiers for fault prediction models. Some studies have used deep learning to obtain semantic information from the model input source code or syntactic data. In this paper, we produced several models by changing the model structure and hyperparameters using MLP with three or more hidden layers. As a result of the model evaluation experiment, the MLP-based deep learning models showed similar performance to the existing models in terms of Accuracy, but significantly better in AUC. It also outperformed another deep learning model, the CNN model.

Network Traffic Measurement Analysis using Machine Learning

  • Hae-Duck Joshua Jeong
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.2
    • /
    • pp.19-27
    • /
    • 2023
  • In recent times, an exponential increase in Internet traffic has been observed as a result of advancing development of the Internet of Things, mobile networks with sensors, and communication functions within various devices. Further, the COVID-19 pandemic has inevitably led to an explosion of social network traffic. Within this context, considerable attention has been drawn to research on network traffic analysis based on machine learning. In this paper, we design and develop a new machine learning framework for network traffic analysis whereby normal and abnormal traffic is distinguished from one another. To achieve this, we combine together well-known machine learning algorithms and network traffic analysis techniques. Using one of the most widely used datasets KDD CUP'99 in the Weka and Apache Spark environments, we compare and investigate results obtained from time series type analysis of various aspects including malicious codes, feature extraction, data formalization, network traffic measurement tool implementation. Experimental analysis showed that while both the logistic regression and the support vector machine algorithm were excellent for performance evaluation, among these, the logistic regression algorithm performs better. The quantitative analysis results of our proposed machine learning framework show that this approach is reliable and practical, and the performance of the proposed system and another paper is compared and analyzed. In addition, we determined that the framework developed in the Apache Spark environment exhibits a much faster processing speed in the Spark environment than in Weka as there are more datasets used to create and classify machine learning models.

Underwater Acoustic Research Trends with Machine Learning: General Background

  • Yang, Haesang;Lee, Keunhwa;Choo, Youngmin;Kim, Kookhyun
    • Journal of Ocean Engineering and Technology
    • /
    • v.34 no.2
    • /
    • pp.147-154
    • /
    • 2020
  • Underwater acoustics that is the study of the phenomenon of underwater wave propagation and its interaction with boundaries, has mainly been applied to the fields of underwater communication, target detection, marine resources, marine environment, and underwater sound sources. Based on the scientific and engineering understanding of acoustic signals/data, recent studies combining traditional and data-driven machine learning methods have shown continuous progress. Machine learning, represented by deep learning, has shown unprecedented success in a variety of fields, owing to big data, graphical processor unit computing, and advances in algorithms. Although machine learning has not yet been implemented in every single field of underwater acoustics, it will be used more actively in the future in line with the ongoing development and overwhelming achievements of this method. To understand the research trends of machine learning applications in underwater acoustics, the general theoretical background of several related machine learning techniques is introduced in this paper.

On the Performance of Cuckoo Search and Bat Algorithms Based Instance Selection Techniques for SVM Speed Optimization with Application to e-Fraud Detection

  • AKINYELU, Andronicus Ayobami;ADEWUMI, Aderemi Oluyinka
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.3
    • /
    • pp.1348-1375
    • /
    • 2018
  • Support Vector Machine (SVM) is a well-known machine learning classification algorithm, which has been widely applied to many data mining problems, with good accuracy. However, SVM classification speed decreases with increase in dataset size. Some applications, like video surveillance and intrusion detection, requires a classifier to be trained very quickly, and on large datasets. Hence, this paper introduces two filter-based instance selection techniques for optimizing SVM training speed. Fast classification is often achieved at the expense of classification accuracy, and some applications, such as phishing and spam email classifiers, are very sensitive to slight drop in classification accuracy. Hence, this paper also introduces two wrapper-based instance selection techniques for improving SVM predictive accuracy and training speed. The wrapper and filter based techniques are inspired by Cuckoo Search Algorithm and Bat Algorithm. The proposed techniques are validated on three popular e-fraud types: credit card fraud, spam email and phishing email. In addition, the proposed techniques are validated on 20 other datasets provided by UCI data repository. Moreover, statistical analysis is performed and experimental results reveals that the filter-based and wrapper-based techniques significantly improved SVM classification speed. Also, results reveal that the wrapper-based techniques improved SVM predictive accuracy in most cases.

A Study on Machine Learning Based Anti-Analysis Technique Detection Using N-gram Opcode (N-gram Opcode를 활용한 머신러닝 기반의 분석 방지 보호 기법 탐지 방안 연구)

  • Kim, Hee Yeon;Lee, Dong Hoon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.2
    • /
    • pp.181-192
    • /
    • 2022
  • The emergence of new malware is incapacitating existing signature-based malware detection techniques., and applying various anti-analysis techniques makes it difficult to analyze. Recent studies related to signature-based malware detection have limitations in that malware creators can easily bypass them. Therefore, in this study, we try to build a machine learning model that can detect and classify the anti-analysis techniques of packers applied to malware, not using the characteristics of the malware itself. In this study, the n-gram opcodes are extracted from the malicious binary to which various anti-analysis techniques of the commercial packers are applied, and the features are extracted by using TF-IDF, and through this, each anti-analysis technique is detected and classified. In this study, real-world malware samples packed using The mida and VMProtect with multiple anti-analysis techniques were trained and tested with 6 machine learning models, and it constructed the optimal model showing 81.25% accuracy for The mida and 95.65% accuracy for VMProtect.

Applying advanced machine learning techniques in the early prediction of graduate ability of university students

  • Pham, Nga;Tiep, Pham Van;Trang, Tran Thu;Nguyen, Hoai-Nam;Choi, Gyoo-Seok;Nguyen, Ha-Nam
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.3
    • /
    • pp.285-291
    • /
    • 2022
  • The number of people enrolling in universities is rising due to the simplicity of applying and the benefit of earning a bachelor's degree. However, the on-time graduation rate has declined since plenty of students fail to complete their courses and take longer to get their diplomas. Even though there are various reasons leading to the aforementioned problem, it is crucial to emphasize the cause originating from the management and care of learners. In fact, understanding students' difficult situations and offering timely Number of Test data and advice would help prevent college dropouts or graduate delays. In this study, we present a machine learning-based method for early detection at-risk students, using data obtained from graduates of the Faculty of Information Technology, Dainam University, Vietnam. We experiment with several fundamental machine learning methods before implementing the parameter optimization techniques. In comparison to the other strategies, Random Forest and Grid Search (RF&GS) and Random Forest and Random Search (RF&RS) provided more accurate predictions for identifying at-risk students.

Corporate Corruption Prediction Evidence From Emerging Markets

  • Kim, Yang Sok;Na, Kyunga;Kang, Young-Hee
    • Asia-Pacific Journal of Business
    • /
    • v.12 no.4
    • /
    • pp.13-40
    • /
    • 2021
  • Purpose - The purpose of this study is to predict corporate corruption in emerging markets such as Brazil, Russia, India, and China (BRIC) using different machine learning techniques. Since corruption is a significant problem that can affect corporate performance, particularly in emerging markets, it is important to correctly identify whether a company engages in corrupt practices. Design/methodology/approach - In order to address the research question, we employ predictive analytic techniques (machine learning methods). Using the World Bank Enterprise Survey Data, this study evaluates various predictive models generated by seven supervised learning algorithms: k-Nearest Neighbour (k-NN), Naïve Bayes (NB), Decision Tree (DT), Decision Rules (DR), Logistic Regression (LR), Support Vector Machines (SVM), and Artificial Neural Network (ANN). Findings - We find that DT, DR, SVM and ANN create highly accurate models (over 90% of accuracy). Among various factors, firm age is the most significant, while several other determinants such as source of working capital, top manager experience, and the number of permanent full-time employees also contribute to company corruption. Research implications or Originality - This research successfully demonstrates how machine learning can be applied to predict corporate corruption and also identifies the major causes of corporate corruption.

Machine learning modeling of irradiation embrittlement in low alloy steel of nuclear power plants

  • Lee, Gyeong-Geun;Kim, Min-Chul;Lee, Bong-Sang
    • Nuclear Engineering and Technology
    • /
    • v.53 no.12
    • /
    • pp.4022-4032
    • /
    • 2021
  • In this study, machine learning (ML) techniques were used to model surveillance test data of nuclear power plants from an international database of the ASTM E10.02 committee. Regression modeling was conducted using various techniques, including Cubist, XGBoost, and a support vector machine. The root mean square deviation of each ML model for the baseline dataset was less than that of the ASTM E900-15 nonlinear regression model. With respect to the interpolation, the ML methods provided excellent predictions with relatively few computations when applied to the given data range. The effect of the explanatory variables on the transition temperature shift (TTS) for the ML methods was analyzed, and the trends were slightly different from those for the ASTM E900-15 model. ML methods showed some weakness in the extrapolation of the fluence in comparison to the ASTM E900-15, while the Cubist method achieved an extrapolation to a certain extent. To achieve a more reliable prediction of the TTS, it was confirmed that advanced techniques should be considered for extrapolation when applying ML modeling.