• Title/Summary/Keyword: data-preserving AI

Search Result 11, Processing Time 0.027 seconds

Blockchain Based Data-Preserving AI Learning Environment Model for Cyber Security System (AI 사이버보안 체계를 위한 블록체인 기반의 Data-Preserving AI 학습환경 모델)

  • Kim, Inkyung;Park, Namje
    • The Journal of Korean Institute of Information Technology
    • /
    • v.17 no.12
    • /
    • pp.125-134
    • /
    • 2019
  • As the limitations of the passive recognition domain, which is not guaranteed transparency of the operation process, AI technology has a vulnerability that depends on the data. Human error is inherent because raw data for artificial intelligence learning must be processed and inspected manually to secure data quality for the advancement of AI learning. In this study, we examine the necessity of learning data management before machine learning by analyzing inaccurate cases of AI learning data and cyber security attack method through the approach from cyber security perspective. In order to verify the learning data integrity, this paper presents the direction of data-preserving artificial intelligence system, a blockchain-based learning data environment model. The proposed method is expected to prevent the threats such as cyber attack and data corruption in providing and using data in the open network for data processing and raw data collection.

A Network Packet Analysis Method to Discover Malicious Activities

  • Kwon, Taewoong;Myung, Joonwoo;Lee, Jun;Kim, Kyu-il;Song, Jungsuk
    • Journal of Information Science Theory and Practice
    • /
    • v.10 no.spc
    • /
    • pp.143-153
    • /
    • 2022
  • With the development of networks and the increase in the number of network devices, the number of cyber attacks targeting them is also increasing. Since these cyber-attacks aim to steal important information and destroy systems, it is necessary to minimize social and economic damage through early detection and rapid response. Many studies using machine learning (ML) and artificial intelligence (AI) have been conducted, among which payload learning is one of the most intuitive and effective methods to detect malicious behavior. In this study, we propose a preprocessing method to maximize the performance of the model when learning the payload in term units. The proposed method constructs a high-quality learning data set by eliminating unnecessary noise (stopwords) and preserving important features in consideration of the machine language and natural language characteristics of the packet payload. Our method consists of three steps: Preserving significant special characters, Generating a stopword list, and Class label refinement. By processing packets of various and complex structures based on these three processes, it is possible to make high-quality training data that can be helpful to build high-performance ML/AI models for security monitoring. We prove the effectiveness of the proposed method by comparing the performance of the AI model to which the proposed method is applied and not. Forthermore, by evaluating the performance of the AI model applied proposed method in the real-world Security Operating Center (SOC) environment with live network traffic, we demonstrate the applicability of the our method to the real environment.

Differential Privacy Technology Resistant to the Model Inversion Attack in AI Environments (AI 환경에서 모델 전도 공격에 안전한 차분 프라이버시 기술)

  • Park, Cheollhee;Hong, Dowon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.3
    • /
    • pp.589-598
    • /
    • 2019
  • The amount of digital data a is explosively growing, and these data have large potential values. Countries and companies are creating various added values from vast amounts of data, and are making a lot of investments in data analysis techniques. The privacy problem that occurs in data analysis is a major factor that hinders data utilization. Recently, as privacy violation attacks on neural network models have been proposed. researches on artificial neural network technology that preserves privacy is required. Therefore, various privacy preserving artificial neural network technologies have been studied in the field of differential privacy that ensures strict privacy. However, there are problems that the balance between the accuracy of the neural network model and the privacy budget is not appropriate. In this paper, we study differential privacy techniques that preserve the performance of a model within a given privacy budget and is resistant to model inversion attacks. Also, we analyze the resistance of model inversion attack according to privacy preservation strength.

A Study on Privacy Preserving Machine Learning (프라이버시 보존 머신러닝의 연구 동향)

  • Han, Woorim;Lee, Younghan;Jun, Sohee;Cho, Yungi;Paek, Yunheung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.924-926
    • /
    • 2021
  • AI (Artificial Intelligence) is being utilized in various fields and services to give convenience to human life. Unfortunately, there are many security vulnerabilities in today's ML (Machine Learning) systems, causing various privacy concerns as some AI models need individuals' private data to train them. Such concerns lead to the interest in ML systems which can preserve the privacy of individuals' data. This paper introduces the latest research on various attacks that infringe data privacy and the corresponding defense techniques.

Distributed Edge Computing for DNA-Based Intelligent Services and Applications: A Review (딥러닝을 사용하는 IoT빅데이터 인프라에 필요한 DNA 기술을 위한 분산 엣지 컴퓨팅기술 리뷰)

  • Alemayehu, Temesgen Seyoum;Cho, We-Duke
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.12
    • /
    • pp.291-306
    • /
    • 2020
  • Nowadays, Data-Network-AI (DNA)-based intelligent services and applications have become a reality to provide a new dimension of services that improve the quality of life and productivity of businesses. Artificial intelligence (AI) can enhance the value of IoT data (data collected by IoT devices). The internet of things (IoT) promotes the learning and intelligence capability of AI. To extract insights from massive volume IoT data in real-time using deep learning, processing capability needs to happen in the IoT end devices where data is generated. However, deep learning requires a significant number of computational resources that may not be available at the IoT end devices. Such problems have been addressed by transporting bulks of data from the IoT end devices to the cloud datacenters for processing. But transferring IoT big data to the cloud incurs prohibitively high transmission delay and privacy issues which are a major concern. Edge computing, where distributed computing nodes are placed close to the IoT end devices, is a viable solution to meet the high computation and low-latency requirements and to preserve the privacy of users. This paper provides a comprehensive review of the current state of leveraging deep learning within edge computing to unleash the potential of IoT big data generated from IoT end devices. We believe that the revision will have a contribution to the development of DNA-based intelligent services and applications. It describes the different distributed training and inference architectures of deep learning models across multiple nodes of the edge computing platform. It also provides the different privacy-preserving approaches of deep learning on the edge computing environment and the various application domains where deep learning on the network edge can be useful. Finally, it discusses open issues and challenges leveraging deep learning within edge computing.

Privacy Preserving Techniques for Deep Learning in Multi-Party System (멀티 파티 시스템에서 딥러닝을 위한 프라이버시 보존 기술)

  • Hye-Kyeong Ko
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.647-654
    • /
    • 2023
  • Deep Learning is a useful method for classifying and recognizing complex data such as images and text, and the accuracy of the deep learning method is the basis for making artificial intelligence-based services on the Internet useful. However, the vast amount of user da vita used for training in deep learning has led to privacy violation problems, and it is worried that companies that have collected personal and sensitive data of users, such as photographs and voices, own the data indefinitely. Users cannot delete their data and cannot limit the purpose of use. For example, data owners such as medical institutions that want to apply deep learning technology to patients' medical records cannot share patient data because of privacy and confidentiality issues, making it difficult to benefit from deep learning technology. In this paper, we have designed a privacy preservation technique-applied deep learning technique that allows multiple workers to use a neural network model jointly, without sharing input datasets, in multi-party system. We proposed a method that can selectively share small subsets using an optimization algorithm based on modified stochastic gradient descent, confirming that it could facilitate training with increased learning accuracy while protecting private information.

Systematic Research on Privacy-Preserving Distributed Machine Learning (프라이버시를 보호하는 분산 기계 학습 연구 동향)

  • Min Seob Lee;Young Ah Shin;Ji Young Chun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.2
    • /
    • pp.76-90
    • /
    • 2024
  • Although artificial intelligence (AI) can be utilized in various domains such as smart city, healthcare, it is limited due to concerns about the exposure of personal and sensitive information. In response, the concept of distributed machine learning has emerged, wherein learning occurs locally before training a global model, mitigating the concentration of data on a central server. However, overall learning phase in a collaborative way among multiple participants poses threats to data privacy. In this paper, we systematically analyzes recent trends in privacy protection within the realm of distributed machine learning, considering factors such as the presence of a central server, distribution environment of the training datasets, and performance variations among participants. In particular, we focus on key distributed machine learning techniques, including horizontal federated learning, vertical federated learning, and swarm learning. We examine privacy protection mechanisms within these techniques and explores potential directions for future research.

Privacy-Preserving Language Model Fine-Tuning Using Offsite Tuning (프라이버시 보호를 위한 오프사이트 튜닝 기반 언어모델 미세 조정 방법론)

  • Jinmyung Jeong;Namgyu Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.165-184
    • /
    • 2023
  • Recently, Deep learning analysis of unstructured text data using language models, such as Google's BERT and OpenAI's GPT has shown remarkable results in various applications. Most language models are used to learn generalized linguistic information from pre-training data and then update their weights for downstream tasks through a fine-tuning process. However, some concerns have been raised that privacy may be violated in the process of using these language models, i.e., data privacy may be violated when data owner provides large amounts of data to the model owner to perform fine-tuning of the language model. Conversely, when the model owner discloses the entire model to the data owner, the structure and weights of the model are disclosed, which may violate the privacy of the model. The concept of offsite tuning has been recently proposed to perform fine-tuning of language models while protecting privacy in such situations. But the study has a limitation that it does not provide a concrete way to apply the proposed methodology to text classification models. In this study, we propose a concrete method to apply offsite tuning with an additional classifier to protect the privacy of the model and data when performing multi-classification fine-tuning on Korean documents. To evaluate the performance of the proposed methodology, we conducted experiments on about 200,000 Korean documents from five major fields, ICT, electrical, electronic, mechanical, and medical, provided by AIHub, and found that the proposed plug-in model outperforms the zero-shot model and the offsite model in terms of classification accuracy.

Edge Computing Model based on Federated Learning for COVID-19 Clinical Outcome Prediction in the 5G Era

  • Ruochen Huang;Zhiyuan Wei;Wei Feng;Yong Li;Changwei Zhang;Chen Qiu;Mingkai Chen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.4
    • /
    • pp.826-842
    • /
    • 2024
  • As 5G and AI continue to develop, there has been a significant surge in the healthcare industry. The COVID-19 pandemic has posed immense challenges to the global health system. This study proposes an FL-supported edge computing model based on federated learning (FL) for predicting clinical outcomes of COVID-19 patients during hospitalization. The model aims to address the challenges posed by the pandemic, such as the need for sophisticated predictive models, privacy concerns, and the non-IID nature of COVID-19 data. The model utilizes the FATE framework, known for its privacy-preserving technologies, to enhance predictive precision while ensuring data privacy and effectively managing data heterogeneity. The model's ability to generalize across diverse datasets and its adaptability in real-world clinical settings are highlighted by the use of SHAP values, which streamline the training process by identifying influential features, thus reducing computational overhead without compromising predictive precision. The study demonstrates that the proposed model achieves comparable precision to specific machine learning models when dataset sizes are identical and surpasses traditional models when larger training data volumes are employed. The model's performance is further improved when trained on datasets from diverse nodes, leading to superior generalization and overall performance, especially in scenarios with insufficient node features. The integration of FL with edge computing contributes significantly to the reliable prediction of COVID-19 patient outcomes with greater privacy. The research contributes to healthcare technology by providing a practical solution for early intervention and personalized treatment plans, leading to improved patient outcomes and efficient resource allocation during public health crises.

A New Image Processing Scheme For Face Swapping Using CycleGAN (순환 적대적 생성 신경망을 이용한 안면 교체를 위한 새로운 이미지 처리 기법)

  • Ban, Tae-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.9
    • /
    • pp.1305-1311
    • /
    • 2022
  • With the recent rapid development of mobile terminals and personal computers and the advent of neural network technology, real-time face swapping using images has become possible. In particular, the cycle generative adversarial network made it possible to replace faces using uncorrelated image data. In this paper, we propose an input data processing scheme that can improve the quality of face swapping with less training data and time. The proposed scheme can improve the image quality while preserving facial structure and expression information by combining facial landmarks extracted through a pre-trained neural network with major information that affects the structure and expression of the face. Using the blind/referenceless image spatial quality evaluator (BRISQUE) score, which is one of the AI-based non-reference quality metrics, we quantitatively analyze the performance of the proposed scheme and compare it to the conventional schemes. According to the numerical results, the proposed scheme obtained BRISQUE scores improved by about 4.6% to 14.6%, compared to the conventional schemes.