• Title/Summary/Keyword: Generate Data

Search Result 3,065, Processing Time 0.028 seconds

Enhanced Privacy Preservation of Cloud Data by using ElGamal Elliptic Curve (EGEC) Homomorphic Encryption Scheme

  • vedaraj, M.;Ezhumalai, P.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.11
    • /
    • pp.4522-4536
    • /
    • 2020
  • Nowadays, cloud is the fastest emerging technology in the IT industry. We can store and retrieve data from the cloud. The most frequently occurring problems in the cloud are security and privacy preservation of data. For improving its security, secret information must be protected from various illegal accesses. Numerous traditional cryptography algorithms have been used to increase the privacy in preserving cloud data. Still, there are some problems in privacy protection because of its reduced security. Thus, this article proposes an ElGamal Elliptic Curve (EGEC) Homomorphic encryption scheme for safeguarding the confidentiality of data stored in a cloud. The Users who hold a data can encipher the input data using the proposed EGEC encryption scheme. The homomorphic operations are computed on encrypted data. Whenever user sends data access permission requests to the cloud data storage. The Cloud Service Provider (CSP) validates the user access policy and provides the encrypted data to the user. ElGamal Elliptic Curve (EGEC) decryption was used to generate an original input data. The proposed EGEC homomorphic encryption scheme can be tested using different performance metrics such as execution time, encryption time, decryption time, memory usage, encryption throughput, and decryption throughput. However, efficacy of the ElGamal Elliptic Curve (EGEC) Homomorphic Encryption approach is explained by the comparison study of conventional approaches.

Mitigating Data Imbalance in Credit Prediction using the Diffusion Model (Diffusion Model을 활용한 신용 예측 데이터 불균형 해결 기법)

  • Sangmin Oh;Juhong Lee
    • Smart Media Journal
    • /
    • v.13 no.2
    • /
    • pp.9-15
    • /
    • 2024
  • In this paper, a Diffusion Multi-step Classifier (DMC) is proposed to address the imbalance issue in credit prediction. DMC utilizes a Diffusion Model to generate continuous numerical data from credit prediction data and creates categorical data through a Multi-step Classifier. Compared to other algorithms generating synthetic data, DMC produces data with a distribution more similar to real data. Using DMC, data that closely resemble actual data can be generated, outperforming other algorithms for data generation. When experiments were conducted using the generated data, the probability of predicting delinquencies increased by over 20%, and overall predictive accuracy improved by approximately 4%. These research findings are anticipated to significantly contribute to reducing delinquency rates and increasing profits when applied in actual financial institutions.

Metadata design and system development for autonomous data survey using unmanned patrol robots (무인순찰로봇 활용 데이터 기록 자동화를 위한 메타데이터 정의 및 시스템 구축)

  • Jung, Namcheol;Lee, Giryun;Nho, Hyunju
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2023.11a
    • /
    • pp.267-268
    • /
    • 2023
  • Unmanned patrol robots are currently being developed for autonomous data survey in construction sites. As the amount of data acquired by robots increases, it is important to utilize proper metadata and system to manage data flow. In this study, we developed three materials, metadata design, robot system and web system, in the purpose of automating construction site data survey using unmanned patrol robots. The metadata was mainly designed to represent when and where raw data was acquired. To identify the location of data acquired, localization data from SLAM algorithm was converted to suit the construction drawings. The robot system and web system were developed to generate, store and parse the raw data and metadata automatically. The materials developed in this study was adopted to Boston Dynamics SPOT, a quadruped robot. Autonomous data survey of 360-picture and environment sensor was tested in two construction sites and the robot worked as intended. As a further study, development on the autonomous data survey to improve the convenience and productivity will be continued.

  • PDF

Utilizing Data Mining Techniques to Predict Students Performance using Data Log from MOODLE

  • Noora Shawareb;Ahmed Ewais;Fisnik Dalipi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.9
    • /
    • pp.2564-2588
    • /
    • 2024
  • Due to COVID19 pandemic, most of educational institutions and schools changed the traditional way of teaching to online teaching and learning using well-known Learning Management Systems (LMS) such as Moodle, Canvas, Blackboard, etc. Accordingly, LMS started to generate a large data related to students' characteristics and achievements and other course-related information. This makes it difficult to teachers to monitor students' behaviour and performance. Therefore, a need to support teachers with a tool alerting student who might be in risk based on their recorded activities and achievements in adopted LMS in the school. This paper focuses on the benefits of using recorded data in LMS platforms, specifically Moodle, to predict students' performance by analysing their behavioural data and engagement activities using data mining techniques. As part of the overall process, this study encountered the task of extracting and selecting relevant data features for predicting performance, along with designing the framework and choosing appropriate machine learning techniques. The collected data underwent pre-processing operations to remove random partitions, empty values, duplicates, and code the data. Different machine learning techniques, including k-NN, TREE, Ensembled Tree, SVM, and MLPNNs were applied to the processed data. The results showed that the MLPNNs technique outperformed other classification techniques, achieving a classification accuracy of 93%, while SVM and k-NN achieved 90% and 87% respectively. This indicates the possibility for future research to investigate incorporating other neural network methods for categorizing students using data from LMS.

Generation of Efficient Fuzzy Classification Rules Using Evolutionary Algorithm with Data Partition Evaluation (데이터 분할 평가 진화알고리즘을 이용한 효율적인 퍼지 분류규칙의 생성)

  • Ryu, Joung-Woo;Kim, Sung-Eun;Kim, Myung-Won
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.1
    • /
    • pp.32-40
    • /
    • 2008
  • Fuzzy rules are very useful and efficient to describe classification rules especially when the attribute values are continuous and fuzzy in nature. However, it is generally difficult to determine membership functions for generating efficient fuzzy classification rules. In this paper, we propose a method of automatic generation of efficient fuzzy classification rules using evolutionary algorithm. In our method we generate a set of initial membership functions for evolutionary algorithm by supervised clustering the training data set and we evolve the set of initial membership functions in order to generate fuzzy classification rules taking into consideration both classification accuracy and rule comprehensibility. To reduce time to evaluate an individual we also propose an evolutionary algorithm with data partition evaluation in which the training data set is partitioned into a number of subsets and individuals are evaluated using a randomly selected subset of data at a time instead of the whole training data set. We experimented our algorithm with the UCI learning data sets, the experiment results showed that our method was more efficient at average compared with the existing algorithms. For the evolutionary algorithm with data partition evaluation, we experimented with our method over the intrusion detection data of KDD'99 Cup, and confirmed that evaluation time was reduced by about 70%. Compared with the KDD'99 Cup winner, the accuracy was increased by 1.54% while the cost was reduced by 20.8%.

Parallelization of Recursive Functions for Recursive Data Structures (재귀적 자료구조에 대한 재귀 함수의 병렬화)

  • An, Jun-Seon;Han, Tae-Suk
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.12
    • /
    • pp.1542-1552
    • /
    • 1999
  • 자료 병렬성이란 자료 집합의 원소들에 대하여 동일한 작업을 동시에 수행하므로써 얻어지는 병렬성을 말한다. 함수형 언어에서 자료 집합에 대한 반복 수행은 재귀적 자료형에 대한 재귀 함수에 의하여 표현된다. 본 논문에서는 이러한 재귀 함수를 자료 병렬 프로그램으로 변환하기 위한 병렬화 방법을 제시한다. 생성되는 병렬 프로그램의 병렬 수행 구조로는 일반적인 형태의 재귀적 자료형에 대하여 정의되는 다형적인 자료 병렬 연산을 사용하여 트리, 리스트 등과 같은 일반적인 재귀적 자료 집합에 대한 자료 병렬 수행이 가능하도록 하였다. 재귀 함수의 병렬화를 위해서는, 함수를 이루는 각각의 계산들의 병렬성을 재귀 호출에 의해 존재하는 의존성에 기반하여 분류하고, 이에 기반하여 각각의 계산들에 대한 적절한 자료 병렬 연산을 사용하는 병렬 프로그램을 생성하였다.Abstract Data parallelism is obtained by applying the same operations to each element of a data collection. In functional languages, iterative computations on data collections are expressed by recursions on recursive data structures. We propose a parallelization method for data-parallel implementation of such recursive functions. We employ polytypic data-parallel primitives to represent the parallel execution structure of the object programs, which enables data parallel execution with general recursive data structures, such as trees and lists. To transform sequential programs to their parallelized versions, we propose a method to classify the types of parallelism in subexpressions, based on the dependencies of the recursive calls, and generate the data-parallel programs using data-parallel primitives appropriately.

Brightness Temperature Retrieval using Direct Broadcast Data from the Passive Microwave Imager on Aqua Satellite

  • Kim, Seung-Bum;Im, Yong-Jo;Kim, Kum-Lan;Park, Hye-Sook;Park, Sung-Ok
    • Korean Journal of Remote Sensing
    • /
    • v.20 no.1
    • /
    • pp.47-55
    • /
    • 2004
  • We have constructed a level-1 processor to generate brightness temperatures using the direct-broadcast data from the passive microwave radiometer onboard Aqua satellite. Although 50-minute half-orbit data, called a granule, are being routinely produced by global data centers, to our knowledge, this is the first attempt to process 10-minute long direct-broadcast (DB) data. We found that the processor designed for a granule needs modification to apply to the DB data. The modification includes the correction to path number, the selection of land mask and the manipulation of dummy scans. Pixel-to-pixel comparison with a reference indicates the difference in brightness temperature of about 0.2 K rms and less than 0.05 K mean. The difference comes from the different length of data between 50-minute granule and about 10-minute DB data. In detail, due to the short data length, DB data do not always have correct cold sky mirror count. The DB processing system is automated to enable the near-real time generation of brightness temperatures within 5 minutes after downlink. Through this work, we would be able to enhance the use of AMSR-E data, thus serving the objective of direct-broadcast.

Wi-Fi Fingerprint-based Data Collection Method and Processing Research (와이파이 핑거프린트 기반 데이터 수집 방법 및 가공 연구)

  • Kim, Sung-Hyun;Yoon, Chang-Pyo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.319-322
    • /
    • 2019
  • There are many techniques for locating users in an indoor spot. Among them, WiFi fingerprinting technique which is widely used is phased into a data collection step and a positioning step. In the data collection step, all surrounding Wi-Fi signals are collected and managed as a list. The more data collected, the better the accuracy of the indoor position based on Wi-Fi fingerprint. Existing high-quality data collection and management methods are time consuming and costly, and many operations are required to extract and generate data necessary for machine learning. Therefore, we research how to collect and manage large amount of data in limited resources. This paper presents efficient data collection methods and data generation for learning.

  • PDF

Human Detection using Real-virtual Augmented Dataset

  • Jongmin, Lee;Yongwan, Kim;Jinsung, Choi;Ki-Hong, Kim;Daehwan, Kim
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.1
    • /
    • pp.98-102
    • /
    • 2023
  • This paper presents a study on how augmenting semi-synthetic image data improves the performance of human detection algorithms. In the field of object detection, securing a high-quality data set plays the most important role in training deep learning algorithms. Recently, the acquisition of real image data has become time consuming and expensive; therefore, research using synthesized data has been conducted. Synthetic data haves the advantage of being able to generate a vast amount of data and accurately label it. However, the utility of synthetic data in human detection has not yet been demonstrated. Therefore, we use You Only Look Once (YOLO), the object detection algorithm most commonly used, to experimentally analyze the effect of synthetic data augmentation on human detection performance. As a result of training YOLO using the Penn-Fudan dataset, it was shown that the YOLO network model trained on a dataset augmented with synthetic data provided high-performance results in terms of the Precision-Recall Curve and F1-Confidence Curve.

Synthetic Data Generation and Performance Analysis for Anomaly Detection (이상 탐지를 위한 합성 데이터 생성 및 성능 분석)

  • Hwang, Ju-hyo;Jin, Kyo-hong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.19-21
    • /
    • 2022
  • Anomaly detection using self-supervised learning typically generates synthetic data to learn to classify normal and abnormal, and uses real abnormal data as test data to measure anomaly detection performance. In a study using this method to generate synthetic data similar to normal data, anomaly detection was carried out by generating synthetic data by cutting and pasting a specific patch from the original image. In this way, the degree of similarity to normal data depends on the number and size of patches, which affects anomaly detection performance. In this paper, synthetic data were generated by varying patch sizes and numbers, and then similarity and analysis with normal data were conducted using a pre-trained model, and anomaly detection performance was measured by learning the model.

  • PDF