• Title/Summary/Keyword: Non-IID

Search Result 20, Processing Time 0.032 seconds

An Experimental Analysis on Entropy Estimators for the Entropy Sources Using Predictors of NIST SP 800-90B (NIST SP 800-90B 프레딕터를 이용한 잡음원의 엔트로피 추정량에 대한 실험적 분석)

  • Park, Hojoong;Bae, Minyoung;Yeom, Yongjin;Kang, Ju-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.12
    • /
    • pp.1892-1902
    • /
    • 2016
  • NIST SP 800-90B is developed to evaluate the security of entropy sources. As SP 800-90B was updated to Second Draft, Estimators with predictors were added at Non-IID track. Though the predictors are known as detecting periodic property of noise sources, periodic properties which are detected by predictor are not clearly known. In this paper, we experiment to find properties of predictors. Once, by experiments we have a result that the min-entropy of Non-IID noise sources is generally determined by tests except for estimators with predictors. And then through presenting various experimental results for clarifying periodic properties detected by predictor, we experimentally analyze on its meaning and role of predictor estimation.

A Study on Federated Learning of Non-IID MNIST Data (NoN-IID MNIST 데이터의 연합학습 연구)

  • Joowon Lee;Joonil Bang;Jongwoo Baek;Hwajong Kim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.533-534
    • /
    • 2023
  • 본 논문에서는 불균형하게 분포된(Non-IID) 데이터를 소유하고 있는 데이터 소유자(클라이언트)들을 가정하고, 데이터 소유자들 간 원본 데이터의 직접적인 이동 없이도 딥러닝 학습이 가능하도록 연합학습을 적용하였다. 실험 환경 구성을 위하여 MNIST 손글씨 데이터 세트를 하나의 숫자만 다량 보유하도록 분할하고 각 클라이언트에게 배포하였다. 연합학습을 적용하여 손글씨 분류 모델을 학습하였을 때 정확도는 85.5%, 중앙집중식 학습모델의 정확도는 90.2%로 연합학습 모델이 중앙집중식 모델 대비 약 95% 수준의 성능을 보여 연합학습 시 성능 하락이 크지 않으며 특수한 상황에서 중앙집중식 학습을 대체할 수 있음을 보였다.

  • PDF

FedGCD: Federated Learning Algorithm with GNN based Community Detection for Heterogeneous Data

  • Wooseok Shin;Jitae Shin
    • Journal of Internet Computing and Services
    • /
    • v.24 no.6
    • /
    • pp.1-11
    • /
    • 2023
  • Federated learning (FL) is a ground breaking machine learning paradigm that allow smultiple participants to collaboratively train models in a cloud environment, all while maintaining the privacy of their raw data. This approach is in valuable in applications involving sensitive or geographically distributed data. However, one of the challenges in FL is dealing with heterogeneous and non-independent and identically distributed (non-IID) data across participants, which can result in suboptimal model performance compared to traditionalmachine learning methods. To tackle this, we introduce FedGCD, a novel FL algorithm that employs Graph Neural Network (GNN)-based community detection to enhance model convergence in federated settings. In our experiments, FedGCD consistently outperformed existing FL algorithms in various scenarios: for instance, in a non-IID environment, it achieved an accuracy of 0.9113, a precision of 0.8798,and an F1-Score of 0.8972. In a semi-IID setting, it demonstrated the highest accuracy at 0.9315 and an impressive F1-Score of 0.9312. We also introduce a new metric, nonIIDness, to quantitatively measure the degree of data heterogeneity. Our results indicate that FedGCD not only addresses the challenges of data heterogeneity and non-IIDness but also sets new benchmarks for FL algorithms. The community detection approach adopted in FedGCD has broader implications, suggesting that it could be adapted for other distributed machine learning scenarios, thereby improving model performance and convergence across a range of applications.

High-Speed Implementation and Efficient Memory Usage of Min-Entropy Estimation Algorithms in NIST SP 800-90B (NIST SP 800-90B의 최소 엔트로피 추정 알고리즘에 대한 고속 구현 및 효율적인 메모리 사용 기법)

  • Kim, Wontae;Yeom, Yongjin;Kang, Ju-Sung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.1
    • /
    • pp.25-39
    • /
    • 2018
  • NIST(National Institute of Standards and Technology) has recently published SP 800-90B second draft which is the document for evaluating security of entropy source, a key element of a cryptographic random number generator(RNG), and provided a tool implemented on Python code. In SP 800-90B, the security evaluation of the entropy sources is a process of estimating min-entropy by several estimators. The process of estimating min-entropy is divided into IID track and non-IID track. In IID track, the entropy sources are estimated only from MCV estimator. In non-IID Track, the entropy sources are estimated from 10 estimators including MCV estimator. The running time of the NIST's tool in non-IID track is approximately 20 minutes and the memory usage is over 5.5 GB. For evaluation agencies that have to perform repeatedly evaluations on various samples, and developers or researchers who have to perform experiments in various environments, it may be inconvenient to estimate entropy using the tool and depending on the environment, it may be impossible to execute. In this paper, we propose high-speed implementations and an efficient memory usage technique for min-entropy estimation algorithm of SP 800-90B. Our major achievements are the three improved speed and efficient memory usage reduction methods which are the method applying advantages of C++ code for improving speed of MultiMCW estimator, the method effectively reducing the memory and improving speed of MultiMMC by rebuilding the data storage structure, and the method improving the speed of LZ78Y by rebuilding the data structure. The tool applied our proposed methods is 14 times faster and saves 13 times more memory usage than NIST's tool.

연합학습 환경에서 클라이언트 선택의 최적화 기법

  • 박민정;손영진;채상미
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.722-723
    • /
    • 2023
  • 연합학습은 중앙 서버에서 데이터를 수집하는 방식이 아닌 로컬 디바이스 또는 클라이언트에서 학습을 진행하고 중앙 서버로 모델 업데이트만 전송하는 분산 학습 기법으로 데이터 보안 및 개인정보보호를 강화하는 동시에 효율적인 분산 학습을 수행할 수 있다. 그러나, 연합학습 대부분의 시나리오는 클라이언트의 서로 다른 분포 형태인 non-IID 데이터를 대상으로 학습함에 따라 중앙집중식 모델에 비하여 낮은 성능을 보이게 된다. 이에 본 연구에서는 연합학습 모델의 성능을 개선하기 위하여 non-IID 의 환경에서 참여 후보자 중에서 적합한 클라이언트 선택의 최적화 기법을 분석한다.

Dynamic Window Adjustment and Model Stability Improvement Algorithm for K-Asynchronous Federated Learning (K-비동기식 연합학습의 동적 윈도우 조절과 모델 안정성 향상 알고리즘)

  • HyoSang Kim;Taejoon Kim
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.4
    • /
    • pp.21-34
    • /
    • 2023
  • Federated Learning is divided into synchronous federated learning and asynchronous federated learning. Asynchronous federated learning has a time advantage over synchronous federated learning, but asynchronous federated learning still has some challenges to obtain better performance. In particular, preventing performance degradation in non-IID training datasets, selecting appropriate clients, and managing stale gradient information are important for improving model performance. In this paper, we deal with K-asynchronous federated learning by using non-IID datasets. In addition, unlike traditional method using static K, we proposed an algorithm that adaptively adjusts K and we can reduce the learning time. Additionally, the we show that model performance is improved by using stale gradient handling method. Finally, we use a method of judging model performance to obtain strong model stability. Experiment results show that overall algorithm can obtain advantages of reducing training time, improving model accuracy, and improving model stability.

Independence and Homogeneity Tests of the Annual Maxima Data used to Estimate the Design Wave Height (설계파고 추정에 사용한 연 최대 자료의 독립 및 분포 동질 검정)

  • Cho, Hong Yeon;Jeong, Weon Mu;Back, Jong Dai
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.32 no.1
    • /
    • pp.26-38
    • /
    • 2020
  • A statistical test was carried out on the IID (Independently and Identically Distributed) assumption of the AM (Annual Maxima) data used to estimate the design wave height. The test was divided into independence (randomness) test and homogeneity test, and each test was conducted on AM data of 210 and 310 stations in coastal and inner coastal grids in typhoon and non-typhoon (monsoon) conditions. As a result of the independence test, the rejection ratios of the test are in the range of 1.8~5.3% and 1.4~6.0% for the non-typhoon and typhoon data sets, respectively. On the other hand, in the distribution difference test of typhoon data and nontyphoon data, the same distribution hypothesis was found to be rejected in the range of 47~79% according to the test method for both coastal grid and inner coastal grid. Therefore, in estimating design wave height by extreme value analysis, the estimation process by dividing the typhoon and non-typhoon data is appropriate.

Empirical Bayes Pproblems with Dependent and Nonidentical Components

  • Inha Jung;Jee-Chang Hong;Kang Sup Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.2 no.1
    • /
    • pp.145-154
    • /
    • 1995
  • Empirical Bayes approach is applied to estimation of the binomial parameter when there is a cost for observations. Both the sample size and the decision rule for estimating the parameter are determined stochastically by the data, making the result more useful in applications. Our empirical Bayes problems with non-iid components are compared to the usual empirical Bayes problems with iid components. The asymptotic optimal procedure with a computer simulation is given.

  • PDF

Design of weighted federated learning framework based on local model validation

  • Kim, Jung-Jun;Kang, Jeon Seong;Chung, Hyun-Joon;Park, Byung-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.13-18
    • /
    • 2022
  • In this paper, we proposed VW-FedAVG(Validation based Weighted FedAVG) which updates the global model by weighting according to performance verification from the models of each device participating in the training. The first method is designed to validate each local client model through validation dataset before updating the global model with a server side validation structure. The second is a client-side validation structure, which is designed in such a way that the validation data set is evenly distributed to each client and the global model is after validation. MNIST, CIFAR-10 is used, and the IID, Non-IID distribution for image classification obtained higher accuracy than previous studies.

Improving the Performance of Threshold Bootstrap for Simulation Output Analysis (시뮬레이션 출력분석을 위한 임계값 부트스트랩의 성능개선)

  • Kim, Yun-Bae
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.23 no.4
    • /
    • pp.755-767
    • /
    • 1997
  • Analyzing autocorrelated data set is still an open problem. Developing on easy and efficient method for severe positive correlated data set, which is common in simulation output, is vital for the simulation society. Bootstrap is on easy and powerful tool for constructing non-parametric inferential procedures in modern statistical data analysis. Conventional bootstrap algorithm requires iid assumption in the original data set. Proper choice of resampling units for generating replicates has much to do with the structure of the original data set, iid data or autocorrelated. In this paper, a new bootstrap resampling scheme is proposed to analyze the autocorrelated data set : the Threshold Bootstrap. A thorough literature search of bootstrap method focusing on the case of autocorrelated data set is also provided. Theoretical foundations of Threshold Bootstrap is studied and compared with other leading bootstrap sampling techniques for autocorrelated data sets. The performance of TB is reported using M/M/1 queueing model, else the comparison of other resampling techniques of ARMA data set is also reported.

  • PDF