• Title/Summary/Keyword: random sets

Search Result 276, Processing Time 0.031 seconds

Machine Learning Based Intrusion Detection Systems for Class Imbalanced Datasets (클래스 불균형 데이터에 적합한 기계 학습 기반 침입 탐지 시스템)

  • Cheong, Yun-Gyung;Park, Kinam;Kim, Hyunjoo;Kim, Jonghyun;Hyun, Sangwon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.6
    • /
    • pp.1385-1395
    • /
    • 2017
  • This paper aims to develop an IDS (Intrusion Detection System) that takes into account class imbalanced datasets. For this, we first built a set of training data sets from the Kyoto 2006+ dataset in which the amounts of normal data and abnormal (intrusion) data are not balanced. Then, we have run a number of tests to evaluate the effectiveness of machine learning techniques for detecting intrusions. Our evaluation results demonstrated that the Random Forest algorithm achieved the best performances.

A unified measure of association for complex data obtained from independence tests (혼합자료에서 독립성검정에 의한 연관성 측정)

  • Lee, Seung-Chun;Huh, Moon Yul
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.4
    • /
    • pp.523-536
    • /
    • 2021
  • Although there exist numerous measures of association, most of them are lacking in generality in that they do not intend to measure the association between heterogeneous type of random variables. On the other hand, many statistical analyzes dealing with complex data sets require a very sophisticate measure of association. In this note, the p-value of independence tests is utilized to obtain a measure of association. The proposed measure of association have some consistency in measuring association between various types of random variables.

Precise Positioning of Farm Vehicle Using Plural GPS Receivers - Error Estimation Simulation and Positioning Fixed Point - (다중 GPS 수신기에 의한 농업용 차량의 정밀 위치 계측(I) - 오차추정 시뮬레이션 및 고정위치계측 -)

  • Kim, Sang-Cheol;Cho, Sung-In;Lee, Seung-Gi;Lee, W.Y.;Hong, Young-Gi;Kim, Gook-Hwan;Cho, Hee-Je;Gang, Ghi-Won
    • Journal of Biosystems Engineering
    • /
    • v.36 no.2
    • /
    • pp.116-121
    • /
    • 2011
  • This study was conducted to develop a robust navigator which could be in positioning for precision farming through developing a plural GPS receiver with 4 sets of GPS antenna. In order to improve positioning accuracy by integrating GPS signals received simultaneously, the algorithm for processing plural GPS signal effectively was designed. Performance of the algorithm was tested using a simulation program and a fixed point on WGS 84 coordinates. Results of this study are aummarized as followings. 1. 4 sets of lower grade GPS receiver and signals were integrated by kalman filter algorithm and geometric algorithm to increase positioning accuracy of the data. 2. Prototype was composed of 4 sets of GPS receiver and INS components. All Star which manufactured by CMC, gyro compass made by KVH, ground speed sensor and integration S/W based on RTOS(Real Time Operating System)were used. 3. Integration algorithm was simulated by developed program which could generate random position error less then 10 m and tested with the prototype at a fixed position. 4. When navigation data was integrated by geometrical correction and kalman filter algorithm, estimated positioning erros were less then 0.6 m and 1.0 m respectively in simulation and fixed position tests.

Field Distribution Characteristics of a Reverberation Chamber with 2D Diffuser Sets (2D 확산기를 이용한 전자파 잔향실 내의 필드 분포 특성)

  • Yang Wook;Rhee Joong-Geun
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.16 no.4 s.95
    • /
    • pp.373-379
    • /
    • 2005
  • Papers on improvement of electromagnetic field uniformity in a reverberation chamber with 1D Quadratic Residue Diffuser of Schroeder method has been published several times. In this paper, to obtain improved electromagnetic field characteristics and field uniformity in a reverberation chamber, cubical residue diffuser sets of Schroeder type are designed for a chamber in $2.3\;\cal{GHz}\~3\;\cal{GHz}$. The FDTD(Finite-Difference Time-Domain) technique is used to analyze the field characteristics in a chamber. Cubical residue algorithm and 2D arrangement show more randomness than the previous study results. The characteristics of tolerance, polarity, deviations, as well as power efficency, are improved with cubical residue diffuser sets in a chamber.

Generating Pairwise Comparison Set for Crowed Sourcing based Deep Learning (크라우드 소싱 기반 딥러닝 선호 학습을 위한 쌍체 비교 셋 생성)

  • Yoo, Kihyun;Lee, Donggi;Lee, Chang Woo;Nam, Kwang Woo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.5
    • /
    • pp.1-11
    • /
    • 2022
  • With the development of deep learning technology, various research and development are underway to estimate preference rankings through learning, and it is used in various fields such as web search, gene classification, recommendation system, and image search. Approximation algorithms are used to estimate deep learning-based preference ranking, which builds more than k comparison sets on all comparison targets to ensure proper accuracy, and how to build comparison sets affects learning. In this paper, we propose a k-disjoint comparison set generation algorithm and a k-chain comparison set generation algorithm, a novel algorithm for generating paired comparison sets for crowd-sourcing-based deep learning affinity measurements. In particular, the experiment confirmed that the k-chaining algorithm, like the conventional circular generation algorithm, also has a random nature that can support stable preference evaluation while ensuring connectivity between data.

Prediction of Tumor Progression During Neoadjuvant Chemotherapy and Survival Outcome in Patients With Triple-Negative Breast Cancer

  • Heera Yoen;Soo-Yeon Kim;Dae-Won Lee;Han-Byoel Lee;Nariya Cho
    • Korean Journal of Radiology
    • /
    • v.24 no.7
    • /
    • pp.626-639
    • /
    • 2023
  • Objective: To investigate the association of clinical, pathologic, and magnetic resonance imaging (MRI) variables with progressive disease (PD) during neoadjuvant chemotherapy (NAC) and distant metastasis-free survival (DMFS) in patients with triple-negative breast cancer (TNBC). Materials and Methods: This single-center retrospective study included 252 women with TNBC who underwent NAC between 2010 and 2019. Clinical, pathologic, and treatment data were collected. Two radiologists analyzed the pre-NAC MRI. After random allocation to the development and validation sets in a 2:1 ratio, we developed models to predict PD and DMFS using logistic regression and Cox proportional hazard regression, respectively, and validated them. Results: Among the 252 patients (age, 48.3 ± 10.7 years; 168 in the development set; 84 in the validation set), PD was occurred in 17 patients and 9 patients in the development and validation sets, respectively. In the clinical-pathologic-MRI model, the metaplastic histology (odds ratio [OR], 8.0; P = 0.032), Ki-67 index (OR, 1.02; P = 0.044), and subcutaneous edema (OR, 30.6; P = 0.004) were independently associated with PD in the development set. The clinical-pathologic-MRI model showed a higher area under the receiver-operating characteristic curve (AUC) than the clinical-pathologic model (AUC: 0.69 vs. 0.54; P = 0.017) for predicting PD in the validation set. Distant metastases occurred in 49 patients and 18 patients in the development and validation sets, respectively. Residual disease in both the breast and lymph nodes (hazard ratio [HR], 6.0; P = 0.005) and the presence of lymphovascular invasion (HR, 3.3; P < 0.001) were independently associated with DMFS. The model consisting of these pathologic variables showed a Harrell's C-index of 0.86 in the validation set. Conclusion: The clinical-pathologic-MRI model, which considered subcutaneous edema observed using MRI, performed better than the clinical-pathologic model for predicting PD. However, MRI did not independently contribute to the prediction of DMFS.

Classification of Remote Sensing Data using Random Selection of Training Data and Multiple Classifiers (훈련 자료의 임의 선택과 다중 분류자를 이용한 원격탐사 자료의 분류)

  • Park, No-Wook;Yoo, Hee Young;Kim, Yihyun;Hong, Suk-Young
    • Korean Journal of Remote Sensing
    • /
    • v.28 no.5
    • /
    • pp.489-499
    • /
    • 2012
  • In this paper, a classifier ensemble framework for remote sensing data classification is presented that combines classification results generated from both different training sets and different classifiers. A core part of the presented framework is to increase a diversity between classification results by using both different training sets and classifiers to improve classification accuracy. First, different training sets that have different sampling densities are generated and used as inputs for supervised classification using different classifiers that show different discrimination capabilities. Then several preliminary classification results are combined via a majority voting scheme to generate a final classification result. A case study of land-cover classification using multi-temporal ENVISAT ASAR data sets is carried out to illustrate the potential of the presented classification framework. In the case study, nine classification results were combined that were generated by using three different training sets and three different classifiers including maximum likelihood classifier, multi-layer perceptron classifier, and support vector machine. The case study results showed that complementary information on the discrimination of land-cover classes of interest would be extracted within the proposed framework and the best classification accuracy was obtained. When comparing different combinations, to combine any classification results where the diversity of the classifiers is not great didn't show an improvement of classification accuracy. Thus, it is recommended to ensure the greater diversity between classifiers in the design of multiple classifier systems.

A Generalized Subtractive Algorithm for Subset Sum Problem (부분집합 합 문제의 일반화된 감산 알고리즘)

  • Lee, Sang-Un
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.2
    • /
    • pp.9-14
    • /
    • 2022
  • This paper presents a subset sum problem (SSP) algorithm which takes the time complexity of O(nlogn). The SSP can be classified into either super-increasing sequence or random sequence depending on the element of Set S. Additive algorithm that runs in O(nlogn) has already been proposed to and utilized for the super-increasing sequence SSP, but exhaustive Brute-Force method with time complexity of O(n2n) remains as the only viable algorithm for the random sequence SSP, which is thus considered NP-complete. The proposed subtractive algorithm basically selects a subset S comprised of values lower than target value t, then sets the subset sum less the target value as the Residual r, only to remove from S the maximum value among those lower than t. When tested on various super-increasing and random sequence SSPs, the algorithm has obtained optimal solutions running less than the cardinality of S. It can therefore be used as a general algorithm for the SSP.

Single-step genomic evaluation for growth traits in a Mexican Braunvieh cattle population

  • Jonathan Emanuel Valerio-Hernandez;Agustin Ruiz-Flores;Mohammad Ali Nilforooshan;Paulino Perez-Rodriguez
    • Animal Bioscience
    • /
    • v.36 no.7
    • /
    • pp.1003-1009
    • /
    • 2023
  • Objective: The objective was to compare (pedigree-based) best linear unbiased prediction (BLUP), genomic BLUP (GBLUP), and single-step GBLUP (ssGBLUP) methods for genomic evaluation of growth traits in a Mexican Braunvieh cattle population. Methods: Birth (BW), weaning (WW), and yearling weight (YW) data of a Mexican Braunvieh cattle population were analyzed with BLUP, GBLUP, and ssGBLUP methods. These methods are differentiated by the additive genetic relationship matrix included in the model and the animals under evaluation. The predictive ability of the model was evaluated using random partitions of the data in training and testing sets, consistently predicting about 20% of genotyped animals on all occasions. For each partition, the Pearson correlation coefficient between adjusted phenotypes for fixed effects and non-genetic random effects and the estimated breeding values (EBV) were computed. Results: The random contemporary group (CG) effect explained about 50%, 45%, and 35% of the phenotypic variance in BW, WW, and YW, respectively. For the three methods, the CG effect explained the highest proportion of the phenotypic variances (except for YW-GBLUP). The heritability estimate obtained with GBLUP was the lowest for BW, while the highest heritability was obtained with BLUP. For WW, the highest heritability estimate was obtained with BLUP, the estimates obtained with GBLUP and ssGBLUP were similar. For YW, the heritability estimates obtained with GBLUP and BLUP were similar, and the lowest heritability was obtained with ssGBLUP. Pearson correlation coefficients between adjusted phenotypes for non-genetic effects and EBVs were the highest for BLUP, followed by ssBLUP and GBLUP. Conclusion: The successful implementation of genetic evaluations that include genotyped and non-genotyped animals in our study indicate a promising method for use in genetic improvement programs of Braunvieh cattle. Our findings showed that simultaneous evaluation of genotyped and non-genotyped animals improved prediction accuracy for growth traits even with a limited number of genotyped animals.

SRC-Stat Package for Fitting Double Hierarchical Generalized Linear Models (이중 다단계 일반화 선형모형 적합을 위한 SRC-stat의 사용)

  • Noh, Maengseok;Ha, Il Do;Lee, Youngjo;Lim, Johan;Lee, Jaeyong;Oh, Heeseok;Shin, Dongwan;Lee, Sanggoo;Seo, Jinuk;Park, Yonhtae;Cho, Sungzoon;Park, Jonghun;Kim, Youkyung;You, Kyungsang
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.2
    • /
    • pp.343-351
    • /
    • 2015
  • We introduce how to fit random effects models via a SRC-Stat statistical package. This package has been developed to fit double hierarchical generalized linear models where mean and dispersion parameters for the variance of random effects and residual variance (overdispersion) can be modeled as random-effect models. The estimates of fixed effects, random effects and variances are calculated by a hierarchical likelihood method. We illustrate the use of our package with practical data-sets.