• Title/Summary/Keyword: False Positive data

Search Result 238, Processing Time 0.027 seconds

An Adaptive Watermark Detection Algorithm for Vector Geographic Data

  • Wang, Yingying;Yang, Chengsong;Ren, Na;Zhu, Changqing;Rui, Ting;Wang, Dong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.1
    • /
    • pp.323-343
    • /
    • 2020
  • With the rapid development of computer and communication techniques, copyright protection of vector geographic data has attracted considerable research attention because of the high cost of such data. A novel adaptive watermark detection algorithm is proposed for vector geographic data that can be used to qualitatively analyze the robustness of watermarks against data addition attacks. First, a watermark was embedded into the vertex coordinates based on coordinate mapping and quantization. Second, the adaptive watermark detection model, which is capable of calculating the detection threshold, false positive error (FPE) and false negative error (FNE), was established, and the characteristics of the adaptive watermark detection algorithm were analyzed. Finally, experiments were conducted on several real-world vector maps to show the usability and robustness of the proposed algorithm.

A Pre-processing Study to Solve the Problem of Rare Class Classification of Network Traffic Data (네트워크 트래픽 데이터의 희소 클래스 분류 문제 해결을 위한 전처리 연구)

  • Ryu, Kyung Joon;Shin, DongIl;Shin, DongKyoo;Park, JeongChan;Kim, JinGoog
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.12
    • /
    • pp.411-418
    • /
    • 2020
  • In the field of information security, IDS(Intrusion Detection System) is normally classified in two different categories: signature-based IDS and anomaly-based IDS. Many studies in anomaly-based IDS have been conducted that analyze network traffic data generated in cyberspace by machine learning algorithms. In this paper, we studied pre-processing methods to overcome performance degradation problems cashed by rare classes. We experimented classification performance of a Machine Learning algorithm by reconstructing data set based on rare classes and semi rare classes. After reconstructing data into three different sets, wrapper and filter feature selection methods are applied continuously. Each data set is regularized by a quantile scaler. Depp neural network model is used for learning and validation. The evaluation results are compared by true positive values and false negative values. We acquired improved classification performances on all of three data sets.

Analysis and Cut-off Adjustment of Dried Blood Spot 17alpha-hydroxyprogesterone Concentration by Birth Weight (신생아의 출생 체중에 따른 혈액 여과지 17alpha-hydroxyprogesterone의 농도 분석 및 판정 기준 조정)

  • Park, Seungman;Kwon, Aerin;Yang, Songhyeon;Park, Euna;Choi, Jaehwang;Hwang, Mijung;Nam, Hyeongyeong;Lee, Eunhee
    • Journal of The Korean Society of Inherited Metabolic disease
    • /
    • v.14 no.2
    • /
    • pp.150-155
    • /
    • 2014
  • The measurement of $17{\alpha}$-hydroxyprogesterone ($17{\alpha}$-OHP) in a dried blood spot on filter paper is an important for screening of congenital adrenal hyperplasia (CAH). Since high levels of $17{\alpha}$-OHP are frequently observed in premature infants without congenital adrenal hyperplasia, we evaluated cuts-off based on birth weight and performed validation. Birth weight and $17{\alpha}$-OHP concentration data of 292,204 newborn screening subjects in Greencross labopratories were analyzed. The cut-off values based on birth weight were newly evaluated and validated with the original data. The mean $17{\alpha}$-OHP concentration were 7.25 ng/mL in very low birth weight (VLBW) group, 4.02 ng/mL in low birth weight (LBW) group, 2.53 g/mL in normal birth weight (NBW) group, and 2.24 ng/mL in heavy birth weight (HBW) group. The cut-offs for CAH were decided as follows: 21.12 ng/mL for VLBW and LBW groups and 11.14 ng/mL for NBW and HBW groups. When applied new cut-offs for original data, positive rates in VLBW and LBW groups were decreased and positive rates in NBW and HBW groups were increased. The cut-offs based on birth weight should be used in the screening for CAH. We believe that our new cut-off reduce the false positive rate and false negative rate and our experience for cut-off set up and validation will be helpful for other laboratories doing newborn screening test.

Theoretical Considerations for the Agresti-Coull Type Confidence Interval in Misclassified Binary Data (오분류된 이진자료에서 Agresti-Coull유형의 신뢰구간에 대한 이론적 고찰)

  • Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.4
    • /
    • pp.445-455
    • /
    • 2011
  • Although misclassified binary data occur frequently in practice, the statistical methodology available for the data is rather limited. In particular, the interval estimation of population proportion has relied on the classical Wald method. Recently, Lee and Choi (2009) developed a new confidence interval by applying the Agresti-Coull's approach and showed the efficiency of their proposed confidence interval numerically, but a theoretical justification has not been explored yet. Therefore, a Bayesian model for the misclassified binary data is developed to consider the Agresti-Coull confidence interval from a theoretical point of view. It is shown that the Agresti-Coull confidence interval is essentially a Bayesian confidence interval.

The Role of Artificial Observations in Testing for the Difference of Proportions in Misclassified Binary Data

  • Lee, Seung-Chun
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.3
    • /
    • pp.513-520
    • /
    • 2012
  • An Agresti-Coull type test is considered for the difference of binomial proportions in two doubly sampled data subject to false-positive error. The performance of the test is compared with the likelihood-based tests. It is shown that the Agresti-Coull test has many desirable properties in that it can approximate the nominal significance level with compatible power performance.

Confidence Intervals for the Difference of Binomial Proportions in Two Doubly Sampled Data

  • Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.3
    • /
    • pp.309-318
    • /
    • 2010
  • The construction of asymptotic confidence intervals is considered for the difference of binomial proportions in two doubly sampled data subject to false-positive error. The coverage behaviors of several likelihood based confidence intervals and a Bayesian confidence interval are examined. It is shown that a hierarchical Bayesian approach gives a confidence interval with good frequentist properties. Confidence interval based on the Rao score is also shown to have good performance in terms of coverage probability. However, the Wald confidence interval covers true value less often than nominal level.

Study of Snort Intrusion Detection Rules for Recognition of Intelligent Threats and Response of Active Detection (지능형 위협인지 및 능동적 탐지대응을 위한 Snort 침입탐지규칙 연구)

  • Han, Dong-hee;Lee, Sang-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.5
    • /
    • pp.1043-1057
    • /
    • 2015
  • In order to recognize intelligent threats quickly and detect and respond to them actively, major public bodies and private institutions operate and administer an Intrusion Detection Systems (IDS), which plays a very important role in finding and detecting attacks. However, most IDS alerts have a problem that they generate false positives. In addition, in order to detect unknown malicious codes and recognize and respond to their threats in advance, APT response solutions or actions based systems are introduced and operated. These execute malicious codes directly using virtual technology and detect abnormal activities in virtual environments or unknown attacks with other methods. However, these, too, have weaknesses such as the avoidance of the virtual environments, the problem of performance about total inspection of traffic and errors in policy. Accordingly, for the effective detection of intrusion, it is very important to enhance security monitoring, consequentially. This study discusses a plan for the reduction of false positives as a plan for the enhancement of security monitoring. As a result of an experiment based on the empirical data of G, rules were drawn in three types and 11 kinds. As a result of a test following these rules, it was verified that the overall detection rate decreased by 30% to 50%, and the performance was improved by over 30%.

A Study on the Improvement of Source Code Static Analysis Using Machine Learning (기계학습을 이용한 소스코드 정적 분석 개선에 관한 연구)

  • Park, Yang-Hwan;Choi, Jin-Young
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1131-1139
    • /
    • 2020
  • The static analysis of the source code is to find the remaining security weaknesses for a wide range of source codes. The static analysis tool is used to check the result, and the static analysis expert performs spying and false detection analysis on the result. In this process, the amount of analysis is large and the rate of false positives is high, so a lot of time and effort is required, and a method of efficient analysis is required. In addition, it is rare for experts to analyze only the source code of the line where the defect occurred when performing positive/false detection analysis. Depending on the type of defect, the surrounding source code is analyzed together and the final analysis result is delivered. In order to solve the difficulty of experts discriminating positive and false positives using these static analysis tools, this paper proposes a method of determining whether or not the security weakness found by the static analysis tools is a spy detection through artificial intelligence rather than an expert. In addition, the optimal size was confirmed through an experiment to see how the size of the training data (source code around the defects) used for such machine learning affects the performance. This result is expected to help the static analysis expert's job of classifying positive and false positives after static analysis.

A Design of FHIDS(Fuzzy logic based Hybrid Intrusion Detection System) using Naive Bayesian and Data Mining (나이브 베이지안과 데이터 마이닝을 이용한 FHIDS(Fuzzy Logic based Hybrid Intrusion Detection System) 설계)

  • Lee, Byung-Kwan;Jeong, Eun-Hee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.5 no.3
    • /
    • pp.158-163
    • /
    • 2012
  • This paper proposes an FHIDS(Fuzzy logic based Hybrid Intrusion Detection System) design that detects anomaly and misuse attacks by using a Naive Bayesian algorithm, Data Mining, and Fuzzy Logic. The NB-AAD(Naive Bayesian based Anomaly Attack Detection) technique using a Naive Bayesian algorithm within the FHIDS detects anomaly attacks. The DM-MAD(Data Mining based Misuse Attack Detection) technique using Data Mining within it analyzes the correlation rules among packets and detects new attacks or transformed attacks by generating the new rule-based patterns or by extracting the transformed rule-based patterns. The FLD(Fuzzy Logic based Decision) technique within it judges the attacks by using the result of the NB-AAD and DM-MAD. Therefore, the FHIDS is the hybrid attack detection system that improves a transformed attack detection ratio, and reduces False Positive ratio by making it possible to detect anomaly and misuse attacks.

Anomaly detection and attack type classification mechanism using Extra Tree and ANN (Extra Tree와 ANN을 활용한 이상 탐지 및 공격 유형 분류 메커니즘)

  • Kim, Min-Gyu;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.23 no.5
    • /
    • pp.79-85
    • /
    • 2022
  • Anomaly detection is a method to detect and block abnormal data flows in general users' data sets. The previously known method is a method of detecting and defending an attack based on a signature using the signature of an already known attack. This has the advantage of a low false positive rate, but the problem is that it is very vulnerable to a zero-day vulnerability attack or a modified attack. However, in the case of anomaly detection, there is a disadvantage that the false positive rate is high, but it has the advantage of being able to identify, detect, and block zero-day vulnerability attacks or modified attacks, so related studies are being actively conducted. In this study, we want to deal with these anomaly detection mechanisms, and we propose a new mechanism that performs both anomaly detection and classification while supplementing the high false positive rate mentioned above. In this study, the experiment was conducted with five configurations considering the characteristics of various algorithms. As a result, the model showing the best accuracy was proposed as the result of this study. After detecting an attack by applying the Extra Tree and Three-layer ANN at the same time, the attack type is classified using the Extra Tree for the classified attack data. In this study, verification was performed on the NSL-KDD data set, and the accuracy was 99.8%, 99.1%, 98.9%, 98.7%, and 97.9% for Normal, Dos, Probe, U2R, and R2L, respectively. This configuration showed superior performance compared to other models.