• Title/Summary/Keyword: False Positive Probability

Search Result 30, Processing Time 0.039 seconds

Approaches for Improving Bloom Filter-Based Set Membership Query

  • Lee, HyunYong;Lee, Byung-Tak
    • Journal of Information Processing Systems
    • /
    • v.15 no.3
    • /
    • pp.550-569
    • /
    • 2019
  • We propose approaches for improving Bloom filter in terms of false positive probability and membership query speed. To reduce the false positive probability, we propose special type of additional Bloom filters that are used to handle false positives caused by the original Bloom filter. Implementing the proposed approach for a routing table lookup, we show that our approach reduces the routing table lookup time by up to 28% compared to the original Bloom filter by handling most false positives within the fast memory. We also introduce an approach for improving the membership query speed. Taking the hash table-like approach while storing only values, the proposed approach shows much faster membership query speed than the original Bloom filter (e.g., 34 times faster with 10 subsets). Even compared to a hash table, our approach reduces the routing table lookup time by up to 58%.

An Analysis on the Error Probability of A Bloom Filter (블룸필터의 오류 확률에 대한 분석)

  • Kim, SungYong;Kim, JiHong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.24 no.5
    • /
    • pp.809-815
    • /
    • 2014
  • As the size of the data is getting larger and larger due to improvement of the telecommunication techniques, it would be main issues to develop and process the database. The bloom filter used to lookup a particular element under the given set is very useful structure because of the space efficiency. In this paper, we introduce the error probabilities in Bloom filter. Especially, we derive the revised false positive rates of the Bloom filter using experimental method. Finally we analyze and compare the original false positive probability of the bloom filter used until now and the false decision probability proposed in this paper.

Likelihood Based Confidence Intervals for the Difference of Proportions in Two Doubly Sampled Data with a Common False-Positive Error Rate

  • Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.5
    • /
    • pp.679-688
    • /
    • 2010
  • Lee (2010) developed a confidence interval for the difference of binomial proportions in two doubly sampled data subject to false-positive errors. The confidence interval seems to be adequate for a general double sampling model subject to false-positive misclassification. However, in many applications, the false-positive error rates could be the same. On this note, the construction of asymptotic confidence interval is considered when the false-positive error rates are common. The coverage behaviors of nine likelihood based confidence intervals are examined. It is shown that the confidence interval based Rao score with the expected information has good performance in terms of coverage probability and expected width.

Understanding the genetics of systemic lupus erythematosus using Bayesian statistics and gene network analysis

  • Nam, Seoung Wan;Lee, Kwang Seob;Yang, Jae Won;Ko, Younhee;Eisenhut, Michael;Lee, Keum Hwa;Shin, Jae Il;Kronbichler, Andreas
    • Clinical and Experimental Pediatrics
    • /
    • v.64 no.5
    • /
    • pp.208-222
    • /
    • 2021
  • The publication of genetic epidemiology meta-analyses has increased rapidly, but it has been suggested that many of the statistically significant results are false positive. In addition, most such meta-analyses have been redundant, duplicate, and erroneous, leading to research waste. In addition, since most claimed candidate gene associations were false-positives, correctly interpreting the published results is important. In this review, we emphasize the importance of interpreting the results of genetic epidemiology meta-analyses using Bayesian statistics and gene network analysis, which could be applied in other diseases.

False-Positive Mycobacterium tuberculosis Detection: Ways to Prevent Cross-Contamination

  • Asgharzadeh, Mohammad;Ozma, Mahdi Asghari;Rashedi, Jalil;Poor, Behroz Mahdavi;Agharzadeh, Vahid;Vegari, Ali;Shokouhi, Behrooz;Ganbarov, Khudaverdi;Ghalehlou, Nima Najafi;Leylabadlo, Hamed Ebrahmzadeh;Kafil, Hossein Samadi
    • Tuberculosis and Respiratory Diseases
    • /
    • v.83 no.3
    • /
    • pp.211-217
    • /
    • 2020
  • The gold standard method for diagnosis of tuberculosis is the isolation of Mycobacterium tuberculosis through culture, but there is a probability of cross-contamination in simultaneous cultures of samples causing false-positives. This can result in delayed treatment of the underlying disease and drug side effects. In this paper, we reviewed studies on false-positive cultures of M. tuberculosis. Rate of occurrence, effective factors, and extent of false-positives were analyzed. Ways to identify and reduce the false-positives and management of them are critical for all laboratories. In most cases, false-positive is occurring in cases with only one positive culture but negative direct smear. The three most crucial factors in this regard are inappropriate technician function, contamination of reagents, and aerosol production. Thus, to reduce false-positives, good laboratory practice, as well as use of whole-genome sequencing or genotyping of all positive culture samples with a robust, extra pure method and rapid response, are essential for minimizing the rate of false-positives. Indeed, molecular approaches and epidemiological surveillance can provide a valuable tool besides culture to identify possible false positives.

Interval Estimation of Population Proportion in a Double Sampling Scheme (이중표본에서 모비율의 구간추정)

  • Lee, Seung-Chun;Choi, Byong-Su
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1289-1300
    • /
    • 2009
  • The double sampling scheme is effective in reducing the sampling cost. However, the doubly sampled data is contaminated by two types of error, namely false-positive and false-negative errors. These would make the statistical analysis more difficult, and it would require more sophisticate analysis tools. For instance, the Wald method for the interval estimation of a proportion would not work well. In fact, it is well known that the Wald confidence interval behaves very poorly in many sampling schemes. In this note, the property of the Wald interval is investigated in terms of the coverage probability and the expected width. An alternative confidence interval based on the Agresti-Coull's approach is recommended.

Improved Fusion Method of Detection Features in SAR ATR System (SAR 자동표적인식 시스템에서의 탐지특징 결합 방법 개선 방안)

  • Cha, Min-Jun;Kim, Hyung-Myung
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.13 no.3
    • /
    • pp.461-469
    • /
    • 2010
  • In this paper, we have proposed an improved fusion method of detection features which can enhance the detection probability under the given false alarm rate in the prescreening stage of SAR ATR(Synthetic Aperture Radar Automatic Target Recognition) system. Since the detection features have the positive correlation, the detection performance can be improved if the joint probability distribution of detection features is considered in the fusion process. The detection region is designed as a simple piecewise linear function which can be represented by few parameters. The parameters for the detection region can be derived by training the sample SAR images to maximize the detection probability with the given false alarm rate. Simulation result shows that the detection performance of the proposed method is improved for all combinations of detection features.

Confidence Intervals for the Difference of Binomial Proportions in Two Doubly Sampled Data

  • Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.3
    • /
    • pp.309-318
    • /
    • 2010
  • The construction of asymptotic confidence intervals is considered for the difference of binomial proportions in two doubly sampled data subject to false-positive error. The coverage behaviors of several likelihood based confidence intervals and a Bayesian confidence interval are examined. It is shown that a hierarchical Bayesian approach gives a confidence interval with good frequentist properties. Confidence interval based on the Rao score is also shown to have good performance in terms of coverage probability. However, the Wald confidence interval covers true value less often than nominal level.

AUC and VUS using truncated distributions (절단함수를 이용한 AUC와 VUS)

  • Hong, Chong Sun;Hong, Seong Hyuk
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.4
    • /
    • pp.593-605
    • /
    • 2019
  • Significant literature exists on the area under the ROC curve (AUC) and the volume under the ROC surface (VUS) which are statistical measures of the discriminant power of classification models. Whereas the partial AUC is restricted on the false positive rate, the two-way partial AUC is restricted on both the false positive rate and true positive rate, which could be more efficient and accurate than partial AUC. The two-way partial AUC was suggested as more efficient and accurate than the partial AUC. Partial VUS as well as the three-way partial VUS were also developed for the ROC surface. A proposed AUC is expressed in this paper with probability and integration using two truncated distribution functions restricted on both the false positive rate and true positive rate. It is also found that this AUC has a relation with the two-way partial AUC. The three-way partial VUS for the ROC surface is also related to the VUS using truncated distribution functions. These AUC and VUS are represented and estimated in terms of Mann-Whitney statistics. Their parametric and non-parametric estimation methods are explored based on normal distributions and random samples.

A Text Mining-based Intrusion Log Recommendation in Digital Forensics (디지털 포렌식에서 텍스트 마이닝 기반 침입 흔적 로그 추천)

  • Ko, Sujeong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.6
    • /
    • pp.279-290
    • /
    • 2013
  • In digital forensics log files have been stored as a form of large data for the purpose of tracing users' past behaviors. It is difficult for investigators to manually analysis the large log data without clues. In this paper, we propose a text mining technique for extracting intrusion logs from a large log set to recommend reliable evidences to investigators. In the training stage, the proposed method extracts intrusion association words from a training log set by using Apriori algorithm after preprocessing and the probability of intrusion for association words are computed by combining support and confidence. Robinson's method of computing confidences for filtering spam mails is applied to extracting intrusion logs in the proposed method. As the results, the association word knowledge base is constructed by including the weights of the probability of intrusion for association words to improve the accuracy. In the test stage, the probability of intrusion logs and the probability of normal logs in a test log set are computed by Fisher's inverse chi-square classification algorithm based on the association word knowledge base respectively and intrusion logs are extracted from combining the results. Then, the intrusion logs are recommended to investigators. The proposed method uses a training method of clearly analyzing the meaning of data from an unstructured large log data. As the results, it complements the problem of reduction in accuracy caused by data ambiguity. In addition, the proposed method recommends intrusion logs by using Fisher's inverse chi-square classification algorithm. So, it reduces the rate of false positive(FP) and decreases in laborious effort to extract evidences manually.