• Title/Summary/Keyword: Conditional test

Search Result 192, Processing Time 0.03 seconds

Named Entity Recognition for Patent Documents Based on Conditional Random Fields (조건부 랜덤 필드를 이용한 특허 문서의 개체명 인식)

  • Lee, Tae Seok;Shin, Su Mi;Kang, Seung Shik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.9
    • /
    • pp.419-424
    • /
    • 2016
  • Named entity recognition is required to improve the retrieval accuracy of patent documents or similar patents in the claims and patent descriptions. In this paper, we proposed an automatic named entity recognition for patents by using a conditional random field that is one of the best methods in machine learning research. Named entity recognition system has been constructed from the training set of tagged corpus with 660,000 words and 70,000 words are used as a test set for evaluation. The experiment shows that the accuracy is 93.6% and the Kappa coefficient is 0.67 between manual tagging and automatic tagging system. This figure is better than the Kappa coefficient 0.6 for manually tagged results and it shows that automatic named entity tagging system can be used as a practical tagging for patent documents in replacement of a manual tagging.

Analysis of Violent Crime Count Data Based on Bivariate Conditional Auto-Regressive Model (이변량 조건부자기회귀모형을이용한강력범죄자료분석)

  • Choi, Jung-Soon;Park, Man-Sik;Won, Yu-Bok;Kim, Hag-Yeol;Heo, Tae-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.3
    • /
    • pp.413-421
    • /
    • 2010
  • In this study, we considered bivariate conditional auto-regressive model taking into account spatial association as well as correlation between the two dependent variables, which are the counts of murder and burglary. We conducted likelihood ratio test for checking over-dispersion issues prior to applying spatial poisson models. For the real application, we used the annual counts of violent crimes at 25 districts of Seoul in 2007. The statistical results are visually illustrated by geographical information system.

Ambiguity Resolution in Chinese Word Segmentation

  • Maosong, Sun;T'sou, Benjamin-K.
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 1995.02a
    • /
    • pp.121-126
    • /
    • 1995
  • A new method for Chinese word segmentation named Conditional F'||'&'||'BMM (Forward and Backward Maximal Matching) which incorporates both bigram statistics (ie., mutual infonllation and difference of t-test between Chinese characters) and linguistic rules for ambiguity resolution is proposed in this paper The key characteristics of this model are the use of: (i) statistics which can be automatically derived from any raw corpus, (ii) a rule base for disambiguation with consistency and controlled size to be built up in a systematic way.

  • PDF

A Sequence of Models for Categorical Data with Compound Scales (복합척도의 범주형 자료에 대한 연속 모형)

  • 최재성
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.103-110
    • /
    • 2001
  • This paper considers a multistage experiment. Response scales can be same or different from stage to stage. When variables are of nested structure, the response variable at each stage can be defined conditionally. For analysing such data with compound scales, this paper suggests a sequnce of dependence models and shows how to set up a sequence of models for the driver's liscense test data.

  • PDF

Hybrid Group-Sequential Conditional-Bayes Approaches to the Double Sampling Plans

  • Seong-gon Ko
    • Communications for Statistical Applications and Methods
    • /
    • v.5 no.1
    • /
    • pp.107-120
    • /
    • 1998
  • This research aims here to develop a certain extended double sampling plan, EDS, which is an extension of ordinary double sampling plan in the sense that the second-stage sampling effort and second-stage critical value are allowed to depend on the point at which the first-stage continuation region is traversed. For purpose of comparison, single sampling plan, optimal ordinary double sampling plan(ODS) and sequential probability ratio test are considered with the same overall error rates, respectively. It is observed that the EDS idea allows less sampling effort than the optimal ODS.

  • PDF

Tests For and Against a Positive Dependence Restriction in Two-Way Ordered Contingency Tables

  • Oh, Myongsik
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.2
    • /
    • pp.205-220
    • /
    • 1998
  • Dependence concepts for ordered two-way contingency tables have been of considerable interest. We consider a dependence concept which is less restrictive than likelihood ratio dependence and more restrictive than regression dependence. Maximum likelihood estimation of cell probability under this dependence restriction is studied. The likelihood ratio statistics for and against this dependence are proposed and their large sample distributions are derived. A real data is analyzed to illustrate the estimation and testing procedures.

  • PDF

On The Derivation of a Certain Noncentral t Distribution

  • Gupta, A.K.;Kabe, D.G.
    • Journal of the Korean Statistical Society
    • /
    • v.19 no.2
    • /
    • pp.182-185
    • /
    • 1990
  • Let a p-component vector y have a p-variate normal distribution $N(b\theta, \Sigma), \Sigma$ unknown, b specified, then for testing $\theta = 0$ against general $\theta$, Khatri and Rao (1987) derive a certain t test and obtain its power function. This paper presents a direct derivation of this power function in terms of the original variates unlike Khatri and Rao (1987) who resort to the canonical transformations of the original variates and the conditional distributions.

  • PDF

Korean Homograph Tagging Model based on Sub-Word Conditional Probability (부분어절 조건부확률 기반 동형이의어 태깅 모델)

  • Shin, Joon Choul;Ock, Cheol Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.10
    • /
    • pp.407-420
    • /
    • 2014
  • In general, the Korean morpheme analysis procedure is divided into two steps. In the first step as an ambiguity generation step, an Eojeol is analyzed into many morpheme sequences as candidates. In the second step, one appropriate candidate is chosen by using contextual information. Hidden Markov Model(HMM) is typically applied in the second step. This paper proposes Sub-word Conditional Probability(SCP) model as an alternate algorithm. SCP uses sub-word information of adjacent eojeol first. If it failed, then SCP use morpheme information restrictively. In the accuracy and speed comparative test, HMM's accuracy is 96.49% and SCP's accuracy is just 0.07% lower. But SCP reduced processing time 53%.

ER-Fuzz : Conditional Code Removed Fuzzing

  • Song, Xiaobin;Wu, Zehui;Cao, Yan;Wei, Qiang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.7
    • /
    • pp.3511-3532
    • /
    • 2019
  • Coverage-guided fuzzing is an efficient solution that has been widely used in software testing. By guiding fuzzers through the coverage information, seeds that generate new paths will be retained to continually increase the coverage. However, we observed that most samples follow the same few high-frequency paths. The seeds that exercise a high-frequency path are saved for the subsequent mutation process until the user terminates the test process, which directly affects the efficiency with which the low-frequency paths are tested. In this paper, we propose a fuzzing solution, ER-Fuzz, that truncates the recording of a high-frequency path to influence coverage. It utilizes a deep learning-based classifier to locate the high and low-frequency path transfer points; then, it instruments at the transfer position to promote the probability low-frequency transfer paths while eliminating subsequent variations of the high-frequency path seeds. We implemented a prototype of ER-Fuzz based on the popular fuzzer AFL and evaluated it on several applications. The experimental results show that ER-Fuzz improves the coverage of the original AFL method to different degrees. In terms of the number of crash discoveries, in the best case, ER-Fuzz found 115% more unique crashes than did AFL. In total, seven new bugs were found and new CVEs were assigned.

Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation

  • Yee, Jaeyong;Park, Taesung;Park, Mira
    • Genomics & Informatics
    • /
    • v.20 no.2
    • /
    • pp.17.1-17.11
    • /
    • 2022
  • Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.