• Title/Summary/Keyword: Gumbel Distribution Model

Search Result 60, Processing Time 0.021 seconds

Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model (굼벨 분포 모델을 이용한 표절 프로그램 자동 탐색 및 추적)

  • Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
    • The KIPS Transactions:PartA
    • /
    • v.16A no.6
    • /
    • pp.453-462
    • /
    • 2009
  • Studies on software plagiarism detection, prevention and judgement have become widespread due to the growing of interest and importance for the protection and authentication of software intellectual property. Many previous studies focused on comparing all pairs of submitted codes by using attribute counting, token pattern, program parse tree, and similarity measuring algorithm. It is important to provide a clear-cut model for distinguishing plagiarism and collaboration. This paper proposes a source code clustering algorithm using a probability model on extreme value distribution. First, we propose an asymmetric distance measure pdist($P_a$, $P_b$) to measure the similarity of $P_a$ and $P_b$ Then, we construct the Plagiarism Direction Graph (PDG) for a given program set using pdist($P_a$, $P_b$) as edge weights. And, we transform the PDG into a Gumbel Distance Graph (GDG) model, since we found that the pdist($P_a$, $P_b$) score distribution is similar to a well-known Gumbel distribution. Second, we newly define pseudo-plagiarism which is a sort of virtual plagiarism forced by a very strong functional requirement in the specification. We conducted experiments with 18 groups of programs (more than 700 source codes) collected from the ICPC (International Collegiate Programming Contest) and KOI (Korean Olympiad for Informatics) programming contests. The experiments showed that most plagiarized codes could be detected with high sensitivity and that our algorithm successfully separated real plagiarism from pseudo plagiarism.

Comparison Study on the Various Forms of Scale Parameter for the Nonstationary Gumbel Model (다양한 규모매개변수를 이용한 비정상성 Gumbel 모형의 비교 연구)

  • Jang, Hanjin;Kim, Sooyoung;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.48 no.5
    • /
    • pp.331-343
    • /
    • 2015
  • Most nonstationary frequency models are defined as the probability models containing the time-dependent parameters. For frequency analysis of annual maximum rainfall data, the Gumbel distribution is generally recommended in Korea. For the nonstationary Gumbel models, the time-dependent location and scale parameters are defined as linear and exponential relationship, respectively. The exponentially time-varying scale parameter of nonstationary Gumbel model is generally used because the scale parameter should be positive. However, the exponential form of scale parameter occasionally provides overestimated quantiles. In this study, various forms of time-varying scale parameters such as exponential, linear, and logarithmic forms were proposed and compared. The parameters were estimated based on the method of maximum likelihood. To compare the accuracy of each scale parameter, Monte Carlo simulation was performed for various conditions. Additionally, nonstationary frequency analysis was conducted for the sites which have more than 30 years data with a trend in rainfall data. As a result, nonstationary Gumbel model with exponentially time-varying scale parameter generally has the smallest root mean square error comparing with another forms.

Evaluation of Flood Severity Using Bivariate Gumbel Mixed Model (이변량 Gumbel 혼합모형을 이용한 홍수심도 평가)

  • Lee, Jeong-Ho;Chung, Gun-Hui;Kim, Tae-Woong
    • Journal of Korea Water Resources Association
    • /
    • v.42 no.9
    • /
    • pp.725-736
    • /
    • 2009
  • A flood event can be defined by three characteristics; peak discharge, total flood volume, and flood duration, which are correlated each other. However, a conventional flood frequency analysis for the hydrological plan, design, and operation has focused on evaluating only the amount of peak discharge. The interpretation of this univariate flood frequency analysis has a limitation in describing the complex probability behavior of flood events. This study proposed a bivariate flood frequency analysis using a Gumbel mixed model for the flood evaluation. A time series of annual flood events was extracted from observations of inflow to the Soyang River Dam and the Daechung Dam, respectively. The joint probability distribution and return period were derived from the relationship between the amount of peak discharge and the total volume of flood runoff. The applicability of the Gumbel mixed model was tested by comparing the return periods acquired from the proposed bivariate analysis and the conventional univariate analysis.

Bivariate Frequency Analysis of Rainfall using Copula Model (Copula 모형을 이용한 이변량 강우빈도해석)

  • Joo, Kyung-Won;Shin, Ju-Young;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.45 no.8
    • /
    • pp.827-837
    • /
    • 2012
  • The estimation of the rainfall quantile is of great importance in designing hydrologic structures. Conventionally, the rainfall quantile is estimated by univariate frequency analysis with an appropriate probability distribution. There is a limitation in which duration of rainfall is restrictive. To overcome this limitation, bivariate frequency analysis by using 3 copula models is performed in this study. Annual maximum rainfall events in 5 stations are used for frequency analysis and rainfall depth and duration are used as random variables. Gumbel (GUM), generalized logistic (GLO) distributions are applied for rainfall depth and generalized extreme value (GEV), GUM, GLO distributions are applied for rainfall duration. Copula models used in this study are Frank, Joe, and Gumbel-Hougaard models. Maximum pseudo-likelihood estimation method is used to estimate the parameter of copula, and the method of probability weighted moments is used to estimate the parameters of marginal distributions. Rainfall quantile from this procedure is compared with various marginal distributions and copula models. As a result, in change of marginal distribution, distribution of duration does not significantly affect on rainfall quantile. There are slight differences depending on the distribution of rainfall depth. In the case which the marginal distribution of rainfall depth is GUM, there is more significantly increasing along the return period than GLO. Comparing with rainfall quantiles from each copula model, Joe and Gumbel-Hougaard models show similar trend while Frank model shows rapidly increasing trend with increment of return period.

A Study on Uncertainty of Risk of Failure Based on Gumbel Distribution (Gumbel 분포형을 이용한 위험도에 관한 불확실성 해석)

  • Heo Jun-Haeng;Lee Dong-Jin;Shin Hong-Joon;Nam Woo-Sung
    • Journal of Korea Water Resources Association
    • /
    • v.39 no.8 s.169
    • /
    • pp.659-668
    • /
    • 2006
  • The uncertainty of the risk of failure of hydraulic structures can be determined by estimating the variance of the risk of failure based on the methods of moments, probability weighted moments, and maximum likelihood assuming that the underlying model is the Gumbel distribution. In this paper, the variance of the risk of failure was derived. Monte Carlo simulation was peformed to verify the characteristics of the derived formulas for various sample size, design life, nonexceedance probability, and variation coefficient. As the results, PWM showed the smallest relative bias and root mean square error than the others while ML showed the smallest ones for relatively large sample siBes regardless of design life and nonexceedance probability. Also, it was found that variation coefficient does not effect on the relative bias and relative root mean square error.

A Study of Infinite Failure NHPP Software Reliability Growth Model base on Record Value Statistics with Gamma Family of Lifetime Distribution (수명분포가 감마족인 기록값 통계량에 기초한 무한고장 NHPP 소프트웨어 신뢰성장 모형에 관한 비교 연구)

  • Kim, Hee-Cheul;Sin, Hyun-Cheul
    • Convergence Security Journal
    • /
    • v.6 no.3
    • /
    • pp.145-153
    • /
    • 2006
  • Infinite failure NHPP models for a record value satisfies mode proposed in the literature exhibit either monotonic increasing or monotonic decreasing failure occurrence rates per fault. In this paper, propose comparative study of software reliability model using Erlang distribution, Rayleigh and Gumbel distribution. Equations to estimate the parameters using maximum likelihood estimation of infinite failure NHPP model based on failure data collected in the form of interfailure times are developed. For the sake of proposing distribution, we used to the special pattern. Analysis of failure data set using arithmetic and Laplace trend tests, goodness-of-fit test, bias tests is presented.

  • PDF

Estimation on Chemical Water Quality Suitability Index for 4 Species of the Mayfly Genus Ephemera (Ephemeroptera: Ephemeridae) Using Probability Distribution Models (확률분포모형을 이용한 하루살이속(Ephemera) 4종에 대한 화학적 수질 적합도지수 평가)

  • Bongjun Jung;Dongsoo Kong
    • Journal of Korean Society on Water Environment
    • /
    • v.39 no.6
    • /
    • pp.475-490
    • /
    • 2023
  • Chemical water quality suitability for species (Ephemera strigata, Ephemera separigata, and Ephemera orientalis-sachalinensis group) of the mayfly genus Ephemera (Order Ephemeroptera) was analyzed with probability distribution models (Exponential, Normal, Lognormal, Logistic, Weibull, Gamma, Beta, Gumbel). Data was collected from 23,957 sampling units of 6,664 sites in Korea from 2010 to 2021. E. orientalis-sachalinensis occurred at the range of BOD5 0.3~11.1 mg/L (the best-fit Lognormal model); T-P 0.007~0.769 mg/L (the Gumbel model); TSS 0.4~142.2 mg/L (the Lognormal model). E. strigata occurred at the range of BOD5 0.4~7.4 mg/L (the Gumbel model); T-P 0.007~0.254 mg/L (the Lognormal model); TSS 0.4~17.1 mg/L (the Lognormal model). E. separigata occurred at the range of BOD5 0.4~2.6 mg/L (the R-Weibull model); T-P 0.007~0.134 mg/L (the Lognormal model); TSS 0.7~10.0 mg/L (the Lognormal model). Habitat suitability range of E. orientalis-sachalinensis was estimated to be 0.4~1.9 mg/L (BOD5), 0.024~0.086 mg/L (T-P), 2.5~22.4 mg/L (TSS); that of E. strigata was 0.4~0.7 mg/L (BOD5), 0.007~0.018 mg/L (T-P), 0.0~1.7 mg/L (TSS); that of E. separigata was 0.0~0.4 mg/L (BOD5), 0.000~0.015 mg/L (T-P), 0.5~3.1 mg/L (TSS). In a relative comparision, E. orientalis-sachalinensis was estimated to be eurysaprobic, and narrowly adapted in high levels of T-P and TSS, E. strigata was estimated to be oligosaprobic and adapted in low levels of T-P and TSS, and E. separigata was estimated to be stenooligosaprobic and widely adapted in low level of T-P and TSS.

Concept of Trend Analysis of Hydrologic Extreme Variables and Nonstationary Frequency Analysis (극치수문자료의 경향성 분석 개념 및 비정상성 빈도해석)

  • Lee, Jeong-Ju;Kwon, Hyun-Han;Kim, Tae-Woong
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.30 no.4B
    • /
    • pp.389-397
    • /
    • 2010
  • This study introduced a Bayesian based frequency analysis in which the statistical trend analysis for hydrologic extreme series is incorporated. The proposed model employed Gumbel extreme distribution to characterize extreme events and a fully coupled bayesian frequency model was finally utilized to estimate design rainfalls in Seoul. Posterior distributions of the model parameters in both Gumbel distribution and trend analysis were updated through Markov Chain Monte Carlo Simulation mainly utilizing Gibbs sampler. This study proposed a way to make use of nonstationary frequency model for dynamic risk analysis, and showed an increase of hydrologic risk with time varying probability density functions. The proposed study showed advantage in assessing statistical significance of parameters associated with trend analysis through statistical inference utilizing derived posterior distributions.

The Comparative Study for Software Reliability Models Based on NHPP (NHPP에 기초한 소프트웨어 신뢰도 모형에 대한 비교연구)

  • Gan, Gwang-Hyeon;Kim, Hui-Cheol;Lee, Byeong-Su
    • The KIPS Transactions:PartD
    • /
    • v.8D no.4
    • /
    • pp.393-400
    • /
    • 2001
  • This paper presents a stochastic model for the software failure phenomenon based on a nonhomogeneous Poisson process (NHPP). The failure process is analyzed to develop a suitable mean value function for the NHPP ; expressions are given for several performance measure. Actual software failure data are compared with generalized model by Goel dependent on the constant reflecting the quality of testing. The performance measures and parametric inferences of the new models, Rayleigh and Gumbel distributions, are discussed. The results of the new models are applied to real software failure data and compared with Goel-Okumoto and Yamada, Ohba and Osaki models. Tools of parameter inference was used method of the maximun likelihood estimate and the bisection algorithm for the computing nonlinear root. In this paper, using the sum of the squared errors, model selection was employed. The numerical example by NTDS data was illustrated.

  • PDF

Development of Vehicular Load Model using Heavy Truck Weight Distribution (II) - Multiple Truck Effects and Model Development (중차량중량분포를 이용한 차량하중모형 개발(II) - 연행차량 효과 분석 및 모형 개발)

  • Hwang, Eui-Seung
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.29 no.3A
    • /
    • pp.199-207
    • /
    • 2009
  • In this paper, new vehicular load model is developed for reliability-based bridge design code. Rational load model and statistical properties of loads are important for developing reliability-based design code. In the previous paper, truck weight data collected at eight locations using WIM or BWIM system are analyzed to calculate the maximum truck weights for specified bridge lifetime. Probability distributions of upper 20% total truck weight are assumed as Extreme Type I (Gumbel Distribution) and 100 years maximum weights are estimated by linear regression. In this study, effects of multiple presence of trucks are analyzed. Probability of multiple presence of trucks are estimated and corresponding multiple truck weights are calculated using the same probability distribution function as in the previous paper. New vehicular live load model are proposed for span length from 10 m to 200 m. New model is compared with current Korean model and various load models of other countries.