• Title/Summary/Keyword: Pareto 분포

Search Result 85, Processing Time 0.026 seconds

A Study on the Reduction of Common Words to Classify Causes of Marine Accidents (해양사고 원인을 분류하기 위한 공통단어의 축소에 관한 연구)

  • Yim, Jeong-Bin
    • Journal of Navigation and Port Research
    • /
    • v.41 no.3
    • /
    • pp.109-118
    • /
    • 2017
  • The key word (KW) is a set of words to clearly express the important causations of marine accidents; they are determined by a judge in a Korean maritime safety tribunal. The selection of KW currently has two main issues: one is maintaining consistency due to the different subjective opinion of each judge, and the second is the large number of KW currently in use. To overcome the issues, the systematic framework used to construct KW's needs to be optimized with a minimal number of KW's being derived from a set of Common Words (CW). The purpose of this study is to identify a set of CW to develop the systematic KW construction frame. To fulfill the purpose, the word reduction method to find minimum number of CW is proposed using P areto distribution function and Pareto index. A total of 2,642 KW were compiled and 56 baseline CW were identified in the data sets. These CW, along with their frequency of use across all KW, are reported. Through the word reduction experiments, an average reduction rate of 58.5% was obtained. The estimated CW according to the reduction rates was verified using the Pareto chart. Through this analysis, the development of a systematic KW construction frame is expected to be possible.

Finding optimal portfolio based on genetic algorithm with generalized Pareto distribution (GPD 기반의 유전자 알고리즘을 이용한 포트폴리오 최적화)

  • Kim, Hyundon;Kim, Hyun Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1479-1494
    • /
    • 2015
  • Since the Markowitz's mean-variance framework for portfolio analysis, the topic of portfolio optimization has been an important topic in finance. Traditional approaches focus on maximizing the expected return of the portfolio while minimizing its variance, assuming that risky asset returns are normally distributed. The normality assumption however has widely been criticized as actual stock price distributions exhibit much heavier tails as well as asymmetry. To this extent, in this paper we employ the genetic algorithm to find the optimal portfolio under the Value-at-Risk (VaR) constraint, where the tail of risky assets are modeled with the generalized Pareto distribution (GPD), the standard distribution for exceedances in extreme value theory. An empirical study using Korean stock prices shows that the performance of the proposed method is efficient and better than alternative methods.

Threshold Estimation of Generalized Pareto Distribution Based on Akaike Information Criterion for Accurate Reliability Analysis (정확한 신뢰성 해석을 위한 아카이케 정보척도 기반 일반화파레토 분포의 임계점 추정)

  • Kang, Seunghoon;Lim, Woochul;Cho, Su-Gil;Park, Sanghyun;Lee, Minuk;Choi, Jong-Su;Hong, Sup;Lee, Tae Hee
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.39 no.2
    • /
    • pp.163-168
    • /
    • 2015
  • In order to perform estimations with high reliability, it is necessary to deal with the tail part of the cumulative distribution function (CDF) in greater detail compared to an overall CDF. The use of a generalized Pareto distribution (GPD) to model the tail part of a CDF is receiving more research attention with the goal of performing estimations with high reliability. Current studies on GPDs focus on ways to determine the appropriate number of sample points and their parameters. However, even if a proper estimation is made, it can be inaccurate as a result of an incorrect threshold value. Therefore, in this paper, a GPD based on the Akaike information criterion (AIC) is proposed to improve the accuracy of the tail model. The proposed method determines an accurate threshold value using the AIC with the overall samples before estimating the GPD over the threshold. To validate the accuracy of the method, its reliability is compared with that obtained using a general GPD model with an empirical CDF.

The Comparative Study for NHPP of Truncated Pareto Software Reliability Growth Model (절단고정시간에 근거한 파레토 NHPP 소프트웨어 신뢰성장모형에 관한 비교 연구)

  • Kim, Hee-Cheul;Shin, Hyun-Cheul
    • Convergence Security Journal
    • /
    • v.12 no.1
    • /
    • pp.9-16
    • /
    • 2012
  • Due to the large scale application of software systems, software reliability plays an important role in software developments. In this paper, a software reliability growth model (SRGM) is proposed for testing time. The testing time on the right is truncated in this model. The intensity function, mean-value function, reliability of the software, estimation of parameters and the special applications of Pareto NHPP model are discussed. This paper, a numerical example of applying using time between failures and parameter estimation using maximum likelihood estimation method, after the efficiency of the data through trend analysis model selection, depended on difference between predictions and actual values, were efficient using the mean square error and $R_{SQ}$.

Likelihood based inference for the shape parameter of Pareto Distribution

  • Lee, Jae-Un;Lee, Woo-Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1173-1181
    • /
    • 2008
  • In this paper, when the parameter of interest is the shape parameter in Pareto distribution, we develop likelihood based inference for this parameter. Specially, we develop signed log-likelihood ratio statistic and the modified signed log-likelihood ratio statistic for the shape parameter. It is well-known that as sample size grows, the modified signed log-likelihood ratio statistic converges to standard normal distribution faster than the signed log-likelihood ratio statistic. But the computation of the modified signed log-likelihood statistic is hard or even impossible when the sufficient statistics and the ancillary statistics are not clear. In this case, one can consider an approximation to the modified signed log-likelihood statistic. Specially, when the parameter of interest is informationally orthogonal to the nuisance parameters, we propose the approximate modified signed log-likelihood statistic. Through simulation, we investigate the performances of the proposed statistics with the signed log-likelihood statistic.

  • PDF

Estimation for Functions of Two Parameters in the Pareto Distribution (파레토분포(分布)에서 두 모수(母數)의 함수(函數) 추정(推定))

  • Woo, Jung-Soo;Kang, Suk-Bok
    • Journal of the Korean Data and Information Science Society
    • /
    • v.1
    • /
    • pp.67-76
    • /
    • 1990
  • For a two-parameter Pareto distribution, the uniformly minimum variance unbiased estimateors(UMVUE) for the function of the two parameters are expressed in terms of confluent hypergeometric function. The variance of the UMVUE is also expressed in terms of hypergeometric function of several variables. UMVUE's for the ${\gamma}th$ moment about zero and several useful parametric functions, and their variances are obtained as special cases. The estimators of Baxter(1980) and Saksena and Johnson(1984) are special cases of our estimator.

  • PDF

Relationship Between Tweet Frequency and User Velocity on Twitter (트위터에서 트윗 주기와 사용자 속도 사이 관계)

  • Jeon, So-Young;Lee, Al-Chan;Seo, Go-Eun;Shin, Won-Yong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.6
    • /
    • pp.1380-1386
    • /
    • 2015
  • Recently, the importance of users' geographic location information has been highlighted with a rapid increase of online social network services. In this paper, by utilizing geo-tagged tweets that provides high-precision location information of users, we first identify both Twitter users' exact location and the corresponding timestamp when the tweet was sent. Then, we analyze a relationship between the tweet frequency and the average user velocity. Specifically, we introduce a tweet-frequency computing algorithm, and show analysis results by country and by city. As a main result, it is shown that the tweet frequency according to user velocity follows a power-law distribution (i.e., Zipf' distribution or a Pareto distribution). In addition, by performing a comparison between the United States and Japan, one can see that the exponent of the distribution in Japan is smaller than that in the United States.

Estimation of Car Insurance Loss Ratio Using the Peaks over Threshold Method (POT방법론을 이용한 자동차보험 손해율 추정)

  • Kim, S.Y.;Song, J.
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.1
    • /
    • pp.101-114
    • /
    • 2012
  • In car insurance, the loss ratio is the ratio of total losses paid out in claims divided by the total earned premiums. In order to minimize the loss to the insurance company, estimating extreme quantiles of loss ratio distribution is necessary because the loss ratio has essential prot and loss information. Like other types of insurance related datasets, the distribution of the loss ratio has heavy-tailed distribution. The Peaks over Threshold(POT) and the Hill estimator are commonly used to estimate extreme quantiles for heavy-tailed distribution. This article compares and analyzes the performances of various kinds of parameter estimating methods by using a simulation and the real loss ratio of car insurance data. In addition, we estimate extreme quantiles using the Hill estimator. As a result, the simulation and the loss ratio data applications demonstrate that the POT method estimates quantiles more accurately than the Hill estimation method in most cases. Moreover, MLE, Zhang, NLS-2 methods show the best performances among the methods of the GPD parameters estimation.

Frequency Analyses for Extreme Rainfall Data using the Burr XII Distribution (Burr XII 모형을 이용한 우리나라 극한 강우자료 빈도해석)

  • Seo, Jungho;Shin, Ju-Young;Jung, Younghun;Heo, Jun-Haeng
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.335-335
    • /
    • 2018
  • 최근 이상기후현상으로 지구상의 여러 지역에서 극치 수문 사상의 발생 빈도와 강도가 날로 증가하고 있는 추세이다. 이에 대해 수공구조물의 설계를 위한 극치강우사상의 빈도해석에 있어서 적절한 확률분포모형의 적용은 매우 중요하다. 이에 수문통계분야에서는 generalized extreme value(GEV), generalized logistic(GLO), Gumbel(GUM) 모형과 같은 극치 분포를 이용한 수문통계적 특성에 대한 접근이 주로 이루어지고 있다. 하지만 우리나라 강우 사상의 경우 GEV 분포와 GUM 분포가 비교적 적합한 것으로 알려져 있지만 하나의 형상매개변수를 가지고 있어 분포 모형이 표현할 수 있는 통계적 특성에 한계를 가지고 있다. 기존의 GEV나 GUM분포로는 적절히 재현되지 않는 자료들을 분석하기 위해서 두 개의 형상매개변수를 가지는 분포형에 대한 연구가 진행되고 있다. 이에 본 연구에서는 두 개의 형상매개변수를 가지는 Burr XII 분포형의 우리나라 극한 강우자료에 대한 적용성을 평가하였다. Burr XII 분포형은 gamma나 exponential 분포 모형처럼 양의 확률변수만을 가지고, Cauchy나 Pareto 분포 모형처럼 두꺼운 꼬리(heavy-tailed distribution) 형상을 나타내기 때문에 비교적 큰 확률변수가 빈번히 나타나는 극치사상에도 적합한 것으로 알려져 있다. 이를 위해 Burr XII 분포 모형을 이용하여 우리나라 강우자료에 대해 지점빈도해석 및 지역빈도해석을 수행하고 우리나라 강우자료에 비교적 적합하다고 알려진 분포인 GEV, GLO, GUM 분포형을 통해 산정된 결과와 비교하였다.

  • PDF

The Determination of Probability Distributions of Annual, Seasonal and Monthly Precipitation in Korea (우리나라의 연 강수량, 계절 강수량 및 월 강수량의 확률분포형 결정)

  • Kim, Dong-Yeob;Lee, Sang-Ho;Hong, Young-Joo;Lee, Eun-Jai;Im, Sang-Jun
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.12 no.2
    • /
    • pp.83-94
    • /
    • 2010
  • The objective of this study was to determine the best probability distributions of annual, seasonal and monthly precipitation in Korea. Data observed at 32 stations in Korea were analyzed using the L-moment ratio diagram and the average weighted distance (AWD) to identify the best probability distributions of each precipitation. The probability distribution was best represented by 3-parameter Weibull distribution (W3) for the annual precipitation, 3-parameter lognormal distribution (LN3) for spring and autumn seasons, and generalized extreme value distribution (GEV) for summer and winter seasons. The best probability distribution models for monthly precipitation were LN3 for January, W3 for February and July, 2-parameter Weibull distribution (W2) for March, generalized Pareto distribution (GPA) for April, September, October and November, GEV for May and June, and log-Pearson type III (LP3) for August and December. However, from the goodness-of-fit test for the best probability distributions of the best fit, GPA for April, September, October and November, and LN3 for January showed considerably high reject rates due to computational errors in estimation of the probability distribution parameters and relatively higher AWD values. Meanwhile, analyses using data from 55 stations including additional 23 stations indicated insignificant differences to those using original data. Further studies using more long-term data are needed to identify more optimal probability distributions for each precipitation.