• Title/Summary/Keyword: Statistics topic

Search Result 133, Processing Time 0.023 seconds

Reducing Bias of the Minimum Hellinger Distance Estimator of a Location Parameter

  • Pak, Ro-Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.1
    • /
    • pp.213-220
    • /
    • 2006
  • Since Beran (1977) developed the minimum Hellinger distance estimation, this method has been a popular topic in the field of robust estimation. In the process of defining a distance, a kernel density estimator has been widely used as a density estimator. In this article, however, we show that a combination of a kernel density estimator and an empirical density could result a smaller bias of the minimum Hellinger distance estimator than using just a kernel density estimator for a location parameter.

  • PDF

On Alternative Collinearity Diagnostics in Linear MEM

  • Moon, Myung-Sang
    • Communications for Statistical Applications and Methods
    • /
    • v.3 no.2
    • /
    • pp.21-28
    • /
    • 1996
  • Collinearities contained in MEM cause the same problems as they do in traditional regression model, so the detection of collinearities is a crucial topic in MEM. One diagnostic was introduced by Carrillo-Gamboa and Gunst, but their method did not work in some cases. Two alternative collinearity diagnostics that provide reasonable measure of collinearities are proposed. Simulation study is performed to compare the small-sample properties of the proposed collinearity diagnostics.

  • PDF

Exploring Regional Decline Risk Areas and Factors Using Topic Modeling and Cluster Analysis (토픽모델링과 군집분석을 통한 지방 소멸 위험지역과 요인의 탐색)

  • Ji-Min Kim;Heeryon Cho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.349-350
    • /
    • 2023
  • 우리나라는 지속적인 저출산과 고령화로 인해 지방 소멸 위험지역이 점차 늘어나고 있다. 본 연구는 지방 소멸과 관련된 다양한 요인을 '인구 소멸'이라는 키워드를 포함하는 신문 기사에 대한 토픽모델링을 통해 발견하고, 추출된 토픽과 관련된 공공 데이터를 수집하여 비슷한 특징을 가지는 지역을 묶는 군집분석을 수행한다. 그리고 지방소멸위험지수로 분류된 소멸 위험지역과 군집분석 결과를 비교한다.

Distribution of a Sum of Weighted Noncentral Chi-Square Variables

  • Heo, Sun-Yeong;Chang, Duk-Joon
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.2
    • /
    • pp.429-440
    • /
    • 2006
  • In statistical computing, it is often for researchers to need the distribution of a weighted sum of noncentral chi-square variables. In this case, it is very limited to know its exact distribution. There are many works to contribute to this topic, e.g. Imhof (1961) and Solomon-Stephens (1977). Imhof's method gives good approximation to the true distribution, but it is not easy to apply even though we consider the development of computer technology Solomon-Stephens's three moment chi-square approximation is relatively easy and accurate to apply. However, they skipped many details, and their simulation is limited to a weighed sum of central chi-square random variables. This paper gives details on Solomon-Stephens's method. We also extend their simulation to the weighted sum of non-central chi-square distribution. We evaluated approximated powers for homogeneous test and compared them with the true powers. Solomon-Stephens's method shows very good approximation for the case.

Bayesian Inference of the Stochastic Gompertz Growth Model for Tumor Growth

  • Paek, Jayeong;Choi, Ilsu
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.6
    • /
    • pp.521-528
    • /
    • 2014
  • A stochastic Gompertz diffusion model for tumor growth is a topic of active interest as cancer is a leading cause of death in Korea. The direct maximum likelihood estimation of stochastic differential equations would be possible based on the continuous path likelihood on condition that a continuous sample path of the process is recorded over the interval. This likelihood is useful in providing a basis for the so-called continuous record or infill likelihood function and infill asymptotic. In practice, we do not have fully continuous data except a few special cases. As a result, the exact ML method is not applicable. In this paper we proposed a method of parameter estimation of stochastic Gompertz differential equation via Markov chain Monte Carlo methods that is applicable for several data structures. We compared a Markov transition data structure with a data structure that have an initial point.

Evaluation of Reliability Using RMD and ${\chi}^2$ Contingency Tests Using Correspondence Analysis in Survey Study (실증 연구에서 RMD에 의한 신뢰도와 대응 분석에 의한 ${\chi}^2$ 분할표 검정의 평가)

  • Choe, Seong-Un
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2012.04a
    • /
    • pp.293-300
    • /
    • 2012
  • Reliability measures of questionnaire and ${\chi}^2$ contingency tests of categorized responses are most practical tools to analyze the characteristics of subjects of survey study. This research evaluates the Cronbaha's reliability measures by using Repeated Measure Design (RMD) with illustrated MINITAB examples. In addition, ${\chi}^2$ statistics of each cell of categorized tables can be effectively interpreted with the symmetric plot of correspondence analysis. The practical example is also discussed to provide comprehensive understanding of topic.

  • PDF

Reconstruction of Categories on the National Petition Site Using K-Means clustering and Topic Modeling (K-means 클러스터링과 토픽 모델링을 기반으로 한 국민청원 사이트의 카테고리 재구성)

  • Woo, Yun Hui;Kim, Hyon Hee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.302-305
    • /
    • 2019
  • 국민 청원 사이트가 뛰어난 접근성과 신속성으로 인하여 국민들로부터 많은 관심을 받고 있다. 현재 국민청원 사이트의 카테고리 분류는 '미래', '성장동력' 등을 포함한 16개의 카테고리 및 기타로 구성되어 있으나 그 기준이 모호하여 많은 청원글들이 기타 카테고리로 분류되고 있는 상황이다. 이는 청원글의 내용을 명확히 반영하지 않고 미리 정의된 카테고리 구조를 사용하고 있는데서 기인한다고 할 수 있다. 본 논문에서는 보다 구체적으로 정의된 카테고리를 정의하고자 추천 순으로 1,500개의 청원글을 수집하였고, 수집된 청원글의 내용을 바탕으로 카테고리 구조를 추출하였다. 먼저, k-평균 알고리즘을 적용하여 청원글을 군집하여 대분류를 정의하였고, 보다 구체적인 세부 분류를 정의하기 위하여 토픽모델링을 실시하였다. 본 논문에서 제시하는 계층적 카테고리 구조는 청원글의 내용을 바탕으로 대분류와 세부분류로 구성된 것이므로 새로운 청원글을 등록하거나 분류하는 데 적절한 것으로 보인다.

Finding optimal portfolio based on genetic algorithm with generalized Pareto distribution (GPD 기반의 유전자 알고리즘을 이용한 포트폴리오 최적화)

  • Kim, Hyundon;Kim, Hyun Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1479-1494
    • /
    • 2015
  • Since the Markowitz's mean-variance framework for portfolio analysis, the topic of portfolio optimization has been an important topic in finance. Traditional approaches focus on maximizing the expected return of the portfolio while minimizing its variance, assuming that risky asset returns are normally distributed. The normality assumption however has widely been criticized as actual stock price distributions exhibit much heavier tails as well as asymmetry. To this extent, in this paper we employ the genetic algorithm to find the optimal portfolio under the Value-at-Risk (VaR) constraint, where the tail of risky assets are modeled with the generalized Pareto distribution (GPD), the standard distribution for exceedances in extreme value theory. An empirical study using Korean stock prices shows that the performance of the proposed method is efficient and better than alternative methods.

Trend Analysis of the Agricultural Industry Based on Text Analytics

  • Choi, Solsaem;Kim, Junhwan;Nam, Seungju
    • Agribusiness and Information Management
    • /
    • v.11 no.1
    • /
    • pp.1-9
    • /
    • 2019
  • This research intends to propose the methodology for analyzing the current trends of agriculture, which directly connects to the survival of the nation, and through this methodology, identify the agricultural trend of Korea. Based on the relationship between three types of data - policy reports, academic articles, and news articles - the research deducts the major issues stored by each data through LDA, the representative topic modeling method. By comparing and analyzing the LDA results deducted from each data source, this study intends to identify the implications regarding the current agricultural trends of Korea. This methodology can be utilized in analyzing industrial trends other than agricultural ones. To go on further, it can also be used as a basic resource for contemplation on potential areas in the future through insight on the current situation. database of the profitability of a total of 180 crop types by analyzing Rural Development Administration's survey of agricultural products income of 115 crop types, small land profitability index survey of 53 crop types, and Statistics Korea's survey of production costs of 12 crop types. Furthermore, this research presents the result and developmental process of a web-based crop introduction decision support system that provides overseas cases of new crop introduction support programs, as well as databases of outstanding business success cases of each crop type researched by agricultural institutions.

Spatial analysis based on topic modeling using foreign tourist review data: Case of Daegu (외국인 관광객 리뷰데이터를 활용한 토픽모델링 기반의 공간분석: 대구광역시를 사례로)

  • Jung, Ji-Woo;Kim, Seo-Yun;Kim, Hyeon-Yu;Yoon, Ju-Hyeok;Jang, Won-Jun;Kim, Keun-Wook
    • Journal of Digital Convergence
    • /
    • v.19 no.8
    • /
    • pp.33-42
    • /
    • 2021
  • As smartphone-based tourism platforms have become active, policy establishment and service enhancement using review data are being made in various fields. In the case of the preceding studies using tourism review data, most of the studies centered on domestic tourists were conducted, and in the case of foreign tourist studies, studies were conducted only on data collected in some languages and text mining techniques. In this study, 3,515 review data written by foreigners were collected by designating the "Daegu attractions" keyword through the online review site. And LDA-based topic modeling was performed to derive tourism topics. The spatial approach through global and local spatial autocorrelation analysis for each topic can be said to be different from previous studies. As a result of the analysis, it was confirmed that there is a global spatial autocorrelation, and that tourist destinations mainly visited by foreigners are concentrated locally. In addition, hot spots have been drawn around Jung-gu in most of the topics. Based on the analysis results, it is expected to be used as a basic research for spatial analysis based on local government foreign tourism policy establishment and topic modeling. And The limitations of this study were also presented.