• Title/Summary/Keyword: distribution valued data

Search Result 26, Processing Time 0.025 seconds

Symbolic Cluster Analysis for Distribution Valued Dissimilarity

  • Matsui, Yusuke;Minami, Hiroyuki;Misuta, Masahiro
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.3
    • /
    • pp.225-234
    • /
    • 2014
  • We propose a novel hierarchical clustering for distribution valued dissimilarities. Analysis of large and complex data has attracted significant interest. Symbolic Data Analysis (SDA) was proposed by Diday in 1980's, which provides a new framework for statistical analysis. In SDA, we analyze an object with internal variation, including an interval, a histogram and a distribution, called a symbolic object. In the study, we focus on a cluster analysis for distribution valued dissimilarities, one of the symbolic objects. A hierarchical clustering has two steps in general: find out step and update step. In the find out step, we find the nearest pair of clusters. We extend it for distribution valued dissimilarities, introducing a measure on their order relations. In the update step, dissimilarities between clusters are redefined by mixture of distributions with a mixing ratio. We show an actual example of the proposed method and a simulation study.

Exploratory Methods for Joint Distribution Valued Data and Their Application

  • Igarashi, Kazuto;Minami, Hiroyuki;Mizuta, Masahiro
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.3
    • /
    • pp.265-276
    • /
    • 2015
  • In this paper, we propose hierarchical cluster analysis and multidimensional scaling for joint distribution valued data. Information technology is increasing the necessity of statistical methods for large and complex data. Symbolic Data Analysis (SDA) is an attractive framework for the data. In SDA, target objects are typically represented by aggregated data. Most methods on SDA deal with objects represented as intervals and histograms. However, those methods cannot consider information among variables including correlation. In addition, objects represented as a joint distribution can contain information among variables. Therefore, we focus on methods for joint distribution valued data. We expanded the two well-known exploratory methods using the dissimilarities adopted Hall Type relative projection index among joint distribution valued data. We show a simulation study and an actual example of proposed methods.

On principal component analysis for interval-valued data (구간형 자료의 주성분 분석에 관한 연구)

  • Choi, Soojin;Kang, Kee-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.61-74
    • /
    • 2020
  • Interval-valued data, one type of symbolic data, are observed in the form of intervals rather than single values. Each interval-valued observation has an internal variation. Principal component analysis reduces the dimension of data by maximizing the variance of data. Therefore, the principal component analysis of the interval-valued data should account for the variance between observations as well as the variation within the observed intervals. In this paper, three principal component analysis methods for interval-valued data are summarized. In addition, a new method using a truncated normal distribution has been proposed instead of a uniform distribution in the conventional quantile method, because we believe think there is more information near the center point of the interval. Each method is compared using simulations and the relevant data set from the OECD. In the case of the quantile method, we draw a scatter plot of the principal component, and then identify the position and distribution of the quantiles by the arrow line representation method.

Discretization of Continuous-Valued Attributes considering Data Distribution (데이터 분포를 고려한 연속 값 속성의 이산화)

  • Lee, Sang-Hoon;Park, Jung-Eun;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.4
    • /
    • pp.391-396
    • /
    • 2003
  • This paper proposes a new approach that converts continuous-valued attributes to categorical-valued ones considering the distribution of target attributes(classes). In this approach, It can be possible to get optimal interval boundaries by considering the distribution of data itself without any requirements of parameters. For each attributes, the distribution of target attributes is projected to one-dimensional space. And this space is clustered according to the criteria like as the density value of each target attributes and the amount of overlapped areas among each density values of target attributes. Clusters which are made in this ways are based on the probabilities that can predict a target attribute of instances. Therefore it has an interval boundaries that minimize a loss of information of original data. An improved performance of proposed discretization method can be validated using C4.5 algorithm and UCI Machine Learning Data Repository data sets.

Testing for stochastic order in interval-valued data (구간 자료의 확률적 순서 검정)

  • Choi, Hyejeong;Lim, Johan;Kwak, Minjung;Park, Seongoh
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.6
    • /
    • pp.879-887
    • /
    • 2019
  • We construct a procedure to test the stochastic order of two samples of interval-valued data. We propose a test statistic that belongs to a U-statistic and derive its asymptotic distribution under the null hypothesis. We compare the performance of the newly proposed method with the existing one-sided bivariate Kolmogorov-Smirnov test using real data and simulated data.

Integer-Valued HAR(p) model with Poisson distribution for forecasting IPO volumes

  • SeongMin Yu;Eunju Hwang
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.273-289
    • /
    • 2023
  • In this paper, we develop a new time series model for predicting IPO (initial public offering) data with non-negative integer value. The proposed model is based on integer-valued autoregressive (INAR) model with a Poisson thinning operator. Just as the heterogeneous autoregressive (HAR) model with daily, weekly and monthly averages in a form of cascade, the integer-valued heterogeneous autoregressive (INHAR) model is considered to reflect efficiently the long memory. The parameters of the INHAR model are estimated using the conditional least squares estimate and Yule-Walker estimate. Through simulations, bias and standard error are calculated to compare the performance of the estimates. Effects of model fitting to the Korea's IPO are evaluated using performance measures such as mean square error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE) etc. The results show that INHAR model provides better performance than traditional INAR model. The empirical analysis of the Korea's IPO indicates that our proposed model is efficient in forecasting monthly IPO volumes.

Assessing the Coronavirus Impact on the Asean Countries' Top 10 Most Valuable Brands

  • ZAHARI, Abdul Rahman;ESA, Elinda;AZIZAN, Noor Azlinna
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.9 no.5
    • /
    • pp.251-260
    • /
    • 2022
  • The goal of this study is to see if the Coronavirus affects the Top 10 most valuable brands in various ASEAN countries (Malaysia, Singapore, Indonesia, and Vietnam) and industry types differently. The data for this study was collected using a secondary data method (content analysis). Based on their annual reports from 2019 to 2021, the researchers examined the brand equity of the Top 10 most valued brands in each of the four ASEAN countries. IBM Statistical Package for Social Science (SPSS) Statistics for Windows was used to examine the data. Frequency, an independent T-test, and one-way analysis of variance tests were also applied to the data. The findings revealed considerable disparities between the Top 10 most valued ASEAN country brands in 2019-2020 and 2019-2021 due to the impact of the Coronavirus. Due to the influence of the Coronavirus, the data revealed no significant differences between industry categories. Future studies could look into the disparities between the most valuable brands and the influence of the Coronavirus over a longer period of time and include a larger number of firms and countries. Brand managers in ASEAN countries' Top 10 most valuable companies must carefully manage their brands to preserve brand life and reduce the impact of future global pandemics.

Discretization of continuous-valued attributes considering data distribution (데이터 분포를 고려한 연속 값 속성의 이산화)

  • 이상훈;박정은;오경환
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.217-220
    • /
    • 2003
  • 본 논문에서는 특정 매개변수의 입력 없이 속성(attribute)에 따른 목적속성(class)값의 분포를 고려하여 연속형(conti-nuous) 값을 범주형(categorical)의 형태로 변환시키는 새로운 방법을 제안하였다. 각각의 속성에 대해 목적속성의 분포를 1차원 공간에 사상(mapping)하고, 각 목적속성의 밀도, 다른 목적속성과의 중복 정도 등의 기준에 따라 구간을 군집화 한다. 이렇게 생성된 군집들은 각각 목적속성을 예측할 수 있는 확률적 수치에 기반한 것으로, 각 속성이 제공하는 정보의 손실을 최소화하는 이산화 경계선을 갖고 있다. 제안된 데이터 이산화 방법의 향상된 성능은 C4.5 알고리즘과 UCI Machine Learning Data Repository 데이터를 사용하여 확인할 수 있다.

  • PDF

Probabilistic estimates of corrosion rate of fuel tank structures of aging bulk carriers

  • Ivosevic, Spiro;Mestrovic, Romeo;Kovac, Natasa
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • v.11 no.1
    • /
    • pp.165-177
    • /
    • 2019
  • This paper considers corrosion wastage of two ship hull structure members as a part of investigated fuel oil tanks of 25 aging bulk carriers. Taking into account that many factors which influence corrosion wastage of ship hull structures are of uncertain nature, the related corrosion rate ($c_1$) is considered here as a real-valued continuous distribution, assuming that the corrosion wastage starts after 5, 6 or 7 years. In all considered cases, by using available data and applying three basic statistical tests, it is established that between two-parameter continuous distributions, normal, Weibull and logistic distributions are best fitted distributions for the mentioned corrosion rate ($c_1$). Note that the presented statistical, numerical and graphical results concerning two mentioned ship hull structure members allow to compare and discuss the corresponding probabilistic estimates for the corrosion rate ($c_1$).

The Impact of Traditional Market Properties and Relationship Quality on Customer Value : Approach from the viewpoint of the Means-end Chain Theory

  • Cho, Hee-Young;Han, Sang-Ho;Yang, Hoe-Chang
    • Journal of Distribution Science
    • /
    • v.12 no.1
    • /
    • pp.13-19
    • /
    • 2014
  • Purpose - This study investigated relationship quality and/or loyalty, from the viewpoint that merchants and consumers could develop the traditional market. It reorganized variables to find the conditions of values that could stimulate consumers' motives to revive the traditional market. Research Design, data, and methodology - This study employed 202 copies of effective questionnaires, based on the data of Yang & Ju (2012), to conduct correlation, regression, and structured equation modeling (SEM). Results - The results emphasized product and store atmosphere as store selection attributes to consider in the minimum error correction (MEC) model; service factor was not significant. Further, consumers valued relationship quality in the test of mediated effects of the sub-factors of store selection attributes, including consumers' social and emotional value. The relationship quality significantly influenced consumers' value in traditional markets that needed to improve and develop using several variables. Conclusions - This study revealed connections between attributes, consequences, and values using the causal relation model, to generate an optimal model based on a practical and theoretical background and proposed ways to obtain consumer-related information easily.