통합 검색 | Korea Science

Direct Divergence Approximation between Probability Distributions and Its Applications in Machine Learning

Sugiyama, Masashi;Liu, Song;du Plessis, Marthinus Christoffel;Yamanaka, Masao;Yamada, Makoto;Suzuki, Taiji;Kanamori, Takafumi
- Journal of Computing Science and Engineering
- /
- 제7권2호
- /
- pp.99-111
- /
- 2013
Approximating a divergence between two probability distributions from their samples is a fundamental challenge in statistics, information theory, and machine learning. A divergence approximator can be used for various purposes, such as two-sample homogeneity testing, change-point detection, and class-balance estimation. Furthermore, an approximator of a divergence between the joint distribution and the product of marginals can be used for independence testing, which has a wide range of applications, including feature selection and extraction, clustering, object matching, independent component analysis, and causal direction estimation. In this paper, we review recent advances in divergence approximation. Our emphasis is that directly approximating the divergence without estimating probability distributions is more sensible than a naive two-step approach of first estimating probability distributions and then approximating the divergence. Furthermore, despite the overwhelming popularity of the Kullback-Leibler divergence as a divergence measure, we argue that alternatives such as the Pearson divergence, the relative Pearson divergence, and the $L^2$-distance are more useful in practice because of their computationally efficient approximability, high numerical stability, and superior robustness against outliers.
https://doi.org/10.5626/JCSE.2013.7.2.99 인용 PDF KSCI KPUBS

발산거리 기반의 신경망에 의한 가우시안 확률 밀도 함수의 군집화 (Guassian pdfs Clustering Using a Divergence Measure-based Neural Network)

박동철;권오현
- 한국통신학회논문지
- /
- 제29권5C호
- /
- pp.627-631
- /
- 2004
음성인식 모델상의 GPDFs(Gaussian Probability Density Functions)을 효율적으로 군집화 할 수 있는 알고리즘이 제안되었다. 제안된 알고리즘은 데이터 사이의 거리 척도로 발산 거리를 사용하는 새로운 형태의 CNN(Centroid Neural Network)으로, 제한된 자원을 가지는 H/W환경의 음성인식에서 메모리 사용량을 축소하는 응용에 대한 실험 결과, 음성인식 모델인 CDHMM(Continuous Density Hidden Markov Model)에서 기존의 Dk-means(Divergence-based k-means)알고리즘을 이용한 방법과 비교하여 인식 성능의 유지와 함께 약 31.3％의 GPDFs를 더 축소할 수 있었고, 군집화 알고리즘을 적용하지 자은 전체 GPDFs를 사용한 경우와 비교해서 인식 성능의 유지와 함께 약 61.8％의 GPDFs를 압축할 수 있었으며, SNR 10㏈ 잡음 데이터에 대한 성능평가에서도 인식 성능이 유지될 수 있었다.
PDF KSCI

MEASURE OF DEPARTURE FROM QUASI-SYMMETRY AND BRADLEY-TERRY MODELS FOR SQUARE CONTINGENCY TABLES WITH NOMINAL CATEGORIES

Kouji Tahata;Nobuko Miyamoto;Sadao Tomizawa
- Journal of the Korean Statistical Society
- /
- 제33권1호
- /
- pp.129-147
- /
- 2004
For square contingency tables with nominal categories, this paper proposes a measure to represent the degree of departure from the quasi-symmetry (QS) model and the Bradley-Terry (BT) model. The measure proposed is expressed by using the Cressie and Read (1984)'s power-divergence or Patil and Taillie (1982)'s diversity index. The measure lies between 0 and 1, and it is useful for comparing the degree of departure from QS or BT in several tables.
PDF KSCI

Generalized Measure of Departure From Global Symmetry for Square Contingency Tables with Ordered Categories

Tomizawa, Sadao;Saitoh, Kayo
- Journal of the Korean Statistical Society
- /
- 제27권3호
- /
- pp.289-303
- /
- 1998
For square contingency tables with ordered categories, Tomizawa (1995) considered two kinds of measures to represent the degree of departure from global symmetry, which means that the probability that an observation will fall in one of cells in the upper-right triangle of square table is equal to the probability that the observation falls in one of cells in the lower-left triangle of it. This paper proposes a generalization of those measures. The proposed measure is expressed by using Cressie and Read's (1984) power divergence or Patil and Taillie's (1982) diversity index. Special cases of the proposed measure include TomiBawa's measures. The proposed measure would be useful for comparing the degree of departure from global symmetry in several tables.
PDF

A New Distance Measure for a Variable-Sized Acoustic Model Based on MDL Technique

Cho, Hoon-Young;Kim, Sang-Hun
- ETRI Journal
- /
- 제32권5호
- /
- pp.795-800
- /
- 2010
Embedding a large vocabulary speech recognition system in mobile devices requires a reduced acoustic model obtained by eliminating redundant model parameters. In conventional optimization methods based on the minimum description length (MDL) criterion, a binary Gaussian tree is built at each state of a hidden Markov model by iteratively finding and merging similar mixture components. An optimal subset of the tree nodes is then selected to generate a downsized acoustic model. To obtain a better binary Gaussian tree by improving the process of finding the most similar Gaussian components, this paper proposes a new distance measure that exploits the difference in likelihood values for cases before and after two components are combined. The mixture weight of Gaussian components is also introduced in the component merging step. Experimental results show that the proposed method outperforms MDL-based optimization using either a Kullback-Leibler (KL) divergence or weighted KL divergence measure. The proposed method could also reduce the acoustic model size by 50% with less than a 1.5% increase in error rate compared to a baseline system.
https://doi.org/10.4218/etrij.10.1510.0062 인용 PDF KSCI

소형 터보압축기 베인 디퓨저 확대각 변화에 따른 유동특성 고찰 (Effects of the Variation of Divergence Angle of Vaned Diffuser on the Flow Characteristics of a Small-size Turbo-compressor)

김홍식;정조순;김윤제
- 대한기계학회:학술대회논문집
- /
- 대한기계학회 2001년도 춘계학술대회논문집E
- /
- pp.813-818
- /
- 2001
The flow characteristics of the vaned diffuser were complicated with geometric shapes. We have studied the effects of various vaned diffuser configurations, such as divergence angles and rectangular and conical cross-section shapes. Numerical analyses are carried out for the diffuser and casing. The pressure recovery coefficient was calculated to estimate the performance of the diffuser, and then compared with the measure data. Results show that the shapes and the divergence angles of the diffuser strongly influence on the performance of the small-size turbo-compressor.
PDF

DIMENSIONS OF THE SUBSETS IN THE SPECTRAL CLASSES OF A SELF-SIMILAR CANTOR SET

Baek, In-Soo
- Journal of applied mathematics & informatics
- /
- 제26권3_4호
- /
- pp.733-738
- /
- 2008
Using an information of dimensions of divergence points, we give full information of dimensions of the completely decomposed class of the lower(upper) distribution sets of a self-similar Cantor set. Further using a relationship between the distribution sets and the subsets generated by the lower(upper) local dimensions of a self-similar measure, we give full information of dimensions of the subsets by the local dimensions.
PDF

Empirical Comparisons of Disparity Measures for Three Dimensional Log-Linear Models

Park, Y.S.;Hong, C.S.;Jeong, D.B.
- Journal of the Korean Data and Information Science Society
- /
- 제17권2호
- /
- pp.543-557
- /
- 2006
This paper is concerned with the applicability of the chi-square approximation to the six disparity statistics: the Pearson chi-square, the generalized likelihood ratio, the power divergence, the blended weight chi-square, the blended weight Hellinger distance, and the negative exponential disparity statistic. Three dimensional contingency tables of small and moderate sample sizes are generated to be fitted to all possible hierarchical log-linear models: the completely independent model, the conditionally independent model, the partial association models, and the model with one variable independent of the other two. For models with direct solutions of expected cell counts, point estimates and confidence intervals of the 90 and 95 percentage points of six statistics are explored. For model without direct solutions, the empirical significant levels and the empirical powers of six statistics to test the significance of the three factor interaction are computed and compared.
PDF

Empirical Comparisons of Disparity Measures for Partial Association Models in Three Dimensional Contingency Tables

Jeong, D.B.;Hong, C.S.;Yoon, S.H.
- Communications for Statistical Applications and Methods
- /
- 제10권1호
- /
- pp.135-144
- /
- 2003
This work is concerned with comparison of the recently developed disparity measures for the partial association model in three dimensional categorical data. Data are generated by using simulation on each term in the log-linear model equation based on the partial association model, which is a proposed method in this paper. This alternative Monte Carlo methods are explored to study the behavior of disparity measures such as the power divergence statistic I(λ), the Pearson chi-square statistic X$^2$, the likelihood ratio statistic G$^2$, the blended weight chi-square statistic BWCS(λ), the blended weight Hellinger distance statistic BWHD(λ), and the negative exponential disparity statistic NED(λ) for moderate sample sizes. We find that the power divergence statistic I(2/3) and the blended weight Hellinger distance family BWHD(1/9) are the best tests with respect to size and power.
https://doi.org/10.5351/CKSS.2003.10.1.135 인용 PDF KSCI

Tree-structured Classification based on Variable Splitting

Ahn, Sung-Jin
- Communications for Statistical Applications and Methods
- /
- 제2권1호
- /
- pp.74-88
- /
- 1995
This article introduces a unified method of choosing the most explanatory and significant multiway partitions for classification tree design and analysis. The method is derived on the impurity reduction (IR) measure of divergence, which is proposed to extend the proportional-reduction-in-error (PRE) measure in the decision-theory context. For the method derivation, the IR measure is analyzed to characterize its statistical properties which are used to consistently handle the subjects of feature formation, feature selection, and feature deletion required in the associated classification tree construction. A numerical example is considered to illustrate the proposed approach.
PDF

검색결과 68건 처리시간 0.025초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)