• Title/Summary/Keyword: 정규 상호정보량

Search Result 15, Processing Time 0.021 seconds

A Study on Relative Mutual Information Coefficients (상호정보량의 정규화에 대한 연구)

  • Lee, Jae-Yun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.37 no.4
    • /
    • pp.178-198
    • /
    • 2003
  • Mutual information as an association measure, has been used for various purposes as well as for calculating term similarity. There we, however, some limits in mutual information. It tends to emphasize low frequency terms extremely because the marginal value of mutual information changes inversely to frequency of terms. To compensate for this limit this study suggests relative mutual information(RMI) coefficients which normalize mutual information, and examines their characteristics in some details. The RMI coefficients also improve effectiveness of global query expansion when they are adapted to three different collections.

Efficient variable selection method using conditional mutual information (조건부 상호정보를 이용한 분류분석에서의 변수선택)

  • Ahn, Chi Kyung;Kim, Donguk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1079-1094
    • /
    • 2014
  • In this paper, we study efficient gene selection methods by using conditional mutual information. We suggest gene selection methods using conditional mutual information based on semiparametric methods utilizing multivariate normal distribution and Edgeworth approximation. We compare our suggested methods with other methods such as mutual information filter, SVM-RFE, Cai et al. (2009)'s gene selection (MIGS-original) in SVM classification. By these experiments, we show that gene selection methods using conditional mutual information based on semiparametric methods have better performance than mutual information filter. Furthermore, we show that they take far less computing time than Cai et al. (2009)'s gene selection but have similar performance.

Time delay estimation algorithm using Elastic Net (Elastic Net를 이용한 시간 지연 추정 알고리즘)

  • Jun-Seok Lim;Keunwa Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.4
    • /
    • pp.364-369
    • /
    • 2023
  • Time-delay estimation between two receivers is a technique that has been applied in a variety of fields, from underwater acoustics to room acoustics and robotics. There are two types of time delay estimation techniques: one that estimates the amount of time delay from the correlation between receivers, and the other that parametrically models the time delay between receivers and estimates the parameters by system recognition. The latter has the characteristic that only a small fraction of the system's parameters are directly related to the delay. This characteristic can be exploited to improve the accuracy of the estimation by methods such as Lasso regularization. However, in the case of Lasso regularization, the necessary information is lost. In this paper, we propose a method using Elastic Net that adds Ridge regularization to Lasso regularization to compensate for this. Comparing the proposed method with the conventional Generalized Cross Correlation (GCC) method and the method using Lasso regularization, we show that the estimation variance is very small even for white Gaussian signal sources and colored signal sources.

Development of Online Machine Learning Model for AHU Supply Air Temperature Prediction using Progressive Sampling and Normalized Mutual Information (점진적 샘플링과 정규 상호정보량을 이용한 온라인 기계학습 공조기 급기온도 예측 모델 개발)

  • Chu, Han-Gyeong;Shin, Han-Sol;Ahn, Ki-Uhn;Ra, Seon-Jung;Park, Cheol Soo
    • Journal of the Architectural Institute of Korea Structure & Construction
    • /
    • v.34 no.6
    • /
    • pp.63-69
    • /
    • 2018
  • The machine learning model can capture the dynamics of building systems with less inputs than the first principle based simulation model. The training data for developing a machine learning model are usually selected in a heuristic manner. In this study, the authors developed a machine learning model which can describe supply air temperature from an AHU in a real office building. For rational reduction of the training data, the progressive sampling method was used. It is found that even though the progressive sampling requires far less training data (n=60) than the offline regular sampling (n=1,799), the MBEs of both models are similar (2.6% vs. 5.4%). In addition, for the update of the machine learning model, the normalized mutual information (NMI) was applied. If the NMI between the simulation output and the measured data is less than 0.2, the model has to be updated. By the use of the NMI, the model can perform better prediction ($5.4%{\rightarrow}1.3%$).

DS/SS Code Acquisition Scheme Based on Signed-Rank Statistic in Non-Gaussian Impulsive Noise Environments (비정규 충격성 잡음 환경에서 부호 순위 통계량에 바탕을 둔 직접수열 대역확산 부호 획득기법)

  • Kim, Sang-Hun;Ahn, Sang-Ho;Lee, Young-Yoon;Yoo, Seung-Soo;Yoon, Seok-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.2C
    • /
    • pp.200-207
    • /
    • 2008
  • In this paper, a new detector is proposed for code acquisition, which employs the signs and ranks of the received signal samples, instead of their actual values, and so does not require knowledge of the non-Gaussian noise dispersion. The mean acquisition performance of the proposed detector is compared with that of the detector of $^{[1]}$. The simulation results show that the proposed scheme is not only robust to deviations from the true value of the non-Gaussian noise dispersion, but also has comparable performance to that of the scheme of $^{[1]}$ using exact knowledge of the non-Gaussian noise dispersion.

Input Variable Selection by Using Fixed-Point ICA and Adaptive Partition Mutual Information Estimation (고정점 알고리즘의 독립성분분석과 적응분할의 상호정보 추정에 의한 입력변수선택)

  • Cho, Yong-Hyun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.5
    • /
    • pp.525-530
    • /
    • 2006
  • This paper presents an efficient input variable selection method using both fixed-point independent component analysis(FP-ICA) and adaptive partition mutual information(AP-MI) estimation. FP-ICA which is based on secant method, is applied to quickly find the independence between input variables. AP-MI estimation is also applied to estimate an accurate dependence information by equally partitioning the samples of input variable for calculating the probability density function(PDF). The proposed method has been applied to 2 problems for selecting the input variables, which are the 7 artificial signals of 500 samples and the 24 environmental pollution signals of 55 samples, respectively The experimental results show that the proposed methods has a fast and accurate selection performance. The proposed method has also respectively better performance than AP-MI estimation without the FP-ICA and regular partition MI estimation.

A Simple Stopping Criterion for the MIN-SUM Iterative Decoding Algorithm on SCCC and Turbo code (반복 복호의 계산량 감소를 위한 간단한 복호 중단 판정 알고리즘)

  • Heo, Jun;Chung, Kyu-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.41 no.4
    • /
    • pp.11-16
    • /
    • 2004
  • A simple stopping criterion for iterative decoding based on min-sum processing is presented. While most stopping criteria suggested in the literature, are based on Cross Entropy (CE) and its simplification, the proposed stopping criterion is to check if a decoded sequence is a valid codeword along the encoder trellis structure. This new stopping criterion requires less computational complexity and saves mem4)ry compared to the conventional stopping rules. The numerical results are presented on the 3GPP turbo code and a Serially Concatenated Convolutional Cods (SCCC).

Applying Randomization Tests to Collocation Analyses in Large Corpora (언어의 공기관계 분석을 위한 임의화검증의 응용)

  • Yang Kyung-Sook;Kim HeeYoung
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.583-595
    • /
    • 2005
  • Contingency tables are used to compare counts of n-grams to determine if the n-gram is a true collocation, meaning that the words that make up the n-gram are highly associated in the text. Some statistical methods for identifying collocation are used. They are Kulczinsky coefficient, Ochiai coefficient, Frager and McGowan coefficient, Yule coefficient, mutual information, and chi-square, and so on. But the main problem is that these measures are based ell the assumption of a nor-mal or approximately normal distribution of the variables being sampled. While this assumption is valid in most instances, it is not valid when comparing the rates of occurrence of rare events, and texts are composed mostly of rare events. In this paper we have simply reviewed some statistics about testing association of two words. Some randomization tests to evaluate the significance level in analyzing collocation in large corpora are proposed. A related graph can be used to compare different lest statistics that ran be used to analyze the same contingency table.

Power analysis for $2{\times}2$ factorial in randomized complete block design (블럭이 존재하는 $2{\times}2$ 요인모형의 검정력 분석)

  • Choi, Young-Hun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.2
    • /
    • pp.245-253
    • /
    • 2011
  • Powers of rank transformed statistic for testing main effects and interaction effects for $2{\times}2$ factorial design in randomized complete block design are very superior to powers of parametric statistic without regard to the block size, composition method of effects and the type of population distributions such as exponential, double exponential, normal and uniform. $2{\times}2$ factorial design in RCBD increases error effects and decreases powers of parametric statistic which results in conservativeness. However powers of rank transformed statistic maintain relative preference. In general powers of rank transformed statistic show relative preference over those of parametric statistic with small block size and big effect size.

Power study for 2 × 2 factorial design in 4 × 4 latin square design (4 × 4 라틴방격모형 내 2 × 2 요인모형의 검정력 연구)

  • Choi, Young Hun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1195-1205
    • /
    • 2014
  • Compared with single design, powers of rank transformed statistic for testing main and interaction effects for $2{\times}2$ factorial in $4{\times}4$ latin square design are rapidly increased as effect size and replication size are increased. In general powers of rank transformed statistic are superior without regard to the diversified effect composition and the type of error distributions as nontesting factors are few and effect size are small. Powers of rank transformed statistic show much higher level than those of parametric statistic in exponential and double exponential distributions. Further powers of rank transformed statistic are very similar with those of parametric statistic in normal and uniform distributions.