• Title/Summary/Keyword: information entropy

Search Result 887, Processing Time 0.025 seconds

Syntax Analysis of Enumeration type and Parallel Type Using Maximum Entropy Model (Maximum Entropy 모델을 이용한 나열 및 병렬형 인식)

  • Lim, Soo-Jong;Lee, Chang-Ki;Hur, Jeong;Jang, Myoung-Gil
    • 한국HCI학회:학술대회논문집
    • /
    • 2006.02a
    • /
    • pp.1240-1245
    • /
    • 2006
  • 한국어 문장을 구조 분석할 때에 모호성을 발생시키는 유형 중의 하나가 나열 및 병렬형이다. 문장 구조 복잡도를 증가시키는 나열 및 병렬형을 구조 분석 전에 미리 하나의 단위로 묶어서 처리하는 것이 문장 구조 분석의 정확도를 높이는데 중요하다. 본 연구에서는 형태소 태그를 이용한 기본 규칙으로 문장을 청크 단위로 분할하고 분할된 청크 중에서 나열형을 인식하여 해당되는 청크들을 하나의 나열 청크로 통합하여 청크의 개수를 줄인다. 병렬형에 대해서는 반복되는 병렬 청크의 범위와 생략된 용언을 복원한다. 이러한 인식은 첫 단계로 기호(symbol)를 중심으로 구축된 간단한 규칙으로 인식을 하고 이러한 규칙에 해당되지 않는 형태의 나열 및 병렬형은 Maximum Entropy 모델을 이용하여 적용한다. ME모델은 어휘자질, 형태소 품사 자질, 거리 자질, 의미자질, 구 단위 태그 자질(NP:명사구, VP:동사구, AP:형용사구), BIO 태그(Begin, Inside, Outside) 자질에 대한 ME(Maximum Entropy) 모델을 이용하여 구축되었다.

  • PDF

Shannon's Information Theory and Document Indexing (Shannon의 정보이론과 문헌정보)

  • Chung Young Mee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.6
    • /
    • pp.87-103
    • /
    • 1979
  • Information storage and retrieval is a part of general communication process. In the Shannon's information theory, information contained in a message is a measure of -uncertainty about information source and the amount of information is measured by entropy. Indexing is a process of reducing entropy of information source since document collection is divided into many smaller groups according to the subjects documents deal with. Significant concepts contained in every document are mapped into the set of all sets of index terms. Thus index itself is formed by paired sets of index terms and documents. Without indexing the entropy of document collection consisting of N documents is $log_2\;N$, whereas the average entropy of smaller groups $(W_1,\;W_2,...W_m)$ is as small $(as\;(\sum\limits^m_{i=1}\;H(W_i))/m$. Retrieval efficiency is a measure of information system's performance, which is largely affected by goodness of index. If all and only documents evaluated relevant to user's query can be retrieved, the information system is said $100\%$ efficient. Document file W may be potentially classified into two sets of relevant documents and non-relevant documents to a specific query. After retrieval, the document file W' is reclassified into four sets of relevant-retrieved, relevant-not retrieved, non-relevant-retrieved and non-relevant-not retrieved. It is shown in the paper that the difference in two entropies of document file Wand document file W' is a proper measure of retrieval efficiency.

  • PDF

Uncertainty Improvement of Incomplete Decision System using Bayesian Conditional Information Entropy (베이지언 정보엔트로피에 의한 불완전 의사결정 시스템의 불확실성 향상)

  • Choi, Gyoo-Seok;Park, In-Kyu
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.47-54
    • /
    • 2014
  • Based on the indiscernible relation of rough set, the inevitability of superposition and inconsistency of data makes the reduction of attributes very important in information system. Rough set has difficulty in the difference of attribute reduction between consistent and inconsistent information system. In this paper, we propose the new uncertainty measure and attribute reduction algorithm by Bayesian posterior probability for correlation analysis between condition and decision attributes. We compare the proposed method and the conditional information entropy to address the uncertainty of inconsistent information system. As the result, our method has more accuracy than conditional information entropy in dealing with uncertainty via mutual information of condition and decision attributes of information system.

Estimation for the Variation of the Concentration of Greenhouse Gases with Modified Shannon Entropy (변형된 샤논 엔트로피식을 이용한 온실가스 농도변화량 예측)

  • Kim, Sang-Mok;Lee, Do-Haeng;Choi, Eol;Koh, Mi-Sol;Yang, Jae-Kyu
    • Journal of Environmental Science International
    • /
    • v.22 no.11
    • /
    • pp.1473-1479
    • /
    • 2013
  • Entropy is a measure of disorder or uncertainty. This terminology is qualitatively used in the understanding of its correlation to pollution in the environmental area. In this research, three different entropies were defined and characterized in order to quantify the qualitative entropy previously used in the environmental science. We are dealing with newly defined distinct entropies $E_1$, $E_2$, and $E_3$ originated from Shannon entropy in the information theory, reflecting concentration of three major green house gases $CO_2$, $N_2O$ and $CH_4$ represented as the probability variables. First, $E_1$ is to evaluate the total amount of entropy from concentration difference of each green house gas with respect to three periods, due to industrial revolution, post-industrial revolution, and information revolution, respectively. Next, $E_2$ is to evaluate the entropy reflecting the increasing of the logarithm base along with the accumulated time unit. Lastly, $E_3$ is to evaluate the entropy with a fixed logarithm base by 2 depending on the time. Analytical results are as follows. $E_1$ shows the degree of prediction reliability with respect to variation of green house gases. As $E_1$ increased, the concentration variation becomes stabilized, so that it follows from linear correlation. $E_2$ is a valid indicator for the mutual comparison of those green house gases. Although $E_3$ locally varies within specific periods, it eventually follows a logarithmic curve like a similar pattern observed in thermodynamic entropy.

Cluster Feature Selection using Entropy Weighting and SVD (엔트로피 가중치 및 SVD를 이용한 군집 특징 선택)

  • Lee, Young-Seok;Lee, Soo-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.4
    • /
    • pp.248-257
    • /
    • 2002
  • Clustering is a method for grouping objects with similar properties into a same cluster. SVD(Singular Value Decomposition) is known as an efficient preprocessing method for clustering because of dimension reduction and noise elimination for a high dimensional and sparse data set like E-Commerce data set. However, it is hard to evaluate the worth of original attributes because of information loss of a converted data set by SVD. This research proposes a cluster feature selection method, called ENTROPY-SVD, to find important attributes for each cluster based on entropy weighting and SVD. Using SVD, one can take advantage of the latent structures in the association of attributes with similar objects and, using entropy weighting one can find highly dense attributes for each cluster. This paper also proposes a model-based collaborative filtering recommendation system with ENTROPY-SVD, called CFS-CF and evaluates its efficiency and utilization.

An Approach to Constructing an Efficient Entropy Source on Multicore Processor (멀티코어 환경에서 효율적인 엔트로피 원의 설계 기법)

  • Kim, SeongGyeom;Lee, SeungJoon;Kang, HyungChul;Hong, Deukjo;Sung, Jaechul;Hong, Seokhie
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.1
    • /
    • pp.61-71
    • /
    • 2018
  • In the Internet of Things, in which plenty of devices have connection to each other, cryptographically secure Random Number Generators (RNGs) are essential. Particularly, entropy source, which is the only one non-deterministic part in generating random numbers, has to equip with an unpredictable noise source(or more) for the required security strength. This might cause an requirement of additional hardware extracting noise source. Although additional hardware resources has better performance, it is needed to make the best use of existing resources in order to avoid extra costs, such as area, power consumption. In this paper, we suggest an entropy source which uses a multi-threaded program without any additional hardware. As a result, it reduces the difficulty when implementing on lightweight, low-power devices. Additionally, according to NIST's entropy estimation test suite, the suggested entropy source is tested to be secure enough for source of entropy input.

ESTIMATION OF SCALE PARAMETER FROM RAYLEIGH DISTRIBUTION UNDER ENTROPY LOSS

  • Chung, Youn-Shik
    • Journal of applied mathematics & informatics
    • /
    • v.2 no.1
    • /
    • pp.33-40
    • /
    • 1995
  • Entropy loss is derived by the scale parameter of Rayleigh distribution. Under this entropy loss we obtain the best invariant estimators and the Bayes estimators of the scale parameter. Also we compare MLE with the proposed estimators.

Effect of Nonlinear Transformations on Entropy of Hidden Nodes

  • Oh, Sang-Hoon
    • International Journal of Contents
    • /
    • v.10 no.1
    • /
    • pp.18-22
    • /
    • 2014
  • Hidden nodes have a key role in the information processing of feed-forward neural networks in which inputs are processed through a series of weighted sums and nonlinear activation functions. In order to understand the role of hidden nodes, we must analyze the effect of the nonlinear activation functions on the weighted sums to hidden nodes. In this paper, we focus on the effect of nonlinear functions in a viewpoint of information theory. Under the assumption that the nonlinear activation function can be approximated piece-wise linearly, we prove that the entropy of weighted sums to hidden nodes decreases after piece-wise linear functions. Therefore, we argue that the nonlinear activation function decreases the uncertainty among hidden nodes. Furthermore, the more the hidden nodes are saturated, the more the entropy of hidden nodes decreases. Based on this result, we can say that, after successful training of feed-forward neural networks, hidden nodes tend not to be in linear regions but to be in saturated regions of activation function with the effect of uncertainty reduction.

TOPOLOGICAL ENTROPY OF A SEQUENCE OF MONOTONE MAPS ON CIRCLES

  • Zhu Yuhun;Zhang Jinlian;He Lianfa
    • Journal of the Korean Mathematical Society
    • /
    • v.43 no.2
    • /
    • pp.373-382
    • /
    • 2006
  • In this paper, we prove that the topological entropy of a sequence of equi-continuous monotone maps $f_{1,\infty}={f_i}\;\infty\limits_{i=1}$on circles is $h(f_{1,\infty})={\frac{lim\;sup}{n{\rightarrow}\infty}}\;\frac 1 n \;log\;{\prod}\limits_{i=1}^n|deg\;f_i|$. As applications, we give the estimation of the entropies for some skew products on annular and torus. We also show that a diffeomorphism f on a smooth 2-dimensional closed manifold and its extension on the unit tangent bundle have the same entropy.

Statistical Measurement of Monsyllable Entropy for Korean Language (한국어 음절의 Entropy에 관한 연구)

  • 이주근;최흥문
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.11 no.3
    • /
    • pp.15-21
    • /
    • 1974
  • The information amount of monosyllables(characters) in Korean language is measured, in order of the following 3 steps. 1) The basic consonants and vowels are partitioned into two steps, 2) These set symbols, C and V, are sequentially combined to obtain the equation which represent the flow state of monosyllables. 3) From the equation, the state graphs can be constructed to examine the proferties of a stochastic process of monosyllables in Korean language. Furthermore, the entropy of Korean language by statistics is measured and compared with that of the western languages. The proposed methods are more definite, systematic, and simpler than the usual methods in examining the nature of information sources.

  • PDF