• 제목/요약/키워드: Information Density

검색결과 4,168건 처리시간 0.03초

Jackknife Kernel Density Estimation Using Uniform Kernel Function in the Presence of k's Unidentified Outliers

  • Woo, Jung-Soo;Lee, Jang-Choon
    • Journal of the Korean Data and Information Science Society
    • /
    • 제6권1호
    • /
    • pp.85-96
    • /
    • 1995
  • The purpose of this paper is to propose the kernel density estimator and the jackknife kernel density estimator in the presence of k's unidentified outliers, and to compare the small sample performances of the proposed estimators in a sense of mean integrated square error(MISE).

  • PDF

Main Content Extraction from Web Pages Based on Node Characteristics

  • Liu, Qingtang;Shao, Mingbo;Wu, Linjing;Zhao, Gang;Fan, Guilin;Li, Jun
    • Journal of Computing Science and Engineering
    • /
    • 제11권2호
    • /
    • pp.39-48
    • /
    • 2017
  • Main content extraction of web pages is widely used in search engines, web content aggregation and mobile Internet browsing. However, a mass of irrelevant information such as advertisement, irrelevant navigation and trash information is included in web pages. Such irrelevant information reduces the efficiency of web content processing in content-based applications. The purpose of this paper is to propose an automatic main content extraction method of web pages. In this method, we use two indicators to describe characteristics of web pages: text density and hyperlink density. According to continuous distribution of similar content on a page, we use an estimation algorithm to judge if a node is a content node or a noisy node based on characteristics of the node and neighboring nodes. This algorithm enables us to filter advertisement nodes and irrelevant navigation. Experimental results on 10 news websites revealed that our algorithm could achieve a 96.34% average acceptable rate.

Minimum Hellinger Distance Estimation and Minimum Density Power Divergence Estimation in Estimating Mixture Proportions

  • Pak, Ro-Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권4호
    • /
    • pp.1159-1165
    • /
    • 2005
  • Basu et al. (1998) proposed a new density-based estimator, called the minimum density power divergence estimator (MDPDE), which avoid the use of nonparametric density estimation and associated complication such as bandwidth selection. Woodward et al. (1995) examined the minimum Hellinger distance estimator (MHDE), proposed by Beran (1977), in the case of estimation of the mixture proportion in the mixture of two normals. In this article, we introduce the MDPDE for a mixture proportion, and show that both the MDPDE and the MHDE have the same asymptotic distribution at a model. Simulation study identifies some cases where the MHDE is consistently better than the MDPDE in terms of bias.

  • PDF

Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation

  • Yee, Jaeyong;Park, Taesung;Park, Mira
    • Genomics & Informatics
    • /
    • 제20권2호
    • /
    • pp.17.1-17.11
    • /
    • 2022
  • Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.

SNS 이용동기 수준에 따른 정보교류, 네트워크 밀도, 정보신뢰성, 유대인식의 차이에 관한 연구 (A study on the Information interchange degree, Network density, Information reliability, Network sense of solidarity of According to the motive difference on Using social networks)

  • 박원준
    • 한국전자통신학회논문지
    • /
    • 제9권6호
    • /
    • pp.657-664
    • /
    • 2014
  • 본 연구는 SNS 이용자들을 대상으로 이용동기를 분석하여, 각 이용동기의 중간값을 기준으로 상, 중, 하로 구분하고, 정보교류, 네트워크 밀도, 정보신뢰도, 유대인식의 차이를 알아보았다. 소셜 네트워크 이용 동기는 정보추구 동기, 사회적 영향동기, 오락적 동기, 네트워크 형성동기로 나타났다. 이러한 이용동기 수준에 따라 종속변인으로 설정한 정보교류 정도, 네트워크 밀도, 정보신뢰도, 유대인식에 차이가 나타났다. 특히 정보교류 정도와 정보의 신뢰성은 4가지 동기 수준에 따라 차이가 나타났으며, 네트워크 밀도와 유대인식의 차이는 사회적 영향 동기 수준에 따라 차이가 나타났다.

시계열 수문자료의 비선형 상관관계 (How to Measure Nonlinear Dependence in Hydrologic Time Series)

  • 문영일
    • 한국수자원학회논문집
    • /
    • 제30권6호
    • /
    • pp.641-648
    • /
    • 1997
  • 상관계수가 변수간의 선형 상관관계를 나타내듯이 mutual information은 변수간의비선형 상관관계를 나타내준다. 본 논문에서는 mutual information 추정법으로 다변수 핵 미도함수(multivariate kernel density estimator)를 이용한 방법이 여러 time lags값에 대하여 산정 되었다. 많은 수문자료에서 보여지는 비선형 관계를 Mutual Information으로 확인하여 보았고, 또한 Mutual Information값이 거의 0인 점에서 optimal delay time을 구하여, 하나의 자료로부터 다변수 회귀분석 모델을 만들 때 이용할 수 있다.

  • PDF

3중 밀도 이산 웨이브렛 변환을 이용한 디지털 영상처리 기법 (The Digital Image Processing Method Using Triple-Density Discrete Wavelet Transformation)

  • 신종홍
    • 디지털산업정보학회논문지
    • /
    • 제8권3호
    • /
    • pp.133-145
    • /
    • 2012
  • This paper describes the high density discrete wavelet transformation which is one that expands an N point signal to M transform coefficients with M > N. The double-density discrete wavelet transform is one of the high density discrete wavelet transformation. This transformation employs one scaling function and two distinct wavelets, which are designed to be offset from one another by one half. And it is nearly shift-invariant. Similarly, triple-density discrete wavelet transformation is a new set of dyadic wavelet transformation with two generators. The construction provides a higher sampling in both time and frequency. Specifically, the spectrum of the first wavelet is concentrated halfway between the spectrum of the second wavelet and the spectrum of its dilated version. In addition, the second wavelet is translated by half-integers rather than whole-integers in the frame construction. This arrangement leads to high density wavelet transformation. But this new transform is approximately shift-invariant and has intermediate scales. In two dimensions, this transform outperforms the standard and double-density discrete wavelet transformation in terms of multiple directions. Resultingly, the proposed wavelet transformation services good performance in image and video processing fields.

An Improved Clustering Method with Cluster Density Independence

  • Yoo, Byeong-Hyeon;Kim, Wan-Woo;Heo, Gyeongyong
    • 한국컴퓨터정보학회논문지
    • /
    • 제20권12호
    • /
    • pp.15-20
    • /
    • 2015
  • In this paper, we propose a modified fuzzy clustering algorithm which can overcome the center deviation due to the Euclidean distance commonly used in fuzzy clustering. Among fuzzy clustering methods, Fuzzy C-Means (FCM) is the most well-known clustering algorithm and has been widely applied to various problems successfully. In FCM, however, cluster centers tend leaning to high density clusters because the Euclidean distance measure forces high density cluster to make more contribution to clustering result. Proposed is an enhanced algorithm which modifies the objective function of FCM by adding a center-scattering term to make centers not to be close due to the cluster density. The proposed method converges more to real centers with small number of iterations compared to FCM. All the strengths can be verified with experimental results.

Sensor Density for Full-View Problem in Heterogeneous Deployed Camera Sensor Networks

  • Liu, Zhimin;Jiang, Guiyan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권12호
    • /
    • pp.4492-4507
    • /
    • 2021
  • In camera sensor networks (CSNs), in order to better identify the point, full-view problem requires capture any facing direction of target (point or intruder), and its coverage prediction and sensor density issues are more complicated. At present, a lot of research supposes that a large number of homogeneous camera sensors are randomly distributed in a bounded square monitoring region to obtain full-view rate which is close to 1. In this paper, we deduce the sensor density prediction model in heterogeneous deployed CSNs with arbitrary full-view rate. Aiming to reduce the influence of boundary effect, we introduce the concepts of expanded monitoring region and maximum detection area. Besides, in order to verify the performance of the proposed sensor density model, we carried out different scenarios in simulation experiments to verify the theoretical results. The simulation results indicate that the proposed model can effectively predict the sensor density with arbitrary full-view rate.

Density-based Outlier Detection in Multi-dimensional Datasets

  • Wang, Xite;Cao, Zhixin;Zhan, Rongjuan;Bai, Mei;Ma, Qian;Li, Guanyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권12호
    • /
    • pp.3815-3835
    • /
    • 2022
  • Density-based outlier detection is one of the hot issues in data mining. A point is determined as outlier on basis of the density of points near them. The existing density-based detection algorithms have high time complexity, in order to reduce the time complexity, a new outlier detection algorithm DODMD (Density-based Outlier Detection in Multidimensional Datasets) is proposed. Firstly, on the basis of ZH-tree, the concept of micro-cluster is introduced. Each leaf node is regarded as a micro-cluster, and the micro-cluster is calculated to achieve the purpose of batch filtering. In order to obtain n sets of approximate outliers quickly, a greedy method is used to calculate the boundary of LOF and mark the minimum value as LOFmin. Secondly, the outliers can filtered out by LOFmin, the real outliers are calculated, and then the result set is updated to make the boundary closer. Finally, the accuracy and efficiency of DODMD algorithm are verified on real dataset and synthetic dataset respectively.