• Title/Summary/Keyword: Partitioned Data Sets

Search Result 21, Processing Time 0.027 seconds

Temporal Association Rules with Exponential Smoothing Method (지수 평활법을 적용한 시간 연관 규칙)

  • Byon, Lu-Na;Park, Byoung-Sun;Han, Jeong-Hye;Jeong, Han-Il;Leem, Choon-Seong
    • The KIPS Transactions:PartD
    • /
    • v.11D no.3
    • /
    • pp.741-746
    • /
    • 2004
  • As electronic commerce progresses, the temporal association rule is developed from partitioned data sets by time to offer personalized services for customer's interest. In this paper, we proposed a temporal association rule with exponential smoothing method that is giving higher weights to recent data than past data. Through simulation and case study, we confirmed that it is more precise than existing temporal association rules but consumes running time.

High-Dimensional Image Indexing based on Adaptive Partitioning ana Vector Approximation (적응 분할과 벡터 근사에 기반한 고차원 이미지 색인 기법)

  • Cha, Gwang-Ho;Jeong, Jin-Wan
    • Journal of KIISE:Databases
    • /
    • v.29 no.2
    • /
    • pp.128-137
    • /
    • 2002
  • In this paper, we propose the LPC+-file for efficient indexing of high-dimensional image data. With the proliferation of multimedia data, there Is an increasing need to support the indexing and retrieval of high-dimensional image data. Recently, the LPC-file (5) that based on vector approximation has been developed for indexing high-dimensional data. The LPC-file gives good performance especially when the dataset is uniformly distributed. However, compared with for the uniformly distributed dataset, its performance degrades when the dataset is clustered. We improve the performance of the LPC-file for the strongly clustered image dataset. The basic idea is to adaptively partition the data space to find subspaces with high-density clusters and to assign more bits to them than others to increase the discriminatory power of the approximation of vectors. The total number of bits used to represent vector approximations is rather less than that of the LPC-file since the partitioned cells in the LPC+-file share the bits. An empirical evaluation shows that the LPC+-file results in significant performance improvements for real image data sets which are strongly clustered.

VDCluster : A Video Segmentation and Clustering Algorithm for Large Video Sequences (VDCluster : 대용량 비디오 시퀀스를 위한 비디오 세그멘테이션 및 클러스터링 알고리즘)

  • Lee, Seok-Ryong;Lee, Ju-Hong;Kim, Deok-Hwan;Jeong, Jin-Wan
    • Journal of KIISE:Databases
    • /
    • v.29 no.3
    • /
    • pp.168-179
    • /
    • 2002
  • In this paper, we investigate video representation techniques that are the foundational work for the subsequent video processing such as video storage and retrieval. A video data set if a collection of video clips, each of which is a sequence of video frames and is represented by a multidimensional data sequence (MDS). An MDS is partitioned into video segments considering temporal relationship among frames, and then similar segments of the clip are grouped into video clusters. Thus, the video clip is represented by a small number of video clusters. The video segmentation and clustering algorithm, VDCluster, proposed in this paper guarantee clustering quality to south an extent that satisfies predefined conditions. The experiments show that our algorithm performs very effectively with respect to various video data sets.

NIS quality analysis of pre- and post-harvest sugarcane.

  • Johnson, Sarah E.;Berding, Nils
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1621-1621
    • /
    • 2001
  • The quality of sugarcane grown on the NE Australian tropical coast ($16^{\circ}$15'- $18^{\circ}$15' S Lat.) has declined markedly in the past seven years. This has been linked to dilution of mill-supply cane with increasing levels of non mature-stalk material consisting of leaves and sucker culms. The prime research objective was to examine the transition from the pre-harvest, in-field crop to harvested material sent for processing, in terms of quality and crop fraction proportions. A secondary objective was to quantify the effects of preharvest-season crop habit and culm condition on crop quality. Ten quadrat samples from each of 54 random crop sites (17 in 1999 and 37 in 2000), covering a wide range of variables (cultivar, crop class, and edaphic, topographic, climatic, and temporal factors) were collected immediately before harvest. Samples were partitioned into four fractions:- sound and unsound mature stalks (culms), sucker culms, and extraneous matter (leaves). Material harvested from each site was sampled and partitioned into four fractions:- sound and unsound billets (culm pieces), culm-spindle pieces, and leaf. In 2000, before harvest, 14 additional sites were sampled monthly, on three occasions, from March - June. Erect and non-erect culms were divided into sound and unsound classes. All samples were disintegrated and presented to a remote reflectance module of a scanning spectrophotometer using the BSES large cassette module. Near infra-red spectroscopic (NIS) analyses were developed for the rapid determination of quality components (Brix, commercial cane sugar (CCS), fibre, moisture, and polariscope reading). Calibrations for three material groups (culm (n = 639), non-culm (n = 496), and combined) were developed for all components using the 1999 data set. Two sub-sets (n = 178, and 190) of about 10% of the preharvest-season and harvest populations scanned in 2000 also were subjected to full routine laboratory analyses. The 1999 combined calibrations were excellent, but the culm calibrations produced consistently lower standard errors. Non-culm calibrations were marginally better than the combined for only CCS and pol. reading. Analysis of the 2000 culm data with calibrations using all 1999 and 2000 culm data resulted in better predictions relative to the 1999 culm calibrations. This also was true for the combined calibrations. Assessment of quality components in pre- and post-harvest sugarcane using NIS calibrations was more cost effective than using routine laboratory techniques. Outcomes from this NIS-facilitated research will have important economic consequences for the Australian sugarcane industry. Potential CCS present in mature culms is being discounted by dilution with leaves and sucker culms, threatening farm viability. The results question the efficacy of current harvesting technology. The CCS of harvested cane is improved only marginally over that of the in-field crop. Current harvesting technology requires either supplementary, innovative pre-mill processing or a design revolution to improve mill-supply cane quality, and therefore whole of industry economics. NIS-facilitated analyses, before the harvest season, highlighted the benefits of growing erect, sound crops. Loss of CCS then, can be minimized only by a combination of crop improvement and agronomic solutions, applied as part of sound on-farm management.

  • PDF

Similarity Search Algorithm Based on Hyper-Rectangular Representation of Video Data Sets (비디오 데이터 세트의 하이퍼 사각형 표현에 기초한 비디오 유사성 검색 알고리즘)

  • Lee, Seok-Lyong
    • The KIPS Transactions:PartD
    • /
    • v.11D no.4
    • /
    • pp.823-834
    • /
    • 2004
  • In this research, the similarity search algorithms are provided for large video data streams. A video stream that consists of a number of frames can be expressed by a sequence in the multidimensional data space, by representing each frame with a multidimensional vector By analyzing various characteristics of the sequence, it is partitioned into multiple video segments and clusters which are represented by hyper-rectangles. Using the hyper-rectangles of video segments and clusters, similarity functions between two video streams are defined, and two similarity search algorithms are proposed based on the similarity functions algorithms by hyper-rectangles and by representative frames. The former is an algorithm that guarantees the correctness while the latter focuses on the efficiency with a slight sacrifice of the correctness Experiments on different types of video streams and synthetically generated stream data show the strength of our proposed algorithms.

Hierarchical Clustering Approach of Multisensor Data Fusion: Application of SAR and SPOT-7 Data on Korean Peninsula

  • Lee, Sang-Hoon;Hong, Hyun-Gi
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.65-65
    • /
    • 2002
  • In remote sensing, images are acquired over the same area by sensors of different spectral ranges (from the visible to the microwave) and/or with different number, position, and width of spectral bands. These images are generally partially redundant, as they represent the same scene, and partially complementary. For many applications of image classification, the information provided by a single sensor is often incomplete or imprecise resulting in misclassification. Fusion with redundant data can draw more consistent inferences for the interpretation of the scene, and can then improve classification accuracy. The common approach to the classification of multisensor data as a data fusion scheme at pixel level is to concatenate the data into one vector as if they were measurements from a single sensor. The multiband data acquired by a single multispectral sensor or by two or more different sensors are not completely independent, and a certain degree of informative overlap may exist between the observation spaces of the different bands. This dependence may make the data less informative and should be properly modeled in the analysis so that its effect can be eliminated. For modeling and eliminating the effect of such dependence, this study employs a strategy using self and conditional information variation measures. The self information variation reflects the self certainty of the individual bands, while the conditional information variation reflects the degree of dependence of the different bands. One data set might be very less reliable than others in the analysis and even exacerbate the classification results. The unreliable data set should be excluded in the analysis. To account for this, the self information variation is utilized to measure the degrees of reliability. The team of positively dependent bands can gather more information jointly than the team of independent ones. But, when bands are negatively dependent, the combined analysis of these bands may give worse information. Using the conditional information variation measure, the multiband data are split into two or more subsets according the dependence between the bands. Each subsets are classified separately, and a data fusion scheme at decision level is applied to integrate the individual classification results. In this study. a two-level algorithm using hierarchical clustering procedure is used for unsupervised image classification. Hierarchical clustering algorithm is based on similarity measures between all pairs of candidates being considered for merging. In the first level, the image is partitioned as any number of regions which are sets of spatially contiguous pixels so that no union of adjacent regions is statistically uniform. The regions resulted from the low level are clustered into a parsimonious number of groups according to their statistical characteristics. The algorithm has been applied to satellite multispectral data and airbone SAR data.

  • PDF

Generation of Efficient Fuzzy Classification Rules Using Evolutionary Algorithm with Data Partition Evaluation (데이터 분할 평가 진화알고리즘을 이용한 효율적인 퍼지 분류규칙의 생성)

  • Ryu, Joung-Woo;Kim, Sung-Eun;Kim, Myung-Won
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.1
    • /
    • pp.32-40
    • /
    • 2008
  • Fuzzy rules are very useful and efficient to describe classification rules especially when the attribute values are continuous and fuzzy in nature. However, it is generally difficult to determine membership functions for generating efficient fuzzy classification rules. In this paper, we propose a method of automatic generation of efficient fuzzy classification rules using evolutionary algorithm. In our method we generate a set of initial membership functions for evolutionary algorithm by supervised clustering the training data set and we evolve the set of initial membership functions in order to generate fuzzy classification rules taking into consideration both classification accuracy and rule comprehensibility. To reduce time to evaluate an individual we also propose an evolutionary algorithm with data partition evaluation in which the training data set is partitioned into a number of subsets and individuals are evaluated using a randomly selected subset of data at a time instead of the whole training data set. We experimented our algorithm with the UCI learning data sets, the experiment results showed that our method was more efficient at average compared with the existing algorithms. For the evolutionary algorithm with data partition evaluation, we experimented with our method over the intrusion detection data of KDD'99 Cup, and confirmed that evaluation time was reduced by about 70%. Compared with the KDD'99 Cup winner, the accuracy was increased by 1.54% while the cost was reduced by 20.8%.

Feature information fusion using multiple neural networks and target identification application of FLIR image (다중 신경회로망을 이용한 특징정보 융합과 적외선영상에서의 표적식별에의 응용)

  • 선선구;박현욱
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.4
    • /
    • pp.266-274
    • /
    • 2003
  • Distance Fourier descriptors of local target boundary and feature information fusion using multiple MLPs (Multilayer perceptrons) are proposed. They are used to identify nonoccluded and partially occluded targets in natural FLIR (forward-looking infrared) images. After segmenting a target, radial Fourier descriptors as global shape features are defined from the target boundary. A target boundary is partitioned into four local boundaries to extract local shape features. In a local boundary, a distance function is defined from boundary points and a line between two extreme points. Distance Fourier descriptors as local shape features are defined by using distance function. One global feature vector and four local feature vectors are used as input data for multiple MLPs to determine final identification result of the target. In the experiments, we show that the proposed method is superior to the traditional feature sets with respect to the identification performance.

A Space Efficient Indexing Technique for DNA Sequences (공간 효율적인 DNA 시퀀스 인덱싱 방안)

  • Song, Hye-Ju;Park, Young-Ho;Loh, Woong-Kee
    • Journal of KIISE:Databases
    • /
    • v.36 no.6
    • /
    • pp.455-465
    • /
    • 2009
  • Suffix trees are widely used in similar sequence matching for DNA. They have several problems such as time consuming, large space usages of disks and memories and data skew, since DNA sequences are very large and do not fit in the main memory. Thus, in the paper, we present a space efficient indexing method called SENoM, allowing us to build trees without merging phases for the partitioned sub trees. The proposed method is constructed in two phases. In the first phase, we partition the suffixes of the input string based on a common variable-length prefix till the number of suffixes is smaller than a threshold. In the second phase, we construct a sub tree based on the disk using the suffix sets, and then write it to the disk. The proposed method, SENoM eliminates complex merging phases. We show experimentally that proposed method is effective as bellows. SENoM reduces the disk usage less than 35% and reduces the memory usage less than 20% compared with TRELLIS algorithm. SENoM is available to query efficiently using the prefix tree even when the length of query sequence is large.

Study on the analysis of disproportionate data and hypothesis testing (불균형 자료 분석과 가설 검정에 관한 연구)

  • 장석환;송규문;김장한
    • The Korean Journal of Applied Statistics
    • /
    • v.5 no.2
    • /
    • pp.243-254
    • /
    • 1992
  • In the present study two sets of unbalanced two-way cross-classification data with and without empty cell(s) were used to evaluate empirically the various sums of squares in the analysis of variance table. Searle(1977) and Searle et.al.(1981) developed a method of computing R($\alpha$\mid$\mu, \beta$) and R($\beta$\mid$\mu, \alpha$) by the use of partitioned matrix of X'X for the model of no interaction, interchanging the columns of X in order of $\alpha, \mu, \beta$ and accordingly the elements in b. An alternative way of computing R($\alpha$\mid$\mu, \beta$), R($\beta$\mid$\mu, \alpha$) and R($\gamma$\mid$\mu, \alpha, \beta$) without interchanging the columns of X has been found by means of,$(X'X)^-$ derived, using $W_2 = Z_2Z_2-Z_2Z_1(Z_1Z_1)^-Z_1Z_2$. It is true that $R(\alpha$\mid$\mu,\beta,\gamma)\Sigma = SSA_W and R(\beta$\mid$\mu,\alpha,\gamma)\Sigma = SSB_W$ where $SSA_W$ and means analysis and $R(\gamma$\mid$\mu,\alpha,\beta) = R(\gamma$\mid$\mu,\alpha,\beta)\Sigma$ for the data without empty cell, but not for the data with empty cell(s). It is also noticed that for the datd with empty cells under W - restrictions $R(\alpha$\mid$\mu,\beta,\gamma)_W = R(\mu,\alpha,\beta,\gamma)_W - R(\mu,\alpha,\beta,\gamma)_W = R(\alpha$\mid$\mu) and R(\beta$\mid$\mu,\alpha,\gamma)_W = R(\mu,\alpha,\beta,\gamma)_W - R(\mu,\alpha,\beta,\gamma)_W = R(\beta$\mid$\mu) but R(\gamma$\mid$\mu,\alpha,\beta)_W = R(\mu,\alpha,\beta,\gamma)_W - R(\mu,\alpha,\beta,\gamma)_W \neq R(\gamma$\mid$\mu,\alpha,\beta)$. The hypotheses $H_o : K' b = 0$ commonly tested were examined in the relation with the corresponding sums of squares for $R(\alpha$\mid$\mu), R(\beta$\mid$\mu), R(\alpha$\mid$\mu,\beta), R(\beta$\mid$\mu,\alpha), R(\alpha$\mid$\mu,\beta,\gamma), R(\beta$\mid$\mu,\alpha,\gamma), and R(\gamma$\mid$\mu,\alpha,\beta)$ under the restrictions.

  • PDF