• Title/Summary/Keyword: Split-Algorithm

Search Result 316, Processing Time 0.032 seconds

An Item-based Collaborative Filtering Technique by Associative Relation Clustering in Personalized Recommender Systems (개인화 추천 시스템에서 연관 관계 군집에 의한 아이템 기반의 협력적 필터링 기술)

  • 정경용;김진현;정헌만;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.467-477
    • /
    • 2004
  • While recommender systems were used by a few E-commerce sites former days, they are now becoming serious business tools that are re-shaping the world of I-commerce. And collaborative filtering has been a very successful recommendation technique in both research and practice. But there are two problems in personalized recommender systems, it is First-Rating problem and Sparsity problem. In this paper, we solve these problems using the associative relation clustering and “Lift” of association rules. We produce “Lift” between items using user's rating data. And we apply Threshold by -cut to the association between items. To make an efficiency of associative relation cluster higher, we use not only the existing Hypergraph Clique Clustering algorithm but also the suggested Split Cluster method. If the cluster is completed, we calculate a similarity iten in each inner cluster. And the index is saved in the database for the fast access. We apply the creating index to predict the preference for new items. To estimate the Performance, the suggested method is compared with existing collaborative filtering techniques. As a result, the proposed method is efficient for improving the accuracy of prediction through solving problems of existing collaborative filtering techniques.

A Study on Speech Recognition Using the HM-Net Topology Design Algorithm Based on Decision Tree State-clustering (결정트리 상태 클러스터링에 의한 HM-Net 구조결정 알고리즘을 이용한 음성인식에 관한 연구)

  • 정현열;정호열;오세진;황철준;김범국
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.2
    • /
    • pp.199-210
    • /
    • 2002
  • In this paper, we carried out the study on speech recognition using the KM-Net topology design algorithm based on decision tree state-clustering to improve the performance of acoustic models in speech recognition. The Korean has many allophonic and grammatical rules compared to other languages, so we investigate the allophonic variations, which defined the Korean phonetics, and construct the phoneme question set for phonetic decision tree. The basic idea of the HM-Net topology design algorithm is that it has the basic structure of SSS (Successive State Splitting) algorithm and split again the states of the context-dependent acoustic models pre-constructed. That is, it have generated. the phonetic decision tree using the phoneme question sets each the state of models, and have iteratively trained the state sequence of the context-dependent acoustic models using the PDT-SSS (Phonetic Decision Tree-based SSS) algorithm. To verify the effectiveness of the above algorithm we carried out the speech recognition experiments for 452 words of center for Korean language Engineering (KLE452) and 200 sentences of air flight reservation task (YNU200). Experimental results show that the recognition accuracy has progressively improved according to the number of states variations after perform the splitting of states in the phoneme, word and continuous speech recognition experiments respectively. Through the experiments, we have got the average 71.5%, 99.2% of the phoneme, word recognition accuracy when the state number is 2,000, respectively and the average 91.6% of the continuous speech recognition accuracy when the state number is 800. Also we haute carried out the word recognition experiments using the HTK (HMM Too1kit) which is performed the state tying, compared to share the parameters of the HM-Net topology design algorithm. In word recognition experiments, the HM-Net topology design algorithm has an average of 4.0% higher recognition accuracy than the context-dependent acoustic models generated by the HTK implying the effectiveness of it.

Evaluation of Sensitivity and Retrieval Possibility of Land Surface Temperature in the Mid-infrared Wavelength through Radiative Transfer Simulation (복사전달모의를 통한 중적외 파장역의 민감도 분석 및 지표면온도 산출 가능성 평가)

  • Choi, Youn-Young;Suh, Myoung-Seok;Cha, DongHwan;Seo, DooChun
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1423-1444
    • /
    • 2022
  • In this study, the sensitivity of the mid-infrared radiance to atmospheric and surface factors was analyzed using the radiative transfer model, MODerate resolution atmospheric TRANsmission (MODTRAN6)'s simulation data. The possibility of retrieving the land surface temperature (LST) using only the mid-infrared bands at night was evaluated. Based on the sensitivity results, the LST retrieval algorithm that reflects various factors for night was developed, and the level of the LST retrieval algorithm was evaluated using reference LST and observed LST. Sensitivity experiments were conducted on the atmospheric profiles, carbon dioxide, ozone, diurnal variation of LST, land surface emissivity (LSE), and satellite viewing zenith angle (VZA), which mainly affect satellite remote sensing. To evaluate the possibility of using split-window method, the mid-infrared wavelength was divided into two bands based on the transmissivity. Regardless of the band, the top of atmosphere (TOA) temperature is most affected by atmospheric profile, and is affected in order of LSE, diurnal variation of LST, and satellite VZA. In all experiments, band 1, which corresponds to the atmospheric window, has lower sensitivity, whereas band 2, which includes ozone and water vapor absorption, has higher sensitivity. The evaluation results for the LST retrieval algorithm using prescribed LST showed that the correlation coefficient (CC), the bias and the root mean squared error (RMSE) is 0.999, 0.023K and 0.437K, respectively. Also, the validation with 26 in-situ observation data in 2021 showed that the CC, bias and RMSE is 0.993, 1.875K and 2.079K, respectively. The results of this study suggest that the LST can be retrieved using different characteristics of the two bands of mid-infrared to the atmospheric and surface conditions at night. Therefore, it is necessary to retrieve the LST using satellite data equipped with sensors in the mid-infrared bands.

Detecting near-duplication Video Using Motion and Image Pattern Descriptor (움직임과 영상 패턴 서술자를 이용한 중복 동영상 검출)

  • Jin, Ju-Kyong;Na, Sang-Il;Jenong, Dong-Seok
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.4
    • /
    • pp.107-115
    • /
    • 2011
  • In this paper, we proposed fast and efficient algorithm for detecting near-duplication based on content based retrieval in large scale video database. For handling large amounts of video easily, we split the video into small segment using scene change detection. In case of video services and copyright related business models, it is need to technology that detect near-duplicates, that longer matched video than to search video containing short part or a frame of original. To detect near-duplicate video, we proposed motion distribution and frame descriptor in a video segment. The motion distribution descriptor is constructed by obtaining motion vector from macro blocks during the video decoding process. When matching between descriptors, we use the motion distribution descriptor as filtering to improving matching speed. However, motion distribution has low discriminability. To improve discrimination, we decide to identification using frame descriptor extracted from selected representative frames within a scene segmentation. The proposed algorithm shows high success rate and low false alarm rate. In addition, the matching speed of this descriptor is very fast, we confirm this algorithm can be useful to practical application.

Error-Tolerant Music Information Retrieval Method Using Query-by-Humming (허밍 질의를 이용한 오류에 강한 악곡 정보 검색 기법)

  • 정현열;허성필
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.6
    • /
    • pp.488-496
    • /
    • 2004
  • This paper describes a music information retrieval system which uses humming as the key for retrieval Humming is an easy way for the user to input a melody. However, there are several problems with humming that degrade the retrieval of information. One problem is a human factor. Sometimes people do not sing accurately, especially if they are inexperienced or unaccompanied. Another problem arises from signal processing. Therefore, a music information retrieval method should be sufficiently robust to surmount various humming errors and signal processing problems. A retrieval system has to extract pitch from the user's humming. However pitch extraction is not perfect. It often captures half or double pitches. even if the extraction algorithms take the continuity of the pitch into account. Considering these problems. we propose a system that takes multiple pitch candidates into account. In addition to the frequencies of the pitch candidates. the confidence measures obtained from their powers are taken into consideration as well. We also propose the use of an algorithm with three dimensions that is an extension of the conventional DP algorithm, so that multiple pitch candidates can be treated. Moreover in the proposed algorithm. DP paths are changed dynamically to take deltaPitches and IOIratios of input and reference notes into account in order to treat notes being split or unified. We carried out an evaluation experiment to compare the proposed system with a conventional system. From the experiment. the proposed method gave better retrieval performance than the conventional system.

A Study on Parameters Estimation of Storage Function Model Using the Genetic Algorithms (유전자 알고리듬을 이용한 저류함수모형의 매개변수 추정에 관한 연구)

  • Park, Bong-Jin;Cha, Hyeong-Seon;Kim, Ju-Hwan
    • Journal of Korea Water Resources Association
    • /
    • v.30 no.4
    • /
    • pp.347-355
    • /
    • 1997
  • In this study, the applicability of genetic algorithms into the parameter estimation of storage function method for flood routing model is investigated. Genetic algorithm is mathematically established theory based on the process of Darwinian natural selection and survival of fittest. It can be represented as a kind of search algorithms for optima point in solution space and make a reach on optimal solutions through performance improvement of assumed model by applying the natural selection of life as mechanical learning province. Flood events recorded in the Daechung dam are selected and used for the parameter estimation and verification of the proposed parameter estimation method by the split sample method. The results are analyzed that the performance of the model are improved including peak discharge and time to peak and shown that the parameter Rsa, and f1 are most sensitive to storage function model. Based on the analysis for estimated parameters and the comparison with the results from experimental equations, the applicability of genetic algorithm is verified and the improvements of those equations will be used for the augmentation of flood control efficiency.

  • PDF

Content based Video Copy Detection Using Spatio-Temporal Ordinal Measure (시공간 순차 정보를 이용한 내용기반 복사 동영상 검출)

  • Jeong, Jae-Hyup;Kim, Tae-Wang;Yang, Hun-Jun;Jin, Ju-Kyong;Jeong, Dong-Seok
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.2
    • /
    • pp.113-121
    • /
    • 2012
  • In this paper, we proposed fast and efficient algorithm for detecting near-duplication based on content based retrieval in large scale video database. For handling large amounts of video easily, we split the video into small segment using scene change detection. In case of video services and copyright related business models, it is need to technology that detect near-duplicates, that longer matched video than to search video containing short part or a frame of original. To detect near-duplicate video, we proposed motion distribution and frame descriptor in a video segment. The motion distribution descriptor is constructed by obtaining motion vector from macro blocks during the video decoding process. When matching between descriptors, we use the motion distribution descriptor as filtering to improving matching speed. However, motion distribution has low discriminability. To improve discrimination, we decide to identification using frame descriptor extracted from selected representative frames within a scene segmentation. The proposed algorithm shows high success rate and low false alarm rate. In addition, the matching speed of this descriptor is very fast, we confirm this algorithm can be useful to practical application.

Hierarchical Clustering Approach of Multisensor Data Fusion: Application of SAR and SPOT-7 Data on Korean Peninsula

  • Lee, Sang-Hoon;Hong, Hyun-Gi
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.65-65
    • /
    • 2002
  • In remote sensing, images are acquired over the same area by sensors of different spectral ranges (from the visible to the microwave) and/or with different number, position, and width of spectral bands. These images are generally partially redundant, as they represent the same scene, and partially complementary. For many applications of image classification, the information provided by a single sensor is often incomplete or imprecise resulting in misclassification. Fusion with redundant data can draw more consistent inferences for the interpretation of the scene, and can then improve classification accuracy. The common approach to the classification of multisensor data as a data fusion scheme at pixel level is to concatenate the data into one vector as if they were measurements from a single sensor. The multiband data acquired by a single multispectral sensor or by two or more different sensors are not completely independent, and a certain degree of informative overlap may exist between the observation spaces of the different bands. This dependence may make the data less informative and should be properly modeled in the analysis so that its effect can be eliminated. For modeling and eliminating the effect of such dependence, this study employs a strategy using self and conditional information variation measures. The self information variation reflects the self certainty of the individual bands, while the conditional information variation reflects the degree of dependence of the different bands. One data set might be very less reliable than others in the analysis and even exacerbate the classification results. The unreliable data set should be excluded in the analysis. To account for this, the self information variation is utilized to measure the degrees of reliability. The team of positively dependent bands can gather more information jointly than the team of independent ones. But, when bands are negatively dependent, the combined analysis of these bands may give worse information. Using the conditional information variation measure, the multiband data are split into two or more subsets according the dependence between the bands. Each subsets are classified separately, and a data fusion scheme at decision level is applied to integrate the individual classification results. In this study. a two-level algorithm using hierarchical clustering procedure is used for unsupervised image classification. Hierarchical clustering algorithm is based on similarity measures between all pairs of candidates being considered for merging. In the first level, the image is partitioned as any number of regions which are sets of spatially contiguous pixels so that no union of adjacent regions is statistically uniform. The regions resulted from the low level are clustered into a parsimonious number of groups according to their statistical characteristics. The algorithm has been applied to satellite multispectral data and airbone SAR data.

  • PDF

Extensions of X-means with Efficient Learning the Number of Clusters (X-means 확장을 통한 효율적인 집단 개수의 결정)

  • Heo, Gyeong-Yong;Woo, Young-Woon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.4
    • /
    • pp.772-780
    • /
    • 2008
  • K-means is one of the simplest unsupervised learning algorithms that solve the clustering problem. However K-means suffers the basic shortcoming: the number of clusters k has to be known in advance. In this paper, we propose extensions of X-means, which can estimate the number of clusters using Bayesian information criterion(BIC). We introduce two different versions of algorithm: modified X-means(MX-means) and generalized X-means(GX-means), which employ one full covariance matrix for one cluster and so can estimate the number of clusters efficiently without severe over-fitting which X-means suffers due to its spherical cluster assumption. The algorithms start with one cluster and try to split a cluster iteratively to maximize the BIC score. The former uses K-means algorithm to find a set of optimal clusters with current k, which makes it simple and fast. However it generates wrongly estimated centers when the clusters are overlapped. The latter uses EM algorithm to estimate the parameters and generates more stable clusters even when the clusters are overlapped. Experiments with synthetic data show that the purposed methods can provide a robust estimate of the number of clusters and cluster parameters compared to other existing top-down algorithms.

Multiple Description Coding of 3-D Data (3차원 데이터의 다중 부호화 기법)

  • Park, Sung-Bum;Kim, Chang-Su;Lee, Sang-Uk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.9C
    • /
    • pp.840-848
    • /
    • 2007
  • A multiple description coding (MDC) scheme for 3-D Data is presented. First, a plane-based 3-D data is split into two descriptions, each of which has identical contribution in 3-D surface reconstruction. In order to maximize the visual quality of reconstructed 3-D data, then, plane parameters are modified according to channel error condition. Finally, these descriptions are compressed and transmitted over distinct channels. In decoder, if two descriptions are available, we reconstruct a high quality 3-D data. If only one description is transmitted, however, 3-D surface recovery scheme reduces artifacts on erroneous 3-D surface, yielding a smooth 3-D surface. Therefore, the proposed algorithm guarantees acceptable quality reconstruction of 3-D data even though one channel is totally lost.