• Title/Summary/Keyword: Data Similarity

Search Result 2,058, Processing Time 0.036 seconds

An Analysis of Data Traffic Considering the Delay and Cell Loss Probability (지연시간과 손실율을 고려한 데이터 트래픽 분석)

  • Lim Seog -Ku
    • Journal of Digital Contents Society
    • /
    • v.5 no.1
    • /
    • pp.7-11
    • /
    • 2004
  • There are many problems that must solve to construct next generation high-speed communication network. Among these, item that must consider basically is characteristics analysis of traffic that nows to network Traffic characteristics of many Internet services that is offered present have shown that network traffic exhibits at a wide range of scals-self-similarity. Self-similarity is expressed by long term dependency, this is contradictory concept with Poisson model that have relativity short term dependency. Therefore, first of all, for design and dimensioning of next generation communication network, traffic model that are reflected burstiness and self-similarity is required. Here self-similarity can be characterized by Hurst parameter. In this paper, the calculation equation is derived considering queueing delay and self-similarity of data traffic art compared with simulation results.

  • PDF

Moving Objects Modeling for Supporting Content and Similarity Searches (내용 및 유사도 검색을 위한 움직임 객체 모델링)

  • 복경수;김미희;신재룡;유재수;조기형
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.5
    • /
    • pp.617-632
    • /
    • 2004
  • Video Data includes moving objects which change spatial positions as time goes by. In this paper, we propose a new modeling method for a moving object contained in the video data. In order to effectively retrieve moving objects, the proposed modeling method represents the spatial position and the size of a moving object. It also represents the visual features and the trajectory by considering direction, distance and speed or moving objects as time goes by. Therefore, It allows various types of retrieval such as visual feature based similarity retrieval, distance based similarity retrieval and trajectory based similarity retrieval and their mixed type of weighted retrieval.

  • PDF

Measuring gameplay similarity between human and reinforcement learning artificial intelligence (사람과 강화학습 인공지능의 게임플레이 유사도 측정)

  • Heo, Min-Gu;Park, Chang-Hoon
    • Journal of Korea Game Society
    • /
    • v.20 no.6
    • /
    • pp.63-74
    • /
    • 2020
  • Recently, research on automating game tests using artificial intelligence agents instead of humans is attracting attention. This paper aims to collect play data from human and artificial intelligence and analyze their similarity as a preliminary study for game balancing automation. At this time, constraints were added at the learning stage in order to create artificial intelligence that can play similar to humans. Play datas obtained 14 people and 60 artificial intelligence by playing Flippy bird games 10 times each. The collected datas compared and analyzed for movement trajectory, action position, and dead position using the cosine similarity method. As a result of the analysis, an artificial intelligence agent with a similarity of 0.9 or more with humans was found.

A New Similarity Measure for Categorical Attribute-Based Clustering (범주형 속성 기반 군집화를 위한 새로운 유사 측도)

  • Kim, Min;Jeon, Joo-Hyuk;Woo, Kyung-Gu;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.37 no.2
    • /
    • pp.71-81
    • /
    • 2010
  • The problem of finding clusters is widely used in numerous applications, such as pattern recognition, image analysis, market analysis. The important factors that decide cluster quality are the similarity measure and the number of attributes. Similarity measures should be defined with respect to the data types. Existing similarity measures are well applicable to numerical attribute values. However, those measures do not work well when the data is described by categorical attributes, that is, when no inherent similarity measure between values. In high dimensional spaces, conventional clustering algorithms tend to break down because of sparsity of data points. To overcome this difficulty, a subspace clustering approach has been proposed. It is based on the observation that different clusters may exist in different subspaces. In this paper, we propose a new similarity measure for clustering of high dimensional categorical data. The measure is defined based on the fact that a good clustering is one where each cluster should have certain information that can distinguish it with other clusters. We also try to capture on the attribute dependencies. This study is meaningful because there has been no method to use both of them. Experimental results on real datasets show clusters obtained by our proposed similarity measure are good enough with respect to clustering accuracy.

A study on the ordering of PIM family similarity measures without marginal probability (주변 확률을 고려하지 않는 확률적 흥미도 측도 계열 유사성 측도의 서열화)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.367-376
    • /
    • 2015
  • Today, big data has become a hot keyword in that big data may be defined as collection of data sets so huge and complex that it becomes difficult to process by traditional methods. Clustering method is to identify the information in a big database by assigning a set of objects into the clusters so that the objects in the same cluster are more similar to each other clusters. The similarity measures being used in the cluster analysis may be classified into various types depending on the nature of the data. In this paper, we computed upper and lower limits for probability interestingness measure based similarity measures without marginal probability such as Yule I and II, Michael, Digby, Baulieu, and Dispersion measure. And we compared these measures by real data and simulated experiment. By Warrens (2008), Coefficients with the same quantities in the numerator and denominator, that are bounded, and are close to each other in the ordering, are likely to be more similar. Thus, results on bounds provide means of classifying various measures. Also, knowing which coefficients are similar provides insight into the stability of a given algorithm.

A Study on the Synthetic ECG Generation for User Recognition (사용자 인식을 위한 가상 심전도 신호 생성 기술에 관한 연구)

  • Kim, Min Gu;Kim, Jin Su;Pan, Sung Bum
    • Smart Media Journal
    • /
    • v.8 no.4
    • /
    • pp.33-37
    • /
    • 2019
  • Because the ECG signals are time-series data acquired as time elapses, it is important to obtain comparative data the same in size as the enrolled data every time. This paper suggests a network model of GAN (Generative Adversarial Networks) based on an auxiliary classifier to generate synthetic ECG signals which may address the different data size issues. The Cosine similarity and Cross-correlation are used to examine the similarity of synthetic ECG signals. The analysis shows that the Average Cosine similarity was 0.991 and the Average Euclidean distance similarity based on cross-correlation was 0.25: such results indicate that data size difference issue can be resolved while the generated synthetic ECG signals, similar to real ECG signals, can create synthetic data even when the registered data are not the same as the comparative data in size.

A Clustering Scheme Considering the Structural Similarity of Metadata in Smartphone Sensing System (스마트폰 센싱에서 메타데이터의 구조적 유사도를 고려한 클러스터링 기법)

  • Min, Hong;Heo, Junyoung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.229-234
    • /
    • 2014
  • As association between sensor networks that collect environmental information by using numberous sensor nodes and smartphones that are equipped with various sensors, many applications understanding users' context have been developed to interact users and their environments. Collected data should be stored with XML formatted metadata containing semantic information to share the collected data. In case of distance based clustering schemes, the efficiency of data collection decreases because metadata files are extended and changed as the purpose of each system developer. In this paper, we proposed a clustering scheme considering the structural similarity of metadata to reduce clustering construction time and improve the similarity of metadata among member nodes in a cluster.

A Study on Measuring the Similarity Among Sampling Sites in Lake Yongdam with Water Quality Data Using Multivariate Techniques (다변량기법을 활용한 용담호 수질측정지점 유사성 연구)

  • Lee, Yosang;Kwon, Sehyug
    • Journal of Environmental Impact Assessment
    • /
    • v.18 no.6
    • /
    • pp.401-409
    • /
    • 2009
  • Multivariate statistical approaches to classify sampling sites with measuring their similarity by water quality data and understand the characteristics of classified clusters have been discussed for the optimal water quality monitering network. For empirical study, data of two years (2005, 2006) at the 9 sampling sites with the combination of 2 depth levels and 7 important variables related to water quality is collected in Yongdam reservoir. The similarity among sampling sites is measured with Euclidean distances of water quality related variables and they are classified by hierarchical clustering method. The clustered sites are discussed with principal component variables in the view of the geographical characteristics of them and reducing the number of measuring sites. Nine sampling sites are clustered as follows; One cluster of 5, 6, and 7 sampling sites shows the characteristic of low water depth and main stream of water. The sites of 2 and 4 are clustered into the same group by characteristics of hydraulics which come from that of main stream. But their changing pattern of water quality looks like different since the site of 2 is near to dam. The sampling sites of 3, 8, and 9 are individually positioned due to the different tributary.

Deep learning-based custom problem recommendation algorithm to improve learning rate (학습률 향상을 위한 딥러닝 기반 맞춤형 문제 추천 알고리즘)

  • Lim, Min-Ah;Hwang, Seung-Yeon;Kim, Jeong-Jun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.5
    • /
    • pp.171-176
    • /
    • 2022
  • With the recent development of deep learning technology, the areas of recommendation systems have also diversified. This paper studied algorithms to improve the learning rate and studied the significance results according to words through comparison with the performance characteristics of the Word2Vec model. The problem recommendation algorithm was implemented with the values expressed through the reflection of meaning and similarity test between texts, which are characteristics of the Word2Vec model. Through Word2Vec's learning results, problem recommendations were conducted using text similarity values, and problems with high similarity can be recommended. In the experimental process, it was seen that the accuracy decreased with the quantitative amount of data, and it was confirmed that the larger the amount of data in the data set, the higher the accuracy.

A METHOD OF IMAGE DATA RETRIEVAL BASED ON SELF-ORGANIZING MAPS

  • Lee, Mal-Rey;Oh, Jong-Chul
    • Journal of applied mathematics & informatics
    • /
    • v.9 no.2
    • /
    • pp.793-806
    • /
    • 2002
  • Feature-based similarity retrieval become an important research issue in image database systems. The features of image data are useful to discrimination of images. In this paper, we propose the highspeed k-Nearest Neighbor search algorithm based on Self-Organizing Maps. Self-Organizing Maps (SOM) provides a mapping from high dimensional feature vectors onto a two-dimensional space. The mapping preserves the topology of the feature vectors. The map is called topological feature map. A topological feature map preserves the mutual relations (similarity) in feature spaces of input data. and clusters mutually similar feature vectors in a neighboring nodes. Each node of the topological feature map holds a node vector and similar images that is closest to each node vector. In topological feature map, there are empty nodes in which no image is classified. We experiment on the performance of our algorithm using color feature vectors extracted from images. Promising results have been obtained in experiments.