• Title/Summary/Keyword: Data Similarity

Search Result 2,058, Processing Time 0.037 seconds

Parameterization of Along-Wind Dispersion Coefficients based on Field and Wind Tunnel Data

  • Kang, Sung-Dae
    • Environmental Sciences Bulletin of The Korean Environmental Sciences Society
    • /
    • v.10 no.S_1
    • /
    • pp.11-22
    • /
    • 2001
  • Observations related to the along-wind dispersion of puffs were collected from 12 field sites and from a wind tunnel experiment and used to test simple similarity relations. Because most of the date made use of concentration time series observation from fixed monitors, the basic observation was t, the standard deviation of the concentration time series. This data also allowed the travel time, t, from the source to the receptor to be estimated, from which the puff advective speed ue, could be determined. The along-wind dispersion coefficient, x, was then assumed to equal tue. The data, which extended over four orders of magnitude, supported the similarity relations t=0.1 t and x=1.8 $u^*$t, where t is the travel time and $u^*$ is the friction velocity. About 50% of the observations were within a factor of two of the predictions based on the similarity relations.

  • PDF

The Methodology of the Golf Swing Similarity Measurement Using Deep Learning-Based 2D Pose Estimation

  • Jonghyuk, Park
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.1
    • /
    • pp.39-47
    • /
    • 2023
  • In this paper, we propose a method to measure the similarity between golf swings in videos. As it is known that deep learning-based artificial intelligence technology is effective in the field of computer vision, attempts to utilize artificial intelligence in video-based sports data analysis are increasing. In this study, the joint coordinates of a person in a golf swing video were obtained using a deep learning-based pose estimation model, and based on this, the similarity of each swing segment was measured. For the evaluation of the proposed method, driver swing videos from the GolfDB dataset were used. As a result of measuring swing similarity by pairing swing videos of a total of 36 players, 26 players evaluated that their other swing sequence was the most similar, and the average ranking of similarity was confirmed to be about 5th. This ensured that the similarity could be measured in detail even when the motion was performed similarly.

Approximate Top-k Labeled Subgraph Matching Scheme Based on Word Embedding (워드 임베딩 기반 근사 Top-k 레이블 서브그래프 매칭 기법)

  • Choi, Do-Jin;Oh, Young-Ho;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.8
    • /
    • pp.33-43
    • /
    • 2022
  • Labeled graphs are used to represent entities, their relationships, and their structures in real data such as knowledge graphs and protein interactions. With the rapid development of IT and the explosive increase in data, there has been a need for a subgraph matching technology to provide information that the user is interested in. In this paper, we propose an approximate Top-k labeled subgraph matching scheme that considers the semantic similarity of labels and the difference in graph structure. The proposed scheme utilizes a learning model using FastText in order to consider the semantic similarity of a label. In addition, the label similarity graph(LSG) is used for approximate subgraph matching by calculating similarity values between labels in advance. Through the LSG, we can resolve the limitations of the existing schemes that subgraph expansion is possible only if the labels match exactly. It supports structural similarity for a query graph by performing searches up to 2-hop. Based on the similarity value, we provide k subgraph matching results. We conduct various performance evaluations in order to show the superiority of the proposed scheme.

Efficient Time-Series Similarity Measurement and Ranking Based on Anomaly Detection (이상탐지 기반의 효율적인 시계열 유사도 측정 및 순위화)

  • Ji-Hyun Choi;Hyun Ahn
    • Journal of Internet Computing and Services
    • /
    • v.25 no.2
    • /
    • pp.39-47
    • /
    • 2024
  • Time series analysis is widely employed by many organizations to solve business problems, as it extracts various information and insights from chronologically ordered data. Among its applications, measuring time series similarity is a step to identify time series with similar patterns, which is very important in time series analysis applications such as time series search and clustering. In this study, we propose an efficient method for measuring time series similarity that focuses on anomalies rather than the entire series. In this regard, we validate the proposed method by measuring and analyzing the rank correlation between the similarity measure for the set of subsets extracted by anomaly detection and the similarity measure for the whole time series. Experimental results, especially with stock time series data and an anomaly proportion of 10%, demonstrate a Spearman's rank correlation coefficient of up to 0.9. In conclusion, the proposed method can significantly reduce computation cost of measuring time series similarity, while providing reliable time series search and clustering results.

Implementation of A Plagiarism Detecting System with Sentence and Syntactic Word Similarities (문장 및 어절 유사도를 이용한 표절 탐지 시스템 구현)

  • Maeng, Joosoo;Park, Ji Su;Shon, Jin Gon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.3
    • /
    • pp.109-114
    • /
    • 2019
  • The similarity detecting method that is basically used in most plagiarism detecting systems is to use the frequency of shared words based on morphological analysis. However, this method has limitations on detecting accurate degree of similarity, especially when similar words concerning the same topics are used, sentences are partially separately excerpted, or postpositions and endings of words are similar. In order to overcome this problem, we have designed and implemented a plagiarism detecting system that provides more reliable similarity information by measuring sentence similarity and syntactic word similarity in addition to the conventional word similarity. We have carried out a comparison of on our system with a conventional system using only word similarity. The comparative experiment has shown that our system can detect plagiarized document that the conventional system can detect or cannot.

Practical Datasets for Similarity Measures and Their Threshold Values (유사도 측정 데이터 셋과 쓰레숄드)

  • Yang, Byoungju;Shim, Junho
    • The Journal of Society for e-Business Studies
    • /
    • v.18 no.1
    • /
    • pp.97-105
    • /
    • 2013
  • In the e-business domain where data objects are quantitatively large, measuring similarity to find the same or similar objects is important. It basically requires comparing and computing the features of objects in pairs, and therefore takes longer time as the amount of data becomes bigger. Recent studies have shown various algorithms to efficiently perform it. Most of them show their performance superiority by empirical tests over some sets of data. In this paper, we introduce those data sets, present their characteristics and the meaningful threshold values that each of data sets contain in nature. The analysis on practical data sets with respect to their threshold values may serve as a referential baseline to the future experiments of newly developed algorithms.

A Study on Prescription Similarity Analysis for Efficiency Improvement (처방 유사도 분석의 효율성 향상에 관한 연구)

  • Hwang, SuKyung;Woo, DongHyeon;Kim, KiWook;Lee, ByungWook
    • Journal of Korean Medical classics
    • /
    • v.35 no.4
    • /
    • pp.1-9
    • /
    • 2022
  • Objectives : This study aims to increase efficiency of the prescription similarity analysis method that uses drug composition ratio. Methods : The controlled experiment compared result generation time, generated data quantity, and accuracy of results between previous and new analysis method on the 12,598 formulas and 61 prescription groups. Results : The control group took 346 seconds on average and generated 768,478 results, while the test group took 24 seconds and generated 241,739 results. The test group adopted a selective calculation method that only used overlapping data between two formulas instead of analyzing all number of cases. It simplified the data processing process, reducing the quantity of data that is required to be processed, leading to better system speed, as fast as 14.47 times more than previous analysis method with equal results. Conclusions : Efficiency for similarity analysis could be improved by reducing data span and simplifying the calculation processes.

A Multimodal Profile Ensemble Approach to Development of Recommender Systems Using Big Data (빅데이터 기반 추천시스템 구현을 위한 다중 프로파일 앙상블 기법)

  • Kim, Minjeong;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.93-110
    • /
    • 2015
  • The recommender system is a system which recommends products to the customers who are likely to be interested in. Based on automated information filtering technology, various recommender systems have been developed. Collaborative filtering (CF), one of the most successful recommendation algorithms, has been applied in a number of different domains such as recommending Web pages, books, movies, music and products. But, it has been known that CF has a critical shortcoming. CF finds neighbors whose preferences are like those of the target customer and recommends products those customers have most liked. Thus, CF works properly only when there's a sufficient number of ratings on common product from customers. When there's a shortage of customer ratings, CF makes the formation of a neighborhood inaccurate, thereby resulting in poor recommendations. To improve the performance of CF based recommender systems, most of the related studies have been focused on the development of novel algorithms under the assumption of using a single profile, which is created from user's rating information for items, purchase transactions, or Web access logs. With the advent of big data, companies got to collect more data and to use a variety of information with big size. So, many companies recognize it very importantly to utilize big data because it makes companies to improve their competitiveness and to create new value. In particular, on the rise is the issue of utilizing personal big data in the recommender system. It is why personal big data facilitate more accurate identification of the preferences or behaviors of users. The proposed recommendation methodology is as follows: First, multimodal user profiles are created from personal big data in order to grasp the preferences and behavior of users from various viewpoints. We derive five user profiles based on the personal information such as rating, site preference, demographic, Internet usage, and topic in text. Next, the similarity between users is calculated based on the profiles and then neighbors of users are found from the results. One of three ensemble approaches is applied to calculate the similarity. Each ensemble approach uses the similarity of combined profile, the average similarity of each profile, and the weighted average similarity of each profile, respectively. Finally, the products that people among the neighborhood prefer most to are recommended to the target users. For the experiments, we used the demographic data and a very large volume of Web log transaction for 5,000 panel users of a company that is specialized to analyzing ranks of Web sites. R and SAS E-miner was used to implement the proposed recommender system and to conduct the topic analysis using the keyword search, respectively. To evaluate the recommendation performance, we used 60% of data for training and 40% of data for test. The 5-fold cross validation was also conducted to enhance the reliability of our experiments. A widely used combination metric called F1 metric that gives equal weight to both recall and precision was employed for our evaluation. As the results of evaluation, the proposed methodology achieved the significant improvement over the single profile based CF algorithm. In particular, the ensemble approach using weighted average similarity shows the highest performance. That is, the rate of improvement in F1 is 16.9 percent for the ensemble approach using weighted average similarity and 8.1 percent for the ensemble approach using average similarity of each profile. From these results, we conclude that the multimodal profile ensemble approach is a viable solution to the problems encountered when there's a shortage of customer ratings. This study has significance in suggesting what kind of information could we use to create profile in the environment of big data and how could we combine and utilize them effectively. However, our methodology should be further studied to consider for its real-world application. We need to compare the differences in recommendation accuracy by applying the proposed method to different recommendation algorithms and then to identify which combination of them would show the best performance.

A New Unsupervised Learning Network and Competitive Learning Algorithm Using Relative Similarity (상대유사도를 이용한 새로운 무감독학습 신경망 및 경쟁학습 알고리즘)

  • 류영재;임영철
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.10 no.3
    • /
    • pp.203-210
    • /
    • 2000
  • In this paper, we propose a new unsupervised learning network and competitive learning algorithm for pattern classification. The proposed network is based on relative similarity, which is similarity measure between input data and cluster group. So, the proposed network and algorithm is called relative similarity network(RSN) and learning algorithm. According to definition of similarity and learning rule, structure of RSN is designed and pseudo code of the algorithm is described. In general pattern classification, RSN, in spite of deletion of learning rate, resulted in the identical performance with those of WTA, and SOM. While, in the patterns with cluster groups of unclear boundary, or patterns with different density and various size of cluster groups, RSN produced more effective classification than those of other networks.

  • PDF

Optimization of the Similarity Measure for User-based Collaborative Filtering Systems (사용자 기반의 협력필터링 시스템을 위한 유사도 측정의 최적화)

  • Lee, Soojung
    • The Journal of Korean Association of Computer Education
    • /
    • v.19 no.1
    • /
    • pp.111-118
    • /
    • 2016
  • Measuring similarity in collaborative filtering-based recommender systems greatly affects system performance. This is because items are recommended from other similar users. In order to overcome the biggest problem of traditional similarity measures, i.e., data sparsity problem, this study suggests a new similarity measure that is the optimal combination of previous similarity and the value reflecting the number of co-rated items. We conducted experiments with various conditions to evaluate performance of the proposed measure. As a result, the proposed measure yielded much better performance than previous ones in terms of prediction qualities, specifically the maximum of about 7% improvement over the traditional Pearson correlation and about 4% over the cosine similarity.