• Title/Summary/Keyword: Similarity Algorithm

Search Result 1,152, Processing Time 0.023 seconds

A study on the Prediction Performance of the Correspondence Mean Algorithm in Collaborative Filtering Recommendation (협업 필터링 추천에서 대응평균 알고리즘의 예측 성능에 관한 연구)

  • Lee, Seok-Jun;Lee, Hee-Choon
    • Information Systems Review
    • /
    • v.9 no.1
    • /
    • pp.85-103
    • /
    • 2007
  • The purpose of this study is to evaluate the performance of collaborative filtering recommender algorithms for better prediction accuracy of the customer's preference. The accuracy of customer's preference prediction is compared through the MAE of neighborhood based collaborative filtering algorithm and correspondence mean algorithm. It is analyzed by using MovieLens 1 Million dataset in order to experiment with the prediction accuracy of the algorithms. For similarity, weight used in both algorithms, commonly, Pearson's correlation coefficient and vector similarity which are used generally were utilized, and as a result of analysis, we show that the accuracy of the customer's preference prediction of correspondence mean algorithm is superior. Pearson's correlation coefficient and vector similarity used in two algorithms are calculated using the preference rating of two customers' co-rated movies, and it shows that similarity weight is overestimated, where the number of co-rated movies is small. Therefore, it is intended to increase the accuracy of customer's preference prediction through expanding the number of the existing co-rated movies.

An Efficient Algorithm for Similarity Search in Large Biosequence Database (대용량 유전체를 위한 효율적인 유사성 검색 알고리즘)

  • Jeong, In-Seon;Park, Kyoung-Wook;Lim, Hyeong-Seok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.1073-1076
    • /
    • 2005
  • Since the size of biosequence database grows exponentially every year, it becomes impractical to use Smith-Waterman algorithm for exact sequence similarity search. For fast sequence similarity search, researchers have been proposed heuristic methods that use the frequency of characters in subsequences. These methods have the defect that different sequences are treated as the same sequence. Because of using only the frequency of characters, the accuracy of these methods are lower than Smith-Waterman algorithm. In this paper, we propose an algorithm which processes query efficiently by indexing the frequency of characters including the positional information of characters in subsequences. The experiments show that our algorithm improve the accuracy of sequence similarity search approximately 5${\sim}$20% than heuristic algorithms using only the frequency of characters.

  • PDF

GORank: Semantic Similarity Search for Gene Products using Gene Ontology (GORank: Gene Ontology를 이용한 유전자 산물의 의미적 유사성 검색)

  • Kim, Ki-Sung;Yoo, Sang-Won;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.682-692
    • /
    • 2006
  • Searching for gene products which have similar biological functions are crucial for bioinformatics. Modern day biological databases provide the functional description of gene products using Gene Ontology(GO). In this paper, we propose a technique for semantic similarity search for gene products using the GO annotation information. For this purpose, an information-theoretic measure for semantic similarity between gene products is defined. And an algorithm for semantic similarity search using this measure is proposed. We adapt Fagin's Threshold Algorithm to process the semantic similarity query as follows. First, we redefine the threshold for our measure. This is because our similarity function is not monotonic. Then cluster-skipping and the access ordering of the inverted index lists are proposed to reduce the number of disk accesses. Experiments with real GO and annotation data show that GORank is efficient and scalable.

The Similarity Measurement of Interior Design Images - Comparison between Measurement based on Perceptual Judgment and Measurement through Computing the Algorithm - (실내디자인 이미지의 유사성 측정 - 관찰자 직관 기반 측정법과 알고리즘 기반 정량적 측정법의 결과 비교를 중심으로 -)

  • Ryu, Hojeong;Ha, Mikyoung
    • Korean Institute of Interior Design Journal
    • /
    • v.24 no.2
    • /
    • pp.32-41
    • /
    • 2015
  • We live in the era of unlimited design competition. As the importance of design is increasing in all areas including marketing, each country does its best effort on design development. However, the preparation on protecting interior design rights by intellectual property laws(IPLs) has not been enough even though they occupy an important place in the design field. It is not quite easy to make a judgement on the similarity between two images having a single common factor because the factors which are composed of interior design have complicated interactive relations between them. From the IPLs point of view, designs with the similar overall appearance are decided to be similar. Objective evaluation criteria not only for designers but also for design examiners and judges are required in order to protect interior design by the IPLs. The objective of this study is the analysis of the possibility that a computer algorithm method can be useful to decide the similarity of interior design images. According to this study, it is realized that the Img2 which is one of content-based image retrieval computer programs can be utilized to measure the degree of the similarity. The simulation results of three descriptors(CEDD, FCTH, JCD) in the Img2 showed the high degree of similar patterns compared with the results of perceptual judgment by observers. In particular, it was verified that the Img2 has high availability on interior design images with a high score of similarity below 60 which are perceptually judged by observers.

Improved Similarity Detection Algorithm of the Video Scene (개선된 비디오 장면 유사도 검출 알고리즘)

  • Yu, Ju-Won;Kim, Jong-Weon;Choi, Jong-Uk;Bae, Kyoung-Yul
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.2
    • /
    • pp.43-50
    • /
    • 2009
  • We proposed similarity detection method of the video frame data that extracts the feature data of own video frame and creates the 1-D signal in this paper. We get the similar frame boundary and make the representative frames within the frame boundary to extract the similarity extraction between video. Representative frames make blurring frames and extract the feature data using DOG values. Finally, we convert the feature data into the 1-D signal and compare the contents similarity. The experimental results show that the proposed algorithm get over 0.9 similarity value against noise addition, rotation change, size change, frame delete, frame cutting.

Patent Document Similarity Based on Image Analysis Using the SIFT-Algorithm and OCR-Text

  • Park, Jeong Beom;Mandl, Thomas;Kim, Do Wan
    • International Journal of Contents
    • /
    • v.13 no.4
    • /
    • pp.70-79
    • /
    • 2017
  • Images are an important element in patents and many experts use images to analyze a patent or to check differences between patents. However, there is little research on image analysis for patents partly because image processing is an advanced technology and typically patent images consist of visual parts as well as of text and numbers. This study suggests two methods for using image processing; the Scale Invariant Feature Transform(SIFT) algorithm and Optical Character Recognition(OCR). The first method which works with SIFT uses image feature points. Through feature matching, it can be applied to calculate the similarity between documents containing these images. And in the second method, OCR is used to extract text from the images. By using numbers which are extracted from an image, it is possible to extract the corresponding related text within the text passages. Subsequently, document similarity can be calculated based on the extracted text. Through comparing the suggested methods and an existing method based only on text for calculating the similarity, the feasibility is achieved. Additionally, the correlation between both the similarity measures is low which shows that they capture different aspects of the patent content.

User Similarity-based Path Prediction Method (사용자 유사도 기반 경로 예측 기법)

  • Nam, Sumin;Lee, Sukhoon
    • The Journal of Korean Institute of Information Technology
    • /
    • v.17 no.12
    • /
    • pp.29-38
    • /
    • 2019
  • A path prediction method using lifelog requires a large amount of training data for accurate path prediction, and the path prediction performance is degraded when the training data is insufficient. The lack of training data can be solved using data of other users having similar user movement patterns. Therefore, this paper proposes a path prediction algorithm based on user similarity. The proposed algorithm learns the path in a triple grid pattern and measures the similarity between users using the cosine similarity technique. Then, it predicts the path with applying measured similarity to the learned model. For the evaluation, we measure and compare the path prediction accuracy of proposed method with the existing algorithms. As a result, the proposed method has 66.6% accuracy, and it is evaluated that its accuracy is 1.8% higher than other methods.

Tabu Search Algorithm for Frequency Reassignment Problem in Mobile Communication Networks (주파수 재할당 문제 해결을 위한 타부 서치 알고리듬 개발)

  • Han, Junghee
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.31 no.1
    • /
    • pp.1-9
    • /
    • 2005
  • This paper proposes the heuristic algorithm for the generalized GT problems to consider the restrictions which are given the number of machine cell and maximum number of machines in machine cell as well as minimum number of machines in machine cell. This approach is split into two phase. In the first phase, we use the similarity coefficient which proposes and calculates the similarity values about each pair of all machines and sort these values descending order. If we have a machine pair which has the largest similarity coefficient and adheres strictly to the constraint about birds of a different feather (BODF) in a machine cell, then we assign the machine to the machine cell. In the second phase, we assign parts into machine cell with the smallest number of exceptional elements. The results give a machine-part grouping. The proposed algorithm is compared to the Modified p-median model for machine-part grouping.

Feature-based Similarity Assessment for Re-using CAD Models (CAD 모델 재사용을 위한 특징형상기반 유사도 측정에 관한 연구)

  • Park, Byoung-Keon;Kim, Jay-Jung
    • Korean Journal of Computational Design and Engineering
    • /
    • v.16 no.1
    • /
    • pp.21-30
    • /
    • 2011
  • Similarity assessment of a CAD model is one of important issues from the aspect of model re-using. In real practice, many new mechanical parts are designed by modifying existing ones. The reuse of part enables to save design time and efforts for the designers. Design time would be further reduced if there were an efficient way to search for existing similar designs. This paper proposes an efficient algorithm of similarity assessment for mechanical part model with design history embedded within the CAD model. Since it is possible to retrieve the design history and detailed-feature information using CAD API, we can obtain an accurate and reliable assessment result. For our purpose, our assessment algorithm can be divided by two: (1) we select suitable parts by comparing MSG (Model Signature Graph) extracted from a base feature of the required model; (2) detailed-features' similarities are assessed with their own attributes and reference structures. In addition, we also propose a indexing method for managing a model database in the last part of this article.

Design of Spatial Similarity Measure for Moving Object Trajectories in Spatial Network (공간 네트워크에서 이동객체 궤적을 위한 공간 유사도 측정방법의 설계)

  • Bistao, Rabindra;Chang, Jae-Woo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10c
    • /
    • pp.83-87
    • /
    • 2006
  • Similarity search in moving object trajectories is an active area of research. In this paper, we introduce a new concept of measure that computes spatial distance (similarity) between two trajectories of moving objects on road networks. In addition, we propose an algorithm that generates a sequence of matching edge pairs for two trajectories that ate to be compared and computes spatial distance between them which is non Euclidian in nature. With an example, we explain how our algorithm works to show spatial similarity between trajectories of moving objects in spatial network.

  • PDF