• 제목/요약/키워드: Sequence similarity

검색결과 1,020건 처리시간 0.024초

A Novel Similarity Measure for Sequence Data

  • Pandi, Mohammad. H.;Kashefi, Omid;Minaei, Behrouz
    • Journal of Information Processing Systems
    • /
    • 제7권3호
    • /
    • pp.413-424
    • /
    • 2011
  • A variety of different metrics has been introduced to measure the similarity of two given sequences. These widely used metrics are ranging from spell correctors and categorizers to new sequence mining applications. Different metrics consider different aspects of sequences, but the essence of any sequence is extracted from the ordering of its elements. In this paper, we propose a novel sequence similarity measure that is based on all ordered pairs of one sequence and where a Hasse diagram is built in the other sequence. In contrast with existing approaches, the idea behind the proposed sequence similarity metric is to extract all ordering features to capture sequence properties. We designed a clustering problem to evaluate our sequence similarity metric. Experimental results showed the superiority of our proposed sequence similarity metric in maximizing the purity of clustering compared to metrics such as d2, Smith-Waterman, Levenshtein, and Needleman-Wunsch. The limitation of those methods originates from some neglected sequence features, which are considered in our proposed sequence similarity metric.

시퀀스 요소 기반의 유사도를 이용한 시퀀스 데이터 클러스터링 (Mining Clusters of Sequence Data using Sequence Element-based Similarity Measure)

  • 오승준;김재련
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2004년도 추계학술대회
    • /
    • pp.221-229
    • /
    • 2004
  • Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. However, only a few of the existing clustering algorithms consider sequentiality. This study presents a method for clustering such sequence datasets. The similarity between sequences must be decided before clustering the sequences. This study proposes a new similarity measure to compute the similarity between two sequences using a sequence element. Two clustering algorithms using the proposed similarity measure are proposed: a hierarchical clustering algorithm and a scalable clustering algorithm that uses sampling and a k-nearest neighbor method. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed clustering algorithms is better than that of clusters produced by traditional clustering algorithms.

  • PDF

Development of the Recommender System of Arabic Books Based on the Content Similarity

  • Alotaibi, Shaykhah Hajed;Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • 제22권8호
    • /
    • pp.175-186
    • /
    • 2022
  • This research article develops an Arabic books' recommendation system, which is based on the content similarity that assists users to search for the right book and predict the appropriate and suitable books pertaining to their literary style. In fact, the system directs its users toward books, which can meet their needs from a large dataset of Information. Further, this system makes its predictions based on a set of data that is gathered from different books and converts it to vectors by using the TF-IDF system. After that, the recommendation algorithms such as the cosine similarity, the sequence matcher similarity, and the semantic similarity aggregate data to produce an efficient and effective recommendation. This approach is advantageous in recommending previously unrated books to users with unique interests. It is found to be proven from the obtained results that the results of the cosine similarity of the full content of books, the results of the sequence matcher similarity of Arabic titles of the books, and the results of the semantic similarity of English titles of the books are the best obtained results, and extremely close to the average of the result related to the human assigned/annotated similarity. Flask web application is developed with a simple interface to show the recommended Arabic books by using cosine similarity, sequence matcher similarity, and semantic similarity algorithms with all experiments that are conducted.

Differentially Expressed Genes of Potentially Allelopathic Rice in Response against Barnyardgrass

  • Junaedi, Ahmad;Jung, Woo-Suk;Chung, Ill-Min;Kim, Kwang-Ho
    • Journal of Crop Science and Biotechnology
    • /
    • 제10권4호
    • /
    • pp.231-236
    • /
    • 2007
  • Differentially expressed genes(DEG) were identified in a rice variety, Sathi, an indica type showing high allelopathic potential against barnyardgrass(Echinochloa crus-galli(L.) Beauv. var. frumentaceae). Rice plants were grown with and without barnyardgrass and total RNA was extracted from rice leaves at 45 days after seeding. DEG full-screening was performed by $GeneFishing^{TM}$ method. The differentially expressed bands were re-amplified and sequenced, then analyzed by Basic Local Alignment Search Tool(BLAST) searching for homology sequence identification. Gel electrophoresis showed nine possible genes associated with allelopathic potential in Sathi, six genes(namely DEG-1, 4, 5, 7, 8, and 9) showed higher expression, and three genes(DEG-2, 3 and 6) showed lower expression as compared to the control. cDNA sequence analysis showed that DEG-7 and DEG-9 had the same sequence. From RT PCR results, DEG-6 and DEG-7 were considered as true DEG, whereas DEG-1, 2, 3, 4, 5, and 8 were considered as putative DEG. Results from blast-n and blast-x search suggested that DEG-1 is homologous to a gene for S-adenosylmethionine synthetase, DEG-2 is homologous to a chloroplast gene for ribulose 1,5-bisphosphate carboxylase large subunit, DEG-8 is homologous to oxysterol-binding protein with an 85.7% sequence similarity, DEG-5 is homologous to histone 2B protein with a 47.9% sequence similarity, DEG-6 is homologous to nicotineamine aminotransferase with a 33.1% sequence similarity, DEG-3 has 98.8% similarity with nucleotides sequence that has 33.1% similarity with oxygen evolving complex protein in photosystem II, DEG-7 is homologous to nucleotides sequence that may relate with putative serin/threonine protein kinase and putative transposable element, and DEG-4 has 98.8% similarity with nucleotides sequence for an unknown protein.

  • PDF

Gated Recurrent Unit Architecture for Context-Aware Recommendations with improved Similarity Measures

  • Kala, K.U.;Nandhini, M.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권2호
    • /
    • pp.538-561
    • /
    • 2020
  • Recommender Systems (RecSys) have a major role in e-commerce for recommending products, which they may like for every user and thus improve their business aspects. Although many types of RecSyss are there in the research field, the state of the art RecSys has focused on finding the user similarity based on sequence (e.g. purchase history, movie-watching history) analyzing and prediction techniques like Recurrent Neural Network in Deep learning. That is RecSys has considered as a sequence prediction problem. However, evaluation of similarities among the customers is challenging while considering temporal aspects, context and multi-component ratings of the item-records in the customer sequences. For addressing this issue, we are proposing a Deep Learning based model which learns customer similarity directly from the sequence to sequence similarity as well as item to item similarity by considering all features of the item, contexts, and rating components using Dynamic Temporal Warping(DTW) distance measure for dynamic temporal matching and 2D-GRU (Two Dimensional-Gated Recurrent Unit) architecture. This will overcome the limitation of non-linearity in the time dimension while measuring the similarity, and the find patterns more accurately and speedily from temporal and spatial contexts. Experiment on the real world movie data set LDOS-CoMoDa demonstrates the efficacy and promising utility of the proposed personalized RecSys architecture.

A New Approach to Find Orthologous Proteins Using Sequence and Protein-Protein Interaction Similarity

  • Kim, Min-Kyung;Seol, Young-Joo;Park, Hyun-Seok;Jang, Seung-Hwan;Shin, Hang-Cheol;Cho, Kwang-Hwi
    • Genomics & Informatics
    • /
    • 제7권3호
    • /
    • pp.141-147
    • /
    • 2009
  • Developed proteome-scale ortholog and paralog prediction methods are mainly based on sequence similarity. However, it is known that even the closest BLAST hit often does not mean the closest neighbor. For this reason, we added conserved interaction information to find orthologs. We propose a genome-scale, automated ortholog prediction method, named OrthoInterBlast. The method is based on both sequence and interaction similarity. When we applied this method to fly and yeast, 17% of the ortholog candidates were different compared with the results of Inparanoid. By adding protein-protein interaction information, proteins that have low sequence similarity still can be selected as orthologs, which can not be easily detected by sequence homology alone.

색도 및 색순에 따른 그라비아 인쇄 공정의 작업 순서 결정 규칙 (Dispatching Rule based on Chromaticity and Color Sequence Priorities for the Gravure Printing Operation)

  • 배재호
    • 산업경영시스템학회지
    • /
    • 제43권3호
    • /
    • pp.10-20
    • /
    • 2020
  • This paper presents a method to measure the similarity of assigned jobs in the gravure printing operation based on the chromaticity and color sequence, and order the jobs accordingly. The proposed dispatching rule can be used to fulfill diverse manufacturing site requirements because the parameters can be adjusted to prioritize chromaticity and color sequence. In general, dispatching rules either ignore the job-changing time or require that the time be clearly defined. However, in the gravure printing operation targeted in this study, it is difficult to apply the general dispatching rule because of the difficulties in quantifying the job-changing time. Therefore, we propose a method for generalizing assignment rules of the job planner, allocating relative similarity among assigned jobs, and determining the sequence of jobs accordingly. Chromaticity priority is determined by the arrangement of the color assignments in the printing operation; color sequence priority is determined by the addition, deletion, or change in a specific color sequence. Finally, the job similarity is determined by the dot product of the chromaticity and color sequence priorities. Implementation of the proposed dispatching rule at an actual manufacturing site showed the planner present the same job order as that obtained using the proposed rule. Therefore, this rule is expected to be useful in industrial sites where clear quantification of the job-changing time is not possible.

디지털 워터마킹을 위한 각종 시퀀스의 유사도 비교 (Comparison of Similarity to Digital Watermarking using Various Sequences)

  • 송상주;박두순;김선형
    • 한국컴퓨터정보학회논문지
    • /
    • 제6권4호
    • /
    • pp.21-29
    • /
    • 2001
  • 본 논문에서는 웨이브릿 변환 알고리듬을 이용하여 영상을 다해상도 변환하고, 중간주파대역의 중요한 계수에 각종 워터마크 시퀀스를 삽입 후 이의 강인성을 유사도 비교를 통하여 측정한다. 웨이브릿 변환은 주파수 영역 특성과 공간 영역의 특성을 함께 갖고 있는 장점을 가지고 있으며, 워터마크로는 임의 난수, 가우시안 시퀀스, 카오스 시퀀스 그리고 소벨 시퀀스를 이용한다. 다양한 공격에 대하여 실험한 결과 카오스 시퀀스가 다른 시퀀스들에 비해 높은 유사도를 보임으로써 향후 워터마크 시퀀스로 사용하기에 적합함을 보인다.

  • PDF

확장된 시퀀스 요소 기반의 유사도를 이용한 계층적 클러스터링 알고리즘 (A Hierarchical Clustering Algorithm Using Extended Sequence Element-based Similarity Measure)

  • 오승준
    • 한국컴퓨터정보학회논문지
    • /
    • 제11권5호
    • /
    • pp.321-327
    • /
    • 2006
  • 최근 들어 상업적이거나 과학적인 데이터들의 폭발적인 증가를 볼 수 있다. 이런 데이터들은 항목들 간의 순서적인 면을 가지고 있는 시퀀스 데이터들이다. 그러나 항목들 간의 순서적인 면을 고려한 클러스터링 연구는 많지 않다. 본 논문에서는 이들 시퀀스 데이터들 간의 유사도를 계산하는 방법과 클러스터링 방법을 연구한다. 특히 다양한 조건을 고려한 확장된 유사도 계산 방법을 제안한다. splice 데이터 셋을 이용하여 본 논문에서 제안하는 클러스터링 방법이 기존 방법 보다 우수하다는 것을 보여준다.

  • PDF

Modification of Existing Similarity Coefficients by Considering an Operation Sequence Ratio in Designing Cellular Manufacturing Systems

  • Yin, Yong;Yasuda, Kazuhiko
    • Industrial Engineering and Management Systems
    • /
    • 제1권1호
    • /
    • pp.19-28
    • /
    • 2002
  • An operation sequence of parts is one of the most important production factors in the design of cellular manufacturing systems. Many similarity coefficient method (SCM) based approaches have been proposed to solve cell formation problems in the literature. However, most of them do not consider the operation sequence factor. This study presents an operation sequence ratio (OSR) and modifies some existing similarity coefficients using the OSR to solver cell formation problems considering operation sequences. The computational results show that the OSR ratio is useful and robust in solving cell formation problems with operation sequences.