• Title/Summary/Keyword: Similarity Search

Search Result 526, Processing Time 0.024 seconds

Similarity Search in Time Series Databases based on the Normalized Distance (정규 거리에 기반한 시계열 데이터베이스의 유사 검색 기법)

  • 이상준;이석호
    • Journal of KIISE:Databases
    • /
    • v.31 no.1
    • /
    • pp.23-29
    • /
    • 2004
  • In this paper, we propose a search method for time sequences which supports the normalized distance as a similarity measure. In many applications where the shape of the time sequence is a major consideration, the normalized distance is a more suitable similarity measure than the simple Lp distance. To support normalized distance queries, most of the previous work has the preprocessing step for vertical shifting which normalizes each sequence by its mean. The proposed method is motivated by the property of sequence for feature extraction. That is, the variation between two adjacent elements of a time sequence is invariant under vertical shifting. The extracted feature is indexed by the spatial access method such as R-tree. The proposed method can match time series of similar shape without vertical shifting and guarantees no false dismissals. The experiments are performed on real data(stock price movement) to verify the performance of the proposed method.

A Study on Web Services for Sequence Similarity search in the Workflow Environment (워크플로우 환경에서의 대규모 서열 유사성 검색 웹 서비스에 관한 연구)

  • Jun, Jin-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.6
    • /
    • pp.41-49
    • /
    • 2008
  • In recent years, a life phenomenon using a workflow management tool in bioinformatics has been actively researched. Workflow management tool is the base which enables researchers to collaborate through the re-use and sharing of service, and a variety of workflow management tools including MyGrid project's Taverna, Kepler and BioWMS have been developed and used as the open source. This workflow management tool can model and automate different services in spatially-distant area in one working space based on the web service technology. Many tools and databases used in the bioinformatics are provided in the web services form and are used in the workflow management tool. In such the situation, the web services development and stable service offering for a sequence similarity search which is basically used in the bioinformatics can be essential in the bioinformatics field. In this paper, the similarity retrieval speed of biology sequence data was improved based on a Linux cluster, and the sequence similarity retrieval could be done for a short time by linking with the workflow management tool through developing it in the web services.

  • PDF

Semantic Conceptual Relational Similarity Based Web Document Clustering for Efficient Information Retrieval Using Semantic Ontology

  • Selvalakshmi, B;Subramaniam, M;Sathiyasekar, K
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.9
    • /
    • pp.3102-3119
    • /
    • 2021
  • In the modern rapid growing web era, the scope of web publication is about accessing the web resources. Due to the increased size of web, the search engines face many challenges, in indexing the web pages as well as producing result to the user query. Methodologies discussed in literatures towards clustering web documents suffer in producing higher clustering accuracy. Problem is mitigated using, the proposed scheme, Semantic Conceptual Relational Similarity (SCRS) based clustering algorithm which, considers the relationship of any document in two ways, to measure the similarity. One is with the number of semantic relations of any document class covered by the input document and the second is the number of conceptual relation the input document covers towards any document class. With a given data set Ds, the method estimates the SCRS measure for each document Di towards available class of documents. As a result, a class with maximum SCRS is identified and the document is indexed on the selected class. The SCRS measure is measured according to the semantic relevancy of input document towards each document of any class. Similarly, the input query has been measured for Query Relational Semantic Score (QRSS) towards each class of documents. Based on the value of QRSS measure, the document class is identified, retrieved and ranked based on the QRSS measure to produce final population. In both the way, the semantic measures are estimated based on the concepts available in semantic ontology. The proposed method had risen efficient result in indexing as well as search efficiency also has been improved.

KUGI: A Database and Search System for Korean Unigene and Pathway Information

  • Yang, Jin-Ok;Hahn, Yoon-Soo;Kim, Nam-Soon;Yu, Ung-Sik;Woo, Hyun-Goo;Chu, In-Sun;Kim, Yong-Sung;Yoo, Hyang-Sook;Kim, Sang-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.407-411
    • /
    • 2005
  • KUGI (Korean UniGene Information) database contains the annotation information of the cDNA sequences obtained from the disease samples prevalent in Korean. A total of about 157,000 5'-EST high throughput sequences collected from cDNA libraries of stomach, liver, and some cancer tissues or established cell lines from Korean patients were clustered to about 35,000 contigs. From each cluster a representative clone having the longest high quality sequence or the start codon was selected. We stored the sequences of the representative clones and the clustered contigs in the KUGI database together with their information analyzed by running Blast against RefSeq, human mRNA, and UniGene databases from NCBI. We provide a web-based search engine fur the KUGI database using two types of user interfaces: attribute-based search and similarity search of the sequences. For attribute-based search, we use DBMS technology while we use BLAST that supports various similarity search options. The search system allows not only multiple queries, but also various query types. The results are as follows: 1) information of clones and libraries, 2) accession keys, location on genome, gene ontology, and pathways to public databases, 3) links to external programs, and 4) sequence information of contig and 5'-end of clones. We believe that the KUGI database and search system may provide very useful information that can be used in the study for elucidating the causes of the disease that are prevalent in Korean.

  • PDF

Differentially Expressed Genes of Potentially Allelopathic Rice in Response against Barnyardgrass

  • Junaedi, Ahmad;Jung, Woo-Suk;Chung, Ill-Min;Kim, Kwang-Ho
    • Journal of Crop Science and Biotechnology
    • /
    • v.10 no.4
    • /
    • pp.231-236
    • /
    • 2007
  • Differentially expressed genes(DEG) were identified in a rice variety, Sathi, an indica type showing high allelopathic potential against barnyardgrass(Echinochloa crus-galli(L.) Beauv. var. frumentaceae). Rice plants were grown with and without barnyardgrass and total RNA was extracted from rice leaves at 45 days after seeding. DEG full-screening was performed by $GeneFishing^{TM}$ method. The differentially expressed bands were re-amplified and sequenced, then analyzed by Basic Local Alignment Search Tool(BLAST) searching for homology sequence identification. Gel electrophoresis showed nine possible genes associated with allelopathic potential in Sathi, six genes(namely DEG-1, 4, 5, 7, 8, and 9) showed higher expression, and three genes(DEG-2, 3 and 6) showed lower expression as compared to the control. cDNA sequence analysis showed that DEG-7 and DEG-9 had the same sequence. From RT PCR results, DEG-6 and DEG-7 were considered as true DEG, whereas DEG-1, 2, 3, 4, 5, and 8 were considered as putative DEG. Results from blast-n and blast-x search suggested that DEG-1 is homologous to a gene for S-adenosylmethionine synthetase, DEG-2 is homologous to a chloroplast gene for ribulose 1,5-bisphosphate carboxylase large subunit, DEG-8 is homologous to oxysterol-binding protein with an 85.7% sequence similarity, DEG-5 is homologous to histone 2B protein with a 47.9% sequence similarity, DEG-6 is homologous to nicotineamine aminotransferase with a 33.1% sequence similarity, DEG-3 has 98.8% similarity with nucleotides sequence that has 33.1% similarity with oxygen evolving complex protein in photosystem II, DEG-7 is homologous to nucleotides sequence that may relate with putative serin/threonine protein kinase and putative transposable element, and DEG-4 has 98.8% similarity with nucleotides sequence for an unknown protein.

  • PDF

An Effect of Similarity Judgement on Human Performance in Inspection Tasks (유사성(類似性) 판단(判斷)과 검사수행도(檢査遂行度)에 관한 연구)

  • Son, Il-Mun;Lee, Dong-Chun;Lee, Sang-Do
    • Journal of Korean Society for Quality Management
    • /
    • v.20 no.2
    • /
    • pp.109-117
    • /
    • 1992
  • An inspection task largely can be seen as a job divided up into a series of visual search and classification subtasks. In these subtasks, an Inspector must performs to compare the standard references proposed in visual environments and recalled in his memory with the visual stimuli to be inspected. It means that the judgement of similarity should be demanded on inspection tasks. Therefore, the inspector's ability for the judgement of similarity and the difference similarity between inspection materials are important factors to effect on performances in inspection tasks. In this paper, to analysis the effect of these factors on inspection time, an inspection task is designed and suggested by means of computer simulator. Especially, the skin conductance responses(SCR) of subjects are measured to evaluate the complexity of tasks due to the difference of similarity between materials. In the results of experiment, the more similar or different the difference of similarity between materials is, the shorter the inspection time is because of the reduction of task complexity. And, When the inspector's cognition for similarity between materials is consistanct, the inpsection time is improved. Concludingly, the consistency of reponses for similarity judgement becomes a measurement to present the performance levels. And the information of inspection time that due to the difference of similarity between materials must be considered in planning and scheduling inspection tasks.

  • PDF

A study on searching image by cluster indexing and sequential I/O (연속적 I/O와 클러스터 인덱싱 구조를 이용한 이미지 데이타 검색 연구)

  • Kim, Jin-Ok;Hwang, Dae-Joon
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.779-788
    • /
    • 2002
  • There are many technically difficult issues in searching multimedia data such as image, video and audio because they are massive and more complex than simple text-based data. As a method of searching multimedia data, a similarity retrieval has been studied to retrieve automatically basic features of multimedia data and to make a search among data with retrieved features because exact match is not adaptable to a matrix of features of multimedia. In this paper, data clustering and its indexing are proposed as a speedy similarity-retrieval method of multimedia data. This approach clusters similar images on adjacent disk cylinders and then builds Indexes to access the clusters. To minimize the search cost, the hashing is adapted to index cluster. In addition, to reduce I/O time, the proposed searching takes just one I/O to look up the location of the cluster containing similar object and one sequential file I/O to read in this cluster. The proposed schema solves the problem of multi-dimension by using clustering and its indexing and has higher search efficiency than the content-based image retrieval that uses only clustering or indexing structure.

CS-Tree : Cell-based Signature Index Structure for Similarity Search in High-Dimensional Data (CS-트리 : 고차원 데이터의 유사성 검색을 위한 셀-기반 시그니쳐 색인 구조)

  • Song, Gwang-Taek;Jang, Jae-U
    • The KIPS Transactions:PartD
    • /
    • v.8D no.4
    • /
    • pp.305-312
    • /
    • 2001
  • Recently, high-dimensional index structures have been required for similarity search in such database applications s multimedia database and data warehousing. In this paper, we propose a new cell-based signature tree, called CS-tree, which supports efficient storage and retrieval on high-dimensional feature vectors. The proposed CS-tree partitions a high-dimensional feature space into a group of cells and represents a feature vector as its corresponding cell signature. By using cell signatures rather than real feature vectors, it is possible to reduce the height of our CS-tree, leading to efficient retrieval performance. In addition, we present a similarity search algorithm for efficiently pruning the search space based on cells. Finally, we compare the performance of our CS-tree with that of the X-tree being considered as an efficient high-dimensional index structure, in terms of insertion time, retrieval time for a k-nearest neighbor query, and storage overhead. It is shown from experimental results that our CS-tree is better on retrieval performance than the X-tree.

  • PDF

A Method of Highspeed Similarity Retrieval based on Self-Organizing Maps (자기 조직화 맵 기반 유사화상 검색의 고속화 수법)

  • Oh, Kun-Seok;Yang, Sung-Ki;Bae, Sang-Hyun;Kim, Pan-Koo
    • The KIPS Transactions:PartB
    • /
    • v.8B no.5
    • /
    • pp.515-522
    • /
    • 2001
  • Feature-based similarity retrieval become an important research issue in image database systems. The features of image data are useful to discrimination of images. In this paper, we propose the highspeed k-Nearest Neighbor search algorithm based on Self-Organizing Maps. Self-Organizing Map(SOM) provides a mapping from high dimensional feature vectors onto a two-dimensional space. A topological feature map preserves the mutual relations (similarity) in feature spaces of input data, and clusters mutually similar feature vectors in a neighboring nodes. Each node of the topological feature map holds a node vector and similar images that is closest to each node vector. We implemented about k-NN search for similar image classification as to (1) access to topological feature map, and (2) apply to pruning strategy of high speed search. We experiment on the performance of our algorithm using color feature vectors extracted from images. Promising results have been obtained in experiments.

  • PDF

Detecting Intentionally Biased Web Pages In terms of Hypertext Information (하이퍼텍스트 정보 관점에서 의도적으로 왜곡된 웹 페이지의 검출에 관한 연구)

  • Lee Woo Key
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.1 s.33
    • /
    • pp.59-66
    • /
    • 2005
  • The organization of the web is progressively more being used to improve search and analysis of information on the web as a large collection of heterogeneous documents. Most people begin at a Web search engine to find information. but the user's pertinent search results are often greatly diluted by irrelevant data or sometimes appear on target but still mislead the user in an unwanted direction. One of the intentional, sometimes vicious manipulations of Web databases is a intentionally biased web page like Google bombing that is based on the PageRank algorithm. one of many Web structuring techniques. In this thesis, we regard the World Wide Web as a directed labeled graph that Web pages represent nodes and link edges. In the Present work, we define the label of an edge as having a link context and a similarity measure between link context and target page. With this similarity, we can modify the transition matrix of the PageRank algorithm. By suggesting a motivating example, it is explained how our proposed algorithm can filter the Web intentionally biased web Pages effective about $60\%% rather than the conventional PageRank.

  • PDF