• Title/Summary/Keyword: Search Query

Search Result 690, Processing Time 0.022 seconds

Spherical Pyramid-Technique : An Efficient Indexing Technique for Similarity Search in High-Dimensional Data (구형 피라미드 기법 : 고차원 데이터의 유사성 검색을 위한 효율적인 색인 기법)

  • Lee, Dong-Ho;Jeong, Jin-Wan;Kim, Hyeong-Ju
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.11
    • /
    • pp.1270-1281
    • /
    • 1999
  • 피라미드 기법 1 은 d-차원의 공간을 2d개의 피라미드들로 분할하는 특별한 공간 분할 방식을 이용하여 고차원 데이타를 효율적으로 색인할 수 있는 새로운 색인 방법으로 제안되었다. 피라미드 기법은 고차원 사각형 형태의 영역 질의에는 효율적이나, 유사성 검색에 많이 사용되는 고차원 구형태의 영역 질의에는 비효율적인 면이 존재한다. 본 논문에서는 고차원 데이타를 많이 사용하는 유사성 검색에 효율적인 새로운 색인 기법으로 구형 피라미드 기법을 제안한다. 구형 피라미드 기법은 먼저 d-차원의 공간을 2d개의 구형 피라미드로 분할하고, 각 단일 구형 피라미드를 다시 구형태의 조각으로 분할하는 특별한 공간 분할 방법에 기반하고 있다. 이러한 공간 분할 방식은 피라미드 기법과 마찬가지로 d-차원 공간을 1-차원 공간으로 변환할 수 있다. 따라서, 변환된 1-차원 데이타를 다루기 위하여 B+-트리를 사용할 수 있다. 본 논문에서는 이렇게 분할된 공간에서 고차원 구형태의 영역 질의를 효율적으로 처리할 수 있는 알고리즘을 제안한다. 마지막으로, 인위적 데이타와 실제 데이타를 사용한 다양한 실험을 통하여 구형 피라미드 기법이 구형태의 영역 질의를 처리하는데 있어서 기존의 피라미드 기법보다 효율적임을 보인다.Abstract The Pyramid-Technique 1 was proposed as a new indexing method for high- dimensional data spaces using a special partitioning strategy that divides d-dimensional space into 2d pyramids. It is efficient for hypercube range query, but is not efficient for hypersphere range query which is frequently used in similarity search. In this paper, we propose the Spherical Pyramid-Technique, an efficient indexing method for similarity search in high-dimensional space. The Spherical Pyramid-Technique is based on a special partitioning strategy, which is to divide the d-dimensional data space first into 2d spherical pyramids, and then cut the single spherical pyramid into several spherical slices. This partition provides a transformation of d-dimensional space into 1-dimensional space as the Pyramid-Technique does. Thus, we are able to use a B+-tree to manage the transformed 1-dimensional data. We also propose the algorithm of processing hypersphere range query on the space partitioned by this partitioning strategy. Finally, we show that the Spherical Pyramid-Technique clearly outperforms the Pyramid-Technique in processing hypersphere range queries through various experiments using synthetic and real data.

KUGI: A Database and Search System for Korean Unigene and Pathway Information

  • Yang, Jin-Ok;Hahn, Yoon-Soo;Kim, Nam-Soon;Yu, Ung-Sik;Woo, Hyun-Goo;Chu, In-Sun;Kim, Yong-Sung;Yoo, Hyang-Sook;Kim, Sang-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.407-411
    • /
    • 2005
  • KUGI (Korean UniGene Information) database contains the annotation information of the cDNA sequences obtained from the disease samples prevalent in Korean. A total of about 157,000 5'-EST high throughput sequences collected from cDNA libraries of stomach, liver, and some cancer tissues or established cell lines from Korean patients were clustered to about 35,000 contigs. From each cluster a representative clone having the longest high quality sequence or the start codon was selected. We stored the sequences of the representative clones and the clustered contigs in the KUGI database together with their information analyzed by running Blast against RefSeq, human mRNA, and UniGene databases from NCBI. We provide a web-based search engine fur the KUGI database using two types of user interfaces: attribute-based search and similarity search of the sequences. For attribute-based search, we use DBMS technology while we use BLAST that supports various similarity search options. The search system allows not only multiple queries, but also various query types. The results are as follows: 1) information of clones and libraries, 2) accession keys, location on genome, gene ontology, and pathways to public databases, 3) links to external programs, and 4) sequence information of contig and 5'-end of clones. We believe that the KUGI database and search system may provide very useful information that can be used in the study for elucidating the causes of the disease that are prevalent in Korean.

  • PDF

A Video Information Management System for Supporting Caption- and Content-based Searches (주석 및 내용 기반 검색을 지원하는 동영상 정보 관리 시스템)

  • 전미경;김인홍;류시국;전용기;강현석
    • Journal of Korea Multimedia Society
    • /
    • v.2 no.3
    • /
    • pp.231-242
    • /
    • 1999
  • Generally, either caption-based search method or content-based search methods is used to retrieve video information. However, each search method has its limitations. Caption-based search is apt to lose consistency as for user's subjects, and content-based search is hard to extract general means. To enhance efficiency and correctness as for complementing each other, we propose the Integrated Video Data Model(IVDM) which integrates the two search methods, to device the model, we analyze video data and construct the structure of video information hierarchically. IVDM supports caption-based search as assigning meta-data by analyzing thematic-unit in the higher level, and also supports content-based search as extracting feature data by analyzing the content of video data in the lower level. We design Object-Oriented database schema of news video, based-on the IVDM. And we provide 4-type of queries and query processing algorithm to retrieve news video information.

  • PDF

PubMine: An Ontology-Based Text Mining System for Deducing Relationships among Biological Entities

  • Kim, Tae-Kyung;Oh, Jeong-Su;Ko, Gun-Hwan;Cho, Wan-Sup;Hou, Bo-Kyeng;Lee, Sang-Hyuk
    • Interdisciplinary Bio Central
    • /
    • v.3 no.2
    • /
    • pp.7.1-7.6
    • /
    • 2011
  • Background: Published manuscripts are the main source of biological knowledge. Since the manual examination is almost impossible due to the huge volume of literature data (approximately 19 million abstracts in PubMed), intelligent text mining systems are of great utility for knowledge discovery. However, most of current text mining tools have limited applicability because of i) providing abstract-based search rather than sentence-based search, ii) improper use or lack of ontology terms, iii) the design to be used for specific subjects, or iv) slow response time that hampers web services and real time applications. Results: We introduce an advanced text mining system called PubMine that supports intelligent knowledge discovery based on diverse bio-ontologies. PubMine improves query accuracy and flexibility with advanced search capabilities of fuzzy search, wildcard search, proximity search, range search, and the Boolean combinations. Furthermore, PubMine allows users to extract multi-dimensional relationships between genes, diseases, and chemical compounds by using OLAP (On-Line Analytical Processing) techniques. The HUGO gene symbols and the MeSH ontology for diseases, chemical compounds, and anatomy have been included in the current version of PubMine, which is freely available at http://pubmine.kobic.re.kr. Conclusions: PubMine is a unique bio-text mining system that provides flexible searches and analysis of biological entity relationships. We believe that PubMine would serve as a key bioinformatics utility due to its rapid response to enable web services for community and to the flexibility to accommodate general ontology.

An Implementation of XML document searching system based on Structure and Semantics Similarity (구조와 내용 유사도에 기반한 XML 웹 문서 검색시스템 구축)

  • Park Uchang;Seo Yeojin
    • Journal of Internet Computing and Services
    • /
    • v.6 no.2
    • /
    • pp.99-115
    • /
    • 2005
  • Extensible Markup Language (XML) is an Internet standard that is used to express and convert data, In order to find the necessary information out of XML documents, you need a search system for XML documents, In this research, we have developed a search system that can find documents that matches the structure and content of a given XML document, making the best use of XML structure, Search metrics take account of the similarity in tag names, tag values, and the structure of tags, After a search, the system displays the ranked results in the order of aggregate similarity, Three methods of query are provided: keyword search which is conventional; search with tag names and their values; and search with XML documents, These three methods enable users to choose the method that best suits their preference, resulting in the increase of the usefulness of the system.

  • PDF

Document Ranking of Web Document Retrieval Systems (웹 정보검색 시스템의 문서 순위 결정)

  • An, Dong-Un;Kang, In-Ho
    • Journal of Information Management
    • /
    • v.34 no.2
    • /
    • pp.55-66
    • /
    • 2003
  • The Web is rich with various sources of information. It contains the contents of documents, multimedia data, shopping materials and so on. Due to the massive and heterogeneous web document collections, users want to find various types of target pages. We can classify user queries as three categories according to users'intent, content search, the site search, and the service search. In this paper, we present that different strategies are needed to meet the need of a user. Also we show the properties of content information, link information and URL information according to the class of a user query. In the content search, content information showed the good result. However, we lost the performance by combining link information and URL information. In the site search, we could increase the performance by combining link information and URL information.

Prediction of Protein Secondary Structure Using the Weighted Combination of Homology Information of Protein Sequences (단백질 서열의 상동 관계를 가중 조합한 단백질 이차 구조 예측)

  • Chi, Sang-mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.9
    • /
    • pp.1816-1821
    • /
    • 2016
  • Protein secondary structure is important for the study of protein evolution, structure and function of proteins which play crucial roles in most of biological processes. This paper try to effectively extract protein secondary structure information from the large protein structure database in order to predict the protein secondary structure of a query protein sequence. To find more remote homologous sequences of a query sequence in the protein database, we used PSI-BLAST which can perform gapped iterative searches and use profiles consisting of homologous protein sequences of a query protein. The secondary structures of the homologous sequences are weighed combined to the secondary structure prediction according to their relative degree of similarity to the query sequence. When homologous sequences with a neural network predictor were used, the accuracies were higher than those of current state-of-art techniques, achieving a Q3 accuracy of 92.28% and a Q8 accuracy of 88.79%.

A Web-document Recommending System using the Korean Thesaurus (한국어 시소러스를 이용한 웹 문서 추천 에이전트)

  • Seo, Min-Rye;Lee, Song-Wook;Seo, Jung-Yun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.1
    • /
    • pp.103-109
    • /
    • 2009
  • We build the web document recommending agent system which offers a certain amount of web documents to each user by monitoring and learning the user's action of web browsing. We also propose a method of query expansion using the Korean thesaurus. The queries to search for new web documents generate a candidate set using the Korean thesaurus. We extract the words which are mostly correlated with the queries, among the words in the candidate set, by using TF-IDF and mutual information. Then, we expand the query. If we adopt the system of query expansion, we can recommend a lot of web documents which have potential interests to users. We thus conclude that the system of query expansion is more effective than a base system of recommending web-documents to users.

Construction of Theme Melody Index by Transforming Melody to Time-series Data for Content-based Music Information Retrieval (내용기반 음악정보 검색을 위한 선율의 시계열 데이터 변환을 이용한 주제선율색인 구성)

  • Ha, Jin-Seok;Ku, Kyong-I;Park, Jae-Hyun;Kim, Yoo-Sung
    • The KIPS Transactions:PartD
    • /
    • v.10D no.3
    • /
    • pp.547-558
    • /
    • 2003
  • From the viewpoint of that music melody has the similar features to time-series data, music melody is transformed to a time-series data with normalization and corrections and the similarity between melodies is defined as the Euclidean distance between the transformed time-series data. Then, based the similarity between melodies of a music object, melodies are clustered and the representative of each cluster is extracted as one of theme melodies for the music. To construct the theme melody index, a theme melody is represented as a point of the multidimensional metric space of M-tree. For retrieval of user's query melody, the query melody is also transformed into a time-series data by the same way of indexing phase. To retrieve the similar melodies to the query melody given by user from the theme melody index the range query search algorithm is used. By the implementation of the prototype system using the proposed theme melody index we show the effectiveness of the proposed methods.

A Clustering Method Based on Path Similarities of XML Data (XML 데이타의 경로 유사성에 기반한 클러스터링 기법)

  • Choi Il-Hwan;Moon Bong-Ki;Kim Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.3
    • /
    • pp.342-352
    • /
    • 2006
  • Current studies on storing XML data are focused on either mapping XML data to existing RDBMS efficiently or developing a native XML storage. Some native XML storages store each XML node with parsed object form. Clustering, the physical arrangement of each object, can be an important factor to increase the performance with this storing method. In this paper, we propose re-clustering techniques that can store an XML document efficiently. Proposed clustering technique uses path similarities among data nodes, which can reduce page I/Os when returning query results. And proposed technique can process a path query only using small number of clusters as possible instead of using all clusters. This enables efficient processing of path query because we can reduce search space by skipping unnecessary data. Finally, we apply existing clustering techniques to store XML data and compare the performance with proposed technique. Our results show that the performance of XML storage can be improved by using a proper clustering technique.