• Title/Summary/Keyword: Cluster Retrieval

Search Result 88, Processing Time 0.021 seconds

A study on searching image by cluster indexing and sequential I/O (연속적 I/O와 클러스터 인덱싱 구조를 이용한 이미지 데이타 검색 연구)

  • Kim, Jin-Ok;Hwang, Dae-Joon
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.779-788
    • /
    • 2002
  • There are many technically difficult issues in searching multimedia data such as image, video and audio because they are massive and more complex than simple text-based data. As a method of searching multimedia data, a similarity retrieval has been studied to retrieve automatically basic features of multimedia data and to make a search among data with retrieved features because exact match is not adaptable to a matrix of features of multimedia. In this paper, data clustering and its indexing are proposed as a speedy similarity-retrieval method of multimedia data. This approach clusters similar images on adjacent disk cylinders and then builds Indexes to access the clusters. To minimize the search cost, the hashing is adapted to index cluster. In addition, to reduce I/O time, the proposed searching takes just one I/O to look up the location of the cluster containing similar object and one sequential file I/O to read in this cluster. The proposed schema solves the problem of multi-dimension by using clustering and its indexing and has higher search efficiency than the content-based image retrieval that uses only clustering or indexing structure.

VIA-Based PC Cluster System for Efficient Information Retrieval (효율적인 정보 검색을 위한 VIA 기반 PC 클러스터 시스템)

  • Kang, Na-Young;Chung, Sang-Hwa;Jang, Han-Kook
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.10
    • /
    • pp.539-549
    • /
    • 2002
  • PC cluster-based Information Retrieval (IR) systems improve their performances by parallel processing of query terms using cluster nodes. However TCP/IP based communication used to exchange data between cluster nodes prevents the performance from being improved further. The user-level communication mechanisms solve the problem by eliminating the time-consuming kernel access in exchanging data between cluster nodes. The Virtual Interface Architecture (VIA) is one of the representative user-level communication mechanisms which provide low latency and high bandwidth. In this paper, we propose a VIA-based parallel IR system on a PC cluster. The IR system is implemented using the following three communication methods: Sealable Coherent Interface (SCI) based VIA, MPI on SCI based VIA, MPI on Fast Ethernet based VIA. Through experiments, the performances of the three methods are analyzed in various aspects.

Content-Based Image Retrieval System using Feature Extraction of Image Objects (영상 객체의 특징 추출을 이용한 내용 기반 영상 검색 시스템)

  • Jung Seh-Hwan;Seo Kwang-Kyu
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.27 no.3
    • /
    • pp.59-65
    • /
    • 2004
  • This paper explores an image segmentation and representation method using Vector Quantization(VQ) on color and texture for content-based image retrieval system. The basic idea is a transformation from the raw pixel data to a small set of image regions which are coherent in color and texture space. These schemes are used for object-based image retrieval. Features for image retrieval are three color features from HSV color model and five texture features from Gray-level co-occurrence matrices. Once the feature extraction scheme is performed in the image, 8-dimensional feature vectors represent each pixel in the image. VQ algorithm is used to cluster each pixel data into groups. A representative feature table based on the dominant groups is obtained and used to retrieve similar images according to object within the image. The proposed method can retrieve similar images even in the case that the objects are translated, scaled, and rotated.

Contents-based Image Retrieval using Fuzzy ART Neural Network (퍼지 ART 신경망을 이용한 내용기반 영상검색)

  • 박상성;이만희;장동식;김재연
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.2
    • /
    • pp.12-17
    • /
    • 2003
  • This paper proposes content-based image retrieval system with fuzzy ART neural network algorithm. Retrieving large database of image data, the clustering is essential for fast retrieval. However, it is difficult to cluster huge image data pertinently, Because current retrieval methods using similarities have several problems like low accuracy of retrieving and long retrieval time, a solution is necessary to complement these problems. This paper presents a content-based image retrieval system with neural network in order to reinforce abovementioned problems. The retrieval system using fuzzy ART algorithm normalizes color and texture as feature values of input data between 0 and 1, and then it runs after clustering the input data. The implemental result with 300 image data shows retrieval accuracy of approximately 87%.

  • PDF

A Content-based Audio Retrieval System Supporting Efficient Expansion of Audio Database (음원 데이터베이스의 효율적 확장을 지원하는 내용 기반 음원 검색 시스템)

  • Park, Ji Hun;Kang, Hyunchul
    • Journal of Digital Contents Society
    • /
    • v.18 no.5
    • /
    • pp.811-820
    • /
    • 2017
  • For content-based audio retrieval which is one of main functions in audio service, the techniques for extracting fingerprints from the audio source, storing and indexing them in a database are widely used. However, if the fingerprints of new audio sources are continually inserted into the database, there is a problem that space efficiency as well as audio retrieval performance are gradually deteriorated. Therefore, there is a need for techniques to support efficient expansion of audio database without periodic reorganization of the database that would increase the system operation cost. In this paper, we design a content-based audio retrieval system that solves this problem by using MapReduce and NoSQL database in a cluster computing environment based on the Shazam's fingerprinting algorithm, and evaluate its performance through a detailed set of experiments using real world audio data.

Library Management and Services for Software Component Reuse on the Web (Web 소프트웨어 컴포넌트 재사용을 위한 라이브러리 관리와 서비스)

  • Lee, Sung-Koo
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.1_2
    • /
    • pp.10-19
    • /
    • 2002
  • In searching and locating a collection of components on the Web, users require a Web browser. Since the Web libraries tend to grow rapidly, there needs to be an effective way to organize and manage such large libraries. Traditional Web-based library(retrieval) systems provide various classification scheme and retrieval services to store and retrieve components. However, these systems do not include invaluable services, for example, enabling users to grasp the overall contents of the library at the beginning of retrieval. This paper discusses a Web-based library system, which provides the efficient management of object-oriented components and a set of services beyond simple component store and retrieval. These services consist of component comprehension through a reverse engineering process, automated summary extraction, and comprehension-based retrieval. Also, The performance of an automated cluster-based classification scheme adopted on the system is evaluated and compared with the cluster-based classification scheme adopted on the system is evaluated and compared with the performance of two other systems using traditional classification scheme.

Selection of Cluster Topic Words in Hierarchical Clustering using K-Means Algorithm

  • Lee Shin Won;Yi Sang Seon;An Dong Un;Chung Sung Jong
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.885-889
    • /
    • 2004
  • Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. Hierarchical clustering improves the performance of retrieval and makes that users can understand easily. For outperforming of clustering, we implemented hierarchical structure with variety and readability, by careful selection of cluster topic words and deciding the number of clusters dynamically. It is important to select topic words because hierarchical clustering structure is summarizes result of searching. We made choice of noun word as a cluster topic word. The quality of topic words is increased $33\%$ as follows. As the topic word of each cluster, the only noun word is extracted for the top-level cluster and the used topic words for the children clusters were not reused.

  • PDF

Clustering XML Documents Considering The Weight of Large Items in Clusters (클러스터의 주요항목 가중치 기반 XML 문서 클러스터링)

  • Hwang, Jeong-Hee
    • The KIPS Transactions:PartD
    • /
    • v.14D no.1 s.111
    • /
    • pp.1-8
    • /
    • 2007
  • As the web document of XML, an exchange language of data in the advanced Internet, is increasing, a target of information retrieval becomes the web documents. Therefore, there we researches on structure, integration and retrieval of XML documents. This paper proposes a clustering method of XML documents based on frequent structures, as a basic research to efficiently process query and retrieval. To do so, first, trees representing XML documents are decomposed and we extract frequent structures from them. Second, we perform clustering considering the weight of large items to adjust cluster creation and cluster cohesion, considering frequent structures as items of transactions. Third, we show the excellence of our method through some experiments which compare which the previous methods.

Intelligent Database Retrieval System using FCM

  • Jecong, Ihn;Park, Gyei-Kark;Hwang, Seung-Wook
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1995.10b
    • /
    • pp.40-44
    • /
    • 1995
  • In this paper, we propose a retrieval system using knowledges of database expressed linguistically, where the relation between data are constructed by FCM. Several algorithms have been proposed to solve the major problem in the conventional retrieval system that the system doesn't reply in case of no data equal to user's query, and to express knowledge of database linguistically. This paper proposes the improved method of adding new cluster and the method of retrieving database from user's query. The validity of this retrieval system is shown by applying its algorithm to an example : the mail order service in post office.

  • PDF

Term Clustering and Duplicate Distribution for Efficient Parallel Information Retrieval (효율적인 병렬정보검색을 위한 색인어 군집화 및 분산저장 기법)

  • 강재호;양재완;정성원;류광렬;권혁철;정상화
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.129-139
    • /
    • 2003
  • The PC cluster architecture is considered as a cost-effective alternative to the existing supercomputers for realizing a high-performance information retrieval (IR) system. To implement an efficient IR system on a PC cluster, it is essential to achieve maximum parallelism by having the data appropriately distributed to the local hard disks of the PCs in such a way that the disk I/O and the subsequent computation are distributed as evenly as possible to all the PCs. If the terms in the inverted index file can be classified to closely related clusters, the parallelism can be maximized by distributing them to the PCs in an interleaved manner. One of the goals of this research is the development of methods for automatically clustering the terms based on the likelihood of the terms' co-occurrence in the same query. Also, in this paper, we propose a method for duplicate distribution of inverted index records among the PCs to achieve fault-tolerance as well as dynamic load balancing. Experiments with a large corpus revealed the efficiency and effectiveness of our method.