• 제목/요약/키워드: 질의클러스터링

Search Result 154, Processing Time 0.025 seconds

A Sequential Indexing Method for Multidimensional Range Queries (다차원 범위 질의를 위한 순차 색인 기법)

  • Cha Guang-Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.254-262
    • /
    • 2005
  • This paper presents a new sequential indexing method called segment-page indexing (SP-indexing) for multidimensional range queries. The design objectives of SP-indexing are twofold:(1) improving the range query performance of multidimensional indexing methods (MIMs) and (2) providing a compromise between optimal index clustering and the full index reorganization overhead. Although more than ten years of database research has resulted in a great variety of MIMs, most efforts have focused on data-level clustering and there has been less attempt to cluster indexes. As a result, most relevant index nodes are widely scattered on a disk and many random disk accesses are required during the search. SP-indexing avoids such scattering by storing the relevant nodes contiguously in a segment that contains a sequence of contiguous disk pages and improves performance by offering sequential access within a segment. Experimental results demonstrate that SP-indexing improves query performance up to several times compared with traditional MIMs using small disk pages with respect to total elapsed time and it reduces waste of disk bandwidth due to the use of simple large pages.

Efficient Broadcast Data Clustering for Multipoint Queries in Mobile Environments (이동 환경에서 다중점 질의를 위한 효율적인 방송 데이타 클러스터링)

  • Bang, Su-Ho;Chung, Yon-Dohn;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.715-722
    • /
    • 2001
  • Mobile computing has become a reality thank to the convergence of two technologies :powerful portable computers and the wireless networks. The restrictions of wireless network such as bandwidth and energy limitations make data broadcasting an attractive data communication method. This paper addresses the clustering of wireless broadcast data for multipoint queries. By effective clustering of broadcast data the mobile client can access the data on the air in short latency In the paper we define the data affinity and segment affinity measures. The data affinity is the degree that two data objects are accessed by queries, and the segment affinity is the degree that two sets of data (i.e segments) are accessed by queries Our method clusters data objects based on data and segment affinity measures we show that the performance of our method is scarcely infuenced by the growth of the number of queries.

  • PDF

Improvement of Retrieval Convenience through the Correlation Analysis between Social Value and Query Pattern (소셜지수와 질의패턴의 상관관계 분석을 통한 검색 편의성 향상)

  • Ahn, Moo-Hyun;Park, Gun-Woo;Lee, Sang-Hoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.391-394
    • /
    • 2009
  • 정보의 양이 폭발적으로 증가함에 따라 웹 사용자가 원하는 적합한 데이터를 찾아내는 것은 매우 어렵다. 이는 웹 사용자마다 서로 다른 검색의도와 질의의 모호성에 의한 것으로, 이와 같은 검색의 어려움을 해결하기 위해 많은 연구들이 수행되어 왔다. 질의 로그는 검색자의 검색 의도가 내포되어 있는 중요한 자료이다. 따라서 웹 사용자별 질의 로그 패턴을 분석하여 유사한 질의를 사용하는 웹 사용자들을 클러스터링 하여 검색에 적용한다면 좀 더 유용한 정보를 획득할 수 있다. 즉, 특정 카테고리와 연관된 질의를 자주 사용하는 웹 사용자들은 해당 분야에 관심이 많을 것이며, 또한 다른 카테고리에 관심이 높은 사람보다 상호간에 소셜지수가 높게 나타날 것이다. 특정 주제에 대해 검색을 할 경우 해당 분야에 관심이 높은 웹 사용자들의 질의 및 클릭한 URL 정보를 상속받을 수 있다면 찾고자 하는 정보에 보다 빨리 접근할 수 있다. 따라서 본 연구는 질의패턴 분석을 통해 카테고리별로 관심도가 높은 웹 사용자들을 클러스터링 한 후 해당 카테고리에 대한 정보 검색시 이들이 사용한 질의와 클릭한 URL 정보를 웹 사용자들에게 제공해줌으로써 정보검색의 편의성을 향상시키기 위한 방안을 제안한다.

Declustering Method for Moving Object Database (이동체 데이터베이스를 위한 디클러스터링 정책)

  • Seo YoungDuk;Hong EnSuk;Hong BongHee
    • The KIPS Transactions:PartD
    • /
    • v.11D no.7 s.96
    • /
    • pp.1399-1408
    • /
    • 2004
  • Because there are so many spatio-temporal data in Moving Object Databases, a single disk system can not gain the fast response time and tota throughput. So it is needed to take a parallel processing system for the high effectiveness query process. In these existing parallel process-ing system. it does not consider characters of moving object data. Moving object data have to be thought about continuous report to the Moving Object Databases. So it is necessary think about the new Declustering System for the high performance system. In this paper, we propose the new Dechustering Policies of Moving objet data for high effectiveness query processing. At first, consider a spatial part of MBB(Minimum Bounding Box) then take a SD(SemiAllocation Disk) value. Second time, consider a SD value and time value which is node made at together as SDT-Proximity. And for more accuracy Declustering effect, consider a Load Balancing. Evaluation shows performance improvement of aver-age %15\%$ compare with Round-Robin method about $5\%\;and\;10\%$ query area. And performance improvement of average $6\%$ compare with Spatial Proximity method.

Phonetic Question Set Generation Algorithm (음소 질의어 집합 생성 알고리즘)

  • 김성아;육동석;권오일
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.2
    • /
    • pp.173-179
    • /
    • 2004
  • Due to the insufficiency of training data in large vocabulary continuous speech recognition, similar context dependent phones can be clustered by decision trees to share the data. When the decision trees are built and used to predict unseen triphones, a phonetic question set is required. The phonetic question set, which contains categories of the phones with similar co-articulation effects, is usually generated by phonetic or linguistic experts. This knowledge-based approach for generating phonetic question set, however, may reduce the homogeneity of the clusters. Moreover, the experts must adjust the question sets whenever the language or the PLU (phone-like unit) of a recognition system is changed. Therefore, we propose a data-driven method to automatically generate phonetic question set. Since the proposed method generates the phone categories using speech data distribution, it is not dependent on the language or the PLU, and may enhance the homogeneity of the clusters. In large vocabulary speech recognition experiments, the proposed algorithm has been found to reduce the error rate by 14.3%.

Relevance Feedback Method of an Extended Boolean Model using Hierarchical Clustering Techniques (계층적 클러스터링 기법을 이용한 확장 불리언 모델의 적합성 피드백 방법)

  • 최종필;김민구
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.10
    • /
    • pp.1374-1385
    • /
    • 2004
  • The relevance feedback process uses information obtained from a user about an initially retrieved set of documents to improve subsequent search formulations and retrieval performance. In the extended Boolean model, the relevance feedback Implies not only that new query terms must be identified, but also that the terms must be connected with the Boolean AND/OR operators properly Salton et al. proposed a relevance feedback method for the extended Boolean model, called the DNF (disjunctive normal form) method. However, this method has a critical problem in generating a reformulated queries. In this study, we investigate the problem of the DNF method and propose a relevance feedback method using hierarchical clustering techniques to solve the problem. We show the results of experiments which are performed on two data sets: the DOE collection in TREC 1 and the Web TREC 10 collection.

Web Crawling and PageRank Calculation for Community-Limited Search (커뮤니티 제한 검색을 위한 웹 크롤링 및 PageRank 계산)

  • Kim Gye-Jeong;Kim Min-Soo;Kim Yi-Reun;Whang Kyu-Young
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.1-3
    • /
    • 2005
  • 최근 웹 검색 분야에서는 검색 질을 높이기 위한 기법들이 많이 연구되어 왔으며, 대표적인 연구로는 제한 검색, focused crawling, 웹 클러스터링 등이 있다. 그러나 제한 검색은 검색 범위를 의미적으로 관련된 사이트들로 제한할 수 없으며, focused crawling은 질의 시점에 클러스터링하기 때문에 질의 처리 시간이 오래 걸리고, 웹 클러스터링은 많은 웹 페이지들을 대상으로 클러스터링하기 위한 오버헤드가 크다. 본 논문에서는 검색 범위를 특정 커뮤니티로 제한하여 검색 하는 커뮤니티 제한 검색과 커뮤니티를 구하는 방법으로 cluster crawler를 제안하여 이러한 문제점을 해결한다. 또한, 커뮤니티를 이용하여 PageRank를 2단계로 계산하는 방법을 제안한다. 제안된 방법은 첫 번째 과정에서 커뮤니티 단위로 지역적으로 PageRank를 계산한 후, 두 번째 과정에서 이를 바탕으로 전역적으로 PageRank론 계산한다. 제안된 방법은 Wang에 의해 제안된 방법에 비해 PageRank 근사치의 오차를 $59\%$ 정도로 줄일 수 있다.

  • PDF

Performance Analysis on Declustering High-Dimensional Data by GRID Partitioning (그리드 분할에 의한 다차원 데이터 디클러스터링 성능 분석)

  • Kim, Hak-Cheol;Kim, Tae-Wan;Li, Ki-Joune
    • The KIPS Transactions:PartD
    • /
    • v.11D no.5
    • /
    • pp.1011-1020
    • /
    • 2004
  • A lot of work has been done to improve the I/O performance of such a system that store and manage a massive amount of data by distributing them across multiple disks and access them in parallel. Most of the previous work has focused on an efficient mapping from a grid ceil, which is determined bY the interval number of each dimension, to a disk number on the assumption that each dimension is split into disjoint intervals such that entire data space is GRID-like partitioned. However, they have ignored the effects of a GRID partitioning scheme on declustering performance. In this paper, we enhance the performance of mapping function based declustering algorithms by applying a good GRID par-titioning method. For this, we propose an estimation model to count the number of grid cells intersected by a range query and apply a GRID partitioning scheme which minimizes query result size among the possible schemes. While it is common to do binary partition for high-dimensional data, we choose less number of dimensions than needed for binary partition and split several times along that dimensions so that we can reduce the number of grid cells touched by a query. Several experimental results show that the proposed estimation model gives accuracy within 0.5% error ratio regardless of query size and dimension. We can also improve the performance of declustering algorithm based on mapping function, called Kronecker Sequence, which has been known to be the best among the mapping functions for high-dimensional data, up to 23 times by applying an efficient GRID partitioning scheme.

P2P query processing method between ontologies in internet environment (인터넷상의 온톨로지간의 P2P 질의처리 방안)

  • Kim, Byung-Gon;Oh, Sung-Kyun
    • Journal of Digital Contents Society
    • /
    • v.10 no.2
    • /
    • pp.239-247
    • /
    • 2009
  • In simple topology in network system, query should be delivered to all linked peers for query processing. This causes waste of transmission band width and throughput of each peer. To overcome this, as well as query processing strategy, efficient routing technique to deliver query to proper peer is needed. For efficient routing, clustering of peers in P2P networks is important. Clustering of P2P network bases on that combines peers that have similar characteristics in same cluster reduces quantity of message in network than assign peer for cluster randomly. In this paper, we propose clustering techniques for ontology based P2P query processing. Similarity measure point, cluster index structure, and query processing steps in ontology based P2P cluster environment are proposed.

  • PDF

Frequent Itemset Creation using Bit Transaction Clustering in Data Mining (데이터 마이닝에서 비트 트랜잭션 클러스터링을 이용한 빈발항목 생성)

  • Kim Eui-Chan;Hwang Byung-Yeon
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.293-298
    • /
    • 2006
  • Many data are stored in database. For getting any information from many data, we use the query sentences. These information is basic and simple. Data mining method is various. In this paper, we manage clustering and association rules. We present a method for finding the better association rules, and we solve a problem of the existing association rules. We propose and apply a new clustering method to fit for association rules. It is not clustering of the existing distance basis or category basis. If we find association rules of each clusters, we can get not only existing rules found in all transaction but also rules that will be characteristics of clusters. Through this study, we can expect that we will reduce the number of many transaction access in large databases and find association of small group.