• Title/Summary/Keyword: XML Filtering

Search Result 50, Processing Time 0.034 seconds

FiST: XML Document Filtering by Sequencing Twig Patterns (가지형 패턴의 시퀀스화를 이용한 XML 문서 필터링)

  • Kwon Joon-Ho;Rao Praveen;Moon Bong-Ki;Lee Suk-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.4
    • /
    • pp.423-436
    • /
    • 2006
  • In recent years, publish-subscribe (pub-sub) systems based on XML document filtering have received much attention. In a typical pub-sub system, subscribing users specify their interest in profiles expressed in the XPath language, and each new content is matched against the user profiles so that the content is delivered only to the interested subscribers. As the number of subscribed users and their profiles can grow very large, the scalability of the system is critical to the success of pub-sub services. In this paper, we propose a novel scalable filtering system called FiST(Filtering by Sequencing Twigs) that transforms twig patterns expressed in XPath and XML documents into sequences using Prufer's method. As a consequence, instead of matching linear paths of twig patterns individually and merging the matches during post-processing, FiST performs holistic matching of twig patterns with incoming documents. FiST organizes the sequences into a dynamic hash based index for efficient filtering. We demonstrate that our holistic matching approach yields lower filtering cost and good scalability under various situations.

XML Document Filtering based on Segments (세그먼트 기반의 XML 문서 필터링)

  • Kwon, Joon-Ho;Rao, Praveen;Moon, Bong-Ki;Lee, Suk-Ho
    • Journal of KIISE:Databases
    • /
    • v.35 no.4
    • /
    • pp.368-378
    • /
    • 2008
  • In recent years, publish-subscribe (pub-sub) systems based on XML document filtering have received much attention. In a typical pub-sub system, subscribed users specify their interest in profiles expressed in the XPath language, and each new content is matched against the user profiles so that the content is delivered to only the interested subscribers. As the number of subscribed users and their profiles can grow very large, the scalability of the system is critical to the success of pub-sub services. In this paper, we propose a fast and scalable XML filtering system called SFiST which is an extension of the FiST system. Sharable segments are extracted from twig patterns and stored into the hash-based Segment Table in SFiST system. Segments are used to represent user profiles as Terse Sequences and stored in the Compact Segment Index during filtering. Our experimental study shows that SFiST system has better performance than FiST system in terms of filtering time and memory usage.

Two-Dimensional Grouping Index for Efficient Processing of XML Filtering Queries (XML 필터링 질의의 효율적 처리를 위한 이차원 그룹핑 색인기법)

  • Yeo, Dae-Hwi;Lee, Jong-Hak
    • Journal of Information Technology and Architecture
    • /
    • v.10 no.1
    • /
    • pp.123-135
    • /
    • 2013
  • This paper presents a two-dimensional grouping index(2DG-index) for efficient processing of XML filtering queries. Recently, many index techniques have been suggested for the efficient processing of structural relationships among the elements in the XML database such as an ancestor- descendant and a parent-child relationship. However, these index techniques focus on simple path queries, and don't consider the path queries that include a condition value for filtering. The 2DG-index is an index structure that deals with the problem of clustering index entries in the twodimensional domain space that consists of a XML path identifier domain and a filtering data value domain. For performance evaluation, we have compared our proposed 2DG-index with the conventional one dimensional index structure such as the data grouping index (DG-index) and the path grouping index (PG-index). As the result of the performance evaluations, we have verified that our proposed 2DG-index can efficiently support the query processing in XML databases according to the query types.

PrimeFilter: An Efficient XML Data Filtering based on Prime Number Indexing (PrimeFilter: 소수 인덱싱 기법에 기반한 효율적 XML 데이타 필터링)

  • Kim, Jae-Hoon;Kim, Sang-Wook;Park, Seog
    • Journal of KIISE:Databases
    • /
    • v.35 no.5
    • /
    • pp.421-431
    • /
    • 2008
  • Recently XML is becoming a de facto standard for online data exchange between heterogeneous systems and also the research of streaming XML data filtering comes into the spotlight. Since streaming XML data filtering technique needs rapid matching of queries with XML data, it is required that the query processing should be efficiently performed. Until now, most of researches focused only on partial sharing of path expressions or efficient predicate processing and they were work for time and space efficiency. However, if containment relationship between queries is previously calculated and the lowest level query is matched with XML data, we can easily get a result that high level queries can match with the XML data without any other processing. That is, using this containment technique can be another optimal solution for streaming XML data filtering. In this paper, we suggest an efficient XML data filtering based on prime number indexing and containment relationship between queries. Through some experimental results, we present that our suggested method has a better performance than the existing method. All experiments have shown that our method has a more than two times better performance even though each experiment has its own distinct test purpose.

SemFilter: A Simple and Efficient Semantic XML Message Filtering (SemFilter: 단순하며 효율적인 시맨틱 XML 메시지 필터링)

  • Kim, Jae-Hoon;Park, Seog
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.7
    • /
    • pp.680-693
    • /
    • 2008
  • Recent studies on XML filtering assume that all data sources follow a single global schema defined in a filtering system. However, beyond this simple assumption, a filtering system can provide a service that allows data publishers to have their own schema; hence, the data sources will become heterogeneous. The number of data sources is expected to be large in a filtering system and the data sources are frequently published, updated, and disappeared, that is, dynamic. In this paper, we introduce implementing a simple and efficient XPath query translation method for such a dynamic environment. The method is especially targeted for a query which is composed based only on users' knowledge and experience without a graphical guidance of the global schema. When a user queries a large number of heterogeneous data, there is a high possibility that the query is not consistent with the same local schema assumed by the user. Our query translation method also supports a function for this problem. Some experimental results for query translation performance have shown that our method has reasonable performance, and is more practical than the existing method.

WFilter (Weighted Filter) for XML filtering (XML 필터링을 위한 WFilter(Weighted Filter))

  • 최정필;최오훈;백두권
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10c
    • /
    • pp.253-255
    • /
    • 2003
  • XML 문서를 비롯하여 인터넷을 통해 교환되는 문서의 비약적인 증가로 인하여, 불필요한 문서에 대한 필터링 및 문서 내의 데이터를 필터링하여 정보를 선택적으로 사용하고자 하는 사용자의 요구가 증대되었다. 기존 XML 필터링 방식은 질의 구조에 의존적이기 때문에, 질의 증가에 따른 필터링 인덱스 구성 및 유지의 문제점을 야기할 수 있다. 본 논문에서는 정보 추출 분야에서 널리 사용되는 단어 벡터의 개념을 사용하여 선택적으로 질의에 가중치를 주어 데이터를 효율적으로 추출할 수 있는 XML WFilter (Weighted Filtering) 기법을 제안한다.

  • PDF

A Keyword-based Filtering Technique of Document-centric XML using NFA Representation (NFA 표현을 사용한 문서-중심적 XML의 키워드 기반 필터링 기법)

  • Lee, Kyoung-Han;Park, Seog
    • Journal of KIISE:Databases
    • /
    • v.33 no.5
    • /
    • pp.437-452
    • /
    • 2006
  • In this paper, we propose an extended XPath specification which includes a special matching character '%' used in the LIKE operation of SQL in order to solve the difficulty of writing some queries to filter element contents well, using the previous XPath specification. We also present a novel technique for filtering a collection of document-centric XMLs, called Pfilter, which is able to exploit the extended XPath specification. Owing to sharing the common prefix characters of the operands in value-based predicates, the Pfilter improves the performance in processing those. We show several performance studies, comparing Pfilter with Yfilter in respect to efficiency and scalability as using multi-query processing time (MQPT), and reporting the results with respect to inserting, deleting, and processing of value-based predicates. In conclusion, our approach provides a core algorithm for evaluating the contains() function of XPath queries in previous XML filtering researches, and a foundation for building XML-based distributed information systems.

Document Filtering Algorithm for Efficient Preprocessing of XML Information Retrieval (XML 정보검색의 효율적 전처리를 위한 문서여과 알고리즘)

  • Kong Yong-Hae;Kim Myung-Sook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.6 no.1
    • /
    • pp.1-11
    • /
    • 2005
  • The paper proposes a preprocessing method for efficient processing of XML queries in information retrieval with a large amount of XML documents. The conventional preprocessing methods filter out XML documents by parsing XML document for keyword of query or by comparing query signatures with signatures of XML document to be generated. But these methods are dependent on a query and are very in efficient for a large amount of XML documents. For this, we generate a universal DTD based on ontology of a domain. The universal DTD is applicable to the XML documents when they contain information of a same domain even when they have different structures and attributes. Then, using the universal DTD, we filter out the XML documents that are not bounded in the domain. We evaluate the performance of this method through experiments.

  • PDF

A Framework of XML Materialized Views Using Incremental Refresh (점진적 갱신에 기반을 둔 XML 형성뷰 관리 프레임워크)

  • Im, Jae-Guk;Gang, Hyeon-Cheol;Seo, Sang-Gu
    • The KIPS Transactions:PartD
    • /
    • v.8D no.4
    • /
    • pp.327-338
    • /
    • 2001
  • The view can provide the user an appropriate portion of the database through data integration and filtering. Views can be materialized for query performance improvement, and in that cse, their consistency needs to be maintained against the updates of the underlying data. They can be either recomputed or incrementally refreshed by reflecting the relevant updates. Since XML could represent the structural information of the documents, for the XML materialized views, new techniques that differ from the previous ones for incrementally refreshing the relational views are required. In this paper, we propose a framework of XML materialized view management where the XML view derived from the underlying XML documents are materialized and incrementally refreshed against the updates of the underlying documents.

  • PDF

Design of Filtering Agent Interface using XML E-Mail System (XML 이메일 시스템의 필터링 에이전트 인터페이스 설계)

  • Jeong, Ok-Ran;Cho, Dong-Sub
    • Proceedings of the KIEE Conference
    • /
    • 2002.11c
    • /
    • pp.476-480
    • /
    • 2002
  • 인터넷의 발달로 인하여 웹을 통한 문서 송수신이 많아지면서 종래의 인쇄 매체 상에 기술된 문서들은 점차 전자문서화 되기 시작했다. 이러한 문서들을 서로 다른 시스템 사이에서 상호 교환하기 위해서는 사용자가 원하는 논리적 구조를 태그로 구현할 수 있는 정형화된 문서 형태가 필요하다. 또한 이메일을 통한 개인적 정보를 얻고 또한 메일의 양이 갈수록 늘어나는 상황에서 카테고리별 자동 분류를 할 수 있는 에이전트가 현안이 되고 있다. 본 논문에서는 XML 형식의 메일에 XSL 문서를 임베디드하여 보내는 XML 이메일 시스템을 설계하여, 본 시스템을 이용하여 본문 내용을 카테고리별 자동 분류해주는 필터링 에이전트 인터페이스(Filtering Agent Interface)를 제안하고자 한다. XML 메일 서버를 통하여 수신된 메일은 XML과 XSL 형식에 따라 XML 메일 데이터베이스에 따로 저장되기 때문에 분석이 매우 용이하다는 장점을 이용하였다.

  • PDF