A Dynamic Signature Declustering Method using Signature Difference

요약 차이를 이용한 요약화일 동적 분산 기법

  • 강형일 (충북대학교 정보통신공학과) ;
  • 강승헌 (충북대학교 정보통신공학과) ;
  • 유재수 (충북대학교 전기전자공학부) ;
  • 임병모 (한진정보통신 연구원)
  • Published : 2000.03.31

Abstract

For processing signature file in parallel, an effective signature file declustering method is needed. The Linear Code Decomposition Method(LCDM) used for the Hamming Filter may give a good performance in some cases, but due to its static property, it fails to evenly decluster signature file when signature are skewed. In addition, it has other problems such as limited scalability and non-determinism. In this paper we propose a new signature file declustering method, called Inner-product method, which overcomes those problems in the LCDM. The Inner-product method declusters signature file dynamically based on the signature difference which is computed by using signature inner product. we show through the simulation experiment that the Inner-product outperforms the LCDM under various data workloads.

요약화일을 병렬로 처리하기 위해서는 효과적인 요약화일 분산 기법이 요구된다. Hamming Filter에서 분산 기법으로 이용되는 선형코드분산기법(LCDM)은 대부분의 경우 우수한 분산 성능을 갖지만 정적 특성 때문에 요약이 편중될 경우 요약화일을 균등하게 분산하기 어렵다. 또한 제한된 확장성과 비결정성(non-determinism)과 같은 문제점을 가지고 있다. 본 논문에서는 LCDM의 문제점을 해결하는 새로운 요약화일 분산 기법인 내적 기법(inner- product method)을 제안한다. 내적 기법은 요약의 내적에 의해 계산되는 요약 차이(signature difference)를 기반으로 하여 요약화일을 동적으로 분산한다. 다양한 데이타 작업부하에서 모의 실험을 통해 내적 기법이 LCDM보다 우수함을 보인다.

Keywords

References

  1. C. Faloutsos. Signature-based text retrieval methods.A Survey. IEEE Computer Society Technical Committee on Data Engineering, 13(1): 25--32, 1990
  2. C. Faloutsos and S. Christodoulakis. Design of a signature file method that accounts for non-uniform occurrence and query frequencies. In Proc. of the 11th VLDB Conf., pages 165-170, Stockholm, Sweden, August 1985
  3. C. S. Roberts. Partial match retrieval via the method of the superimposed codes. In Proc. of IEEE 67, pages 1624-1642, Dec. 1979
  4. Z. Lin and C. Faloutsos. Frame-sliced signature files. IEEE Transaction on Knowledege and Data Engineering, 4(3):281-289, June 1992 https://doi.org/10.1109/69.142018
  5. U. Deppisch. S-tree: A dynamic balanced signature index for office retrieval. In ACM SIGIR, pages 77-87, 1986 https://doi.org/10.1145/253168.253189
  6. F. Rabitti and P. Zezula. A dynamic signature technique for multimedia database. In Proc. of the 13th ACM SIGIR, pages 193-210, Brussels, Belgium, September 1990 https://doi.org/10.1145/96749.98223
  7. Ciaccia P. Zezula, P. and P. Tieberio. Hamming filter: A dynamic signature file organization for parallel stores. In Proc. of the 19th VLDB Conf., pages 314-327, Dublin, Ireland, 1993
  8. Kim M. H. Lee Y. J. Yoo, J. S. and J. W. Chang. The hs file : A new dynamic signature file method for efficient information retrieval. In Int'l Conf. On Database and Expert Systems Applications, Athenes, Greece, Sept. 1994
  9. Tiberio P. Grandi, F. and P. Zezula. Frame-sliced partitioned parallel signature files. In Proc. of 15th Ann. Int'l SIGIR, pages 286--297, Denmark, June 1992 https://doi.org/10.1145/133160.133211
  10. D.L. Lee and C. Leng. A partitioned signature file structure for multiattribute and text retrieval. In Proc. of the 6th Int'l Conf. On Data Engineering, pages 389--397, Los Angeles, California, Feb. 1990 https://doi.org/10.1109/ICDE.1990.113492
  11. Rabitti F. Zezula, P. and P. Tiberio. Dynamic partitioning of signature files. ACM TOISn, 9(4):336--369, October 1991 https://doi.org/10.1145/119311.119313
  12. C.Faloutsos and D.Metaxas. Declustering using error correcting codes. In Proc. of 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 253--258, Philadelphia, Pennsylvania, March 1989 https://doi.org/10.1145/73721.73747
  13. F.M. Reza. An Introduction to Information Theory McGraw-Hill, 1961
  14. M. Mehta and D. J. DeWitt. Managing intraoperator parallelism in parallel database systems. In Proc. of the 21th VLDB Conf., pages 382--394, Zurich, Swizerland, 1995
  15. M. H. Kim and S. Pramanik. Optimizing database accesses for parallel processing of multikey range searches. The Computer Juornal, 35(1):45--51, 1992 https://doi.org/10.1093/comjnl/35.1.45
  16. Y. Y. Sung. Performance analysis of disk modulo allocation method for cartesian product files. IEEE Transactions on Software Engineering SE-13(9), pages 1018--1026, 1987 https://doi.org/10.1109/TSE.1987.233524