Clustering of Web Objects with Similar Popularity Trends

Loh, Woong-Kee;

doi:10.3745/KIPSTD.2008.15-D.4.485

정보처리학회논문지D (The KIPS Transactions:PartD)

제15D권4호
/
Pages.485-494
/
2008
/
1598-2866(pISSN)

한국정보처리학회 (Korea Information Processing Society)

DOI QR Code

유사한 인기도 추세를 갖는 웹 객체들의 클러스터링

Clustering of Web Objects with Similar Popularity Trends

노웅기 (성결대학교 멀티미디어학부)

Loh, Woong-Kee

발행 : 2008.08.29

https://doi.org/10.3745/KIPSTD.2008.15-D.4.485 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

인터넷이 광범위하게 활용됨에 따라 검색 키워드, 멀티미디어 객체, 웹 페이지, 블로그 등의 다양한 웹 객체들이 크게 증가하고 있다. 이러한 웹 객체들의 인기도는 시간에 따라 변화하며, 그러한 웹 객체 인기도의 시간적 패턴에 대한 마이닝이 여러 가지 웹 응용에 필요한 중요한 연구 과제가 되고 있다. 예를 들어, 검색 키워드에 대한 인기도 패턴의 분석은 앞으로 인기가 높아질 키워드를 미리 예측할 수 있게 하여 광고주들에게 키워드를 판매하기 위한 가격을 결정하는 데에 중요한 자료가 될 수 있다. 하지만, 웹 객체 인기도가 시간에 따라 변화하고 웹 객체의 개수가 매우 방대하다는 특성으로 인하여 웹 객체 인기도에 대한 분석은 매우 어려운 문제이다. 본 논문에서는 웹 객체 인기도의 시간적 패턴을 마이닝하기 위한 효율적인 알고리즘을 제안한다. 본 논문은 웹 객체 인기도를 시계열로 표현하고, 두 웹 객체 인기도 간의 유사성을 측정하기 위하여 gap 척도를 제안한다. gap 척도의 효율적인 계산을 위하여 FFT를 활용한 알고리즘을 제안하고, 밀도기반 클러스터링 알고리즘을 이용하여 유사한 인기도 추세를 갖는 웹 객체들의 클러스터를 생성한다. 본 논문에서는 웹 객체 인기도가 특정 분포를 따르거나 주기적이라고 가정하지 않는다. Google Trends 웹 사이트로부터 구한 검색 키워드 인기도를 이용한 실험을 통하여, 제안된 알고리즘이 실세계 응용에서 유용함을 보인다.

Huge amounts of various web items such as keywords, images, and web pages are being made widely available on the Web. The popularities of such web items continuously change over time, and mining temporal patterns in popularities of web items is an important problem that is useful for several web applications. For example, the temporal patterns in popularities of search keywords help web search enterprises predict future popular keywords, enabling them to make price decisions when marketing search keywords to advertisers. However, presence of millions of web items makes it difficult to scale up previous techniques for this problem. This paper proposes an efficient method for mining temporal patterns in popularities of web items. We treat the popularities of web items as time-series, and propose gapmeasure to quantify the similarity between the popularities of two web items. To reduce the computation overhead for this measure, an efficient method using the Fast Fourier Transform (FFT) is presented. We assume that the popularities of web items are not necessarily following any probabilistic distribution or periodic. For finding clusters of web items with similar popularity trends, we propose to use a density-based clustering algorithm based on the gap measure. Our experiments using the popularity trends of search keywords obtained from the Google Trends web site illustrate the scalability and usefulness of the proposed approach in real-world applications.

키워드

참고문헌

R. Agrawal, C. Faloutsos, and A. Swami, “Efficient Similarity Search in Sequence Databases,” In Proc. Int'l Conf. on Foundations and Data Organization and Algorithm (FODO), Chicago, Illinois, pp.69-84, Oct., 1993
M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “OPTICS: Ordering Points To Identify the Clustering Structure,” In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Philadelphia, Pennsylvania, pp.49-60, June, 1999
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley, 1999
P. J. Carrington, J. Scott, and S Wasserman, Models and Methods in Social Network Analysis, Cambridge University Press, 2005
S. Chien and N. Immorlica, “Semantic Similarity between Search Engine Queries Using Temporal Correlation,” In Proc. Int'l Conf. on World Wide Web (WWW), Chiba, Japan, pp. 2-11, May, 2005 https://doi.org/10.1145/1060745.1060752
M. G. Elfeky, W. G. Aref, and A. K. Elmagarmid, “Periodicity Detection in Time Series Databases,” IEEE Trans. on Knowledge and Data Engineering (TKDE), Vol.17, No.7, pp.875-887, July, 2005 https://doi.org/10.1109/TKDE.2005.114
C. Elkan, “Using the Triangle Inequality to Accelerate k-Means,” In Proc. Int'l Conf. on Machine Learning (ICML), Washington, DC, pp. 147-153, Aug., 2003
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density -Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” In Proc. Int'l Conf. on Knowledge Discovery and Data Mining (KDD), Portland, Oregon, pp.226-231, Aug., 1996
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, “Fast Subsequence Matching in Time-Series Databases,” In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Minneapolis, Minnesota, pp.419-429, May, 1994 https://doi.org/10.1145/191843.191925
R. C. Gonzalez and R. E. Woods, Digital Image Processing, Prentice Hall, 2nd Ed., 2002
J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2nd Ed., 2005
M. R. Henzinger, “Web Information Retrieval – An Algorithmic Perspective,” In Proc. Annual European Symposium (ESA), Saarbrucken, Germany, pp.1-8, Sept., 2000
E. J. Keogh, “Exact Indexing of Dynamic Time Warping,” In Proc. Int'l Conf. on Very Large Data Bases (VLDB), Hong Kong, China, pp.406-417, Aug., 2002
E. J. Keogh and S. Kasetty, “On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration,” In Proc. Int'l Conf. on Knowledge Discovery and Data Mining, ACM SIGKDD, Edmonton, Canada, pp.102-111, July, 2002 https://doi.org/10.1145/775047.775062
E. J. Keogh, L. Wei, X. Xi, S.-H. Lee, and M. Vlachos, “LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary Representations and Distance Measures,” In Proc. Int'l Conf. on Very Large Data Bases(VLDB), Seoul, Korea, pp.882-893, Sept., 2006
R. Kosala and H. Blockeel, “Web Mining Research: A Survey,” SIGKDD Explorations, Vol.2, No.1, pp.1-15, June, 2000 https://doi.org/10.1145/360402.360406
A. N. Langville and C. D. Meyer, “A Survey of Eigenvector Methods of Web Information Retrieval,” The SIAM Review, Vol.47, No.1, pp.135-161, Jan., 2005 https://doi.org/10.1137/S0036144503424786
J. Lin, M. Vlachos, E. J. Keogh, and D. Gunopulos, “Iterative Incremental Clustering of Time Series,” In Proc. Int'l Conf. on Extending Database Technology (EDBT), Crete, Greece, pp.106-122, Mar., 2004
J. Lin et al., “An MPAA-Based Iterative Clustering Algorithm Augmented by Nearest Neighbors Search for Time-Series Data Streams,” In Proc. Pacific-Asia Conf. on Advances in Knowledge Discovery and Data Mining (PAKDD), Hanoi, Vietnam, pp.333-342, May, 2005
J. McQueen, “Some Methods for Classification and Analysis of Multivariate Observation,” In Proc. Berkeley Symp. on Mathematical Statistics and Probability, Berkeley, California, pp.281-297, 1967
Y.-S. Moon, K.-Y. Whang, and W.-K. Loh, “Duality-Based Subsequence Matching in Time-Series Databases,” In Proc. Int'l Conf. on Data Engineering (ICDE), IEEE, Heidelberg, Germany, pp.263-272, Apr., 2001
Y.-S. Moon, K.-Y. Whang, and W.-S. Han, “General Match: A Subsequence Matching Method in Time-Series Databases Based on Generalized Windows,” In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Madison, Wisconsin, pp. 382-393, June, 2002 https://doi.org/10.1145/564691.564735
M. Nanni, “Speeding-Up Hierarchical Agglomerative Clustering in Presence of Expensive Metrics,” In Proc. Pacific-Asia Conf. on Advances in Knowledge Discovery and Data Mining (PAKDD), Hanoi, Vietnam, pp.378-387, May, 2005 https://doi.org/10.1007/11430919_45
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 2nd Ed., 1992
J. G. Proakis and D. K. Manolakis, Digital Signal Processing, Prentice Hall, 4th Ed., 2006
C. Ratanamahatana and E. J. Keogh, “Three Myths about Dynamic Time Warping Data Mining,” In Proc. SIAM International Data Mining Conference (SDM), Newport Beach, California, pp.506-510, Apr., 2005
Y. Sakurai, S. Papadimitriou, and C. Faloutsos, “AutoLag: Automatic Discovery of Lag Correlations in Stream Data,” In Proc. Int'l Conf. on Data Engineering(ICDE), Tokyo, Japan, pp.159-160, Apr., 2005 https://doi.org/10.1109/ICDE.2005.24
M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. J. Keogh, “Indexing Multi-Dimensional Time-Series with Support for Multiple Distance Measures,” In Proc. Int'l Conf. on Knowledge Discovery and Data Mining, ACM SIGKDD, Washington, D.C., pp. 216-225, Aug., 2003 https://doi.org/10.1145/956750.956777
M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos, “Identifying Similarities, Periodicities and Bursts for Online Search Queries,” In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Paris, France, pp.131-142, June, 2004 https://doi.org/10.1145/1007568.1007586
B.-K. Yi, H. V. Jagadish, and C. Faloutsos, “Efficient Retrieval of Similar Time Sequences Under Time Warping,” In Proc. Int'l Conf. on Data Engineering (ICDE), Orlando, Florida, pp.201-208, Feb., 1998 https://doi.org/10.1109/ICDE.1998.655778
B.-K. Yi and C. Faloutsos, “Fast Time Sequence Indexing for Arbitrary Lp Norms,” In Proc. Int'l Conf. on Very Large Data Bases(VLDB), Cairo, Egypt, pp.385-394, Sept., 2000
T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An Efficient Data Clustering Method for Very Large Databases,” In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Montreal, Canada, pp.103-114, June, 1996

정보처리학회논문지D (The KIPS Transactions:PartD)

유사한 인기도 추세를 갖는 웹 객체들의 클러스터링

Clustering of Web Objects with Similar Popularity Trends

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)