1 |
C. Li, J. Lu, and Y. Lu, "Efficient Merging and Filtering Algorithms for Approximate String Searches," Proc. the International Conference on Data Engineering, pp. 257-266, 2008.
|
2 |
A. Chowdhury, O. Frieder, D. Grossman, and M.C. McCabe, "Collection Statistics for Fast Duplicate Document Detection," ACM Transactions on Information Systems, Vol. 20, No. 2, pp. 171-191, 2002.
DOI
ScienceOn
|
3 |
N. Shrivakumar and H. Garcia-Molina, "Finding Near-replicas of Documents on the Web," International Workshop on the World Wide Web and Databases, pp. 204-212, 1998.
|
4 |
H. Yang and J. Callan, "Near-duplicate Detection by Instance-level Constrained Clustering," Proc. the international ACM SIGIR conference, pp. 421-428, 2006.
|
5 |
L. Gravano, P.G. Ipeirotis, H.V. Jagadish, N.Koudas, S. Muthukrishnan, and D. Srivastava, "Approximate String Joins in a Database (almost) for Free," Proc. the International Conference on Very Large Data Bases, pp. 491-500, 2001.
|
6 |
S. Chaudhuri, V. Ganti, and R. Kaushik, "A Primitive Operator for Similarity Joins in Data Cleaning," Proc. the International Conference on Data Engineering, pp. 5-16, 2006.
|
7 |
S. Sarawagi, and A. Kirpa, "Efficient Set Joins on Similarity Predicates," Proc. the ACM SIGMOD International Conference, pp. 743-754, 2004.
|
8 |
L. Huang, L. Wang, and X. Li, "Achieving Both High Precision and High Recall in Near-duplicate Detection," Proc. the Conference on Information and Knowledge Management, pp. 63-72, 2008.
|
9 |
J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," USENIX Symposium on Operating Systems Design and implementation, pp. 137-150, 2004.
|
10 |
Yahoo!, "Hadoop". http://hadoop.apache.org, 2013.
|
11 |
R.J. Bayardo, Y. Ma, and R. Srikant, "Scaling up All Pairs Similarity Search," Proc. International World Wide Web Conference, pp. 131-140, 2007.
|
12 |
J. Lin, "Brute Force and Indexed Approaches to Pairwise Document Similarity Comparisons with MapReduce," Proc. the international ACM SIGIR conference , pp. 155-161, 2009.
|
13 |
T. Elsayed, J. Lin, and D. Oard, "Pairwise Document Similarity in Large Collections with MapReducee," Proc. Annual Meeting of the Association of Computational Linguistics, pp. 265-268, 2008.
|
14 |
M. Theobald, J. Siddharth, and A. Paepcke, "SpotSigs: Robust and Efficient Near Duplicate Detection in Large Web Collections," Proc. the International ACM SIGIR Conference, pp. 563-570, 2008.
|
15 |
C. Xiao, W. Wang, X. Lin, and J.X. Yu, "Efficient Similarity Joins for Near Duplicate Detection," Proc. International World Wide Web Conference, pp. 131-140, 2008.
|
16 |
J.W. Kim, K.S. Candan, and J. Tatemura, "Efficient Overlap and Content Reuse Detection in Blogs and Online News Articles," Proc. International World Wide Web Conference, pp. 81-90, 2009.
|
17 |
P. Indyk, and R. Motwani, "Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality," ACM Symposium on the Theory of Computing, pp. 604-613, 1998.
|
18 |
A. Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via Hashing," Proc. the International Conference on Very Large Data Bases, pp. 518-529, 1999.
|
19 |
N. Beckmann, H.P. Kriegel, R. Schneider, and B. Seeger, "The R*-tree: An Efficient and Robust Access Method for Points and Rectangles," Proc. the ACM SIGMOD International Conference, pp. 322-331, 1990.
|
20 |
J.T. Robinson, "The K-D-B-tree: A Search Structure for Large Multidimensional Dynamic Indexes," Proc. the ACM SIGMOD International Conference, pp. 10-18, 1981.
|
21 |
S. Berchtold, C. Bohm, and H.P. Kriegel, "The Pyramid-Technique: Towards Breaking the Curse of Dimensionality," Proc. the ACM SIGMOD International Conference, pp. 142-153, 1998.
|
22 |
A. Arasu, V. Ganti, and R. Kaushik, "Efficient Exact Set-similarity Joins," Proc. the International Conference on Very Large Data Bases, pp. 918-929, 2006.
|
23 |
K. Chakrabarti and S. Mehrotra, "The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces," Proc. the International Conference on Data Engineering, pp. 440-447, 1999.
|
24 |
A. Andoni and P. Indyk, "Near-optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions," Communications of the ACM, Vol. 51, No. 1, pp. 117-122, 2008.
|
25 |
J. Zobel, A. Moffat, and K. Ramamohanarao, "Inverted Files versus Signature Files for Text Indexing," ACM Transactions on Database Systems, Vol. 23, No. 4, pp. 453-490, 1998.
DOI
ScienceOn
|
26 |
Google Blog Search. http://blogsearch.google.com/blogsearch, 2013.
|
27 |
Google News. http://news.google.com, 2013.
|
28 |
Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2006.
|
29 |
K.H. Hyun, "Video Matching Algorithm of Content-Based Video Copy Detection for Copyright Protection," Journal of Korea Multimedia Society, Vol. 11, No. 3, pp. 315-322, 2008.
과학기술학회마을
|
30 |
D. Metzler, Y. Bernstein, W.B. Croft, A. Moffat, and J. Zobel, "Similarity Measures for Tracking Information Flow," Proc. the Conference on Information and Knowledge Management, pp. 517-524, 2005.
|
31 |
X. Chen, B. Francia, M. Li, and B. Mckinnon, "Shared Information and Program Plagiarism Detection," IEEE Transactions on Information Theory, Vol. 50, No. 7, pp. 1545-1551, 2004.
DOI
ScienceOn
|
32 |
N. Shivakumar and H. Garcia-Molina, "SCAM: A Copy Detection Mechanism for Digital Documents," Second Annual Conference on the Theory and Practice of Digital Libraries, 1995.
|
33 |
S. Schleimer, D.S. Wilkerson, and A. Aiken, "Winnowing: Local Algorithms for Document Fingerprinting," Proc. the ACM SIGMOD International Conference, pp. 76-85, 2003.
|
34 |
Y. Bernstein and J. Zobel, "A Scalable System for Identifying Co-derivative Documents." Proc. String Processing and Information Retrieval Symp, pp. 56-67, 2004.
|