DOI QR코드

DOI QR Code

Tweet Entity Linking Method based on User Similarity for Entity Disambiguation

개체 중의성 해소를 위한 사용자 유사도 기반의 트윗 개체 링킹 기법

  • Received : 2016.06.07
  • Accepted : 2016.07.06
  • Published : 2016.09.15

Abstract

Web based entity linking cannot be applied in tweet entity linking because twitter documents are shorter in comparison to web documents. Therefore, tweet entity linking uses the information of users or groups. However, data sparseness problem is occurred due to the users with the inadequate number of twitter experience data; in addition, a negative impact on the accuracy of the linking result for users is possible when using the information of unrelated groups. To solve the data sparseness problem, we consider three features including the meanings from single tweets, the users' own tweet set and the sets of other users' tweets. Furthermore, we improve the performance and the accuracy of the tweet entity linking by assigning a weight to the information of users with a high similarity. Through a comparative experiment using actual twitter data, we verify that the proposed tweet entity linking has higher performance and accuracy than existing methods, and has a correlation with solving the data sparseness problem and improved linking accuracy for use of information of high similarity users.

트위터 문서는 웹 문서에 비해 길이가 짧기 때문에 웹 기반의 개체 링킹 기법을 그대로 적용시킬 수 없어 사용자 정보나 집단의 정보를 활용하는 방법들이 시도되고 있다. 하지만, 트윗의 개수가 충분하지 않은 사용자의 경우 데이터 희소성 문제가 여전히 발생하고 관련이 없는 집단의 정보를 사용할 경우 링킹의 결과에 악영향을 미칠 수 있다. 본 논문에서는 기존 연구의 문제를 해결하기 위해 단일 트윗 내의 의미 관련도 뿐만 아니라 사용자의 트윗 집합과 다른 사용자들의 트윗 집합까지 고려하여 데이터 희소성을 해결하고, 관련성이 높은 사용자들의 트윗 정보에 가중치를 주어 트윗 개체 링킹의 성능을 높이고자 한다. 실제 트위터 데이터를 활용한 실험을 통해 제안하는 트윗 개체 링킹 기법이 기존의 기법에 비해 높은 성능을 가지며, 유사도가 높은 사용자의 정보를 사용하는 것이 트윗 개체 링킹에서 데이터 희소성 해결과 링킹 정확도 향상에 연관성이 있음을 보였다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. W. Shen and J. Wang, "Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions," Journal of IEEE Transactions on Knowledge and Data Engineering, Vol. 27, pp. 443-460, 2015. https://doi.org/10.1109/TKDE.2014.2327028
  2. R. Mihalcea and A. Csomai, "Wikify!: linking documents to encyclopedic knowledge," Proc. Of the 16th Conference on Information and Knowledge Management, pp. 233-242, 2007.
  3. D. Milne and I. H. Witten, "Learning to Link with Wikipedia" Proc. Of the 18th Conference on Information and Knowledge Management, pp. 509-518, 2008.
  4. X. Han, L. Sun and J. Zhao, "Collective Entity Linking in Web Text: A Graph-Based Method," Proc. Of the 34th International ACMSIGIR Conferenceon Research and Development in Information Retrieval, pp. 765-774, 2011.
  5. S. Kulkarni et al., "Collective annotation of Wikipedia entities in web text," Proc. Of the 15th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 457-466, 2009.
  6. W. Shen, J. Wang, P. Luo and M. Wang, "Linking Named Entities in Tweets with Knowledge Base via User Interest Modeling," Proc. Of the 19th SIGKDD international conference on Knowledge Discovery and Data Mining (KDD), pp. 68-76, 2013.
  7. R. Bansal et al., "EDIUM: Improving Entity Disambiguation via User Modeling," Journal of Advances in Information Retrieval, pp. 418-423, 2014.
  8. X. Liu et al., "Entity Linking for Tweets," Proc. Of the 51th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1304-1311, 2013.
  9. S. Jeong, Y. Park, S. Kang and J. Seo, "Entity Linking for Tweets using User Model and Real-time News Stream," Journal of Cognitive Science, pp. 435-452, 2015. (in Korean)
  10. I-S. Kang, "An Effect of Semantic Relatedness on Entity Disambiguation: Using Korean Wikipedia," Journal of Korean Institute of Intelligent Systems, pp. 111-118, 2015. (in Korean)
  11. O. Medelyan, I. H. Witten, D. Milne, "Topic indexing with Wikipedia," Proc. Of the Wikipedia and AI workshop at AAAI-08, 2008.
  12. S. Dill. Eiron et al., "Semtag and seeker: bootstrapping the semantic web via automated semantic annotation," Proc. Of the 12th international conference on World Wide Web (WWW), pp. 178-186, 2003.
  13. R. L. Cilibrasi and Paul M. B. Vitanyi, "The Google Similarity Distance," Proc. Of IEEE Transation on Knowledge and Data Engineering, Vol. 19, No. 3, pp. 370-383, 2007. https://doi.org/10.1109/TKDE.2007.48
  14. J. Gracia et al., "Querying the Web: A Multiontology Disambiguation Method," Proc. Of the 6th International Conference on Web Engineering, 2006.
  15. D. Milne and I. H. Witten, "An effective, low-cost measure of semantic relatedness obtained from Wikipedia links," Proc. Of the AAAI Workshop on WIKIAI, pp. 25-30, 2008.
  16. Y-D. Seo, J-D. Kim and D-K. Baik, "PReAmacy: A Personalized Recommendation Algorithm considering Contents and Intimacy between Users in Social Network Services," Journal of KIISE, Vol. 41, No. 4, pp. 209-226, 2014. (in Korean)
  17. K-S. Seol, J-D. Kim, H-N. Shin, D-K. Baik, "Intimacy Measurement Method and Experiment between Social Network Service Users," Journal of KIISE, Vol. 39, No. 4, pp. 335-341, 2012. (in Korean)
  18. H. Park, H. Kwak, M. Cha and S. B. Moon, "Influentials Ranking in Social Networks," Journal of KIISE, Vol. 28, No. 3, pp. 24-30, 2010. (in Korean)