A Text Similarity Measurement Method Based on Singular Value Decomposition and Semantic Relevance

Li, Xu;Yao, Chunlong;Fan, Fenglong;Yu, Xiaoqiang;

doi:10.3745/JIPS.02.0067

Journal of Information Processing Systems

Volume 13 Issue 4
/
Pages.863-875
/
2017
/
1976-913X(pISSN)
/
2092-805X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

A Text Similarity Measurement Method Based on Singular Value Decomposition and Semantic Relevance

Li, Xu (School of Information Science and Engineering, Dalian Polytechnic University) ;
Yao, Chunlong (School of Information Science and Engineering, Dalian Polytechnic University) ;
Fan, Fenglong (School of Information Science and Engineering, Dalian Polytechnic University) ;
Yu, Xiaoqiang (School of Information Science and Engineering, Dalian Polytechnic University)

Received : 2016.02.15
Accepted : 2017.03.14
Published : 2017.08.31

https://doi.org/10.3745/JIPS.02.0067 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

The traditional text similarity measurement methods based on word frequency vector ignore the semantic relationships between words, which has become the obstacle to text similarity calculation, together with the high-dimensionality and sparsity of document vector. To address the problems, the improved singular value decomposition is used to reduce dimensionality and remove noises of the text representation model. The optimal number of singular values is analyzed and the semantic relevance between words can be calculated in constructed semantic space. An inverted index construction algorithm and the similarity definitions between vectors are proposed to calculate the similarity between two documents on the semantic level. The experimental results on benchmark corpus demonstrate that the proposed method promotes the evaluation metrics of F-measure.

Keywords

References

N. K. Nagwani, "A comment on "a similarity measure for text classification and clustering"," IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 1, pp. 2589-2590, 2015. https://doi.org/10.1109/TKDE.2015.2451616
A. Awajan, "Semantic similarity based approach for reducing Arabic texts dimensionality," International Journal of Speech Technology, vol. 19, no. 2, pp. 191-201, 2016. https://doi.org/10.1007/s10772-015-9284-6
L. Xu, S. Sun and Q. Wang, "Text similarity algorithm based on semantic vector space model," in Proceedings of the 15th International Conference on Computer and Information Science, Okayama, Japan, 2016, pp. 1-4.
R. lonescu and M. Popescu, Knowledge Transfer between Computer Vision and Text Mining: Similarity-Based Learning Approaches. Cham: Springer, 2016.
E. Blanco and D. Moldovan, "A semantic logic-based approach to determine textual similarity," IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 23, no. 4, pp. 683-693, 2015. https://doi.org/10.1109/TASLP.2015.2403613
M. Shirakawa, K. Nakayama, T. Hara, and S. Nishio, "Wikipedia-based semantic similarity measurements for noisy short texts using extended naive Bayes," IEEE Transactions on Emerging Topics in Computing, vol. 3, no. 2, pp. 205-219, 2015. https://doi.org/10.1109/TETC.2015.2418716
H. Z. Liu and P. F. Wang, "Accessing text semantic similarity using ontology," Journal of Software, vol. 9, no. 2, pp. 490-497, 2014.
W. Song, C. H. Li, and S. C. Park, "Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures," Expert Systems with Applications, vol. 36, no. 5, pp. 9095-9104, 2009. https://doi.org/10.1016/j.eswa.2008.12.046
Y. Wang and J. Hodges, "Document clustering with semantic analysis," in Proceedings of the 39th Annual Hawaii International Conference on System Sciences, Kauia, HI, 2006, pp. 54-63.
R. M. Aliguliyev, "A new sentence similarity measure and sentence based extractive technique for automatic text summarization," Expert Systems with Applications, vol. 36, no. 4, pp. 7764-7772, 2009. https://doi.org/10.1016/j.eswa.2008.11.022
L. Gang, C. Zheng and L. Zhang, "Text information retrieval based on concept semantic similarity," in Proceedings of the 5th International Conference on Semantics, Knowledge and Grid, Zhuhai, China, 2009, pp. 356-360.
A. Hotho, S. Staab, and G. Stumme, "Ontologies improves text document clustering," in Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, FL, 2003, pp. 541-544.
R. J. Bellegarda, "Exploiting latent semantic information in statistical language modeling," Proceedings of the IEEE, vol. 88, no. 8, pp. 1279-1296, 2000. https://doi.org/10.1109/5.880084
C. Buck and P. Koehn, "Quick and reliable document alignment via TF/IDF-weighted cosine distance," in Proceedings of the 1st Conference on Machine Translation, Berlin, Germany, 2016, pp. 672-678.
A. Mirzal, "Clustering and latent semantic indexing aspects of the singular value decomposition," International Journal of Information and Decision Sciences, vol. 8, no. 1, pp. 53-72, 2016. https://doi.org/10.1504/IJIDS.2016.075790
G. Karypis, "CLUTO: a clustering toolkit," 2006 [Online]. Available: http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview.

Journal of Information Processing Systems

A Text Similarity Measurement Method Based on Singular Value Decomposition and Semantic Relevance

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)