Research on Keyword-Overlap Similarity Algorithm Optimization in Short English Text Based on Lexical Chunk Theory

Na Li;Cheng Li;Honglie Zhang;

doi:10.3745/JIPS.02.0205

Journal of Information Processing Systems

Volume 19 Issue 5
/
Pages.631-640
/
2023
/
1976-913X(pISSN)
/
2092-805X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Research on Keyword-Overlap Similarity Algorithm Optimization in Short English Text Based on Lexical Chunk Theory

Na Li (Public Foreign Language Teaching and Research Department, Qiqihar University) ;
Cheng Li (College of Computer and Control Engineering, Qiqihar University) ;
Honglie Zhang (College of Computer and Control Engineering, Qiqihar University)

Received : 2022.12.14
Accepted : 2023.02.26
Published : 2023.10.31

https://doi.org/10.3745/JIPS.02.0205 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Short-text similarity calculation is one of the hot issues in natural language processing research. The conventional keyword-overlap similarity algorithms merely consider the lexical item information and neglect the effect of the word order. And some of its optimized algorithms combine the word order, but the weights are hard to be determined. In the paper, viewing the keyword-overlap similarity algorithm, the short English text similarity algorithm based on lexical chunk theory (LC-SETSA) is proposed, which introduces the lexical chunk theory existing in cognitive psychology category into the short English text similarity calculation for the first time. The lexical chunks are applied to segment short English texts, and the segmentation results demonstrate the semantic connotation and the fixed word order of the lexical chunks, and then the overlap similarity of the lexical chunks is calculated accordingly. Finally, the comparative experiments are carried out, and the experimental results prove that the proposed algorithm of the paper is feasible, stable, and effective to a large extent.

Keywords

Acknowledgement

This research was funded by the Education Department of Heilongjiang Province of China (Grant No. 135309463 and 135509118).

References

C. Banea, S. Hassan, M. Mohler, and R. Mihalcea, "UNT: a supervised synergistic approach to semantic text similarity," in Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval), Montreal, Canada, 2012, pp. 635-642.
H. Liang, K. Lin, and S. Zhu, "Short text similarity hybrid algorithm for a Chinese medical intelligent question answering system," in Technology-Inspired Smart Learning for Future Education. Singapore: Springer, 2020, pp. 129-142. https://doi.org/10.1007/978-981-15-5390-5_11
S. Banerjee, S. Kaur, and P. Kumar, "Quote examiner: verifying quoted images using web-based text similarity," Multimedia Tools and Applications, vol. 80, pp. 12135-12154, 2021. https://doi.org/10.1007/s11042-020-10270-4
Y. Liu and M. Chen, "Applying text similarity algorithm to analyze the triangular citation behavior of scientists," Applied Soft Computing, vol. 107, article no. 107362, 2021. https://doi.org/10.1016/j.asoc.2021.107362
X. Lin, M. Zhang, X. Bao, J. Li, and X. Wu, "Short-text Classification Method Based on Concept Network," Computer Engineering, vol. 36, no. 21, pp. 4-6, 2010. https://doi.org/10.3969/j.issn.1000-3428.2010.21.002
C. Jin and H. Zhou, "Chinese short text clustering based on dynamic vector," Computer Engineering and Applications, vol. 47, no. 33, pp. 156-158, 2011.
X. Q. Zhao, Y. Zheng, and H. Q. Chu. "Application of concept tree in semantic similarity of short texts," Computer Technology and Development, vol. 22, no. 6, pp. 159-162, 2012.
J. Yin, D. Chao, Z. Liu, W. Zhang, X. Yu, and J. Wang, "Model-based clustering of short text streams," in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 2018, pp. 2634-2642. https://doi.org/10.1145/3219819.3220094
T. Schick, H. Schmid, and H. Schutze, "Automatically identifying words that can serve as labels for few-shot text classification," in Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 2020, pp. 5569-5578. https://doi.org/10.18653/v1/2020.coling-main.488
J. W. Sun, X. Q. Lu, and L. H. Zhang, "Short text classification based on semantics and maximum matching degree," Computer Engineering and Designing, vol. 34, no. 10, pp. 3613-3618, 2013.
H. T. Nguyen, P. H. Duong, and E. Cambria, "Learning short-text semantic similarity with word embeddings and external knowledge sources," Knowledge-Based Systems, vol. 182, article no. 104842, 2019. https://doi.org/10.1016/j.knosys.2019.07.013
G. Majumder, P. Pakray, R. Das, and D. Pinto, "Interpretable semantic textual similarity of sentences using alignment of chunks with classification and regression," Applied Intelligence, vol. 51, pp. 7322-7349, 2021. https://doi.org/10.1007/s10489-020-02144-x
Z. Liu, C. Lu, H. Huang, S. Lyu, and Z. Tao, "Text classification based on multi-granularity attention hybrid neural network," 2020 [Online]. Available: https://arxiv.org/abs/2008.05282.
P. Huang, G. Yu, H. Lu, D. Liu, L. Xing, Y. Yin, N. Kovalchuk, L. Xing, and D. Li, "Attention-aware fully convolutional neural network with convolutional long short-term memory network for ultrasound-based motion tracking," Medical Physics, vol. 46, no. 5, pp. 2275-2285, 2019. https://doi.org/10.1002/mp.13510
N. Peinelt, D. Nguyen, and M. Liakata, "tBERT: topic models and BERT joining forces for semantic similarity detection," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual Event, 2020, pp. 7047-7055. http://dx.doi.org/10.18653/v1/2020.acl-main.630
R. Zhang, G. Yang, and H. Wu, "A new measure of semantic similarity between unknown Chinese words based on HowNet," Journal of Chinese Information Processing, vol. 26, no. 1, pp. 16-21, 2012.
J. D. Becker, "The phrasal lexicon," in Proceedings of the 1975 Workshop on Theoretical Issues in Natural Language Processing, Cambridge, MA, 1975, pp. 60-63. https://doi.org/10.3115/980190.980212
J. R. Nattinger and J. S. DeCarrico, Lexical Phrases and Language Teaching. Oxford, UK: Oxford University Press, 1992.

Journal of Information Processing Systems

Research on Keyword-Overlap Similarity Algorithm Optimization in Short English Text Based on Lexical Chunk Theory

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)