Browse > Article
http://dx.doi.org/10.3837/tiis.2021.11.007

Question Similarity Measurement of Chinese Crop Diseases and Insect Pests Based on Mixed Information Extraction  

Zhou, Han (College of Information and Electrical Engineering, China Agricultural University)
Guo, Xuchao (College of Information and Electrical Engineering, China Agricultural University)
Liu, Chengqi (College of Information and Electrical Engineering, China Agricultural University)
Tang, Zhan (College of Information and Electrical Engineering, China Agricultural University)
Lu, Shuhan (University of Michigan)
Li, Lin (College of Information and Electrical Engineering, China Agricultural University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.15, no.11, 2021 , pp. 3991-4010 More about this Journal
Abstract
The Question Similarity Measurement of Chinese Crop Diseases and Insect Pests (QSM-CCD&IP) aims to judge the user's tendency to ask questions regarding input problems. The measurement is the basis of the Agricultural Knowledge Question and Answering (Q & A) system, information retrieval, and other tasks. However, the corpus and measurement methods available in this field have some deficiencies. In addition, error propagation may occur when the word boundary features and local context information are ignored when the general method embeds sentences. Hence, these factors make the task challenging. To solve the above problems and tackle the Question Similarity Measurement task in this work, a corpus on Chinese crop diseases and insect pests(CCDIP), which contains 13 categories, was established. Then, taking the CCDIP as the research object, this study proposes a Chinese agricultural text similarity matching model, namely, the AgrCQS. This model is based on mixed information extraction. Specifically, the hybrid embedding layer can enrich character information and improve the recognition ability of the model on the word boundary. The multi-scale local information can be extracted by multi-core convolutional neural network based on multi-weight (MM-CNN). The self-attention mechanism can enhance the fusion ability of the model on global information. In this research, the performance of the AgrCQS on the CCDIP is verified, and three benchmark datasets, namely, AFQMC, LCQMC, and BQ, are used. The accuracy rates are 93.92%, 74.42%, 86.35%, and 83.05%, respectively, which are higher than that of baseline systems without using any external knowledge. Additionally, the proposed method module can be extracted separately and applied to other models, thus providing reference for related research.
Keywords
Text semantic similarity; Short text-similarity; Agricultural natural language processing; Chinese word segmentation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M. T. Luong, H. Pham and C. D. Manning, "Effective Approaches to Attention-based Neural Machine Translation," in Proc. of the 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 1412-1421, September 2015.
2 S. Hochreiter, J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no.8, pp. 1735-1780, November 1997.   DOI
3 T. Mikolov, K. Chen, G. Corrado and J. Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv preprint arXiv: 1301.3781, January 2013.
4 L. Zhongguo and S. Maosong, "Punctuation as Implicit Annotations for Chinese Word Segmentation," Computational Linguistics, vol. 35, no. 4, pp. 505-512, December 2009.   DOI
5 Z. Y. Jiao, S. Q. Sun and K. Sun, "Chinese lexical analysis with deep Bi-GRU-CRF network," arXiv preprint arXiv: 1807.01882, July 2018.
6 R. X. Luo, J. J. Xu, Y. Zhang, X. C. Ren and X. Sun, "PKUSEG: A toolkit for multi-domain Chinese word segmentation," arXiv preprint arXiv: 1906.11455, June 2019.
7 X. C. Guo, H. Zhou, J. Su, X. Hao, Z. Tang, L. Diao and L. Li, "Chinese agricultural diseases and pests named entity recognition with multi-scale local context features and self-attention mechanism," Computers and Electronics in Agriculture, vol. 179, December 2020.
8 M. Mirakyan, K. Hambardzumyan and H. Khachatrian, "Natural language inference over interaction space: ICLR 2018 reproducibility report," arXiv preprint arXiv: 1802.03198, February 2018.
9 J. Devlin, M. W. Chang, K. Lee and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171-4186, June 2019.
10 S. Cox, X. L. Dong, R. Rai, L. Christopherson, W. F. Zheng, A. Tropsha and C. Schmitt, "A semantic similarity based methodology for predicting protein-protein interactions: Evaluation with P53-interacting kinases," Journal of Biomedical Informatics, vol. 111, pp. 1532-0464, November 2020.
11 M. J. Kusner, Y. Sun, N. I. Kolkin and K. Q. Weinberger, "From word embeddings to document distances," in Proc. of the 32nd International Conference on Machine Learning, ICML, vol. 37, pp. 957-966, July 2015.
12 Y. H. Liu, M. Ott, N. Goyal, J. F. Du, M. Joshi, D. Q. Chen, O. Levy, M. Lewis, L. Zettlemoyer and V. Stoyanov, "RoBERTa: A Robustly Optimized BERT Pretraining Approach," arXiv preprint arXiv: 1907.11692, July 2019.
13 Z. Quan, Z. Wang, Y. Le, B. Yao, K. Li and J. Yin, "An Efficient Framework for Sentence Similarity Modeling," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 4, pp. 853-865, April 2019.   DOI
14 A. Severyn, M. Nicosia and A. Moschitti, "Building structures from classifiers for passage reranking," in Proc. of the 22nd ACM international conference on Information & Knowledge Management, pp. 969-978, October 2013.
15 R. Johnson and T. Zhang, "Deep Pyramid Convolutional Neural Networks for Text Categorization," in Proc. of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 562-570, July 2017.
16 L. Javier, L. P. F. Javier, E. G. Borja, N. I. Javier and Z. S. F. Javier, "Agricultural recommendation system for crop protection," Computers and Electronics in Agriculture, vol. 152, pp. 82-89, September 2018.   DOI
17 Z. G. Wang, W. Hamza, R. Florian, "Bilateral Multi-Perspective Matching for Natural Language Sentences," in Proc. of the 26th International Joint Conference on Artificial Intelligence, pp. 4144-4150, 2017.
18 X. Liu, Q. C. Chen, C. Deng, H. J. Zeng, J. Chen, D. F. Li and B. Z. Tang, "LCQMC: A Largescale Chinese Question Matching Corpus," in Proc. of the 27th International Conference on Computational Linguistics, pp. 1952-1962, August 2018.
19 R. B. Zhang, Q. C. Zhang, Y. Zhou, Y. Pan, M. Zhu, Y. Qi and X. X. Sun, "Occurrence and control measures of rice blast in Jianhu County in 2019," Anhui Agricultural Science Bulletin, vol. 26, no. 24, pp. 127-128, December 2020.
20 N. Ghasemi and S. Momtazi, "Neural text similarity of user reviews for improving collaborative filtering recommender systems," Electronic Commerce Research and Applications, vol. 45, pp. 1567-4223, January-February 2021.
21 J. Chen, Q. C. Chen, X. Liu, H. J. Yang, D. H. Lu and B. Z. Tang, "The BQ Corpus: A Largescale Domain-specific Chinese Corpus for Sentence Semantic Equivalence Identification," in Proc. of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4946-4951, October-November 2018.
22 Y. Z. Zhang, D. W. Song, P. Zhang, X. Li and P. P. Wang, "A quantum-inspired sentiment representation model for twitter sentiment analysis," Applied Intelligence, vol. 49, pp. 3093-3108, March 2019.   DOI
23 M. Zhao, C. C. Dong, Q. X. Dong and Y. Chen, "Question Classification of Tomato Pests and Diseases Question Answering System Based on BIGRU," Transactions of The Chinese Society of Agricultural Machinery, vol. 49, no. 5, pp. 271-276, October 2018.
24 Z. L. Yang, Z. H. Dai, Y. M. Yang, J. Carbonell, R. Salakhutdinov and Q. V. Le, "XLNet: Generalized Autoregressive Pretraining for Language Understanding," arXiv preprint arXiv: 1906.08237, January 2019.
25 M. Y. Zhang, H. R. Wu and H. J. Zhu, "Analysis of Extraction of Semantic Feature in Agricultural Question and Answer Based on Convolutional Model," Transactions of the Chinese Society for Agricultural Machinery, vol. 49, no. 12, pp. 203-210, May 2018.
26 N. Jin, C. J. Zhao, H. R. Wu, Y. S. Miao, S. Li and B. Z. Yang, "Classification Technology of Agricultural Questions Based on BiGRU_MulCNN," Transactions of The Chinese Society of Agricultural Machinery, vol. 51, no. 5, pp. 199-206, August 2020.
27 Z.C. Zhang, "An improved BM25 algorithm for clinical decision support in Precision Medicine based on co-word analysis and Cuckoo Search," BMC Medical Informatics and Decision Making, vol. 21, no. 81, March 2021.
28 D. Suhartono and K. Khodirun, "System of Information Feedback on Archive Using Term Frequency-Inverse Document Frequency and Vector Space Model Methods," International Journal of Informatics and Information Systems, vol. 3, no. 1, pp. 36-42, 2020.   DOI
29 T. Kenter and M. de Rijke, "Short text similarity with word embeddings," in Proc. of the 24th ACM International Conference on Information and Knowledge Management, CIKM, pp. 1411-1420, October 2015.
30 M. A. Alvarez-Carmona, M. Franco-Salvador, E. Villatoro-Tello, M. Montes-y Gomez, P. Rosso and L. Villasenor-Pineda, "Semantically-informed distance and similarity measures for paraphrase plagiarism identification," Journal of Intelligent & Fuzzy Systems, vol. 34, no. 5, pp. 2983-2990, May 2018.   DOI
31 R. Q. Yang, J. H. Zhang, X. Gao, F. Ji and H. Q. Chen, "Simple and Effective Text Matching with Richer Alignment Features," in Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4699-4709, July 2019.
32 Y. Kim, "Convolutional Neural Networks for Sentence Classification," in Proc. of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746-1751, October 2014.
33 Z. Z. Lan, M. D. Chen, S. Goodman, K. Gimpel, P. Sharma and R. Soricut, "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations," arXiv preprint arXiv: 1909.11942, September 2019.
34 W. P. Yin, H. Schutze, B. Xiang and B. W. Zhou, "ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs," Transactions of the Association for Computational Linguistics, vol. 4, pp. 566-567, December 2016.   DOI
35 L. Xu, X. Zhang and Q. Dong, "CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model," arXiv preprint arXiv: 2003.01355, March 2020.
36 A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin, "Attention is All You Need," in Proc. of the 31st International Conference on Neural Information Processing Systems, pp. 6000-6010, December 2017.
37 Q. Chen, X. D. Zhu, Z. H. Ling, S. Wei, H. Jiang and D. Inkpen, "Enhanced LSTM for Natural Language Inference," in Proc. of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1657-1668, July 2017.