DOI QR코드

DOI QR Code

Summarizing the Differences in Chinese-Vietnamese Bilingual News

  • Wu, Jinjuan (Dept. of Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology) ;
  • Yu, Zhengtao (Dept. of Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology) ;
  • Liu, Shulong (Dept. of Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology) ;
  • Zhang, Yafei (Dept. of Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology) ;
  • Gao, Shengxiang (Dept. of Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology)
  • Received : 2017.11.06
  • Accepted : 2018.07.29
  • Published : 2019.12.31

Abstract

Summarizing the differences in Chinese-Vietnamese bilingual news plays an important supporting role in the comparative analysis of news views between China and Vietnam. Aiming at cross-language problems in the analysis of the differences between Chinese and Vietnamese bilingual news, we propose a new method of summarizing the differences based on an undirected graph model. The method extracts elements to represent the sentences, and builds a bridge between different languages based on Wikipedia's multilingual concept description page. Firstly, we calculate the similarity between Chinese and Vietnamese news sentences, and filter the bilingual sentences accordingly. Then we use the filtered sentences as nodes and the similarity grade as the weight of the edge to construct an undirected graph model. Finally, combining the random walk algorithm, the weight of the node is calculated according to the weight of the edge, and sentences with highest weight can be extracted as the difference summary. The experiment results show that our proposed approach achieved the highest score of 0.1837 on the annotated test set, which outperforms the state-of-the-art summarization models.

Keywords

Acknowledgement

This work was supported by National key research and development plan project (No. 2018YFC0830105, 2018YFC0830100), National Nature Science Foundation (No. 61732005, 61672271, 61761026, 61662041, 61762056), High-tech Industry Development Project of Yunnan Province (No. 201606), and Natural Science Foundation of Yunnan Province (No. 2018FB104).

References

  1. R. Mihalcea, C. Banea, and J. Wiebe, "Learning multilingual subjective language via cross-lingual projections," in Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, 2007, pp. 976-983.
  2. M. S. Almeida, C. Pinto, H. Figueira, P. Mendes, and A. F. Martins, "Aligning opinions: cross-lingual opinion mining with dependencies," in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 2015, pp. 408-418.
  3. C. Banea, R. Mihalcea, and J. Wiebe, "Porting multilingual subjectivity resources across languages," IEEE Transactions on Affective Computing, vol. 4, no. 2, pp. 211-225, 2013. https://doi.org/10.1109/T-AFFC.2013.1
  4. C. Banea, R. Mihalcea, and J. Wiebe, "Multilingual subjectivity: are more languages better?," in Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, 2010, pp. 28-36.
  5. C. Banea, R. Mihalcea, J. Wiebe, and S. Hassan, "Multilingual subjectivity analysis using machine translation," in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Honolulu, HI, 2008, pp. 127-135.
  6. A. Nenkova and K. McKeown, "A survey of text summarization techniques," in Mining Text Data. Boston, MA: Springer, 2012, pp. 43-76.
  7. D. Gillick, B. Favre, and D. Hakkani-Tur, "The ICSI Summarization System at TAC 2008," 2008; https://pageperso.lis-lab.fr/benoit.favre/papers/favre_tac2008.pdf
  8. A. Celikyilmaz and D. Hakkani-Tur, "A hybrid hierarchical model for multi-document summarization," in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 815-824.
  9. G. Salton, A. Singhal, M. Mitra, and C. Buckley, "Automatic text structuring and summarization," Information Processing & Management, vol. 33, no. 2, pp. 193-207, 1997. https://doi.org/10.1016/S0306-4573(96)00062-3
  10. Y. Li and S. Li, "Query-focused multi-document summarization: combining a topic model with graph-based semi-supervised learning," in Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, 2014, pp. 1197-1207.
  11. D. Parveen and M. Strube, "Integrating importance, non-redundancy and coherence in graph-based extractive summarization," in Proceedings of the 24th International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 2015, pp. 1298-1304.
  12. X. Wan and J. Zhang, "CTSUM: extracting more certain summaries for news articles," in Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, Gold Coast, Australia, 2014, pp. 787-796.
  13. Z. Cao, F. Wei, L. Dong, S. Li, and M. Zhou, "Ranking with recursive neural networks and its application to multi-document summarization," in Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, 2015, pp. 2153-2159.
  14. J. Cheng and M. Lapata, "Neural summarization by extracting sentences and words," 2016; https://arxiv.org/abs/1603.07252.
  15. S. Narayan, N. Papasarantopoulos, S. B. Cohen, and M. Lapata, "Neural extractive summarization with side information," 2017; https://arxiv.org/abs/1704.04530.
  16. J. G. Yao, X. Wan, and J. Xiao, "Recent advances in document summarization," Knowledge and Information Systems, vol. 53, no. 2, pp. 297-336, 2017. https://doi.org/10.1007/s10115-017-1042-4
  17. S. P. Singh, A. Kumar, A. Mangal, and S. Singhal, "Bilingual automatic text summarization using unsupervised deep learning," in Proceedings of 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), Chennai, India, 2016, pp. 1195-1200.
  18. X. Wan, H. Jia, S. Huang, and J. Xiao, "Summarizing the differences in multilingual news," in Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, 2011, pp. 735-744.
  19. Linguistic Data Consortium, "ACE 2005 - Chinese entities V5.5," 2005; https://www.ldc.upenn.edu/collaborations/past-projects/ace/annotation-tasks-and-specifications.
  20. W. Che, Z. Li, and T. Liu, "LTP: a Chinese language technology platform," in Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, Beijing, China, 2010, pp. 13-16.
  21. SourceForge.Net, "JVnTextPro: A Java-based Vietnamese Text Processing Tool," 2010; http://jvntextpro.sourceforge.net/.
  22. Q. Yang, Z. Yu, X. Hong, S. Gao, and Z. Tang, "Chinese-Vietnamese word similarity computation based on Wikipedia," Journal of Nanjing University of Science and Technology, vol. 40, no. 4, pp. 461-466, 2016.
  23. X. Wan, J. Yang, and J. Xiao, "Manifold-ranking based topic-focused multi-document summarization," in Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, 2007, pp. 2903-2908).
  24. C. Y. Lin and E. Hovy, "Automatic evaluation of summaries using n-gram co-occurrence statistics," in Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, 2003, pp. 150-157.
  25. D. R. Radev, H. Jing, M. Stys, and D. Tam, "Centroid-based summarization of multiple documents," Information Processing & Management, vol. 40, no. 6, pp. 919-938, 2004. https://doi.org/10.1016/j.ipm.2003.10.006
  26. J. G. Yao, X. Wan, and J. Xiao, "Phrase-based compressive cross-language summarization," in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, pp. 118-127.