[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2021.11.017

A Generation-based Text Steganography by Maintaining Consistency of Probability Distribution

Yang, Boya (College of Information and Electrical Engineering, China Agricultural University)
Peng, Wanli (College of Information and Electrical Engineering, China Agricultural University)
Xue, Yiming (College of Information and Electrical Engineering, China Agricultural University)
Zhong, Ping (College of Science, China Agricultural University)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.15, no.11, 2021 , pp. 4184-4202 More about this Journal

Abstract

Text steganography combined with natural language generation has become increasingly popular. The existing methods usually embed secret information in the generated word by controlling the sampling in the process of text generation. A candidate pool will be constructed by greedy strategy, and only the words with high probability will be encoded, which damages the statistical law of the texts and seriously affects the security of steganography. In order to reduce the influence of the candidate pool on the statistical imperceptibility of steganography, we propose a steganography method based on a new sampling strategy. Instead of just consisting of words with high probability, we select words with relatively small difference from the actual sample of the language model to build a candidate pool, thus keeping consistency with the probability distribution of the language model. What's more, we encode the candidate words according to their probability similarity with the target word, which can further maintain the probability distribution. Experimental results show that the proposed method can outperform the state-of-the-art steganographic methods in terms of security performance.

Keywords

Steganography; Steganalysis; Linguistic Steganography; Probability Distribution; Imperceptibility;

Citations & Related Records

Reference

1	S. Zhang, Z. Yang,J. Yang and Y. Huang, "Provably Secure Generative Linguistic Steganography," in Proc. of Findings of the Association for Computational Linguistics: ACLIJCNLP 2021, pp. 3046-3055, 2021.
2	T. Fang, M. Jaggi, and K. Argyraki, "Generating steganographic text with lstms," in Proc. of the 55th Annual Meeting of the Association for Computational Linguistics - Student Research Workshop, pp. 100-106, Jul. 2017.
3	B. Li, S. Tan, M. Wang, J. Huang, "Investigation on cost assignment in spatial image steganography," IEEE Transactions on Information Forensics and Security, 9(8), pp. 1264-1277, 2014. DOI
4	T Y. Tew and K. Wong, "An Overview of Information Hiding in H.264/AVC Compressed Video," IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 2, pp. 305-319, Feb. 2014,. DOI
5	Y. Xue, J. Zhou, H. Zeng, P. Zhong, J. Wen, "An adaptive steganographic scheme for h.264/avc video with distortion optimization," Signal Processing: Image Communication, Vol. 76, pp. 22-30, 2019. DOI
6	K. Bennett, "Linguistic steganography: Survey, analysis, and robustness concerns for hiding information in text," 2004.
7	M. Chapman and G. Davida, "Hiding the hidden: A software system for concealing ciphertext as innocuous text," Information and Communications Security, pp. 335-345, 1997.
8	F. Dai and Z. Cai, "Towards Near-imperceptible Steganographic Text," in Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4303-4308, Jul. 2019.
9	A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, "Learning word vectors for sentiment analysis," in Proc. of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1. Association for Computational Linguistics, pp. 142-150, Jun. 2011.
10	D.P. Kingma, J. Ba, "Adam: A Method for Stochastic Optimization," arXiv preprint arXiv:1412.6980, 2014.
11	P. Wayner, "Mimic functions," Cryptologia, vol. 16, no. 3, pp. 193-214,1992. DOI
12	J. Wen, X. Zhou, P. Zhong, and Y. Xue, "Convolutional neural network-based text steganalysis," IEEE Signal Processing Letters, vol. 26, no. 3, pp. 460-464, 2019. DOI
13	Z. Yang, S. Jin, Y. Huang, Y. Zhang, and H. Li, "Automatically generate steganographic text based on markov model and huffman coding," arXiv preprint arXiv:1811.04720, 2018.
14	X. Liao, J. Yin, M. Chen and Z. Qin, "Adaptive Payload Distribution in Multiple Images Steganography Based on Image Texture Features," IEEE Transactions on Dependable and Secure Computing, 2020.
15	J. Wen, X. Zhou, M. Li, P. Zhong, Y. Xue, "A novel natural language steganographic framework based on image description neural network," Journal of Visual Communication and Image Representation, Vol. 61, pp. 157-169, May. 2019. DOI
16	Z. Chen, L. Huang, Z. Yu, W. Yang, L. Li, X. Zheng, X. Zhao, "Linguistic steganography detection using statistical characteristics of correlations between words," in Proc. of International Workshop on Information Hiding, pp. 224-235, 2008.
17	H. H. Moraldo, "An approach for text steganography based on markov chains," arXiv preprint arXiv:1409.0915, 2014.
18	Z. Yang, X. Guo, Z. Chen, Y. Huang and Y. Zhang, "RNN-Stega: Linguistic Steganography Based on Recurrent Neural Networks," IEEE Transactions on Information Forensics and Security, vol. 14, no. 5, pp. 1280-1295, May 2019. DOI
19	Z. Yang, S. Zhang, Y. Hu, Z. Hu and Y. Huang, "VAE-Stega: Linguistic Steganography Based on Variational Auto-Encoder," IEEE Transactions on Information Forensics and Security, vol. 16, pp. 880-895, 2021. DOI
20	Z. M. Ziegler, Y. Deng, and A. M. Rush, "Neural linguistic steganography," in Proc. of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1210-1215, Nov. 2019.
21	A. Go, R. Bhayani, and L. Huang, "Twitter sentiment classification using distant supervision," CS224N Project Report, Stanford, vol. 1, no. 12, 2009.
22	"kaggle," 2017. [Online]. Available: https://www.kaggle.com/snapcrack/all-the-news/data
23	P. Meng, L. Hang, W. Yang, Z. Chen and H. Zheng, "Linguistic Steganography Detection Algorithm Using Statistical Language Model," in Proc. of 2009 International Conference on Information Technology and Computer Science, pp. 540-543, 2009..
24	Y. Zhu, S. Lu, L. Zheng, J. Guo, W. Zhang, J. Wang, Y. Yu, "Texygen: A benchmarking platform for text generation models," in Proc. of The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 1097-1100, 2018.
25	Z. Yang, K. Wang, J. Li, Y. Huang and Y. Zhang, "TS-RNN: Text Steganalysis Based on Recurrent Neural Networks," IEEE Signal Processing Letters, vol. 26, no. 12, pp. 1743-1747, Dec. 2019. DOI
26	Devlin, J. et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proc. of NAACL-HLT, pp. 4171-4186, June. 2019.
27	J. Shen, H. Ji and J. Han, "Near-imperceptible Neural Linguistic Steganography via SelfAdjusting," in Proc. of EMNLP, pp. 303-313, 2020.
28	S. Kullback. R. A. Leibler, "On Information and Sufficiency," Ann. Math. Statist, 22 (1), pp. 79 - 86, Mar. 1951. DOI
29	Y. Luo and Y. Huang, "Text steganography with high embedding rate: Using recurrent neural networks to generate Chinese classic poetry," in Proc. of ACM Workshop on Information Hiding and Multimedia Security, pp. 99-104, Jun. 2017.