DOI QR코드

DOI QR Code

Text Watermarking Based on Syntactic Constituent Movement

구문요소의 전치에 기반한 문서 워터마킹

  • 김미영 (성신여자대학교 컴퓨터정보학부)
  • Published : 2009.02.28

Abstract

This paper explores a method of text watermarking for agglutinative languages and develops a syntactic tree-based syntactic constituent movement scheme. Agglutinative languages provide a good ground for the syntactic tree-based natural language watermarking because syntactic constituent order is relatively free. Our proposed natural language watermarking method consists of seven procedures. First, we construct a syntactic dependency tree of unmarked text. Next, we perform clausal segmentation from the syntactic tree. Third, we choose target syntactic constituents, which will move within its clause. Fourth, we determine the movement direction of the target constituents. Then, we embed a watermark bit for each target constituent. Sixth, if the watermark bit does not coincide with the direction of the target constituent movement, we displace the target constituent in the syntactic tree. Finally, from the modified syntactic tree, we obtain a marked text. From the experimental results, we show that the coverage of our method is 91.53%, and the rate of unnatural sentences of marked text is 23.16%, which is better than that of previous systems. Experimental results also show that the marked text keeps the same style, and it has the same information without semantic distortion.

이 논문은 한국어 문장을 대상으로 구문요소의 전치를 기반으로 한 문서 워터마킹 방법을 제안한다. 한국어와 같은 교착어는 구문요소의 순서가 자유롭기 때문에 구문 트리 기반의 자연어 워터마킹을 위한 좋은 환경을 제공한다. 본 논문에서 제안하는 자연어 워터마킹 방법은 7단계로 구성되어 있다. 첫째, 문장의 구문분석을 수행한다. 다음으로, 구문요소가 해당 절의 범위 안에서만 전치되도록 범위를 한정하기 위하여 구문 트리로부터 각 절을 분할한다. 세 번째로, 전치를 위한 목표 구문요소를 선택한다. 네 번째, 목표 구문요소의 전치 후에도 문장의 의미나 문체의 변화가 최소화되도록 가장 자연스러운 전이위치를 결정한다. 그 후, 목표 구문요소에 대한 워터마크 비트를 삽입한다. 여섯 번째 단계로, 워터마크 비트가 목표 구문요소의 전치 방향과 상응하지 않으면 구문 트리에서 목표 구문요소를 전치한다. 마지막으로 변환된 구문 트리에서 워터마킹된 문서를 얻는다. 실험 결과를 통해 본 논문에서 제안한 방법의 적용률은 91.53%이고, 최종 워터마킹된 문장들 중 부자연스러운 문장의 비율은 23.16%로서 기존 시스템들보다 좋은 결과를 보여준다. 또한 워터마킹된 문장이 원시 문장과 같은 문체를 유지하고, 의미적인 왜곡없이 같은 정보를 나타내고 있다.

Keywords

References

  1. H. M. Meral, E. Sevinc, E. Unkar, B. Sankur, A. S. Ozsoy and T. Gungor, “Syntactic tools for text watermarking,” In Proc. of the SPIE International Conference on Security, Steganography, and Watermarking of Multimedia Contents, pp.1-12, 2007
  2. I.Cox, M. L. Miller, J. A. Bloom and M. Kaufman, “Digital Watermarking,” Digital Watermarking, pp.1-40, 2002
  3. M. Topkara, C. M. Taskiran and E. J. Delp, “Natural language watermarking,” SPIE Conf. On Security, Steganography and Watermarking of Multimedia Contents, pp.441-452, 2005 https://doi.org/10.1117/12.593790
  4. C. M. Taskiran, M. Topkara and E. J. Delp, “Attacks on linguistic steganography systems using text analysis,” SPIE Conf. On Security, Steganography and Watermarking of Multimedia Contents, pp.313-336, 2006
  5. U. Topkara, M. Topkara and M. J. Atallah, “The hiding Virtues of Ambiguity: Quantifiably Resilient Watermarking of Natural language Text through Synonym Substitutiions,” In Proc. Of ACM Multimedia and Security Conference, pp.164-174, 2006
  6. M. J. Atallah, V. Raskin, M. Crogan, C. Hempelmann, F. Kerschbaum, D. Mohamed and S. Naik. “Natural language watermarking: design, analysis, and proof-of-concept implementation,” In Proc. of the International Information Hiding Workshop, pp.185-199, 2001
  7. M. Topkara, U. Topkara and M. J. Atallah, “Words are not enough: sentence level natural language watermarking”, In Proc. of 4th ACM Workshop on Content Protection and Security (in conjunction with ACM Multimedia), pp.37-46, 2006 https://doi.org/10.1145/1178766.1178777
  8. M. Atallah, V. Raskin, C. F. Hempelmann, M. Karahan, R. Sion, K. E. Triezenberg and U. Topkara, “Natural language watermarking and tamperproofing,” Lecture Notes in Computer Sciences, pp.196-212, 2002 https://doi.org/10.1007/3-540-36415-3_13
  9. Osamu Takizawa, Kyoko Makino, Tsutomu Matsumoto, Hiroshi Nakagawa and Ichiro Murase, “Method of Hiding Information in Agglutinative Language Documents Using Adjustment to New Line Positions”. Knowledge-Based Intelligent Information and Engineering Systems(KES) (3) pp.1039-1048, 2005
  10. M. Y. Kim, S. J. Kang and J. H. Lee, “Resolving Ambiguity in Inter-chunk Dependency Parsing,” Proc. of the Sixth Natural Language Processing Pacific Rim Symposium(NLPRS), pp.263-270, 2001
  11. 권재일, “한국어 문법의 연구”, 서울:박이정, 1994
  12. K. Papineni, S. Roukos, T. Ward and W. Zhu, “Blue: a method for automatic evaluation of machine translation” In Proc. of 40th Annual Meeting of the ACL, pp.311-318, 2002
  13. National Institute of Standards and Technology. Machine translations benchmark tests provided by national institute of standards and technology. In http://www.nist.gov/speech/tests/mt/resources/
  14. Y. L. Chiang, L. P. Chang, W. T. Hsieh and W. C. Chen, “Natural language watermarking using semantic substitution for Chinese text,” Lecture Notes in Computer Science, pp. 129-140, 2004