DOI QR코드

DOI QR Code

Format-Controllable Text Editing in Real-Scene Images

실제 장면 이미지에서 포맷 제어 가능한 텍스트 편집

  • Quang-Vinh Dang (Dept. of Artificial Intelligence Convergence, Chonnam National University) ;
  • Hyung-Jeong Yang (Dept. of Artificial Intelligence Convergence, Chonnam National University) ;
  • Soo-Hyung Kim (Dept. of Artificial Intelligence Convergence, Chonnam National University)
  • Published : 2024.10.31

Abstract

Flexibility is crucial in applications where users or systems require precise control over the appearance of text in images, particularly in scene text editing tasks. However, previous methods have primarily focused on altering text content, often neglecting the important aspect of controlling text formatting. In this paper, we propose a text editing model that not only edits content but also provides control over the format, utilizing a diffusion model with denoising and text-aware losses. By integrating these mechanisms, the model is capable of generating high-quality scene text images based on user-specified inputs such as text, size, and font, ensuring that both the content and appearance align with user preferences. We evaluate the model's performance using OCR accuracy on the ICDAR FST dataset, and the results demonstrate that our approach is highly competitive and effective when compared to existing methods in the field.

Keywords

Acknowledgement

This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2023-RS-2023-00256629) grant funded by the Korea government (MSIT), and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2023-00219107).

References

  1. Chen H., Xu Z., Gu Z., Li Y., Meng C., Zhu H., Wang W., "DiffUTE: Universal text editing diffusion model," Advances in Neural Information Processing Systems, vol. 36, 2024.
  2. Qu Y., Tan Q., Xie H., Xu J., Wang Y., Zhang Y., "Exploring stroke-level modifications for scene text editing," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, pp. 2119-2127, 2023.
  3. Lee J., Kim Y., Kim S., Yim M., Shin S., Lee G., Park S., "RewriteNet: Reliable scene text editing with implicit decomposition of text contents and styles," arXiv preprint, arXiv:2107.11041, 2021.
  4. Dang Q.V., Lee G.S., "Scene text segmentation via multitask cascade transformer with paired data synthesis," IEEE Access, 2023.
  5. Dang Q.V., Lee G.S., "Scene text segmentation by paired data synthesis," Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP), pp. 545-549, 2023.
  6. Kingma D.P., "Auto-encoding variational bayes," arXiv preprint, arXiv:1312.6114, 2013.
  7. Isola P., Zhu J.Y., Zhou T., Efros A.A., "Image-to-image translation with conditional adversarial networks," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125-1134, 2017.
  8. Wu L., Zhang C., Liu J., Han J., Liu J., Ding E., Bai X., "Editing text in the wild," Proceedings of the 27th ACM International Conference on Multimedia, pp. 1500-1508, 2019.
  9. Ji J., Zhang G., Wang Z., Hou B., Zhang Z., Price B., Chang S., "Improving diffusion models for scene text editing with dual encoders," arXiv preprint, arXiv:2304.05568, 2023
  10. Fang S., Xu C., Niu Y., Chen Z., Pu S., Huang F., "Read like humans: Autonomous, bidirectional and iteratively refining scene text recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7098-7107, 2021.
  11. Chen Z., Lin W., Huang J., Pu S., "TextDiffuser: Diffusion models for scene text editing," arXiv preprint, arXiv:2304.02328, 2024. https://doi.org/10.1109/TASLP.2023.3345146
  12. Karatzas D., Shafait F., Uchida S., Iwamura M., Bigorda L., Mestre S.R., Mas J., Mota D.F., Almazan J., de las Heras L.P., "ICDAR 2013 Robust reading competition," Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1484-1493, 2013.
  13. Ch'Ng S., Chan C.S., "Total-Text: A comprehensive dataset for scene text detection and recognition," Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935-942, 2017.
  14. Xu Y., Wang X., Li X., Lv Z., Zhang Y., "Rethinking text segmentation: A novel dataset and method," Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2563-2572, 2021.