AIGC视觉内容生成与溯源研究进展

刘安安; 苏育挺; 王岚君; 李斌; 钱振兴; 张卫明; 周琳娜; 张新鹏; 张勇东; 黄继武; 俞能海

doi:10.11834/jig.240003

生成式大模型与人机交互 | 浏览量 : 0 下载量: 67 CSCD: 0

PDF
导出
分享
收藏
专辑

AIGC视觉内容生成与溯源研究进展
Review on the progress of the AIGC visual content generation and traceability
2024年29卷第6期页码：1535-1554
纸质出版日期： 2024-06-16 ，
DOI： 10.11834/jig.240003
稿件说明：

移动端阅览

刘安安，苏育挺，王岚君，李斌，钱振兴，张卫明，周琳娜，张新鹏，张勇东，黄继武，俞能海. 2024. AIGC视觉内容生成与溯源研究进展. 中国图象图形学报， 29(06):1535-1554

Liu Anan， Su Yuting， Wang Lanjun， Li Bin， Qian Zhenxing， Zhang Weiming， Zhou Linna， Zhang Xinpeng， Zhang Yongdong， Huang Jiwu， Yu Nenghai. 2024. Review on the progress of the AIGC visual content generation and traceability. Journal of Image and Graphics， 29(06):1535-1554
刘安安，苏育挺，王岚君，李斌，钱振兴，张卫明，周琳娜，张新鹏，张勇东，黄继武，俞能海. 2024. AIGC视觉内容生成与溯源研究进展. 中国图象图形学报， 29(06):1535-1554 DOI： 10.11834/jig.240003.

Liu Anan， Su Yuting， Wang Lanjun， Li Bin， Qian Zhenxing， Zhang Weiming， Zhou Linna， Zhang Xinpeng， Zhang Yongdong， Huang Jiwu， Yu Nenghai. 2024. Review on the progress of the AIGC visual content generation and traceability. Journal of Image and Graphics， 29(06):1535-1554 DOI： 10.11834/jig.240003.

摘要

随着数字媒体与创意产业的快速发展，人工智能生成内容（artificial intelligence generated content， AIGC）技术以其在视觉内容生成中的创新应用而逐渐受到关注。本文旨在围绕AIGC视觉内容生成与溯源研究进展深入研讨。首先，针对图像生成技术进行探讨，从基于生成式对抗网络的传统方法出发，系统地分析了基于生成式对抗网络、自回归模型和扩散概率模型的最新进展。接着，深入探讨可控图像生成技术，突出了通过布局、线稿等附加信息以及基于视觉参考的方法来为创作者提供精确控制的技术现状。随着图像生成技术的革新和应用，生成图像的安全性问题逐渐浮现。而预先审核和过滤的技术手段已难以满足实际需求，故亟需实现生成内容的溯源来进行监管。因此，本文进而对生成图像溯源技术进行研讨，并聚焦水印技术在确保生成内容可靠性和安全性方面的应用。依据水印嵌入的流程节点，首先将现有的水印相关的生成图像溯源方法归为无水印嵌入的生成图像溯源、水印前置嵌入的生成图像溯源、水印后置嵌入的生成图像溯源以及联合生成的生成图像溯源并进行详细分析，然后介绍针对生成图像的水印攻击研究现状，最后对生成图像溯源技术进行总结和展望。鉴于视觉内容生成在质量和安全上的挑战，旨在为研究者提供一个视觉内容生成与溯源的系统研究视角，以促进数字媒体创作环境的安全与可信，并引导未来相关技术的发展方向。

Abstract

In the contemporary digital era， which is characterized by rapid technological advancements， multimedia content creation， particularly in visual content generation， has become an integral part of modern societal development. The exponential growth of digital media and the creative industry has attracted attention to artificial intelligence generated content （AIGC） technology. The groundbreaking applications of AIGC in visual content generation not only have equipped multimedia creators with novel tools and capabilities but also have delivered substantial benefits across diverse domains， which span from the realms of cinema and gaming to the immersive landscapes of virtual reality. This review comprehensive introduces the profound advancements within AIGC technology. Our particular emphasis is on the domain of visual content generation and its critical facet of traceability. Initially， our discussions trace the evolutionary path of image generation technology， from its inception within generative adversarial networks （GANs） to the latest advancements in Transformer auto-regressive models and diffusion probability models. This progression unveils a remarkable leap in the quality and capability of image generation， which underscores the rapid evolution of this field. This evolution has transitioned from its nascent stages to an era characterized by explosive growth. First， we delve into the development of GANs， encompassing their evolution from text-conditioned methods to sophisticated techniques for style control and the development of large-scale models. This type of technology pioneered the text-to-image generation. GANs can further improve their performance by expanding network parameters and dataset size due to their strong scalability. Furthermore， we explore the emergence of Transformer-based auto-regressive models， such as DALL·E and CogView， which have heralded a new epoch in the domain of image generation. The basic strategy of autoregressive models is to first use the Transformer structure to predict the feature sequence of images based on other conditional feature sequences such as text and sketches. Then， it uses a specially trained decoding network to decode these feature sequences into a complete image. They can generate realistic images based on the large-scale parameters. In addition， our discourse delves into the burgeoning interest surrounding diffusion probability models， which are renowned for their stable training methods and their ability to yield high-quality outputs. The diffusion models first adopt an iterative and random process to simulate the gradual transformation of the observed data into a known noise distribution. Then， they reconstruct the original data in the opposite direction from the noise distribution. This random process based on stochastic approach provides a more stable training process， while it also demonstrates impressive results in terms of generated quality and diversity. As the development of AIGC technology continues to advance， it encounters challenges， such as the enhancement in content quality and the need of precise control to align with specific requisites. Within this context， this review conducts a thorough exploration of controllable image generation technology， which is a pivotal research domain that strives to furnish meticulous control over the generated content. This achievement is facilitated through the integration of supplementary elements， such as intricate layouts， detailed sketches， and precise visual references. This approach empowers creators to preserve their artistic autonomy while upholding exacting standards of quality. One notable facet that has garnered considerable academic attention is the utilization of visual references as a mechanism to enable the generation of diverse styles and personalized outcomes by incorporating user-provided visual elements. This review underscores the profound potential inherent in these methodologies， which illustrates their transformative role across domains such as digital art and interactive media. The development of these technologies introduces new horizons in digital creativity. However， it presents profound challenges， particularly in the domain of image authenticity and the potential for malevolent misuse. These risks are exemplified by the creation of deep fakes or the proliferation of fake news. These challenges extend far beyond mere technical intricacies； they encompass substantial risks pertaining to individual privacy， security， and the broader societal implications of eroding public trust and social stability. In response to these formidable challenges， watermark-related image traceability technology has emerged as an indispensable solution. This technology harnesses the power of watermarking techniques to authenticate and verify AI-generated images， which safeguards their integrity. Within the pages of this review， we meticulously categorize these watermarking techniques into distinct types： watermark-free embedding， watermark pre-embedding， watermark post-embedding， and joint generation methods. First， we introduce the watermark-free embedding methods， which treat the generated traces left during model generation as fingerprints. The inherent fingerprint information is used to achieve model attribution of generated images and achieve traceability purposes. Second， the watermark pre-embedding methods aim to embed the watermark into input training data such as noise and image. Another aim is to use the embedded watermark data to train the generation model， which can also introduce traceability information in the generated image. Third， the watermark post-embedding methods divide the process of generating watermark images into two stages： image generation and watermark embedding. Watermark embedding is performed after image generation. Finally， the joint generation methods aim to achieve adaptive embedding of watermark information during the image generation process， minimize damage to the image generation process when fusing with image features， and ultimately generate images carrying watermarks. Each of these approaches plays a pivotal role in the verification of traceability across diverse scenarios， which offers a robust defense against potential misuses of AI-generated imagery. In conclusion， while AIGC technology offers promising new opportunities in visual content creation， it simultaneously causes significant challenges regarding the security and integrity of generated content. This comprehensive review covers the breadth of AIGC technology， which starts from an overview of existing image generation technologies， such as GANs， auto-regressive models， and diffusion probability models. It then categorizes and analyzes controllable image generation technology from the perspectives of additional conditions and visual examples. In addition， the review focuses on watermark-related image traceability technology， discusses various watermark embedding techniques and the current state of watermark attacks on generated images， and provides an extensive overview and future outlook of generation image traceability technology. The aim is to offer researchers a detailed， systematic， and comprehensive perspective on the advancements in AIGC visual content generation and traceability. This study deepens the understanding of current research trends， challenges， and future directions in this rapidly evolving field.

关键词

人工智能内容生成（AIGC）视觉内容生成可控图像生成生成内容安全生成图像溯源

Keywords

artificial intelligence generated content （AIGC）visual content generationcontrollable image generationsecurity of generated contenttraceability of generated images

references

Alam S， Jamil A， Saldhi A and Ahmad M. 2015. Digital image authentication and encryption using digital signature//Proceedings of 2015 International Conference on Advances in Computer Engineering and Applications. Ghaziabad， India： IEEE： 332-336 ［DOI： 10.1109/icacea.2015.7164725http://dx.doi.org/10.1109/icacea.2015.7164725］

Albright M and McCloskey S. 2019. Source generator attribution via inversion//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Long Beach， USA： IEEE： 8： #3

Asnani V， Yin X， Hassner T and Liu X M. 2023. Reverse engineering of generative models： inferring model hyperparameters from generated images. IEEE Transactions on Pattern Analysis and Machine Intelligence， 45（12）： 15477-15493 ［DOI： 10.1109/TPAMI.2023.3301451http://dx.doi.org/10.1109/TPAMI.2023.3301451］

Betker J， Goh G， Jing L， Brooks T， Wang J F， Li L J， Ouyang L， Zhuang J T， Lee J， Guo Y F， Manassra W， Dhariwal P， Chu C， Jiao Y X and Ramesh A. 2023. Improving image generation with better captions ［EB/OL］. ［2023-11-05］. https://cdn.openai.com/papers/dall-e-3.pdfhttps://cdn.openai.com/papers/dall-e-3.pdf

Bui T， Agarwal S， Yu N and Collomosse J. 2023. RoSteALS： robust steganography using autoencoder latent space//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Vancouver， Canada： IEEE： 933-942 ［DOI： 10.1109/cvprw59228.2023.00100http://dx.doi.org/10.1109/cvprw59228.2023.00100］

Bui T， Yu N and Collomosse J. 2022. RepMix： representation mixing for robust attribution of synthesized images//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv， Israel： Springer： 146-163 ［DOI： 10.1007/978-3-031-19781-9_9http://dx.doi.org/10.1007/978-3-031-19781-9_9］

Cui Y Q， Ren J， Xu H， He P F， Liu H， Sun L C， Xing Y and Tang J L. 2023. DiffusionShield： a watermark for copyright protection against generative diffusion models ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2306.04642.pdfhttps://arxiv.org/pdf/2306.04642.pdf

Dhariwal P and Nichol A. 2021. Diffusion models beat GANs on image synthesis//Advances in Neural Information Processing Systems， 34： 8780-8794

Ding M， Yang Z Y， Hong W Y， Zheng W D， Zhou C， Yin D， Lin J Y， Zou X， Shao Z， Yang H X and Tang J. 2021. CogView： mastering text-to-image generation via Transformers//Advances in Neural Information Processing Systems， 34： 19822-19835

Ding M， Zheng W D， Hong W Y and Tang J. 2022a. CogView2： faster and better text-to-image generation via hierarchical Transformers//Advances in Neural Information Processing Systems. New Orleans， USA： 35： 16890-16902.

Ding W P， Ming Y R， Cao Z H and Lin C T. 2022b. A generalized deep neural network approach for digital watermarking analysis. IEEE Transactions on Emerging Topics in Computational Intelligence， 6（3）： 613-627 ［DOI： 10.1109/tetci.2021.3055520http://dx.doi.org/10.1109/tetci.2021.3055520］

Ditria L and Drummond T. 2023. Hey that's mine imperceptible watermarks are preserved in diffusion generated outputs ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2308.11123.pdfhttps://arxiv.org/pdf/2308.11123.pdf

Fan L X， Ng K W and Chan C S. 2019. Rethinking deep neural network ownership verification： embedding passports to defeat ambiguity attacks//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： 4714-4723

Fei J W， Xia Z H， Tondi B and Barni M. 2022. Supervised GAN watermarking for intellectual property protection//2022 IEEE International Workshop on Information Forensics and Security. Shanghai， China： IEEE： 1-6 ［DOI： 10.1109/wifs55849.2022.9975409http://dx.doi.org/10.1109/wifs55849.2022.9975409］

Fernandez P， Couairon G， Jégou H， Douze M and Furon T. 2023. The stable signature： rooting watermarks in latent diffusion models ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2303.15435.pdfhttps://arxiv.org/pdf/2303.15435.pdf

Gal R， Alaluf Y， Atzmon Y， Patashnik O， Bermano A H， Chechik G and Cohen-Or D. 2022. An image is worth one word： personalizing text-to-image generation using textual inversion ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2208.01618.pdfhttps://arxiv.org/pdf/2208.01618.pdf

Girish S， Suri S， Rambhatla S and Shrivastava A. 2021. Towards discovery and attribution of open-world GAN generated images//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 14074-14083 ［DOI： 10.1109/iccv48922.2021.01383http://dx.doi.org/10.1109/iccv48922.2021.01383］

Goodfellow I， Pouget-Abadie J， Mirza M， Xu B， Warde-Farley D， Ozair S， Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal， Canada： MIT Press： 2672-2680

Ho J， Jain A and Abbeel P. 2020. Denoising diffusion probabilistic models//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： 6840-6851

Ho J and Salimans T. 2022. Classifier-free diffusion guidance ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2207.12598.pdfhttps://arxiv.org/pdf/2207.12598.pdf

Hu D H， Wang L， Jiang W J， Zheng S L and Li B. 2018. A novel image steganography method via deep convolutional generative adversarial networks. IEEE Access， 6： 38303-38314 ［DOI： 10.1109/access.2018.2852771http://dx.doi.org/10.1109/access.2018.2852771］

Kang M， Zhu J Y， Zhang R， Park J， Shechtman E， Paris S and Park T. 2023. Scaling up GANs for text-to-image synthesis//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver， Canada： IEEE： 10124-10134 ［DOI： 10.1109/cvpr52729.2023.00976http://dx.doi.org/10.1109/cvpr52729.2023.00976］

Karras T， Laine S and Aila T. 2019. A style-based generator architecture for generative adversarial networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 4396-4405 ［DOI： 10.1109/cvpr.2019.00453http://dx.doi.org/10.1109/cvpr.2019.00453］

Kingma D P and Welling M. 2022. Auto-encoding variational bayes ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/1312.6114.pdfhttps://arxiv.org/pdf/1312.6114.pdf

Kumari N， Zhang B L， Zhang R， Shechtman E and Zhu J Y. 2023. Multi-concept customization of text-to-image diffusion//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver， Canada： IEEE： 1931-1941 ［DOI： 10.1109/cvpr52729.2023.00192http://dx.doi.org/10.1109/cvpr52729.2023.00192］

Li D X， Li J N and Hoi S C H. 2023a. BLIP-diffusion： pre-trained subject representation for controllable text-to-image generation and editing ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2305.14720.pdfhttps://arxiv.org/pdf/2305.14720.pdf

Li J N， Li D X， Savarese S and Hoi S. 2023b. BLIP-2： bootstrapping language-image pre-training with frozen image encoders and large language models ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2301.12597.pdfhttps://arxiv.org/pdf/2301.12597.pdf

Li X Y. 2023. DiffWA： diffusion models for watermark attack ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2306.12790.pdfhttps://arxiv.org/pdf/2306.12790.pdf

Li Y H， Liu H T， Wu Q Y， Mu F Z， Yang J W， Gao J F， Li C Y and Lee Y J. 2023c. GLIGEN： open-set grounded text-to-image generation//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver， Canada： IEEE： 22511-22521 ［DOI： 10.1109/CVPR52729.2023.02156http://dx.doi.org/10.1109/CVPR52729.2023.02156］

Liu A A， Zhang G K， Su Y T， Xu N， Zhang Y D and Wang L J. 2023. T2IW： joint text to image and watermark generation ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2309.03815.pdfhttps://arxiv.org/pdf/2309.03815.pdf

Ma Y H， Zhao Z Y， He X L， Li Z， Backes M and Zhang Y. 2023. Generative watermarking against unauthorized subject-driven image synthesis ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2306.07754.pdfhttps://arxiv.org/pdf/2306.07754.pdf

Marra F， Gragnaniello D， Verdoliva L and Poggi G. 2019. Do GANs leave artificial fingerprints?//Proceedings of 2019 IEEE Conference on Multimedia Information Processing and Retrieval. San Jose， USA： IEEE： 506-511 ［DOI： 10.1109/MIPR.2019.00103http://dx.doi.org/10.1109/MIPR.2019.00103］

Mou C， Wang X T， Xie L B， Wu Y Z， Zhang J， Qi Z A， Shan Y and Qie X H. 2023. T2I-adapter： learning adapters to dig out more controllable ability for text-to-image diffusion models ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2302.08453.pdfhttps://arxiv.org/pdf/2302.08453.pdf

Nadimpalli A V and Rattani A. 2023. Proactive deepfake detection using GAN-based visible watermarking. ACM Transactions on Multimedia Computing， Communications， and Applications： #3625547 ［DOI： 10.1145/3625547http://dx.doi.org/10.1145/3625547］

Nichol A and Dhariwal P. 2021. Improved denoising diffusion probabilistic models//Proceedings of the 38th International Conference on Machine Learning. Virtual Event， PMLR： 139： 8162-8171

Nichol A， Dhariwal P， Ramesh A， Shyam P， Mishkin P， McGrew B， Sutskever I and Chen M. 2022. GLIDE： towards photorealistic image generation and editing with text-guided diffusion models//Proceedings of 2022 International Conference on Machine Learning. Baltimore， Maryland， USA： PMLR： 16784-16804

Ong D S， Chan C S， Ng K W， Fan L X and Yang Q. 2021. Protecting intellectual property of generative adversarial networks from ambiguity attacks//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 3629-3638 ［DOI： 10.1109/cvpr46437.2021.00363http://dx.doi.org/10.1109/cvpr46437.2021.00363］

Qiao T， Ma Y Y， Zheng N， Wu H Z， Chen Y L， Xu M and Luo X Y. 2023. A novel model watermarking for protecting generative adversarial network. Computers and Security， 127： #103102 ［DOI： 10.1016/j.cose.2023.103102http://dx.doi.org/10.1016/j.cose.2023.103102］

Qiao T T， Zhang J， Xu D Q and Tao D C. 2019. MirrorGAN： learning text-to-image generation by redescription//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 1505-1514 ［DOI： 10.1109/CVPR.2019.00160http://dx.doi.org/10.1109/CVPR.2019.00160］

Qin C， Zhang S， Yu N， Feng Y H， Yang X Y， Zhou Y B， Wang H， Neibles J C， Xiong C M， Savarese S， Ermon S， Fu Y and Xu R. 2023. UniControl： a unified diffusion model for controllable visual generation in the wild ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2305.11147.pdfhttps://arxiv.org/pdf/2305.11147.pdf

Raffel C， Shazeer N， Roberts A， Lee K， Narang S， Matena M， Zhou Y Q， Li W and Liu P J. 2020. Exploring the limits of transfer learning with a unified text-to-text Transformer. The Journal of Machine Learning Research， 21（1）： 5485-5551

Ramesh A， Dhariwal P， Nichol A， Chu C and Chen M. 2022. Hierarchical text-conditional image generation with CLIP latents ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2204.06125.pdfhttps://arxiv.org/pdf/2204.06125.pdf

Ramesh A， Pavlov M， Goh G， Gray S， Voss C， Radford A， Chen M and Sutskever I. 2021. Zero-shot text-to-image generation//Proceedings of the 38th International Conference on Machine Learning. Virtual-only： PMLR： 8821-8831

Reed S， Akata Z， Mohan S， Tenka S， Schiele B and Lee H. 2016a. Learning what and where to draw//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona， Spain： Curran Associates Inc.： 217-225

Reed S， Akata Z， Yan X C， Logeswaran L， Schiele B and Lee H. 2016b. Generative adversarial text to image synthesis//Proceedings of the 33rd International Conference on Machine Learning. New York， USA： JMLR： 1060-1069

Rolfe J T. 2017. Discrete variational autoencoders ［EB/OL］. ［2024-01-07］. https://arxiv.org/pdf/1609.02200.pdfhttps://arxiv.org/pdf/1609.02200.pdf

Rombach R， Blattmann A， Lorenz D， Esser P and Ommer B. 2022. High-resolution image synthesis with latent diffusion models//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 10674-10685 ［DOI： 10.1109/cvpr52688.2022.01042http://dx.doi.org/10.1109/cvpr52688.2022.01042］

Ruiz N， Li Y Z， Jampani V， Pritch Y， Rubinstein M and Aberman K. 2023. DreamBooth： fine tuning text-to-image diffusion models for subject-driven generation//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver， Canada： IEEE： 22500-22510 ［DOI： 10.1109/cvpr52729.2023.02155http://dx.doi.org/10.1109/cvpr52729.2023.02155］

Saharia C， Chan W， Saxena S， Li L L， Whang J， Denton E， Ghasemipour S K S， Ayan B K， Mahdavi S S， Lopes R G， Salimans T， Ho J， Fleet D J and Norouzi M. 2022. Photorealistic text-to-image diffusion models with deep language understanding//Advances in Neural Information Processing Systems. New Orleans， USA： 35： 36479-36494.

Schuhmann C， Beaumont R， Vencu R， Gordon C， Wightman R， Cherti M， Coombes T， Katta A， Mullis C， Wortsman M， Schramowski P， Kundurthy S， Crowson， K， Schmidt L， Kaczmarczyk R and Jitsev J. 2022. LAION-5B： an open large-scale dataset for training next generation image-text models//Advances in Neural Information Processing Systems. New Orleans， USA： 35： 25278-25294.

Sennrich R， Haddow B and Birch A. 2016. Neural machine translation of rare words with subword units ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/1508.07909.pdfhttps://arxiv.org/pdf/1508.07909.pdf

Shi C Y， Chen L， Wang C Y， Zhou X and Qin Z L. 2023. Review on image forensic techniques based on deep learning. Mathematics， 11： #3134 ［DOI： 10.20944/preprints202306.1179.v1http://dx.doi.org/10.20944/preprints202306.1179.v1］

Sohl-Dickstein J， Weiss E， Maheswaranathan N and Ganguli S. 2015. Deep unsupervised learning using nonequilibrium thermodynamics//Proceedings of the 32nd International Conference on Machine Learning. Lille， France： JMLR： 2256-2265

Tao M， Bao B K， Tang H and Xu C S. 2023. GALIP： generative adversarial CLIPs for text-to-image synthesis//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver， Canada： IEEE： 14214-14223 ［DOI： 10.1109/cvpr52729.2023.01366http://dx.doi.org/10.1109/cvpr52729.2023.01366］

van den Oord A， Vinyals O and Kavukcuoglu K. 2017. Neural discrete representation learning//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 6309-6318

Vaswani A， Shazeer N， Parmar N， Uszkoreit J， Jones L， Gomez A N， Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 6000-6010

Wang Q， Li S， Zhang X P and Feng G R. 2023. Rethinking neural style transfer： generating personalized and watermarked stylized images//Proceedings of the 31st ACM International Conference on Multimedia. Ottawa， Canada： ACM： 6928-6937 ［DOI： 10.1145/3581783.3612202http://dx.doi.org/10.1145/3581783.3612202］

Wang S Y， Wang O， Zhang R， Owens A and Efros A A. 2020. CNN-generated images are surprisingly easy to spot... for now//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 8692-8701 ［DOI： 10.1109/cvpr42600.2020.00872http://dx.doi.org/10.1109/cvpr42600.2020.00872］

Wu H Z， Liu G， Yao Y W and Zhang X P. 2021. Watermarking neural networks with watermarked images. IEEE Transactions on Circuits and Systems for Video Technology， 31（7）： 2591-2601 ［DOI： 10.1109/tcsvt.2020.3030671http://dx.doi.org/10.1109/tcsvt.2020.3030671］

Wu H Z， Zhang J， Li Y， Yin Z X， Zhang X P， Tian H， Li B， Zhang W M and Yu N H. 2023. Overview of artificial intelligence model watermarking. Journal of Image and Graphics， 28（6）： 1792-1810

吴汉舟，张杰，李越，殷赵霞，张新鹏，田晖，李斌，张卫明，俞能海. 2023. 人工智能模型水印研究进展. 中国图象图形学报， 28（6）： 1792-1810 ［DOI： 10.11834/jig.230010http://dx.doi.org/10.11834/jig.230010］

Wu W Y and Liu S S. 2023. A comprehensive review and systematic analysis of artificial intelligence regulation policies ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2307.12218.pdfhttps://arxiv.org/pdf/2307.12218.pdf

Xiong C， Qin C， Feng G R and Zhang X P. 2023. Flexible and secure watermarking for latent diffusion model//Proceedings of the 31st ACM International Conference on Multimedia. Ottawa， Canada： ACM： 1668-1676 ［DOI： 10.1145/3581783.3612448http://dx.doi.org/10.1145/3581783.3612448］

Xu T， Zhang P C， Huang Q Y， Zhang H， Gan Z， Huang X L and He X D. 2018. AttnGAN： fine-grained text to image generation with attentional generative adversarial networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 1316-1324 ［DOI： 10.1109/cvpr.2018.00143http://dx.doi.org/10.1109/cvpr.2018.00143］

Yang T Y， Wang D D， Tang F， Zhao X Y， Cao J and Tang S. 2023. Progressive open space expansion for open-set model attribution//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver， Canada： IEEE： 15856-15865 ［DOI： 10.1109/cvpr52729.2023.01522http://dx.doi.org/10.1109/cvpr52729.2023.01522］

Yin Z X， Yin H and Zhang X P. 2022. Neural network fragile watermarking with no model performance degradation//Proceedings of 2022 IEEE International Conference on Image Processing. Bordeaux， France： IEEE： 3958-3962 ［DOI： 10.1109/ICIP46576.2022.9897413http://dx.doi.org/10.1109/ICIP46576.2022.9897413］

Yu F， Seff A， Zhang Y D， Song S R， Funkhouser T and Xiao J X. 2016. LSUN： construction of a large-scale image dataset using deep learning with humans in the loop ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/1506.03365.pdfhttps://arxiv.org/pdf/1506.03365.pdf

Yu J H， Li X， Koh J Y， Zhang H， Pang R M， Qin J， Ku A， Xu Y Z， Baldridge J and Wu Y H. 2022a. Vector-quantized image modeling with improved VQGAN ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2110.04627.pdfhttps://arxiv.org/pdf/2110.04627.pdf

Yu J H， Xu Y Z， Koh J Y， Luong T， Baid G， Wang Z R， Vasudevan V， Ku A， Yang Y F， Ayan B K， Hutchinson B， Han W， Parekh Z， Li X， Zhang H， Baldridge J and Wu Y H. 2022b. Scaling autoregressive models for content-rich text-to-image generation ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2206.10789.pdfhttps://arxiv.org/pdf/2206.10789.pdf

Yu N， Davis L and Fritz M. 2019. Attributing fake images to GANs： Learning and analyzing GAN fingerprints//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 7555-7565 ［DOI： 10.1109/iccv.2019.00765http://dx.doi.org/10.1109/iccv.2019.00765］

Yu N， Skripniuk V， Abdelnabi S and Fritz M. 2021. Artificial fingerprinting for generative models： rooting deepfake attribution in training data//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 14428-14437 ［DOI： 10.1109/iccv48922.2021.01418http://dx.doi.org/10.1109/iccv48922.2021.01418］

Yu N， Skripniuk V， Chen D F， Davis L and Fritz M. 2022c. Responsible disclosure of generative models using scalable fingerprinting ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2012.08726.pdfhttps://arxiv.org/pdf/2012.08726.pdf

Zeng Y W， Tan J X， You Z X， Qian Z X and Zhang X P. 2023. Watermarks for generative adversarial network based on steganographic invisible backdoor//Proceedings of 2023 IEEE International Conference on Multimedia and Expo. Brisbane， Australia： IEEE： 1211-1216 ［DOI： 10.1109/icme55011.2023.00211http://dx.doi.org/10.1109/icme55011.2023.00211］

Zhang H， Xu T， Li H S， Zhang S T， Wang X G， Huang X L and Metaxas D. 2017. StackGAN： text to photo-realistic image synthesis with stacked generative adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 5908-5916 ［DOI： 10.1109/iccv.2017.629http://dx.doi.org/10.1109/iccv.2017.629］

Zhang K A， Xu L， Cuesta-Infante A and Veeramachaneni K. 2019. Robust invisible video watermarking with attention ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/1909.01285.pdfhttps://arxiv.org/pdf/1909.01285.pdf

Zhang L M， Rao A Y and Agrawala M. 2023. Adding conditional control to text-to-image diffusion models//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Paris， France： IEEE： 3813-3824 ［DOI： 10.1109/iccv51070.2023.00355http://dx.doi.org/10.1109/iccv51070.2023.00355］

Zhao X D， Zhang K X， Su Z H， Vasan S， Grishchenko I， Kruegel C， Vigna G， Wang Y X and Li L. 2023a. Invisible image watermarks are provably removable using generative AI ［EB/OL］. ［2023-11-05］. https://arxiv.org/pdf/2306.01953.pdfhttps://arxiv.org/pdf/2306.01953.pdf

Zhao Y， Liu B， Ding M， Liu B P， Zhu T Q and Yu X. 2023b. Proactive deepfake defence via identity watermarking//Proceedings of 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa， USA： IEEE： 4591-4600 ［DOI： 10.1109/wacv56688.2023.00458http://dx.doi.org/10.1109/wacv56688.2023.00458］

文章被引用时，请邮件提醒。

提交

暂无数据