面向人脸修复篡改检测的大规模数据集
Large-scale datasets for facial tampering detection with inpainting techniques
- 2024年29卷第7期 页码:1834-1848
纸质出版日期: 2024-07-16
DOI: 10.11834/jig.230422
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-07-16 ,
移动端阅览
李伟, 黄添强, 黄丽清, 郑翱鲲, 徐超. 2024. 面向人脸修复篡改检测的大规模数据集. 中国图象图形学报, 29(07):1834-1848
Li Wei, Huang Tianqiang, Huang Liqing, Zheng Aokun, Xu Chao. 2024. Large-scale datasets for facial tampering detection with inpainting techniques. Journal of Image and Graphics, 29(07):1834-1848
目的
2
图像合成方法随着计算机视觉的不断发展和深度学习技术的逐渐成熟为人们的生活带来了丰富的体验。然而,用于传播虚假信息的恶意篡改图像可能对社会造成极大危害,使人们对数字内容在图像媒体中的真实性产生怀疑。面部编辑作为一种常用的图像篡改手段,通过修改面部的五官信息来伪造人脸。图像修复技术是面部编辑常用的手段之一,使用其进行面部伪造篡改同样为人们的生活带来了很大干扰。为了对此类篡改检测方法的相关研究提供数据支持,本文制作了面向人脸修复篡改检测的大规模数据集。
方法
2
具体来说,本文选用了不同质量的源数据集(高质量的人脸图像数据集CelebA-HQ及低质量的人脸视频数据集FF++),通过图像分割方法将面部五官区域分割,最后使用两种基于深度网络的修复方法CTSDG(image inpainting via conditional texture and structure dual generation)和RFR(recurrent feature reasoning for image inpainting)以及一种传统修复方法SC(struct completion),生成总数量达到60万幅的大规模修复图像数据集。
结果
2
实验结果表明,由FF++数据集生成的图像在基准检测网络ResNet-50下的检测精度下降了15%,在Xception-Net网络下检测精度下降了5%。且不同面部部位的检测精度相差较大,其中眼睛部位的检测精度最低,检测精度为0.91。通过泛化性实验表明,同一源数据集生成的数据在不同部位的修复图像间存在一定的泛化性,而不同的源数据制作的数据集间几乎没有泛化性。因此,该数据集也可为修复图像之间的泛化性研究提供研究数据,可以在不同数据集、不同修复方式和不同面部部位生成的图像间进行修复图像的泛化性研究。
结论
2
基于图像修复技术的篡改方式在一定程度上可以骗过篡改检测器,对于此类篡改方式的检测方法研究具有现实意义。提供的大型基于修复技术的人脸篡改数据集为该领域的研究提供了新的数据来源,丰富了数据多样性,为深入研究该类型的人脸篡改和检测方法提供了有力的基准。数据集开源地址
https://pan.baidu.com/s/1-9HIBya9X-geNDe5zcJldw
https://pan.baidu.com/s/1-9HIBya9X-geNDe5zcJldw
?pwd=thli。
Objective
2
DeepFake technology, born with the continuous maturation of deep learning techniques, primarily utilizes neural networks to create non-realistic faces. This method has enriched people’s lives as computer vision advances and deep learning technologies mature. It has revolutionized the film industry by generating astonishing visuals and reducing production costs. Similarly, in the gaming industry, it has facilitated the creation of smooth and realistic animation effects. However, the malicious use of image manipulation to spread false information poses significant risks to society, casting doubt on the authenticity of digital content in visual media. Forgery techniques encompass four main categories: face reenactment, face replacement, face editing, and face synthesis. Face editing, a commonly employed image manipulation method, involves falsifying facial features by modifying the information related to the five facial regions. As one of the commonly employed methods in facial editing, image inpainting technology involves utilizing known content from an image to fill in missing areas, aiming to restore the image in a way that aligns as closely as possible with human perception. In the context of facial forgery, image inpainting is primarily used for identity falsification, wherein facial features are altered to achieve the goal of replacing a face. The use of image inpainting for facial manipulation similarly introduces significant disruption to people’s lives. To support research on detection methods for such manipulations, this paper produced a large-scale dataset for face manipulation detection based on inpainting techniques.
Method
2
This paper specifically focuses on the field of image tampering detection, utilizing two classic datasets: the high-quality CelebA-HQ dataset, comprising 25 000 high-resolution (1 024 × 1 024 pixels) celebrity face images, and the low-quality FF++ dataset, consisting of 15 000 face images extracted from video frames. On the basis of the two datasets, facial feature regions (eyebrows, eyes, nose, mouth, and the entire facial area) are segmented using image segmentation methods. Corresponding mask images are created, and the segmented facial regions are directly obscured on the original image. Two deep neural network-based inpainting methods (image inpainting via conditional texture and structure dual generation (CTSDG) and recurrent feature reasoning for image inpainting (RFR)) along with a traditional inpainting method (struct completion(SC)) were employed. The deep neural network methods require the provision of mask images to indicate the areas for inpainting, while the traditional method could directly perform inpainting on segmented facial feature images. The facial regions were inpainted using these three methods, resulting in a large-scale dataset comprising 600 000 images. This extensive dataset incorporates diverse pre-processing techniques, various inpainting methods, and includes images with different qualities and inpainted facial regions. It serves as a valuable resource for training and testing in related detection tasks, offering a rich dataset for subsequent research in the field, and also establishes a meaningful benchmark dataset for future studies in the domain of face tampering detection.
Result
2
We present comparative experiments conducted on the generated dataset, revealing notable findings. Experimental results indicate a 15% decrease in detection accuracy for images derived from the FF++ dataset under the ResNet-50 benchmark detection network. Under the Xception-Net network, the detection accuracy experiences a 5% decline. Furthermore, significant variations in detection accuracy are observed among different facial regions, with the lowest accuracy recorded in the eye region at 0.91. Generalization experiments suggest that inpainted images from the same source dataset exhibit a certain degree of generalization across different facial regions. In contrast, minimal generalization is observed among datasets created from different source data. Consequently, this dataset also serves as valuable research data for studying the generalization of inpainted images across different facial regions. Visualization tools demonstrate that the detection network indeed focuses on the inpainted facial features, affirming its attention to the manipulated facial regions. This work provides new research perspectives for methods of detecting image restoration-based manipulations.
Conclusion
2
The use of image inpainting techniques for tampering introduces a challenging scenario that can deceive conventional tampering detectors to a certain extent. Researching detection methods for this type of tampering is of practical significance. The provided large-scale face tampering dataset, based on inpainting techniques, encompasses high- and low-quality images, employing three distinct inpainting methods and targeting various facial features. This dataset offers a novel source of data for research in this field, enhancing diversity and providing benchmark data for further exploration of image restoration-related forgeries. With the scarcity of relevant datasets in this domain, we propose the utilization of this dataset as a benchmark for the field of image inpainting tampering detection. This dataset not only supports research in detection methodologies but also contributes to studies on the generalization of such methods. It serves as a foundational resource, filling the gap in the available datasets and facilitating advancements in the detection and generalization studies in the domain of image inpainting tampering. This benchmark includes a large-scale inpainting image dataset, totaling 600 000 images. The dataset’s quality is evaluated based on accuracy on manipulation detection networks, generalizability across different inpainting networks and facial regions, and modules such as data visualization.
图像篡改深度学习图像修复数据集基准
image tamperingdeep learningimage inpaintingdatasetbenchmark
Afchar D, Nozick V, Yamagishi J and Echizen I. 2018. MesoNet: a compact facial video forgery detection network//2018 IEEE International Workshop on Information Forensics and Security (WIFS). Hong Kong, China: IEEE: 1-7 [DOI: 10.1109/WIFS.2018.8630761http://dx.doi.org/10.1109/WIFS.2018.8630761]
Amerini I, Galteri L, Caldelli R and Bimbo A D. 2020. Deepfake video detection through optical flow based CNN//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul, Korea (South): IEEE: 1205-1207 [DOI: 10.1109/ICCVW.2019.00152http://dx.doi.org/10.1109/ICCVW.2019.00152]
Bertalmio M, Sapiro G, Caselles V and Ballester C. 2000. Image inpainting//Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. New York, USA: ACM: 417-424 [DOI: 10.1145/344779.344972http://dx.doi.org/10.1145/344779.344972]
Chan T F and Shen J H. 2001. Nontexture inpainting by curvature-driven diffusions. Journal of Visual Communication and Image Representation, 12(4): 436-449 [DOI: 10.1006/jvci.2001.0487http://dx.doi.org/10.1006/jvci.2001.0487]
Cheng W H, Hsieh C W, Lin S K, Wang C W and Wu J L. 2005. Robust algorithm for exemplar-based image inpainting//Proceedings of 2005 International Conference on Computer Graphics, Imaging and Visualization. [s.l.]: [s.n.]: 64-69
Chollet F. 2017. Xception: deep learning with depthwise separable convolutions. IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1800-1807 [DOI: 10.1109/CVPR.2017.195http://dx.doi.org/10.1109/CVPR.2017.195]
Cozzolino D, Poggi G and Verdoliva L. 2017. Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection//Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security. Philadelphia, USA: ACM: 159-164 [DOI: 10.1145/3082031.3083247http://dx.doi.org/10.1145/3082031.3083247]
Criminisi A, Pérez P and Toyama K. 2004. Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on Image Processing, 13(9): 1200-1212 [DOI: 10.1109/TIP.2004.833105http://dx.doi.org/10.1109/TIP.2004.833105]
Dale K, Sunkavalli K, Johnson M K, Vlasic D, Matusik W and Pfister H. 2011. Video face replacement. ACM Transactions on Graphics, 30(6): 1-10 [DOI: 10.1145/2070781.2024164http://dx.doi.org/10.1145/2070781.2024164]
Dolhansky B, Howes R, Pflaum B, Baram N and Ferrer C C. 2019. The deepfake detection challenge (DFDC) preview dataset [EB/OL]. [2023-06-20]. https://arxiv.org/pdf/1910.08854.pdfhttps://arxiv.org/pdf/1910.08854.pdf
Guo X F, Yang H Y and Huang D. 2021. Image inpainting via conditional texture and structure dual generation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 14134-14143 [DOI: 10.1109/ICCV48922.2021.01387http://dx.doi.org/10.1109/ICCV48922.2021.01387]
He K M, Zhang X Y, Ren S Q and Sun J. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016 [DOI:10.1109/CVPR.2016.90]
Huang J B, Kang S B, Ahuja N and Kopf J. 2014. Image completion using planar structure guidance. ACM Transactions on Graphics, 33(4): #129 [DOI: 10.1145/2601097.2601205http://dx.doi.org/10.1145/2601097.2601205]
Jiang L M, Li R, Wu W, Qian C and Loy C C. 2020. Deeperforensics-1.0: a large-scale dataset for real-world face forgery detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2889-2898 [DOI: 10.1109/CVPR42600.2020.00296http://dx.doi.org/10.1109/CVPR42600.2020.00296]
Khalid H, Tariq S, Kim M and Woo S S. 2021. FakeAVCeleb: a novel audio-video multimodal deepfake dataset [EB/OL].[2023-06-20]. https://arxiv.org/pdf/2108.05080https://arxiv.org/pdf/2108.05080]
Kietzmann J, Lee L W, McCarthy I P and Kietzmann T C. 2020. Deepfakes: trick or treat? Business Horizons, 63(2): 135-146 [DOI: 10.1016/j.bushor.2019.11.006http://dx.doi.org/10.1016/j.bushor.2019.11.006]
Korshunova I, Shi W Z, Dambre J and Theis L. 2017. Fast face-swap using convolutional neural networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 3677-3685 [DOI: 10.1109/ICCV.2017.397http://dx.doi.org/10.1109/ICCV.2017.397]
Li H D, Li B, Tan S Q and Huang J W. 2018. Detection of deep network generated images using disparities in color components [EB/OL]. [DOI:10.1016/j.sigpro.2020.107616http://dx.doi.org/10.1016/j.sigpro.2020.107616]
Li J M, Xie H T, Li J H, Wang Z Y and Zhang Y D. 2021. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 6454-6463 [DOI: 10.1109/CVPR46437.2021.00639http://dx.doi.org/10.1109/CVPR46437.2021.00639]
Li J Y, Wang N, Zhang L F, Du B and Tao D C. 2020a. Recurrent feature reasoning for image inpainting//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 7760-7768 [DOI: 10.1109/CVPR42600.2020.00778http://dx.doi.org/10.1109/CVPR42600.2020.00778]
Li Y Z, Yang X, Sun P, Qi H G and Lyu S W. 2020b. Celeb-DF: a large-scale challenging dataset for deepfake forensics//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 3204-3213 [DOI: 10.1109/CVPR42600.2020.00327http://dx.doi.org/10.1109/CVPR42600.2020.00327]
Liu G L, Reda F A, Shih K J, Wang T C, Tao A and Catanzaro B. 2018. Image inpainting for irregular holes using partial convolutions//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 89-105 [DOI: 10.1007/978-3-030-01252-6http://dx.doi.org/10.1007/978-3-030-01252-6]
Lugmayr A, Danelljan M, Romero A, Yu F, Timofte R and Van Gool L. 2022. RePaint: inpainting using denoising diffusion probabilistic models//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 11461-11471 [DOI: 10.1109/CVPR52688.2022.01117http://dx.doi.org/10.1109/CVPR52688.2022.01117]
Nirkin Y, Wolf L, Keller Y and Hassner T. 2020. DeepFake detection based on discrepancies between faces and their context. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10): 6111-6121 [DOI: 10.1109/TPAMI.2021.3093446http://dx.doi.org/10.1109/TPAMI.2021.3093446]
Pathak D, Krähenbühl P, Donahue J, Donahue J and Efros A A. 2016. Context encoders: feature learning by inpainting//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2536-2544 [DOI: 10.1109/CVPR.2016.278http://dx.doi.org/10.1109/CVPR.2016.278]
Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J and Niessner M. 2019. Faceforensics++: learning to detect manipulated facial images//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 1-11 [DOI: 10.1109/ICCV.2019.00009http://dx.doi.org/10.1109/ICCV.2019.00009]
Thies J, Zollhöfer M and Nießner M. 2019. Deferred neural rendering: image synthesis using neural textures. ACM Transactions on Graphics, 38(4): #66 [DOI: 10.1145/3306346.3323035http://dx.doi.org/10.1145/3306346.3323035]
Thies J, Zollhöfer M, Stamminger M, Theobalt C and Nießner M. 2016. Face2face: real-time face capture and reenactment of RGB videos//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2387-2395 [DOI: 10.1109/CVPR.2016.262http://dx.doi.org/10.1109/CVPR.2016.262]
Van Den Oord A, Kalchbrenner N and Kavukcuoglu K. 2016. Pixel recurrent neural networks//Proceedings of the 33rd International Conference on Machine Learning. New York, USA: JMLR.org: 1747-1756 [DOI: 10.5555/3045390.3045575http://dx.doi.org/10.5555/3045390.3045575]
Xiong W, Yu J H, Lin Z, Yang J M, Lu X, Barnes C and Luo J B. 2019. Foreground-aware image inpainting//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5840-5848 [DOI: 10.1109/CVPR.2019.00599http://dx.doi.org/10.1109/CVPR.2019.00599]
Yu C Q, Wang J B, Peng C, Gao C X, Yu G and Sang N. 2018a. BiSeNet: bilateral segmentation network for real-time semantic segmentation//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 325-341 [DOI: 10.1007/978-3-030-01261-8_20http://dx.doi.org/10.1007/978-3-030-01261-8_20]
Yu J H, Lin Z, Yang J M, Shen X H, Lu X and Huang T S. 2018b. Generative image inpainting with contextual attention//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5505-5514 [DOI: 10.1109/CVPR.2018.00577http://dx.doi.org/10.1109/CVPR.2018.00577]
Zhao H Q, Wei T Y, Zhou W B, Zheng W M, Chen D D and Yu N H. 2021. Multi-attentional deepfake detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 2185-2194 [DOI: 10.1109/CVPR46437.2021.00222http://dx.doi.org/10.1109/CVPR46437.2021.00222]
Zollhöfer M, Thies J, Garrido P, Bradley D, Beeler T, Pérez P, Stamminger M, Nießner M and Theobalt C. 2018. State of the art on monocular 3D face reconstruction, tracking, and applications. Computer Graphics Forum, 37(2): 523-550 [DOI: 10.1111/cgf.13382http://dx.doi.org/10.1111/cgf.13382]
相关作者
相关机构