自适应光流估计驱动的微表情识别
Adaptive optical flow estimation-driven micro-expression recognition
- 2024年29卷第10期 页码:3060-3073
纸质出版日期: 2024-10-16
DOI: 10.11834/jig.230566
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-10-16 ,
移动端阅览
包永堂, 武晨曦, 张鹏, 单彩峰. 2024. 自适应光流估计驱动的微表情识别. 中国图象图形学报, 29(10):3060-3073
Bao Yongtang, Wu Chenxi, Zhang Peng, Shan Caifeng. 2024. Adaptive optical flow estimation-driven micro-expression recognition. Journal of Image and Graphics, 29(10):3060-3073
目的
2
微表情识别旨在从面部肌肉应激性运动中自动分析和鉴别研究对象的情感类别,其在谎言检测、心理诊断等方面具有重要应用价值。然而,当前微表情识别方法通常依赖离线光流估计,导致微表情特征表征能力不足。针对该问题,提出了一种基于自适应光流估计的微表情识别模型(adaptive micro-expression recognition, AdaMER)。
方法
2
AdaMER并行联立实现光流估计和微表情分类两个任务自适应学习微表情相关的运动特征。首先,提出密集差分编码—解码器以提取多层次面部位移信息,实现自适应光流估计;然后,借助视觉Transformer挖掘重建光流蕴含的微表情判别性信息;最后,融合面部位移微表情语义信息与微表情判别信息进行微表情分类。
结果
2
在由SMIC(spontaneous micro-expression recognition)、SAMM(spontaneous micro-facial movement dataset)和CASME II(the Chinese Academy of Sciences micro-expression)构建的复合微表情数据集上进行大量实验,结果显示本文方法UF1(unweighted F1-score)和UAR(unweighted average recall)分别达到了82.89%和85.95%,相比于最新方法FRL-DGT(feature representation learning with adaptive displacement generation and Transformer fusion)分别提升了1.77%和4.85%。
结论
2
本文方法融合了自适应光流估计与微表情分类两个任务,一方面以端到端的方式实现自适应光流估计以感知面部细微运动,提高细微表情描述能力;另一方面,充分挖掘微表情判别信息,提升微表情识别性能。
Objective
2
Micro-expressions are brief, subtle facial muscle movements that accidentally signal emotions when the person tries to hide their true inner feelings. Micro-expressions are more responsive to a person’s true feelings and motivations than macro-expressions. Micro-expression recognition aims to analyze and identify automatically the emotional category of the research object from the stressful movement of the facial muscles, which has an important application value in lie detection, psychological diagnosis, and other aspects. In the early development of micro-expression recognition, local binary patterns and optical flow were widely used as features for training traditional machine learning models. However, the traditional manual feature approach relies on manually designing rules, making it difficult to adapt to the differences in micro-expression data across different individuals and scenarios. Given that deep learning can automatically learn the optimal feature representation of an image, the recognition performance of micro-expression recognition studies based on deep learning far exceeds that of traditional methods. However, micro-expressions occur as subtle facial changes, which causes the micro-expression recognition task to remain challenging. By analyzing the pixel movement between consecutive frames, the optical flow can represent the dynamic information of micro-expressions. Deep learning-based micro-expression recognition methods perform facial muscle motion descriptions with optical flow information to improve micro-expression recognition performance. However, existing micro-expression recognition methods usually extract the optical flow information offline, which relies on existing optical flow estimation techniques and suffers from the insufficient description of subtle expressions and neglect of static facial expression information, which restricts the recognition effect of the model. Therefore, this study proposes a micro-expression recognition network based on adaptive optical flow estimation, which realizes optical flow estimation and micro-expression classification to learn micro-expression-related motion features through parallel association adaptively.
Method
2
The training samples of micro-expressions are limited, which makes it difficult to train complex network models. Therefore, this study selects the apex and their neighboring frames in the micro-expression video sequence as training data in the preprocessing stage. In addition, when loading the data, the original training data are replaced with image pairs containing motion information in the video sequence with a certain probability. Second, the deep learning network with a dense differential encoder-decoder implements the facial muscle motion adaptive optical flow estimation task to improve the characterization of subtle expressions. ResNet18 extracts features from the two-frame image and the difference map in a dense differential encoder. The branch processing the two frames shares the parameters. A motion enhancement module is added to the feature extraction branch of the differential image to accomplish the interlayer information interaction. In the motion enhancement module, the difference map features computed from the two frames need the spatial attention mechanism to focus on the micro-expression-related motion; the two frames are subtracted from each other to preserve and amplify the difference between the two frames, and using the two features provides valid information for subsequent networks. The decoder in this study maps the multilevel facial displacement information extracted by the dense differential encoder and the last layer of the two-frame image output features to reconstruct the optical flow features. Vision Transformer is a deep learning model based on the self-attention mechanism, which has global perception capability in comparison with the traditional convolutional neural network. Then, with the feature extraction capability of vision Transformer, the micro-expression discriminative information embedded in the reconstructed optical flow is mined. Finally, the semantic information of micro-expressions extracted from facial displacement information and the discriminative information of micro-expressions extracted from the vision Transformer model are fused to provide rich information for micro-expression classification. This study uses the Endpoint error loss constraint for the optical flow estimation task to achieve the learning purpose, which continuously reduces the Euclidean distance between the predicted and real optical flow. Cross entropy loss function constraints are used for the features extracted by vision Transformer and the fused features, which make the network learn micro-expression related information. At the same time, the image with low motion intensity in the two frames is equivalent to the neutral expression (without motion information), and the KL-divergence loss is applied to the output of the feature by the encoder to suppress irrelevant information. The loss functions interact to complete the network optimization.
Result
2
This study evaluates the model performance on a public dataset using the leave-one-subject-out cross-validation evaluation strategy. Face alignment and cropping are performed on the public dataset samples to unify the dataset. To demonstrate the state-of-the-art of the proposed method, we compare it with existing mainstream methods on composite datasets constructed by SMIC, SAMM, and CASME II. Our method achieves 82.89% and 85.59% UF1 and UAR on the whole dataset, 78.16% and 80.89% UF1 and UAR on the SMIC part, 94.52% and 96.02% UF1 and UAR on the CASME II part, and 73.24% and 75.83%. Our method achieves optimal results in the whole dataset, the SMIC part, and the CASME II part, and suboptimal results in the SAMM part. Compared to the latest proposed micro-expression method based on feature representation learning with adaptive displacement generation and Transformer fusion (FRL-DGT), our method demonstrates an improvement of 1.77% and 4.85%.
Conclusion
2
The micro-expression recognition model based on adaptive optical flow estimation proposed in this study fuses the proposed two tasks of adaptive optical flow estimation and micro-expression categorization, which, on the one hand, senses the subtle facial movements in an end-to-end manner and improves the ability of subtle expression description, and on the other hand, fully exploits the micro-expression discriminative information and enhances the micro-expression performance.
微表情识别自适应光流估计运动特征差分编码特征融合
micro-expression recognitionadaptive optical flow estimationmotion featuresdifferential encoderfeature fusion
Alkaddour M, Tariq U and Dhall A. 2022. Self-supervised approach for facial movement based optical flow. IEEE Transactions on Affective Computing, 13(4): 2071-2085 [DOI: 10.1109/TAFFC.2022.3197622http://dx.doi.org/10.1109/TAFFC.2022.3197622]
Ben X Y, Ren Y, Zhang J P, Wang S J, Kpalma K, Meng W X and Liu Y J. 2022. Video-based facial micro-expression analysis: a survey of datasets, features and algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9): 5826-5846 [DOI: 10.1109/TPAMI.2021.3067464http://dx.doi.org/10.1109/TPAMI.2021.3067464]
Bhushan B. 2015. Study of facial micro-expressions in psychology//Mandal M K, Awasthi A, eds. Understanding Facial Expressions in Communication. New Delhi: Springer: 265-286
Chen B, Liu K H, Xu Y, Wu Q Q and Yao J F. 2023. Block division convolutional network with implicit deep features augmentation for micro-expression recognition. IEEE Transactions on Multimedia, 25: 1345-1358 [DOI: 10.1109/TMM.2022.3141616http://dx.doi.org/10.1109/TMM.2022.3141616]
Chen G, Zhang S Q and Zhao X M. 2022. Video sequence-based human facial expression recognition using Transformer networks. Journal of Ilmage and Graphicts, 27(10): 3022-3030
陈港, 张石清, 赵小明. 2022. 采用Transformer 网络的视频序列表情识别. 中国图象图形学报, 27(10): 3022-3030 [ DOI: 10.11834/jig.210248]
Davison A K, Lansley C, Costen N, Tan K and Yap M H. 2018. SAMM: a spontaneous micro-facial movement dataset. IEEE Transactions on Affective Computing, 9(1): 116-129. [DOI: 10.1109/TAFFC.2016.2573832http://dx.doi.org/10.1109/TAFFC.2016.2573832]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2021. An image is worth 16 × 16 words: Transformers for image recognition at scale//Proceedings of the 9th International Conference on Learning Representations. [s.l.]: ICLR
Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D and Brox T. 2015. FlowNet: learning optical flow with convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 2758-2766 [DOI: 10.1109/ICCV.2015.316http://dx.doi.org/10.1109/ICCV.2015.316]
Ekman P and Friesen W V. 1975. Unmasking the Face: a Guide to Recognizing Emotions from Facial Clues. Englewood Cliffs: Prentice-Hall: 430-440
Fan X Q, Chen X L, Jiang M J, Shahid A R and Yan H. 2023. SelfME: self-supervised motion learning for micro-expression recognition//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 13834-13843 [DOI: 10.1109/CVPR52729.2023.01329http://dx.doi.org/10.1109/CVPR52729.2023.01329]
Gan Y S, Liong S T, Yau W C, Huang Y C and Tan L K. 2019. OFF-ApexNet on micro-expression recognition system. Signal Processing: Image Communication, 74: 129-139 [DOI: 10.1016/j.image.2019.02.005http://dx.doi.org/10.1016/j.image.2019.02.005]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A and Brox T. 2017. FlowNet 2.0: evolution of optical flow estimation with deep networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1647-1655 [DOI: 10.1109/CVPR.2017.179http://dx.doi.org/10.1109/CVPR.2017.179]
Jiang S H, Campbell D, Lu Y, Li H D and Hartley R. 2021. Learning to estimate hidden motions with global motion aggregation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 9752-9761 [DOI: 10.1109/ICCV48922.2021.00963http://dx.doi.org/10.1109/ICCV48922.2021.00963]
Khor H Q, See J, Liong S T, Phan R C W and Lin W Y. 2019. Dual-stream shallow networks for facial micro-expression recognition//Proceedings of 2019 IEEE International Conference on Image Processing. Taipei, China: IEEE: 36-40 [DOI: 10.1109/ICIP.2019.8802965http://dx.doi.org/10.1109/ICIP.2019.8802965]
Lei L, Chen T, Li S G and Li J F. 2021. Micro-expression recognition based on facial graph representation learning and facial action unit fusion//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 1571-1580 [DOI: 10.1109/CVPRW53098.2021.00173http://dx.doi.org/10.1109/CVPRW53098.2021.00173]
Lei L, Li J F, Chen T and Li S G. 2020. A novel graph-TCN with a graph structured representation for micro-expression recognition//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: ACM: 2237-2245 [DOI: 10.1145/3394171.3413714http://dx.doi.org/10.1145/3394171.3413714]
Li G H, Yuan Y F, Ben X Y and Zhang J P. 2020. Spatiotemporal attention network for microexpression recognition. Journal of Image and Graphics, 25(11): 2380-2390
李国豪, 袁一帆, 贲晛烨, 张军平. 2020. 采用时空注意力机制的人脸微表情识别. 中国图象图形学报, 25(11): 2380-2390 [DOI: 10.11834/jig.200325http://dx.doi.org/10.11834/jig.200325]
Li X B, Pfister T, Huang X H, Zhao G Y and Pietikäinen M. 2013. A spontaneous micro-expression database: inducement, collection and baseline//Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. Shanghai, China: IEEE: 1-6 [DOI: 10.1109/FG.2013.6553717http://dx.doi.org/10.1109/FG.2013.6553717]
Li Y T, Huang X H and Zhao G Y. 2018. Can micro-expression be recognized based on single apex frame?//Proceedings of the 25th IEEE International Conference on Image Processing. Athens, Greece: IEEE: 3094-3098 [DOI: 10.1109/ICIP.2018.8451376http://dx.doi.org/10.1109/ICIP.2018.8451376]
Li Y T, Huang X H and Zhao G Y. 2021. Joint local and global information learning with single apex frame detection for micro-expression recognition. IEEE Transactions on Image Processing, 30: 249-263 [DOI: 10.1109/TIP.2020.3035042http://dx.doi.org/10.1109/TIP.2020.3035042]
Liu Y J, Zhang J K, Yan W J, Wang S J, Zhao G Y and Fu X L. 2016. A main directional mean optical flow feature for spontaneous micro-expression recognition. IEEE Transactions on Affective Computing, 7(4): 299-310 [DOI: 10.1109/TAFFC.2015.2485205http://dx.doi.org/10.1109/TAFFC.2015.2485205]
Nguyen X B, Duong C N, Li X, Gauch S, Seo H S and Luu K. 2023. Micron-BERT: BERT-based facial micro-expression recognition//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 1482-1492 [DOI: 10.1109/CVPR52729.2023.00149http://dx.doi.org/10.1109/CVPR52729.2023.00149]
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
See J, Yap M H, Li J T, Hong X P and Wang S J. 2019. MEGC 2019–the second facial micro-expressions grand challenge//Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition. Lille, France: IEEE: 1-5 [DOI: 10.1109/FG.2019.8756611http://dx.doi.org/10.1109/FG.2019.8756611]
Shreve M, Godavarthy S, Goldgof D and Sarkar S. 2011. Macro-and micro-expression spotting in long videos using spatio-temporal strain//Proceedings of 2011 IEEE International Conference on Automatic Face & Gesture Recognition. Santa Barbara, USA: IEEE: 51-56 [DOI: 10.1109/FG.2011.5771451].
Su Y T, Zhang J Q, Liu J and Zhai G T. 2021. Key facial components guided micro-expression recognition based on first & second-order motion//Proceedings of 2021 IEEE International Conference on Multimedia and Expo. Shenzhen, China: IEEE: 1-6 [DOI: 10.1109/ICME51207.2021.9428407http://dx.doi.org/10.1109/ICME51207.2021.9428407]
Sun B, Cao S M, Li D L, He J and Yu L J. 2022. Dynamic micro-expression recognition using knowledge distillation. IEEE Transactions on Affective Computing, 13(2): 1037-1043 [DOI: 10.1109/TAFFC.2020.2986962http://dx.doi.org/10.1109/TAFFC.2020.2986962]
Szajnberg N M. 2022. What the face reveals: basic and applied studies of spontaneous expression using the facial action coding system (FACS). Journal of the American Psychoanalytic Association, 70(3): 591-595 [DOI: 10.1177/00030651221107681http://dx.doi.org/10.1177/00030651221107681]
Teed Z and Deng J. 2020. RAFT: recurrent all-pairs field transforms for optical flow//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 402-419 [DOI: 10.1007/978-3-030-58536-5_24http://dx.doi.org/10.1007/978-3-030-58536-5_24]
Van Quang N, Chun J and Tokuyama T. 2019. CapsuleNet for micro-expression recognition//Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition. Lille, France: IEEE: 1-7 [DOI: 10.1109/FG.2019.8756544http://dx.doi.org/10.1109/FG.2019.8756544]
Wang F P, Li J, Qi C, Wang L and Wang P. 2023. Multi-scale multi-modal micro-expression recognition algorithm based on Transformer [EB/OL]. [2023-07-08]. https://arxiv.org/pdf/2301.02969.pdfhttps://arxiv.org/pdf/2301.02969.pdf
Wang Y D, See J, Phan R C W and Oh Y H. 2014. LBP with six intersection points: reducing redundant information in LBP-TOP for micro-expression recognition//Proceedings of the 12th Asian Conference on Computer Vision. Singapore,Singapore: Springer: 525-537 [DOI: 10.1007/978-3-319-16865-4_34http://dx.doi.org/10.1007/978-3-319-16865-4_34]
Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of 2018 European Conference on Computer Vision. Munich, Germany: Springer: 3-19 [DOI: 10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1]
Xia Z Q, Peng W, Khor H Q, Feng X Y and Zhao G Y. 2020. Revealing the invisible with model and data shrinking for composite-database micro-expression recognition. IEEE Transactions on Image Processing, 29: 8590-8605 [DOI: 10.1109/TIP.2020.3018222http://dx.doi.org/10.1109/TIP.2020.3018222]
Yan W J, Li X B, Wang S J, Zhao G Y, Liu Y J, Chen Y H and Fu X L. 2014. CASME II: an improved spontaneous micro-expression database and the baseline evaluation. PLoS One, 9(1): #e86041 [DOI: 10.1371/journal.pone.0086041http://dx.doi.org/10.1371/journal.pone.0086041]
Yu J H, Zhang C Y, Song Y and Cai W D. 2021. ICE-GAN: identity-aware and capsule-enhanced GAN with graph-based reasoning for micro-expression recognition and synthesis//Proceedings of 2021 International Joint Conference on Neural Networks. Shenzhen, China: IEEE: 1-8 [DOI: 10.1109/IJCNN52387.2021.9533988http://dx.doi.org/10.1109/IJCNN52387.2021.9533988]
Zach C, Pock T and Bischof H. 2007. A duality based approach for realtime TV-L 1 optical flow//Proceedings of the 29th Joint Pattern Recognition Symposium. Heidelberg, Germany: Springer: 214-223 [DOI: 10.1007/978-3-540-74936-3_22http://dx.doi.org/10.1007/978-3-540-74936-3_22]
Zhai Z J, Zhao J H, Long C J, Xu W J, He S J and Zhao H J. 2023. Feature representation learning with adaptive displacement generation and Transformer fusion for micro-expression recognition//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 22086-22095 [DOI: 10.1109/CVPR52729.2023.02115http://dx.doi.org/10.1109/CVPR52729.2023.02115]
Zhang L F, Hong X P, Arandjelović O and Zhao G Y. 2022. Short and long range relation based spatio-temporal Transformer for micro-expression recognition. IEEE Transactions on Affective Computing, 13(4): 1973-1985 [DOI: 10.1109/TAFFC.2022.3213509http://dx.doi.org/10.1109/TAFFC.2022.3213509]
Zhao G Y and Pietikainen M. 2007. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6): 915-928 [DOI: 10.1109/TPAMI.2007.1110http://dx.doi.org/10.1109/TPAMI.2007.1110]
Zheng Z Q, Jiang Q S and Wang S F. 2020. Posed and spontaneous expression distinction through multi-task and adversarial learning. Journal of Image and Graphics, 25(11): 2370-2379
郑壮强, 姜其胜, 王上飞. 2020. 多任务学习和对抗学习结合的自发与非自发表情识别. 中国图象图形学报, 25(11): 2370-2379 [DOI: 10.11834/jig.200264http://dx.doi.org/10.11834/jig.200264]
Zhou L, Mao Q R, Huang X H, Zhang F F and Zhang Z H. 2022. Feature refinement: an expression-specific feature learning and fusion method for micro-expression recognition. Pattern Recognition, 122: #108275 [DOI: 10.1016/j.patcog.2021.108275http://dx.doi.org/10.1016/j.patcog.2021.108275]
Zhou L, Mao Q R and Xue L Y. 2019. Dual-inception network for cross-database micro-expression recognition//Proceedings of the 14th IEEE International Conference on Automatic Face and Gesture Recognition. Lille, France: IEEE: 1-5 [DOI: 10.1109/FG.2019.8756579http://dx.doi.org/10.1109/FG.2019.8756579]
相关作者
相关机构