Biometric Data Protection in Videos Available in Datasets through Face Swapping

Authors

  • Héctor Caballero Hernández Universidad Autónoma del Estado de México
  • Vianney Muñoz Jiménez
  • Marco Antonio Ramos Corchado

DOI:

https://doi.org/10.32870/recibe.v14i3.442

Keywords:

Bimetric data, Mexican SIgn Language, privacy, datasets, anonymization, intelligence artificial ethics

Abstract

Biometric data such as face, voice, and fingerprints are vulnerable to attacks using artificial intelligence (AI) tools, as they contain irreplaceable characteristics of an individual. This work presents the application of a facial anonymization technique based on face swapping and background removal from video scenes to protect the privacy of participants appearing in the LSM- VMX experimental dataset, which consists of 180 signs of the Mexican Sign Language (LSM). The anonymization process was developed in Python to generate a face swap of the participants using the inswapper_128 model, while the background removal of the scenes was performed using the U2Net rembg library. To test the functionality of the modified videos, models based on MediaPipe and a support vector machine (SVM) were trained using the videos from dataset A (original) and dataset B (modified) to generate an LSM sign recognition model, the results showed that the average accuracy for datasets A and B was 0.975 and 0.983, respectively, which demonstrates that the changes did not impact the performance of the model. Furthermore, facial symmetry tests were run to check that the anonymization process was successful, using the VGG-face model and the SSIM metric, the results showed that the faces shown in dataset B were different from those in dataset A.

References

Blanton, M., & Murphy, D. (2024, June). Privacy preserving biometric authentication for fingerprints and beyond. In Proceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy (pp. 367-378). https://doi.org/10.1145/3626232.3653269.

Bozkir, E., Günlü, O., Fuhl, W., Schaefer, R. F., & Kasneci, E. (2021). Differential privacy for eye tracking with temporal correlations. PLoS ONE, 16(8), e0255979. https://doi.org/10.1371/journal.pone.0255979.

Cavoukian, A. (2011). Privacy by Design: The 7 Foundational Principles. Information and Privacy Commissioner of Ontario.

Ciftci, U. A., Yuksek, G., & Demir, I. (2023). My face my choice: Privacy enhancing deepfakes for social media anonymization. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 1369-1379).

Ciocca, G., Napoletano, P., & Schettini, R. (2023). A review on deep learning based biometric recognition. Pattern Recognition Letters, 173, 43-52. https://doi.org/10.1016/j.patrec.2023.08.012.

Chowdhury, A. M., & Imtiaz, M. H. (2022). Contactless fingerprint recognition using deep learning—a systematic review. Journal of Cybersecurity and Privacy, 2(3), 714-730. https://doi.org/10.3390/jcp2030036.

Ding, F., Zhu, G., Li, Y., Zhang, X., Atrey, P. K., & Lyu, S. (2021). Anti-forensics for face swapping videos via adversarial training. IEEE Transactions on Multimedia, 24, 3429-3441. 10.1109/TMM.2021.3098422.

Dong, J., Roth, A., & Su, W. J. (2022). Gaussian differential privacy. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1), 3-37. https://doi.org/10.1111/rssb.12454.

Espejel, J., Jalili, L. D., Cervantes, J., & Canales, J. C. (2024). Sign language images dataset from Mexican sign language. Data in Brief, 55, 110566. 10.1016/j.dib.2024.110566.

European Parliament. (2024, March 13). EU AI Act: First regulation on artificial intelligence. https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence

Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The PASCAL Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 88(2), 303–338. https://doi.org/10.1007/s11263-009-0275-4.

Far, S. B., & Rad, A. I. (2022). Applying digital twins in metaverse: User interface, security and privacy challenges. Journal of Metaverse, 2(1), 8-15.

Federación Mundial de Sordos. (s.f.). Building a World Where Everywhere Deaf People Can Sign Anywhere! Recuperado el 20 de agosto de 2025, de https://wfdeaf.org.

Gichoya, J. W., Thomas, K., Celi, L. A., Safdar, N., Banerjee, I., Banja, J. D., ... & Purkayastha, S. (2023). AI pitfalls and what not to do: mitigating bias in AI. The British Journal of Radiology, 96(1150), 20230023. 10.1259/bjr.20230023.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.

Groshev, A., Maltseva, A., Chesakov, D., Kuznetsov, A., & Dimitrov, D. (2022). GHOST—a new face swap approach for image and video domains. IEEE Access, 10, 83452-83462. 10.1109/ACCESS.2022.3196668.

Hanisch, S., Arias-Cabarcos, P., Parra-Arnau, J., & Strufe, T. (2025). Anonymization techniques for behavioral biometric data: a survey. ACM Computing Surveys, 57(11), 1-54. https://doi.org/10.1145/3729418.

Hassanpour, A., Moradikia, M., Yang, B., Abdelhadi, A., Busch, C., & Fierrez, J. (2022). Differential privacy preservation in robust continual learning. IEEE Access, 10, 24273-24287. 10.1109/ACCESS.2022.3154826.

Huang, B., Wang, Z., Yang, J., Ai, J., Zou, Q., Wang, Q., & Ye, D. (2023). Implicit identity driven deepfake face swapping detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4490-4499).

Huang, Z., Tang, F., Zhang, Y., Cao, J., Li, C., Tang, S., ... & Lee, T. Y. (2024). Identity-preserving face swapping via dual surrogate generative models. ACM Transactions on Graphics, 43(5), 1-19. https://doi.org/10.1145/3676165.

Hukkelås, H., Mester, R., Lindseth, F. (2019). DeepPrivacy: A Generative Adversarial Network for Face Anonymization. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2019. Lecture Notes in Computer Science, vol 11844. Springer, Cham. https://doi.org/10.1007/978-3-030-33720-9_44.

INEGI. (2021). Censo de Población y Vivienda 2020: Discapacidad. Instituto Nacional de Estadística y Geografía. https://www.inegi.org.mx/temas/discapacidad/.

Jamil, F., & Jamil, H. (2024, August). Toward Intelligent Ethnicity Recognition and Face Anonymization: An IncepX-Ensemble Model. In International Conference on Computational Collective Intelligence (pp. 243-255). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-70819-0_19.

Jeremiah, S. R., Ha, J., Singh, S. K., & Park, J. H. (2024). Privacy guard: collaborative edge-cloud computing architecture for attribute-preserving face anonymization in CCTV networks. Human-centric Computing and Information Sciences, 14(43), 1e16. 10.22967/HCIS.2024.14.043.

Lee, P. Y. K., Ma, N. F., Kim, I. J., & Yoon, D. (2023). Speculating on risks of AI clones to selfhood and relationships: Doppelganger-phobia, identity fragmentation, and living memories. Proceedings of the ACM on Human-computer Interaction, 7(CSCW1), 1-28. https://doi.org/10.1145/3579524.

Kumar, T., Bhushan, S., Sharma, P., & Garg, V. (2024). Examining the vulnerabilities of biometric systems: Privacy and security perspectives. In Leveraging Computer Vision to Biometric Applications (pp. 34-67). Chapman and Hall/CRC.

Lara-Ortiz, V., Fuentes-Aguilar, R. Q., & Chairez, I. (2025). Spanish to Mexican Sign Language glosses corpus for natural language processing tasks. Scientific Data, 12(1), 702. https://doi.org/10.1038/s41597-025-04871-7.

Liu, K., Perov, I., Gao, D., Chervoniy, N., Zhou, W., & Zhang, W. (2023). Deepfacelab: Integrated, flexible and extensible face-swapping framework. Pattern Recognition, 141, 109628. https://doi.org/10.1016/j.patcog.2023.109628.

Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.-L., Yong, M. G., Lee, J., Chang, W.-T., Hua, W., Georg, M., & Grundmann, M. (2019). MediaPipe: A Framework for Building Perception Pipelines. https://doi.org/10.48550/arXiv.1906.08172.

Martínez-Sánchez, V., Villalón-Turrubiates, I., Cervantes-Álvarez, F., & Hernández-Mejía, C. (2023). Exploring a novel mexican sign language lexicon video dataset. Multimodal Technologies and Interaction, 7(8), 83. https://doi.org/10.3390/mti7080083.

Mishra, K., Pagare, H., & Sharma, K. (2025). A hybrid rule-based NLP and machine learning approach for PII detection and anonymization in financial documents. Scientific Reports, 15(1), 22729. 10.1038/s41598-025-04971-9.

Nesterova, I. (2020). Mass data gathering and surveillance: the fight against facial recognition technology in the globalized world. In SHS web of conferences (Vol. 74, p. 03006). EDP Sciences. https://doi.org/10.1051/shsconf/20207403006.

Organización Mundial de la Salud. (2021). Informe mundial sobre la audición. https://www.who.int/publications/i/item/9789240020481.

Perea-Trigo, M., López-Ortiz, E. J., Soria-Morillo, L. M., Álvarez-García, J. A., & Vegas-Olmos, J. J. (2025). Impact of face swapping and data augmentation on sign language recognition. Universal Access in the Information Society, 24(2), 1283-1294. https://doi.org/10.1007/s10209-024-01133-y.

Puussaar, A., Clear, A. K., & Wright, P. (2017, May). Enhancing personal informatics through social sensemaking. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 6936-6942). https://doi.org/10.1145/3025453.302580.

Rastgoo, R., Kiani, K., & Escalera, S. (2021). Sign language recognition: A deep survey. Expert Systems with Applications, 164, 113794. https://doi.org/10.1016/j.eswa.2020.113794.

Rodriguez, M., Oubram, O., Bassam, A., Lakouari, N., & Tariq, R. (2025). Mexican Sign Language Recognition: Dataset Creation and Performance Evaluation Using MediaPipe and Machine Learning Techniques. Electronics, 14(7), 1423. https://doi.org/10.3390/electronics14071423.

Rot, P., Grm, K., Peer, P., & Štruc, V. (2023). PrivacyProber: Assessment and detection of soft–biometric privacy–enhancing techniques. IEEE Transactions on Dependable and Secure Computing, 21(4), 2869-2887. https://doi.org/10.1109/TDSC.2023.3319500.

Sánchez, M. (2020). Protección de datos personales biométricos. Instituto Nacional de Transparencia, Acceso a la Información y Protección de Datos Personales (INAI).

Sharma, S., Das, D., & Chaudhury, S. (2025). A decentralized privacy-preserving XR system for 3D medical data visualization using hybrid biometric cryptosystem. Scientific Reports, 15(1), 28568. https://doi.org/10.1038/s41598-025-08784-8.

Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557-570. https://doi.org/10.1142/S0218488502001648.

Tanuwidjaja, H. C., Choi, R., Baek, S., & Kim, K. (2020). Privacy-preserving deep learning on machine learning as a service—a comprehensive survey. Ieee Access, 8, 167425-167447. 10.1109/ACCESS.2020.3023084.

Trujillo-Romero, F., & García-Bautista, G. (2023). Mexican sign language corpus: Towards an automatic translator. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(8), 1-24. https://doi.org/10.1145/3591471.

Voigt, P., & von dem Bussche, A. (2017). The EU General Data Protection Regulation (GDPR). Springer. https://doi.org/10.1007/978-3-319-57959-7.

Voulodimos, A., Doulamis, N., Doulamis, A., & Protopapadakis, E. (2018). Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience, 2018, 7068349. https://doi.org/10.1155/2018/7068349.

Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612. https://doi.org/10.1109/TIP.2003.819861.

Xu, C., Zhang, J., Han, Y., Tian, G., Zeng, X., Tai, Y., ... & Liu, Y. (2022, October). Designing one unified framework for high-fidelity face reenactment and swapping. In European conference on computer vision (pp. 54-71). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-19784-0_4.

Xu, S., Chang, C. C., Nguyen, H. H., & Echizen, I. (2024). Reversible anonymization for privacy of facial biometrics via cyclic learning. EURASIP Journal on Information Security, 2024(1), 24. https://doi.org/10.1186/s13635-024-00174-3.

Xu, Z., Hong, Z., Ding, C., Zhu, Z., Han, J., Liu, J., & Ding, E. (2022, June). Mobilefaceswap: A lightweight framework for video face swapping. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 3, pp. 2973-2981). https://doi.org/10.1609/aaai.v36i3.20203.

Yao, A., Pal, S., Dong, C., Li, X., & Liu, X. (2024, March). A framework for user biometric privacy protection in UAV delivery systems with edge computing. In 2024 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops) (pp. 631-636). IEEE. 10.1109/PerComWorkshops59983.2024.10502849.

Zhu, Y., Li, Q., Wang, J., Xu, C. Z., & Sun, Z. (2021). One shot face swapping on megapixels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4834-4844).

Published

2026-01-13

How to Cite

Caballero Hernández, H., Muñoz Jiménez, V., & Ramos Corchado, M. A. (2026). Biometric Data Protection in Videos Available in Datasets through Face Swapping. ReCIBE, Electronic Journal of Computing, Informatics, Biomedical and Electronics, 14(3), C3–17. https://doi.org/10.32870/recibe.v14i3.442

Issue

Section

Computer Science & IT