DeepSense: An Explainable AI Multi-Modal Platform for Deepfake Detection Across Image, Audio, and Video

Authors

  • Syed Numaan, Mohammed Shameem Sarwar, Shaik Jamal, Naila Fathima B.E Department of CSE-AIML, Lords Institute of Engineering and Technology Author

Keywords:

AI

Abstract

The rapid proliferation of generative AI has given rise to highly realistic synthetic media, commonly known as
deepfakes, posing severe threats to personal identity, democratic processes, and digital trust. Existing detection
systems are predominantly uni-modal and opaque, offering little forensic evidence to support their binary
classifications. This paper presents DeepSense, a comprehensive, explainable AI-powered multi-modal deepfake
detection platform capable of concurrently analyzing static images, digital audio recordings, and video files. The
system integrates XceptionNet for image analysis, a hybrid XceptionNet+LSTM for video, and a CNN-BiLSTM
architecture for audio, achieving detection accuracies of 90.83%, 95.25%, and 98.32% respectively. Explainable
AI (XAI) techniques -- specifically Gradient-weighted Class Activation Mapping (Grad-CAM) for visual media and
high-resolution spectral feature visualization for audio -- are deeply integrated into the inference pipeline. The
Google Gemini 3.1 Flash LLM is employed to translate raw algorithmic outputs into natural-language forensic
narratives. DeepSense is deployed via an interactive Streamlit web interface, democratizing access to digital
forensics for non-technical users, journalists, and legal professionals

Downloads

Published

2026-04-20

Issue

Section

Articles

How to Cite

DeepSense: An Explainable AI Multi-Modal Platform for Deepfake Detection Across Image, Audio, and Video. (2026). International Journal of Engineering and Science Research, 16(2), 348-356. https://www.ijesr.org/index.php/ijesr/article/view/1636

Similar Articles

51-60 of 559

You may also start an advanced similarity search for this article.