Violence Detection
Abstract
In today’s digital era, the proliferation of video content on surveillance systems and social media platforms has raised serious concerns about public safety, especially in identifying and preventing violent activities. Manual monitoring of such video data is not only labor-intensive but also prone to human error and delayed response. To address this challenge, we propose an automated Violence Detection System using deep learning techniques that enhances surveillance capabilities by efficiently identifying violent behavior in video streams.
Our system leverages the power of Convolutional Neural Networks (CNN) for spatial feature extraction and Long Short-Term Memory (LSTM) networks for temporal sequence analysis. The CNN model, specifically VGG16, is employed to extract high-level features from individual frames of a video, while LSTM processes the sequential data to understand motion dynamics and context over time. Together, this hybrid architecture enables accurate classification of video clips into violent or non-violent categories.
The backend is built using Python (Flask framework) and integrated with a user-friendly web interface that allows users to upload video files for analysis. The model provides outputs in the form of classification results along with a confidence score and inference time. Video data is preprocessed through a pipeline that includes frame extraction, resizing, noise reduction, normalization, and sequence grouping. To handle data imbalance issues, techniques like augmentation and synthetic sampling are applied.
This system not only automates violence detection but also supports scalability, real-time processing, and integration with existing surveillance infrastructures. By deploying it in public spaces, transport hubs, or educational institutions, the proposed model can significantly contribute to early detection and prevention of violent incidents, thereby improving public safety and emergency response