Student Abnormal Behavior Recognition
Abstract
An intelligent campus surveillance system enhances school safety by utilizing abnormal behavior recognition, a key aspect of action recognition in computer vision. While Convolutional Neural Networks (CNNs) are commonly used for action recognition, capturing comprehensive motion sequence features from videos remains challenging. This work addresses these challenges in video-based abnormal behavior recognition on campuses. It introduces a novel framework combining long-range temporal video modeling and a global sparse uniform sampling strategy, dividing videos into three equal segments for uniform snippet sampling. The method leverages a consensus of three temporal segment transformers (TST), which connect patches globally and compute self-attention using joint spatiotemporal factorization. The model is developed on the CABR50 dataset, featuring 50 abnormal action classes with over 700 clips per class.