The real-time detection and recognition ability of human action recognition in a video surveillance system is a key problem in an intelligent surveillance system. Because the behavior recognition for video surveillance systems is affected by the complexity of the scene, the classification performance of the behavior recognition models is not satisfactory. To increase the processing efficiency of the network and solve the problem of low classification accuracy of human action recognition, we designed a deep learning model based on three-dimensional (3D) convolutional network multiscale feature fusion to reduce the impact of constant appearance changes, background clutter, and pedestrian occlusion. The model alternately uses 3D convolution and 3D pooling operations to extract temporal and spatial features between consecutive frames after data preprocessing, and then uses a feature pyramid structure to select three sets of feature layers with different scales. The model performs deconvolution operations in a bottom to up order and fuses with the features of the previous layer, then downsampling and high-level feature layer fusion are performed sequentially from top to bottom. Using the newly generated the highest-level feature layer to realize abnormal behavior recognition. The C3D network algorithm based on feature fusion proposed in this paper is compared with the three most advanced methods of C3D, R3D, and R ( 2 + 1 ) D on the pedestrian abnormal action recognition (PAAR) dataset and the same parameters, and the accuracy is significantly improved. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
CITATIONS
Cited by 1 patent.
Video surveillance
3D modeling
Video
Detection and tracking algorithms
Convolution
Evolutionary algorithms
Image fusion