Study of the Use of Features Extracted From the Lightweight and Fast Neural Networks in the Visual Object Trackers Based on Discriminative Correlation Filters

Authors

DOI:

https://doi.org/10.20535/RADAP.2025.101.%25p

Keywords:

visual object tracking, neural network features, discriminative correlation filters (DCF), alternating direction method of multipliers (ADMM)

Abstract

Introduction. Visual object tracking is an important computer vision problem that has a wide range of applications. Although there are many tracking methods exist, not all of them can solve the problem in real-time, especially on mobile and embedded platforms. One of the approaches that permit achieving a trade-off between a high processing speed and tracking reliability is based on the discriminative correlation filters (DCF). The latest implementations of this approach use features, extracted from large and computationally intensive neural networks, such as VGG or ResNet. On the one hand, it enabled to improve tracking quality, but on the other, made trackers too computationally demanding and unacceptably slow for some applications. At the same time, features from lightweight and fast neural networks within the DCF-based tracking approach are practically little studied. This paper is partially intended to fill this gap.

Theoretic results. Using the tracker, which is based on the discriminative correlation filter (DCF) that utilizes the alternating direction methods of multipliers (ADMM) as the optimizer, we analyzed the features, extracted from the lightweight convolutional neural networks such as SqueezeNet, MobileNetV3, ShuffleNet-9, and the tiny/nano YOLO detectors. Particularly, we justified the selection of specific layers from which it is the most expedient to extract the features, and using the VOT Challenge 2019 as the benchmark, estimated the tracking robustness and precision. Using the mobile PC and two popular embedded platforms (Nvidia Jetson Nano and Raspberry Pi 5) we also measured the average frame processing time for all the tested features of the mentioned neural networks.

Conclusions. Our study found that the most efficient features, which provide the trade-off between processing speed and tracking quality, are the ones extracted from the YOLOv3 tiny and SqueezeNet neural networks.

References

References

1. Ma Z., Wang L., Zhang H., Lu W., Yin J. (2020). RPT: Learning Point Set Representation for Siamese Visual Tracking. 16th European Conference Computer Vision - ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science, vol. 12539, pp. 653-665. DOI: 10.1007/978-3-030-68238-5_43.

2. Yang Z., Miao J., Wei Y., Wang W., Wang X., Yang Y. (2024). Scalable Video Object Segmentation with Identification Mechanism. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 46, Iss. 9, pp. 6247-6262. DOI: 10.1109/TPAMI.2024.3383592.

3. Danelljan M., Bhat G., Khan F. S., Felsberg M. (2019). ATOM: Accurate Tracking by Overlap Maximization. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4655-4664. DOI: 10.1109/CVPR.2019.00479.

4. Bhat G., Danelljan M., Gool L. V., Timofte R. (2019). Learning Discriminative Model Prediction for Tracking. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6181-6190. DOI: 10.1109/ICCV.2019.00628.

5. He K., Zhang X., Ren S., Sun J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778. DOI: 10.1109/CVPR.2016.90.

6. Cui Y., Song T., Wu G., Wang L. (2023). MixFormerV2: Efficient Fully Transformer Tracking. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), vol. 36, pp. 58736-58751.

7. Kristan M., Leonardis A., Matas J., Felsberg M. et al (2022). The Tenth Visual Object Tracking VOT2022 Challenge Results. Computer Vision - ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol. 13808, pp. 431-460. DOI: 10.1007/978-3-031-25085-9_25.

8. Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D. et al (2021). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. International Conference on Learning Representations. DOI: 10.48550/arXiv.2010.11929.

9. Borsuk V., Vei R., Kupyn O., Martyniuk T., Krashenyi I., Matas J. Fast, Efficient, Accurate and Robust Visual Tracker (2022). Computer Vision - ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol. 13682, pp. 644-663. DOI: 10.1007/978-3-031-20047-2_37.

10. Iandola F. N., Han S., Moskewicz M. W., Ashraf K. et al (2016). AlexNet-level Accuracy with 50x Fewer Parameters and <0.5mb Model Size. arXiv.org. DOI: 10.48550/arXiv.1602.07360.

11. Howard A., Sandler M., Chu G., Chen L.-C. et al (2019). Searching for MobileNetV3. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314-1324. DOI: 10.1109/ICCV.2019.00140.

12. Zhang X., Zhou X., Lin M., Sun J. (2018). ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6848-6856. DOI: 10.1109/CVPR.2018.00716.

13. Redmon J., Farhadi A. (2018). YOLOv3: An Incremental Improvement. arXiv.org. DOI: 10.48550/arXiv.1804.02767.

14. Jocher G. (2020). Ultralytics YOLOv5. DOI:10.5281/zenodo.3908559.

15. Kristan M., Matas J., Leonardis A., Felsberg M. et al (2019). The Seventh Visual Object Tracking VOT2019 Challenge Results. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). DOI: 10.1109/ICCVW.2019.00276.

16. Boyd S., Parikh N., Chu E, Peleato B., Eckstein J. (2010). Distributed optimization and statistical learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, Vol. 3, Iss. 1, pp. 1-122. DOI: 10.1561/2200000016.

17. Danelljan M., Häger G., Khan F., Felsberg M. (2015). Learning spatially regularized correlation filters for visual tracking. IEEE International Conference on Computer Vision (ICCV), pp. 4310-4318. DOI: 10.1109/ICCV.2015.490.

18. Danelljan M., Bhat G., Khan F.S., Felsberg M. (2017). ECO: Efficient convolution operators for tracking. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6931-6939. DOI: 10.1109/CVPR.2017.733.

19. Danelljan M., Häger G., Khan F.S., Felsberg M. (2017). Discriminative Scale Space Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, Iss. 8, pp. 1561–1575. DOI: 10.1109/TPAMI.2016.2609928.

20. Li F., Tian C., Zuo W., Zhang Lei, Yang M.-H. (2018). Learning spatial-temporal regularized correlation filters for visual tracking. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4904-4913. DOI: 10.1109/CVPR.2018.00515.

21. Lukežič A., Vojíř T., Zajc L.Č., Matas J., Kristan M. (2017). Discriminative correlation filter tracker with channel and spatial reliability. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6309-6318. DOI: 10.1007/s11263-017-1061-3.

22. Bolme D. S., Beveridge R. J., Draper B. A., Lui Y. M. (2010) Visual Object Tracking using Adaptive Correlation Filters. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2544-2550. DOI: 10.1109/CVPR.2010.5539960.

23. Henriques J. F., Caseiro R., Martins P., Batista J. (2015). High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), Vol. 37, Iss. 3, pp. 583-596. DOI: 10.1109/TPAMI.2014.2345390.

24. Danelljan M., Robinson A., Khan F. S. and Felsberg M. (2016). Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking. 14th European Conference on Computer Vision - ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, Vol. 9909, pp. 472-488. DOI: 10.1007/978-3-319-46454-1_29.

25. Galoogahi H. K., Sim T., Lucey S. (2015). Correlation Filters with Limited Boundaries. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4630-4638. DOI: 10.1109/CVPR.2015.7299094.

26. Galoogahi H. K., Fagg A., Lucey S. (2017). Learning background-aware correlation filters for visual tracking. IEEE International Conference on Computer Vision (ICCV), pp. 1144-1152. DOI: 10.1109/ICCV.2017.129.

27. Yan B., Zhang X., Wang D., Lu H., Yang X. (2021) Alpha-refine: Boosting tracking performance by precise bounding box estimation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5289–5298. DOI: 10.1109/CVPR46437.2021.00525.

28. Lukežič A., Matas J., Kristan M. (2020). D3S – A Discriminative Single Shot Segmentation Tracker. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7131–7140. DOI: 10.1109/CVPR42600.2020.00716.

29. Xu T., Feng Z.-H., Wu X.-J., Kittler J. (2020). An Accelerated Correlation Filter Tracker. Pattern Recognition, vol. 102. DOI: 10.1016/j.patcog.2019.107172.

30. Ma C., Huang J.-B., Yang X., Yang M.-H. (2015). Hierarchical Convolutional Features for Visual Tracking.2015 IEEE International Conference on Computer Vision (ICCV), pp. 3074-3082. DOI: 10.1109/ICCV.2015.352.

31. Hu J., Shen L., Albanie S., Sun G., Wu E. (2018). Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. DOI: 10.1109/CVPR.2018.00745.

32. Xie S., Girshick R., Dollár P., Tu Z., He K. (2017). Aggregated Residual Transformations for Deep Neural Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987-5995. DOI: 10.1109/CVPR.2017.634.

Published

2025-09-30

Issue

Section

Computing methods in radio electronics

How to Cite

“Study of the Use of Features Extracted From the Lightweight and Fast Neural Networks in the Visual Object Trackers Based on Discriminative Correlation Filters” (2025) Visnyk NTUU KPI Seriia - Radiotekhnika Radioaparatobuduvannia, (101), pp. 39–50. doi:10.20535/RADAP.2025.101.%p.