Abstract
Keypoint detection and description are the basis of many computer vision applications such as object recognition and image analysis. Current deep learning-based methods have made great progress in joint learning of keypoint detection and description construction. Low-level features have been proved to be helpful for keypoint detection and description. However, current detector and descriptor focus more on high-level feature and ignore the importance of low-level feature. They simply concatenate features and are lack of sufficient feature fusion. In this work, we propose a fusion representation learning network, which fuses different levels of features for both detectors and descriptors. Furthermore, we design and propose an adaptive feature fusion structure for the descriptor. Extensive experiments on HPatches, FM-Bench and Day-Night datasets demonstrate the superiority of our approach.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
References
Nai, K., Li, Z., Li, G., Wang, S.: Robust object tracking via local sparse appearance model. IEEE Tran. Image Process. 27(10), 4958–4970 (2018)
Sipiran, I., Bustos, B.: Key-components: detection of salient regions on 3D meshes. Vis. Comput. 29(12), 1319–1332 (2013)
Zhou, L., Zhu, S., Luo, Z., Shen, T., Zhang, R., Zhen, M., Fang, T., Quan, L.: Learning and matching multi-view descriptors for registration of point clouds. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 505–522 (2018)
Harris, C., Stephens, M.: A combined corner and edge detector. Alvey Vis. Conf. 50(15), 10–5244 (1988)
Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vis. 60(1), 63–86 (2004)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: International Conference on Computer Vision (ICCV), vol. 2011, pp. 2564–2571 (2011)
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)
Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-Net: learning local features from images, arXiv preprint arXiv:1805.09662 (2018)
Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2010, pp. 3304–3311 (2010)
Dusmanu, M., Rocco, T., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., Sattler, T.: D2-net: a trainable CNN for joint description and detection of local features. In: Proceedings of the IEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8092–8101 (2019)
Revaud, J., Weinzaepfel, P., De Souza, C., Pion, N., Csurka, G., Cabon, Y., Humenberger, M.: R2D2: repeatable and reliable detector and descriptor, arXiv preprint arXiv:1906.06195 (2019)
Luo, Z., Zhou, L., Bai, X., Chen, H., Zhang, J., Yao, Y., Li, S., Fang, T., Quan, L.: Aslfeat: learning local features of accurate shape and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6589–6598 (2020)
Zhang, W., Xiong, Q., Shi, W., Chen, S.: Region saliency detection via multi-feature on absorbing Markov chain. Vis. Comput. 32(3), 275–287 (2016)
Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 50, No. 15, pp. 10–5244 (1988)
Kong, H., Akakin, H.C., Sarma, S.E.: A generalized Laplacian of Gaussian filter for blob detection and its applications. IEEE Trans. Cybern. 43(6), 1719–1733 (2013)
Zhang, X., Yu, F.X., Karaman, S., Chang, S.F.: Learning discriminative and transformation covariant local feature detectors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6818–6826 (2017)
Yao, Q., Hu, X., Lei, H.: Geospatial object detection in remote sensing images based on multi-scale convolutional neural networks. In: IGARSS, IEEE International Geoscience and Remote Sensing Symposium, vol. 2019, pp. 1450–1453 (2019)
Bay, H., Tuytelaars, T., Van Gool, L., Surf, L.: Speeded up robust features. In: European Conference on Computer Vision (ECCV), pp. 404–417 (2006)
Liu, B., Wu, H., Su, W., Zhang, W., Sun, J.: Rotation-invariant object detection using Sector-ring HOG and boosted random ferns. Vis. Comput. 34(5), 707–719 (2018)
Strecha, C., Bronstein, A., Bronstein, M., Fua, P.: LDAHash: improved matching with smaller descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 66–78 (2011)
Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3456–3465 (2017)
Yi, K.M., Verdie, Y., Fua, P., Lepetit, V.: Learning to assign orientations to feature points. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 107–116 (2016)
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: Lift: learned invariant feature transform. In: European Conference on Computer Vision (ECCV), pp. 467–483 (2016)
Shen, X., Wang, C., Li, X., Yu, Z., Li, J., Wen, C., Cheng, M., He, Z.: Rf-net: an end-to-end image matching network based on receptive field. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8132–8140 (2019)
Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5173-5182 (2017)
Bian, J.W., Wu, Y.H., Cheng, M.M., Reid, I.: An evaluation of feature matchers for fundamental matrix estimation, arXiv preprint arXiv:1908.09474 (2019)
Sattler, T., Weyand, T., Leibe, B., Kobbelt, L.: Image retrieval for image-based localization revisited. In: British Machine Vision Conference (BMVC) vol. 2, No. 1, p. 4 (2012)
Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D SLAM systems. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), vol. 2012, pp. 573–580 (2012)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2012, pp. 3354–3361 (2012)
Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Gr. (ToG) 36(4), 1–13 (2017)
Wilson, K., Snavely, N.: Robust global translations with 1dsfm. In: European Conference on Computer Vision (ECCV), pp. 61–75 (2014)
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4104–4113 (2016)
Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. Adv. Neural Inf. Process. Syst. (NIPS), vol. 30 (2017)
Mishkin, D., Radenovic, F., Matas, J.: Repeatability is not enough: learning affine regions via discriminability. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 284–300 (2018)
Luo, Z., Shen, T., Zhou, L., Zhang, J., Yao, Y., Li, S., Fang, T., Quan, L.: Contextdesc: local descriptor augmentation with cross-modality context. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2527–2536 (2019)
Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2012, pp. 2911–2918 (2012)
Joshi, K., Patel, M.I.: Recent advances in local feature detector and descriptor: a literature survey. Int. J. Multimed. Inf. Retr. 9(4), 231–247 (2020)
Qin, Z., Fang, K., Zhu, Y., Fei-Fei, L., Savarese, S.: Keto: learning keypoint representations for tool manipulation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 7278–7285 (2020)
Song, Y., Cai, L., Li, J., Tian, Y., Li, M.: SEKD: self-evolving keypoint detection and description, arXiv preprint arXiv:2006.05077 (2020)
Yang, Y., Asthana, A., Zheng, L.: Does keypoint estimation benefit object detection? An empirical study of one-stage and two-stage detectors. In: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG ), pp. 1–7 (2021)
Acknowledgements
This work was supported in part by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under the Grant NRF-2020R1F1A1072332, in part by National Natural Science Foundation of China under the Grant 61231010, in part by the scholarship from China Scholarship Council (CSC) under the Grant CSC No. 202006020119.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, S., Park, U., Sun, S. et al. Fusion representation learning for keypoint detection and description. Vis Comput 39, 5683–5692 (2023). https://doi.org/10.1007/s00371-022-02689-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02689-7