Black-Box Dissector: Towards Erasing-Based Hard-Label Model Stealing Attack

Wang, Yixu; Li, Jie; Liu, Hong; Wang, Yan; Wu, Yongjian; Huang, Feiyue; Ji, Rongrong

doi:10.1007/978-3-031-20065-6_12

Yixu Wang¹²,
Jie Li¹²,
Hong Liu¹³,
Yan Wang¹⁴,
Yongjian Wu¹⁵,
Feiyue Huang¹⁵ &
…
Rongrong Ji^12,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13665))

Included in the following conference series:

European Conference on Computer Vision

2519 Accesses
15 Citations

Abstract

Previous studies have verified that the functionality of black-box models can be stolen with full probability outputs. However, under the more practical hard-label setting, we observe that existing methods suffer from catastrophic performance degradation. We argue this is due to the lack of rich information in the probability prediction and the overfitting caused by hard labels. To this end, we propose a novel hard-label model stealing method termed black-box dissector, which consists of two erasing-based modules. One is a CAM-driven erasing strategy that is designed to increase the information capacity hidden in hard labels from the victim model. The other is a random-erasing-based self-knowledge distillation module that utilizes soft labels from the substitute model to mitigate overfitting. Extensive experiments on four widely-used datasets consistently demonstrate that our method outperforms state-of-the-art methods, with an improvement of at most $8.27\%$. We also validate the effectiveness and practical potential of our method on real-world APIs and defense methods. Furthermore, our method promotes other related tasks, i.e., transfer adversarial attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LSSMSD: defending against black-box DNN model stealing based on localized stochastic sensitivity

Article 18 September 2024

Rethinking Open-World DeepFake Attribution with Multi-perspective Sensory Learning

Article 12 August 2024

Black-Box Buster: A Robust Zero-Shot Transfer-Based Adversarial Attack Method

Notes

1.
For the purpose of protecting privacy, we hide the specific information of the victim model.
2.
https://github.com/garythung/trashnet.

References

Anil, R., Pereyra, G., Passos, A., Ormandi, R., Dahl, G.E., Hinton, G.E.: Large scale distributed neural network training through online distillation. arXiv preprint arXiv:1804.03235 (2018)
Barbalau, A., Cosma, A., Ionescu, R.T., Popescu, M.: Black-box ripper: copying black-box models using generative evolutionary algorithms. In: NeurIPS (2020)
Google Scholar
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)
Dong, X., Luu, A.T., Ji, R., Liu, H.: Towards robustness against natural language word substitutions. In: ICLR (2021)
Google Scholar
Ducoffe, M., Precioso, F.: Adversarial active learning for deep networks: a margin based approach. In: ICML (2018)
Google Scholar
Fang, S., Li, J., Lin, X., Ji, R.: Learning to learn transferable attack. In: AAAI (2022)
Google Scholar
Furlanello, T., Lipton, Z.C., Tschannen, M., Itti, L., Anandkumar, A.: Born again neural networks. In: ICML (2018)
Google Scholar
Gong, X., Chen, Y., Yang, W., Mei, G., Wang, Q.: Inversenet: augmenting model extraction attacks with training data inversion. In: IJCAI (2021)
Google Scholar
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset (2007)
Google Scholar
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: ICML (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS Deep Learning Workshop (2015)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)
Google Scholar
Jagielski, M., Carlini, N., Berthelot, D., Kurakin, A., Papernot, N.: High accuracy and high fidelity extraction of neural networks. In: 29th Usenix Security (2020)
Google Scholar
Kariyappa, S., Prakash, A., Qureshi, M.: Maze: data-free model stealing attack using zeroth-order gradient estimation. arXiv preprint arXiv:2005.03161 (2020)
Kariyappa, S., Qureshi, M.K.: Defending against model stealing attacks with adaptive misinformation. In: CVPR (2020)
Google Scholar
Kim, K., Ji, B., Yoon, D., Hwang, S.: Self-knowledge distillation: a simple way for better generalization. arXiv preprint arXiv:2006.12000 (2020)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: SIGIR (1994)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)
Google Scholar
Maini, P., Yaghini, M., Papernot, N.: Dataset inference: ownership resolution in machine learning. In: ICLR (2021)
Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)
Google Scholar
Orekondy, T., Schiele, B., Fritz, M.: Knockoff nets: stealing functionality of black-box models. In: CVPR (2019)
Google Scholar
Orekondy, T., Schiele, B., Fritz, M.: Prediction poisoning: towards defenses against dnn model stealing attacks. In: ICLR (2019)
Google Scholar
Pal, S., Gupta, Y., Shukla, A., Kanade, A., Shevade, S., Ganapathy, V.: Activethief: model extraction using active learning and unannotated public data. In: AAAI (2020)
Google Scholar
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: ACM AsiACCS (2017)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV 115, 211–252 (2015)
Article MathSciNet Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: ICCV (2017)
Google Scholar
Sener, O., Savarese, S.: Active learning for convolutional neural networks: a core-set approach. In: ICLR (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)
Google Scholar
Wang, X., Xiang, Y., Gao, J., Ding, J.: Information laundering for model privacy. In: ICLR (2021)
Google Scholar
Yang, J., Jiang, Y., Huang, X., Ni, B., Zhao, C.: Learning black-box attackers with transferable priors and query feedback. In: NeurIPS (2020)
Google Scholar
Yu, H., Yang, K., Zhang, T., Tsai, Y.Y., Ho, T.Y., Jin, Y.: Cloudleak: large-scale deep learning models stealing through adversarial examples. In: NDSS (2020)
Google Scholar
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: ICLR (2017)
Google Scholar
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: ICLR (2017)
Google Scholar
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: AAAI (2020)
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016)
Google Scholar
Zhou, M., Wu, J., Liu, Y., Liu, S., Zhu, C.: Dast: data-free substitute training for adversarial attacks. In: CVPR (2020)
Google Scholar

Download references

Acknowledgments

This work was supported by the National Science Fund for Distinguished Young Scholars (No.62025603), the National Natural Science Foundation of China (No. U21B2037, No. 62176222, No. 62176223, No. 62176226, No. 62072386, No. 62072387, No. 62072389, and No. 62002305), Guangdong Basic and Applied Basic Research Foundation (No. 2019B1515120049), and the Natural Science Foundation of Fujian Province of China (No. 2021J01002).

Author information

Authors and Affiliations

Media Analytics and Computing Lab, School of Informatics, Xiamen University, Xiamen, China
Yixu Wang, Jie Li & Rongrong Ji
National Institute of Informatics, Tokyo, Japan
Hong Liu
Pinterest, San Francisco, USA
Yan Wang
Youtu Lab, Tencent Technology (Shanghai) Co., Ltd., Shanghai, China
Yongjian Wu & Feiyue Huang
Institute of Artificial Intelligence, Xiamen University, Xiamen, China
Rongrong Ji

Authors

Yixu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yongjian Wu
View author publications
You can also search for this author in PubMed Google Scholar
Feiyue Huang
View author publications
You can also search for this author in PubMed Google Scholar
Rongrong Ji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rongrong Ji .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 324 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y. et al. (2022). Black-Box Dissector: Towards Erasing-Based Hard-Label Model Stealing Attack. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13665. Springer, Cham. https://doi.org/10.1007/978-3-031-20065-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-20065-6_12
Published: 03 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20064-9
Online ISBN: 978-3-031-20065-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Black-Box Dissector: Towards Erasing-Based Hard-Label Model Stealing Attack

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LSSMSD: defending against black-box DNN model stealing based on localized stochastic sensitivity

Rethinking Open-World DeepFake Attribution with Multi-perspective Sensory Learning

Black-Box Buster: A Robust Zero-Shot Transfer-Based Adversarial Attack Method

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 324 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Black-Box Dissector: Towards Erasing-Based Hard-Label Model Stealing Attack

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LSSMSD: defending against black-box DNN model stealing based on localized stochastic sensitivity

Rethinking Open-World DeepFake Attribution with Multi-perspective Sensory Learning

Black-Box Buster: A Robust Zero-Shot Transfer-Based Adversarial Attack Method

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 324 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation