iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1145/3649329.3657314
Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers | Proceedings of the 61st ACM/IEEE Design Automation Conference skip to main content
10.1145/3649329.3657314acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

Published: 07 November 2024 Publication History

Abstract

Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs. Previous state-of-the-art works optimize these operations by piece-wise linear approximation and store the parameters in look-up tables (LUT), but most of them require unfriendly high-precision arithmetics such as FP/INT 32 and lack consideration of integer-only INT quantization. This paper proposed a genetic LUT-Approximation algorithm namely GQA-LUT that can automatically determine the parameters with quantization awareness. The results demonstrate that GQA-LUT achieves negligible degradation on the challenging semantic segmentation task for both vanilla and linear Transformer models. Besides, proposed GQA-LUT enables the employment of INT8-based LUT-Approximation that achieves an area savings of 81.3~81.7% and a power reduction of 79.3~80.2% compared to the high-precision FP/INT 32 alternatives. Code is available at https://github.com/PingchengDong/GQA-LUT.

References

[1]
Jacob Devlin et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, page 2, 2019.
[2]
Ze Liu et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012--10022, 2021.
[3]
Enze Xie et al. Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077--12090, 2021.
[4]
Han Cai et al. Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17302--17313, October 2023.
[5]
Dong Zhang et al. Augmented fcn: rethinking context modeling for semantic segmentation. Science China Information Sciences, 66(4):142105, 2023.
[6]
Sehoon Kim et al. I-bert: Integer-only bert quantization. In International conference on machine learning, pages 5506--5518. PMLR, 2021.
[7]
Shih-Yang Liu et al. Oscillation-free quantization for low-bit vision transformers. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 21813--21824. PMLR, 23--29 Jul 2023.
[8]
Shih-yang Liu et al. Llm-fp4: 4-bit floating-point quantized transformers. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 592--605, 2023.
[9]
Fengbin Tu et al. Multcim: Digital computing-in-memory-based multimodal transformer accelerator with attention-token-bit hybrid sparsity. IEEE Journal of Solid-State Circuits, 2023.
[10]
Jacob R Stevens et al. Softermax: Hardware/software co-design of an efficient softmax for transformers. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 469--474. IEEE, 2021.
[11]
Joonsang Yu et al. Nn-lut: neural approximation of non-linear operations for efficient transformer inference. In 2023 59th ACM/IEEE Design Automation Conference (DAC), pages 577--582, 2022.
[12]
Janghyeon Kim et al. Range-invariant approximation of non-linear operations for efficient bert fine-tuning. In 2023 60th ACM/IEEE Design Automation Conference (DAC), pages 1--6. IEEE, 2023.
[13]
Xijie Huang et al. Sdq: Stochastic differentiable quantization with mixed precision. In International Conference on Machine Learning, pages 9295--9309. PMLR, 2022.
[14]
Xianghong Hu et al. A tiny accelerator for mixed-bit sparse cnn based on efficient fetch method of simo spad. IEEE Transactions on Circuits and Systems II: Express Briefs, 70(8):3079--3083, 2023.
[15]
Benoit Jacob et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2704--2713, 2018.
[16]
Ashish Vaswani et al. Attention is all you need. Advances in neural information processing systems, 30, 2017.
[17]
Dongchen Han et al. Flatten transformer: Vision transformer using focused linear attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5961--5971, 2023.
[18]
Yoshua Bengio et al. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
[19]
Steven K. Esser et al. Learned step size quantization. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020.
[20]
John Holland. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press, 1992.
[21]
Marius Cordts et al. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213--3223, 2016.
[22]
Dong Zhang et al. Graph reasoning transformer for image parsing. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2380--2389, 2022.

Cited By

View all
  • (2024)Boosting Weakly-Supervised Image Segmentation via Representation, Transform, and CompensatorIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341377834:11(11013-11025)Online publication date: Nov-2024
  • (2024)Hardware-oriented algorithms for softmax and layer normalization of large language modelsScience China Information Sciences10.1007/s11432-024-4137-467:10Online publication date: 12-Sep-2024

Index Terms

  1. Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference
        June 2024
        2159 pages
        ISBN:9798400706011
        DOI:10.1145/3649329
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        In-Cooperation

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 07 November 2024

        Check for updates

        Author Tags

        1. non-linear function
        2. quantization-aware training
        3. integer-only arithmetic
        4. transformer
        5. look-up table
        6. genetic algorithm

        Qualifiers

        • Research-article

        Conference

        DAC '24
        Sponsor:
        DAC '24: 61st ACM/IEEE Design Automation Conference
        June 23 - 27, 2024
        CA, San Francisco, USA

        Acceptance Rates

        Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

        Upcoming Conference

        DAC '25
        62nd ACM/IEEE Design Automation Conference
        June 22 - 26, 2025
        San Francisco , CA , USA

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)94
        • Downloads (Last 6 weeks)94
        Reflects downloads up to 11 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Boosting Weakly-Supervised Image Segmentation via Representation, Transform, and CompensatorIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341377834:11(11013-11025)Online publication date: Nov-2024
        • (2024)Hardware-oriented algorithms for softmax and layer normalization of large language modelsScience China Information Sciences10.1007/s11432-024-4137-467:10Online publication date: 12-Sep-2024

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media