SpAtten-Chip: A Fully-Integrated Energy-Scalable Transformer Accelerator Supporting Adaptive Model Configuration and Word Elimination for Language Understanding on Edge Devices

News

Waiting for more news.

Awards

No items found.

Competition Awards

No items found.

Abstract

Efficient natural language processing on the edge is needed to interpret voice commands, which have become a standard way to interact with devices around us. Due to the tight power and compute constraints of edge devices, it is important to adapt the computation to the hardware conditions. We present a Transformer accelerator with a variable-depth adder tree to support different model dimensions, a SuperTransformer model from which Sub Transformers of various sizes can be sampled enabling adaptive model configuration, and a dedicated word elimination unit to prune redundant tokens. We achieve up to 6.9× scalability in network latency and energy between the largest and smallest Sub Transformers, under the same operating conditions. Word elimination can reduce network energy by 16%, with a 14.5% drop in F1 score. At 0.68V and 80MHz, processing a 32-length input with our custom 2-layer Transformer model for intent detection and slot filling takes 0.61ms and 1.6μJ.

‍

Video

Citation

@INPROCEEDINGS{10244459,
author={Ji, Zexi and Wang, Hanrui and Wang, Miaorong and Khwa, Win-San and Chang, Meng-Fan and Han, Song and Chandrakasan, Anantha P.},
booktitle={2023 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)},
title={A Fully-Integrated Energy-Scalable Transformer Accelerator Supporting Adaptive Model Configuration and Word Elimination for Language Understanding on Edge Devices},
year={2023},
volume={},
number={},
pages={1-6},
keywords={Adaptation models;Computational modeling;Scalability;Image edge detection;Transformers;Hardware;Natural language processing;hardware accelerators;machine learning;natural language processing;transformers},
doi={10.1109/ISLPED58423.2023.10244459}}

‍

Media

No media articles found.

Acknowledgment

The authors would like to thank TSMC for funding and the TSMC University Shuttle Program for tapeout support.

Team Members

Hanrui Wang