âMUX-PLMs: Pre-Training Language Models With Data Multiplexingâ, 2023-02-24 (; backlinks)â :
Data multiplexing is a recently proposed method for improving a modelâs inference efficiency by processing multiple instances simultaneously using an ordered representation mixture. Prior work on data multiplexing only used task-specific Transformers without any pre-training, which limited their accuracy and generality.
In this paper, we develop pre-trained multiplexed language models (MUX-PLMs) that can be widely finetuned on any downstream task. Our approach includes a three-stage training procedure and novel multiplexing and demultiplexing modules for improving throughput and downstream task accuracy.
We demonstrate our method on BERT and ELECTRA pre-training objectives, with our MUX-BERT and MUX-ELECTRA models achieving 2Ă/5Ă inference speedup with a 2â4% drop in absolute performance on GLUE and 1â2% drop on token-level tasks.