Bibliography (7):

  1. https://research.google/blog/a-fast-wordpiece-tokenization-system/

  2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  3. https://github.com/huggingface/tokenizers

  4. https://www.tensorflow.org/text/guide/subwords_tokenizer