Transformers reading list

I am currently working with transformer models. Transformers are a family of models that were developed principally for natural language processing. Their distinguishing feature is a mechanism called attention, which allows them to learn which tokens in a sequence are most relevant to the meaning of a particular token when it is being encoded or decoded. This allows the model to change the mathematical representation of the meaning of each token to reflect the context in which it occurs.

Transformers perform well on a wide-range of natural language tasks, including text classification, named entity recognition, and sequence-to-sequence tasks like translation, summarisation and text generation. They are typically pre-trained to perform a general task on a large linguistic corpus in an unsupervised way, using tasks like next word prediction and masked word prediction. This allows the model to learn the features of natural language without needing labelled data. You can then take the pre-trained model and fine-tune it to perform a more specific task using much less training data than you would otherwise need.

While these models are very useful, they are also quite complex. They combine a number of concepts and components from different branches of deep learning, and sequence-to-sequence models in particular integrate two individually sophisticated architectures, called an encoder and a decoder.

Thanks to the frameworks developed by Hugging Face, you can use transformers without knowing all the details of their implementation. But I prefer to get my head around the details, even if it takes a bit of time and effort. It's easier to identify things you can do to improve a model if you have walked through the details of how they work.

This page is a collection of useful links for learning about transformers. It's mainly a note to myself so that I can easily find things that have helped me in the past. But hopefully it's helpful to others who are learning to use these models too.

Transformer architecture

Hugging Face

Working with transformers