The Vision Transformer Model

With the Transformer architecture revolutionizing the implementation of attention, and achieving very promising results in the natural language processing domain, it was only a matter of time before we could see its application in the computer vision domain too. This was eventually achieved with the implementation of the Vision Transformer (ViT).

In this tutorial, you will discover the architecture of the Vision Transformer model, and its application to the task of image classification.

After completing this tutorial, you will know:

How the ViT works in the context of image classification.
What the training process of the ViT entails.
How the ViT compares to convolutional neural networks in terms of inductive bias.

To finish reading, please visit source site

Attention