Efficient Vision Transformers with Dynamic Token Sparsification
 
				DynamicViT
This repository contains PyTorch implementation for DynamicViT.
Created by Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, Cho-Jui Hsieh

Model Zoo
We provide our DynamicViT models pretrained on ImageNet:
Usage
Requirements
- torch>=1.7.0
- torchvision>=0.8.1
- timm==0.4.5
Data preparation: download and extract ImageNet images from http://image-net.org/. The directory structure should be
│ILSVRC2012/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......
Model preparation: download pre-trained DeiT and LV-ViT models for training DynamicViT:
sh download_pretrain.sh
Demo
We provide a Jupyter notebook where you can run the visualization