Skip to content

Vision Transformer

Documentation

Attention is all you need
Attention_is_all_you_need_201706_Google_Brain.pdf
AN IMAGE IS WORTH 16X16 WORDS - TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
ViT_google_research_202010.pdf
Training data-efficient image transformers & distillation through attention
DeiT_Facebook_202101.pdf
MLP-Mixer - An all-MLP Architecture for Vision
MLP-Mixer_Google_research_202105.pdf
Swin Transformer - 성능 좋은듯
Hierarchical Vision Transformer using Shifted Windows
https://www.youtube.com/watch?v=FQVS_0Bja6o
https://arxiv.org/abs/2103.14030
https://github.com/SwinTransformer/Swin-Transformer-Object-Detection

See also

Favorite site