Vision Transformer
Documentation
- Attention is all you need
-
Attention_is_all_you_need_201706_Google_Brain.pdf
- AN IMAGE IS WORTH 16X16 WORDS - TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
-
ViT_google_research_202010.pdf
- Training data-efficient image transformers & distillation through attention
-
DeiT_Facebook_202101.pdf
- MLP-Mixer - An all-MLP Architecture for Vision
-
MLP-Mixer_Google_research_202105.pdf
- Swin Transformer - 성능 좋은듯
- Hierarchical Vision Transformer using Shifted Windows
- https://www.youtube.com/watch?v=FQVS_0Bja6o
- https://arxiv.org/abs/2103.14030
- https://github.com/SwinTransformer/Swin-Transformer-Object-Detection