Skip to content

Vision Transformer

Documentation

Attention is all you need: Attention_is_all_you_need_201706_Google_Brain.pdf

AN IMAGE IS WORTH 16X16 WORDS - TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE: ViT_google_research_202010.pdf

Training data-efficient image transformers & distillation through attention: DeiT_Facebook_202101.pdf

MLP-Mixer - An all-MLP Architecture for Vision: MLP-Mixer_Google_research_202105.pdf

Swin Transformer - 성능 좋은듯: Hierarchical Vision Transformer using Shifted Windows; https://www.youtube.com/watch?v=FQVS_0Bja6o; https://arxiv.org/abs/2103.14030; https://github.com/SwinTransformer/Swin-Transformer-Object-Detection

See also

Favorite site