Skip to main content
  1. Data Science Blog/

What Are Transformers in AI

·7583 words·36 mins· loading · ·
Artificial Intelligence (AI) AI/ML Models Natural Language Processing (NLP) Transformer Architecture Deep Learning (DL) Machine Learning (ML) Neural Networks Language Models (LLMs) AI Models AI Model Architecture AI Research

On This Page

Table of Contents
Share with :

What-are-Transformers-in-AI

What Are Transformers in AI
#

Transformer Architecture
#

Transformer

Background
#

Whether GPT, ChatGPT, DALL-E, Whisper, Satablity AI or whatever significant you see in the AI worlds nowdays it is because of Transformer Architecture. Transformers are a type of neural network architecture that have several properties that make them effective for modeling data with long-range dependencies. They generally feature a combination of multi-headed attention mechanisms, residual connections, layer normalization, feedforward connections, and positional embeddings.

Precursors of Transformers were RNN, LSTM, and GRU architecture. Transformers are based on the 2017 research paper “Attention is All You Need”

Initially, Transformers were used for NLP-related tasks. Slowly researchers started exploring the power of the Transformer Architectures and as of 2023 these are used for hundreds of tasks in different AI domains of technologies like:

  • Text Models (NLP, NLU, NLG)
  • Vision Models (Computer Vision)
  • Audio Models (Audio Processing, Classification, Audio Generation)
  • Reinforcement (RL) Models
  • Time-series Models
  • Multimodal: OCR (extract information from scanned documents), video classification, visual QA, table data question answering
  • Graph Models

Starting the journey in 2017, as of now (2023) we have approx 200 Transformer based architectures proposed by various researchers for various purposes. Using these architecture and various benchmark datasets thousands of models have been created which give SOTA performance on various tasks. Based on your need you choose which architecture can help you meet your project objective. There are high chances you will get some pre-trained models which you can use without training (Zero-shot) or small finetuning (one-shot or few-shot) efforts. For that you need to explore Huggingface and PaperWithCode

This articles list all the major Transformer related researcher paper, their creators, capability and date of release.

Tasks, which a Transformer can do
#

Vision Tasks
#

  • Image classification
  • Semantic segmentation
  • Video classification
  • Object detection
  • Zero-shot object detection
  • Zero-shot image classification
  • Depth estimation

Multimodal Tasks
#

  • Image captioning
  • Document Question Answering
  • Image to Text
  • Text to Video
  • Document Question Answering
  • Visual Question Answering
  • Text to Image
  • Image to Image
  • Image Generation

Audio Tasks
#

  • Audio classification
  • Automatic speech recognition
  • Audio to Audio
  • Text to Speech
  • Voice Activity Detection
  • Audio Generation

Text Tasks
#

  • Text classification
  • Token classification (NER, POS etc)
  • Question answering
  • Causal language modeling
  • Masked language modeling
  • Translation
  • Summarization
  • Multiple choice
  • Sentence Similarity
  • Table Question Answering
  • Fill in the black (Masking Filling)
  • Conversation

Frameworks Used for Developing Models using Above Architectures
#

As of May'2023 following frameworks are used for creating models. Tensorflow and Pytorch are two most popular frameworks. Keras is not part of Tensorflow.

  • TensorFlow
  • Caffe
  • Caffe2
  • PyTorch
  • MXNet
  • Keras
  • Chainer
  • JAX

Number of Models in Model Repositories
#

There are many model repositories but the most famous are as below. These model repositories host pre-trained models. You can download these models and use them for your project.

  • Huggingface : As of 2-Jul-23 Huggingface has 243,495 models. In May, 2023, Huggingface has 196,000+ models in the repository. As of Sep'2021, there were 10,000 models. You can see the exponential growth in the models in the Huggingface model repository.
  • Another model repository tfhub has around 132,000+ models as of May'23. Tfhub hosts tensorflow-based models.
  • Keras Moel Zoo hosts around 3500 models.
  • Pytorch Model Hub

Summary of 200+ Transformer
#

Below is the table which summarises these approx 200 transformers.

Note : Name starting with * are not Transformers, most of them are pretransformer age architectures.
Help Needed: If you find any archive paper’s link is incorrect then let me know via hari.prasad@vedavit-ps.com

Sno.TransformerPaper TitleTypeYearResearcher
1.*AlexNet PaperImageNet Classification with Deep Convolutional Neural NetworksCNNDec-2012University of Toronto, Google
2.*VGG16 PaperVery Deep Convolutional Networks for Large-Scale Image RecognitionCNNSep-2014University of Oxford
3.*VGG19 PaperVery Deep Convolutional Networks for Large-Scale Image RecognitionCNNApr-2015University of Oxford
4.*ResNet PaperDeep Residual Learning for Image RecognitionCNNDec-2015Microsoft Research
5.*InceptionResNet PaperInception-v4, Inception-ResNet and the Impact of Residual Connections on LearningCNNAug-2016Google
6.*ConvNeXt PaperConvolutional Neural Networks with Alternately Updated CliqueCNNDec-2016Cornell University, Tsinghua University
7.*DenseNet PaperDensely Connected Convolutional NetworksCNNJan-2017Cornell University, Tsinghua University
8.*MobileNetV1 PaperEfficient Convolutional Neural Networks for Mobile Vision ApplicationsAutoencodingApr-2017Google Inc.
9.*Xception PaperXception: Deep Learning with Depthwise Separable ConvolutionsCNNApr-2017Google
10.EncoderDecoder PaperLeveraging Pre-trained Checkpoints for Sequence Generation TasksSequence-to-SequenceMay-2017Google Research
11.*MobileNetV2 PaperInverted Residuals and Linear BottlenecksAutoencodingFeb-2018Google Inc.
12.Data2Vec PaperA General Framework for Self-supervised Learning in Speech, Vision and LanguageLanguage ModelMar-2018Facebook
13.GPT PaperImproving Language Understanding by Generative Pre-Training. Auto-regressive model for next token predictionAutoregressiveJun-2018OpenAI
14.BERT PaperPre-training of Deep Bidirectional Transformers for Language UnderstandingAutoencodingOct-2018Google
15.MarianMT PaperMachine translation models trained using OPUS dataAutoencodingOct-2018
16.BiT PaperGeneral Visual Representation LearningVision TransformerJan-2019Google AI
17.Transformer-XL PaperAttentive Language Models Beyond a Fixed-Length ContextAutoregressiveJan-2019Google/CMU
18.XLM PaperCross-lingual Language Model PretrainingBERT-basedJan-2019Facebook
19.CTRL PaperA Conditional Transformer Language Model for Controllable GenerationAutoencodingFeb-2019Salesforce
20.GPT-2 PaperLanguage Models are Unsupervised Multitask LearnersAutoregressiveFeb-2019OpenAI
21.Funnel Transformer PaperFiltering out Sequential Redundancy for Efficient Language ProcessingAutoregressiveApr-2019CMU/Google Brain
22.*EfficientNet B0 PaperEfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksCNNMay-2019Google Research
23.ALBERT PaperA Lite BERT for Self-supervised Learning of Language Representations,Factorized BERTMay-2019Google Research and the Toyota Technological Institute at Chicago
24.EfficientNet PaperRethinking Model Scaling for Convolutional Neural NetworksVision TransformerMay-2019Google Brain
25.MobileNetV3 PaperSearching for MobileNetV3AutoencodingMay-2019Google
26.Nezha PaperNeural Contextualized Representation for Chinese Language UnderstandingAutoencodingMay-2019Huawei Noah’s Ark Lab
27.BART PaperDenoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionSequence-to-SequenceJun-2019Facebook
28.ERNIE PaperEnhanced Representation through Knowledge IntegrationAutoencodingJun-2019Baidu
29.ErnieM PaperEnhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual CorporaAutoencodingJun-2019Baidu
30.FlauBERT PaperUnsupervised Language Model Pre-training for FrenchAutoencodingJun-2019CNRS
31.LXMERT PaperLearning Cross-Modality Encoder Representations from Transformers for Open-Domain Question AnsweringAutoencodingJun-2019UNC Chapel Hill
32.Pegasus PaperPre-training with Extracted Gap-sentences for Abstractive SummarizationAutoregressiveJun-2019Google
33.XLNet PaperGeneralized Autoregressive Pretraining for Language UnderstandingAutoregressiveJun-2019Google/CMU
34.BioGpt Papergenerative pre-trained transformer for biomedical text generation and miningAutoregressiveJul-2019Microsoft Research AI4Science
35.Hubert PaperSelf-Supervised Speech Representation Learning by Masked Prediction of Hidden UnitsAutoencodingJul-2019Facebook
36.REALM PaperRetrieval-Augmented Language Model Pre-TrainingHybridJul-2019Google Research
37.SpeechToTextTransformer PaperFast Speech-to-Text Modeling with fairseqHybridJul-2019Facebook,
38.XLM-V PaperOvercoming the Vocabulary Bottleneck in Multilingual Masked Language ModelsMultilingualJul-2019Meta AI
39.RoBERTa PaperA Robustly Optimized BERT Pretraining ApproachBERT-basedAug-2019Facebook
40.GPT Neo PaperEleutherAI/gpt-neoAutoregressiveSep-2019EleutherAI
41.CamemBERT Papera Tasty French Language ModelAutoencodingOct-2019Inria/Facebook/Sorbonne
42.DialoGPT PaperLarge-Scale Generative Pre-training for Conversational Response GenerationAutoregressiveOct-2019Microsoft Research
43.DistilBERT Papersmaller, faster, cheaper and lighterAutoencodingOct-2019HuggingFace
44.LiLT PaperA Simple yet Effective Language-Independent Layout Transformer for Structured Document UnderstandingAutoencodingOct-2019South China University of Technology
45.LUKE PaperDeep Contextualized Entity Representations with Entity-aware Self-attentionAutoencodingOct-2019Studio Ousia
46.MobileBERT Papera Compact Task-Agnostic BERT for Resource-Limited DevicesAutoencodingOct-2019CMU/Google Brain
47.MT5 PaperA massively multilingual pre-trained text-to-text transformerAutoregressiveOct-2019Google AI
48.RAG PaperRetrieval-Augmented Generation for Knowledge-Intensive NLP TasksHybridOct-2019Facebook
49.ConvBERT PaperImproving BERT with Span-based Dynamic ConvolutionAutoencodingNov-2019YituTech
50.Megatron-GPT2 PaperTraining Multi-Billion Parameter Language Models Using Model ParallelismAutoregressiveNov-2019NVIDIA
51.PhoBERT PaperPre-trained language models for VietnameseBERT-basedNov-2019VinAI Research
52.RoBERTa-PreLayerNorm PaperA Fast, Extensible Toolkit for Sequence ModelingBERT-basedNov-2019Facebook
53.BERTweet PaperA pre-trained language model for English TweetsAutoencodingDec-2019VinAI Research
54.mBART PaperMultilingual Denoising Pre-training for Neural Machine TranslationAutoregressiveDec-2019Facebook
55.Megatron-BERT PaperTraining Multi-Billion Parameter Language Models Using Model ParallelismAutoregressiveDec-2019NVIDIA
56.SpeechToTextTransformer2 PaperLarge-Scale Self- and Semi-Supervised Learning for Speech TranslationHybridDec-2019Facebook,
57.BERT For Sequence Generation PaperLeveraging Pre-trained Checkpoints for Sequence Generation TasksAutoencodingFeb-2020Google
58.ConvNeXT PaperA ConvNet for the 2020sVision TransformerMar-2020Facebook AI
59.ELECTRA PaperPre-training text encoders as discriminators rather than generatorsAutoencodingApr-2020Google Research/Stanford University
60.Longformer PaperThe Long-Document TransformerAutoregressiveApr-2020AllenAI
61.RegNet PaperDesigning Network Design SpaceCNNApr-2020META Platforms
62.SqueezeBERT PaperWhat can computer vision teach NLP about efficient neural networks?BERT-basedApr-2020Berkeley
63.LayoutLM PaperPre-training of Text and Layout for Document Image UnderstandingAutoencodingMay-2020Microsoft Research Asia
64.MPNet PaperMasked and Permuted Pre-training for Language UnderstandingAutoencodingMay-2020Microsoft Research
65.VisualBERT PaperA Simple and Performant Baseline for Vision and LanguageBERT-basedMay-2020UCLA NLP
66.Conditional DETR PaperConditional DETR for Fast Training ConvergenceVision TransformerJun-2020Microsoft Research Asia
67.GPTBigCode Paperdon’t reach for the stars!AutoregressiveJun-2020BigCode
68.M-CTC-T PaperPseudo-Labeling For Massively Multilingual Speech RecognitionAutoencodingJun-2020Facebook
69.Pix2Struct PaperScreenshot Parsing as Pretraining for Visual Language UnderstandingHybridJun-2020Google
70.ProphetNet PaperPredicting Future N-gram for Sequence-to-Sequence Pre-trainingAutoregressiveJun-2020Microsoft Research
71.SEW PaperPerformance-Efficiency Trade-offs in Unsupervised Pre-training for Speech RecognitionVision Transformer (ViT)Jun-2020ASAPP
72.T5 PaperExploring the Limits of Transfer Learning with a Unified Text-to-Text TransformerAutoregressiveJun-2020Google AI
73.DeBERTa PaperDecoding-enhanced BERT with Disentangled AttentionAutoencodingJul-2020Microsoft
74.Informer PaperBeyond Efficient Transformer for Long Sequence Time-Series ForecastingAutoencodingJul-2020Beihang University, UC Berkeley, Rutgers University, SEDD Company
75.LED PaperThe Long-Document TransformerAutoregressiveJul-2020AllenAI
76.SwitchTransformers PaperScaling to Trillion Parameter Models with Simple and Efficient SparsityHybridJul-2020Google
77.Whisper PaperRobust Speech Recognition via Large-Scale Weak SupervisionAutoregressiveJul-2020OpenAI
78.XLM-ProphetNet PaperPredicting Future N-gram for Sequence-to-Sequence Pre-trainingHybridJul-2020Microsoft Research
79.XLM-RoBERTa PaperUnsupervised Cross-lingual Representation Learning at ScaleBERT-basedJul-2020Facebook AI,
80.Deformable DETR PaperDeformable Transformers for End-to-End Object DetectionVision TransformerAug-2020SenseTime Research
81.FNet PaperMixing Tokens with Fourier TransformsAutoencodingAug-2020Google Research
82.GPTSAN-japanese Paperreleased in the repository tanreinama/GPTSANAutoregressiveAug-2020
83.SEW-D PaperPerformance-Efficiency Trade-offs in Unsupervised Pre-training for Speech RecognitionVision Transformer (ViT)Aug-2020ASAPP
84.CPM PaperA Large-scale Generative Chinese Pre-trained Language ModelSequence-to-SequenceSep-2020Tsinghua University
85.GIT PaperA Generative Image-to-text Transformer for Vision and LanguageAutoencodingSep-2020Microsoft Research
86.LayoutXLM PaperMultimodal Pre-training for Multilingual Visually-rich Document UnderstandingAutoencodingSep-2020Microsoft Research Asia
87.DETR PaperEnd-to-End Object Detection with TransformersVision TransformerOct-2020Facebook
88.GPT NeoX PaperAn Open-Source Autoregressive Language ModelAutoregressiveOct-2020EleutherAI
89.RemBERT PaperRethinking embedding coupling in pre-trained language modelsBERT-basedOct-2020Google Research
90.RoCBert PaperRobust Chinese Bert with Multimodal Contrastive PretrainingBERT-basedOct-2020WeChatAI
91.TAPAS PaperWeakly Supervised Table Parsing via Pre-trainingHybridOct-2020Google AI
92.UPerNet PaperUnified Perceptual Parsing for Scene UnderstandingVision Transformer (ViT)Oct-2020Peking University
93.Vision Transformer (ViT) PaperTransformers for Image Recognition at ScaleVision Transformer (ViT)Oct-2020Google AI
94.Wav2Vec2 PaperA Framework for Self-Supervised Learning of Speech RepresentationsAutoregressiveOct-2020Facebook AI
95.PLBart PaperUnified Pre-training for Program Understanding and GenerationHybridNov-2020UCLA NLP
96.DiT PaperSelf-supervised Pre-training for Document Image TransformerVision TransformerDec-2020Microsoft Research
97.DPR PaperDense Passage Retrieval for Open-Domain Question AnsweringSequence-to-SequenceDec-2020Facebook
98.GLPN PaperGlobal-Local Path Networks for Monocular Depth Estimation with Vertical CutDepthAutoencodingDec-2020KAIST
99.LeViT PaperA Vision Transformer in ConvNet’s Clothing for Faster InferenceAutoencodingDec-2020Meta AI
100.NAT PaperNeighborhood Attention TransformerAutoencodingDec-2020SHI Labs
101.TAPEX PaperTable Pre-training via Learning a Neural SQL ExecutorHybridDec-2020Microsoft Research
102.VideoMAE PaperMasked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingHybridDec-2020Multimedia Computing Group, Nanjing University
103.Wav2Vec2-Conformer PaperFast Speech-to-Text Modeling with FAIRSEQAutoregressiveDec-2020Facebook AI
104.CLIP PaperLearning Transferable Visual Models From Natural Language SupervisionVision-Language PretrainingJan-2021OpenAI
105.XLS-R PaperSelf-supervised Cross-lingual Speech Representation Learning at ScaleAutoregressiveJan-2021Facebook AI
106.Audio Spectrogram Transformer PaperAudio Spectrogram TransformerAudio TransformerFeb-2021MIT
107.M2M100 PaperBeyond English-Centric Multilingual Machine TranslationAutoregressiveFeb-2021Facebook
108.MEGA PaperMoving Average Equipped Gated AttentionAutoencodingFeb-2021Facebook
109.BEiT PaperBERT Pre-Training of Image TransformersVision TransformerMar-2021Microsoft
110.BigBird-Pegasus PaperTransformers for Longer SequencesSequence-to-SequenceMar-2021Google Research
111.BigBird-RoBERTa PaperTransformers for Longer SequencesAutoencodingMar-2021Google Research
112.CLIPSeg PaperImage Segmentation Using Text and Image PromptsVision-Language PretrainingMar-2021University of Göttingen
113.DPT PaperVision Transformers for Dense PredictionVision TransformerMar-2021Intel Labs
114.Perceiver IO PaperA General Architecture for Structured Inputs & OutputsHybridMar-2021Deepmind
115.Reformer PaperThe Efficient TransformerHybridMar-2021Google Research
116.RoFormer PaperEnhanced Transformer with Rotary Position EmbeddingHybridMar-2021ZhuiyiTechnology
117.Swin Transformer PaperHierarchical Vision Transformer using Shifted WindowsVision Transformer (ViT)Mar-2021Microsoft
118.TrOCR PaperTransformer-based Optical Character Recognition with Pre-trained ModelsHybridMar-2021Microsoft,
119.Wav2Vec2Phoneme PaperSimple and Effective Zero-shot Cross-lingual Phoneme RecognitionAutoregressiveMar-2021Facebook AI
120.X-CLIP PaperExpanding Language-Image Pretrained Models for General Video RecognitionHybridMar-2021Microsoft Research
121.XLSR-Wav2Vec2 PaperUnsupervised Cross-Lingual Representation Learning For Speech RecognitionAutoregressiveMar-2021Facebook AI
122.Blenderbot PaperRecipes for building an open-domain chatbotSequence-to-SequenceApr-2021Facebook
123.BlenderbotSmall PaperRecipes for building an open-domain chatbotSequence-to-SequenceApr-2021Facebook
124.BLIP PaperBootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and GenerationVision TransformerApr-2021Salesforce
125.ByT5 PaperTowards a token-free future with pre-trained byte-to-byte modelsSequence-to-SequenceApr-2021Google Research
126.CvT PaperIntroducing Convolutions to Vision TransformersVision TransformerApr-2021Microsoft
127.DeBERTa-v2 PaperDecoding-enhanced BERT with Disentangled AttentionAutoencodingApr-2021Microsoft
128.DeiT PaperTraining data-efficient image transformers & distillation through attentionVision TransformerApr-2021Facebook
129.GroupViT PaperSemantic Segmentation Emerges from Text SupervisionAutoencodingApr-2021UCSD, NVIDIA
130.LayoutLMv2 PaperMulti-modal Pre-training for Visually-Rich Document UnderstandingAutoencodingApr-2021Microsoft Research Asia
131.MaskFormer PaperPer-Pixel Classification is Not All You Need for Semantic SegmentationAutoencodingApr-2021Meta and UIUC
132.SegFormer PaperSimple and Efficient Design for Semantic Segmentation with TransformersHybridApr-2021NVIDIA
133.Time Series Transformer PaperHybridApr-2021HuggingFace.
134.TimeSformer PaperSpace-Time Attention All You Need for Video Understanding?HybridApr-2021Facebook
135.Trajectory Transformer PaperOffline Reinforcement Learning as One Big Sequence Modeling ProblemHybridApr-2021the University of California at Berkeley
136.UniSpeech PaperUnified Speech Representation Learning with Labeled and Unlabeled DataHybridApr-2021Microsoft Research
137.UniSpeechSat PaperUNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAININGHybridApr-2021Microsoft Research
138.ALIGN PaperScaling Up Visual and Vision-Language. Representation Learning With Noisy Text SupervisionVision TransformerMay-2021Google Research
139.BORT PaperOptimal Subarchitecture Extraction For BERTSequence-to-SequenceMay-2021Alexa
140.DePlot PaperOne-shot visual language reasoning by plot-to-table translationVision TransformerMay-2021Google AI
141.DETA PaperNMS Strikes BackSequence-to-SequenceMay-2021The University of Texas at Austin
142.DiNAT PaperDilated Neighborhood Attention TransformerVision TransformerMay-2021SHI Labs
143.Jukebox PaperA Generative Model for MusicAutoencodingMay-2021OpenAI
144.mBART-50 PaperMultilingual Translation with Extensible Multilingual Pretraining and FinetuningAutoregressiveMay-2021Facebook
145.Nyströmformer PaperA Nyström-Based Algorithm for Approximating Self-AttentionAutoencodingMay-2021the University of Wisconsin - Madison
146.ViT Hybrid PaperTransformers for Image Recognition at ScaleHybridMay-2021Google AI
147.X-MOD PaperLifting the Curse of Multilinguality by Pre-training Modular TransformersHybridMay-2021Meta AI
148.BARTpho PaperPre-trained Sequence-to-Sequence Models for VietnameseAutoregressiveJun-2021VinAI Research
149.BridgeTower PaperBuilding Bridges Between Encoders in Vision-Language Representation LearningVision TransformerJun-2021Harbin Institute of Technology/Microsoft Research Asia/Intel Labs
150.CodeGen PaperA Conversational Paradigm for Program SynthesisVision TransformerJun-2021Salesforce
151.GPT-J Paperreleased in the repository kingoflolz/mesh-transformer-jaxAutoregressiveJun-2021EleutherAI
152.LLaMA PaperOpen and Efficient Foundation Language ModelsAutoencodingJun-2021The FAIR team of Meta AI
153.MarkupLM PaperPre-training of Text and Markup Language for Visually-rich Document UnderstandingAutoencodingJun-2021Microsoft Research Asia
154.PoolFormer PaperMetaFormer is Actually What You Need for VisionAutoregressiveJun-2021Sea AI Labs
155.QDQBert PaperPrinciples and Empirical EvaluationBERT-basedJun-2021NVIDIA
156.ViLT PaperVision-and-Language Transformer Without Convolution or Region SupervisionVision Transformer (ViT)Jun-2021NAVER AI Lab/Kakao Enterprise/Kakao Brain
157.BARThez Papera Skilled Pretrained French Sequence-to-Sequence ModelAutoregressiveJul-2021École polytechnique
158.Donut PaperOCR-free Document Understanding TransformerTime Series TransformerJul-2021NAVER
159.ImageGPT PaperGenerative Pretraining from PixelsAutoregressiveJul-2021OpenAI
160.OPT PaperOpen Pre-trained Transformer Language ModelsHybridJul-2021Meta AI
161.Splinter PaperFew-Shot Question Answering by Pretraining Span SelectionHybridJul-2021Tel Aviv University,
162.XGLM PaperFew-shot Learning with Multilingual Language ModelsHybridJul-2021Facebook AI
163.YOSO PaperYou Only Sample (Almost)Object DetectionJul-2021the University of Wisconsin - Madison
164.EfficientFormer PaperVision Transformers at MobileNetSpeedVision TransformerAug-2021Snap Research
165.ESM PaperESM-1b. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. ESM-1v was released with the paper Language models enable zero-shot prediction of the effects of mutations on protein function. ESM-2 and ESMFold were released with the paper Language models of protein sequences at the scale of evolution enable accurate structure predictionProtein TransformerAug-2021Meta AI
166.Mask2Former PaperMasked-attention Mask Transformer for Universal Image SegmentationAutoencodingAug-2021FAIR and UIUC
167.MGP-STR PaperMulti-Granularity Prediction for Scene Text RecognitionAutoencodingAug-2021Alibaba Research
168.NLLB PaperScaling Human-Centered Machine TranslationAutoencodingAug-2021Meta
169.T5v1.1 Paperreleased in the repository google-research/text-to-text-transfer-transformerAutoregressiveAug-2021Google AI
170.TVLT PaperTextless Vision-Language TransformerHybridAug-2021UNC Chapel Hill
171.WavLM PaperLarge-Scale Self-Supervised Pre-Training for Full Stack Speech ProcessingAutoregressiveAug-2021Microsoft Research
172.XLM-RoBERTa-XL PaperLarger-Scale Transformers for Multilingual Masked Language ModelingBERT-basedAug-2021Facebook AI,
173.Chinese-CLIP PaperContrastive Vision-Language Pretraining in ChineseVision-Language PretrainingSep-2021OFA-Sys
174.CLAP Paper[Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation]https //arxiv.org/abs/2211.06687)Vision TransformerSep-2021LAION-AI
175.Decision Transformer PaperReinforcement Learning via Sequence ModelingVision TransformerSep-2021Berkeley/Facebook/Google
176.BLIP-2 PaperBootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsVision TransformerOct-2021Salesforce
177.CANINE PaperPre-training an Efficient Tokenization-Free Encoder for Language RepresentationVision TransformerOct-2021Google Research
178.Graphormer PaperDo Transformers Really Perform Bad for Graph Representation?AutoencodingOct-2021Microsoft
179.I-BERT PaperInteger-only BERT QuantizationAutoencodingOct-2021Berkeley
180.MatCha PaperEnhancing Visual Language Pretraining with Math Reasoning and Chart DerenderingAutoencodingOct-2021Google AI
181.mLUKE PaperThe Power of Entity Representations in Multilingual Pretrained Language ModelsAutoencodingOct-2021Studio Ousia
182.MobileViT PaperLight-weight, General-purpose, and Mobile-friendly Vision TransformerAutoencodingOct-2021Apple
183.OWL-ViT PaperSimple Open-Vocabulary Object Detection with Vision TransformersVision Transformer (ViT)Oct-2021Google AI
184.SpeechT5 PaperUnified-Modal Encoder-Decoder Pre-Training for Spoken Language ProcessingAutoregressiveOct-2021Microsoft Research
185.Swin Transformer V2 PaperScaling Up Capacity and ResolutionVision Transformer (ViT)Oct-2021Microsoft
186.ViTMAE PaperMasked Autoencoders Are Scalable Vision LearnersVision Transformer (ViT)Oct-2021Meta AI
187.BLOOM PaperThe architecture of BLOOM is essentially similar to GPT3, but has been trained on 46 different languages and 13 programming languages.Vision TransformerNov-2021BigScience workshop
188.ConvNeXTV2 PaperCo-designing and Scaling ConvNets with Masked AutoencodersVision TransformerNov-2021Facebook AI
189.CPM-Ant PaperSequence-to-SequenceNov-2021OpenBMB
190.GPT-Sw3 PaperBuilding the First Large-Scale Generative Language Model for SwedishAutoregressiveNov-2021AI-Sweden
191.LongT5 PaperEfficient Text-To-Text Transformer for Long SequencesAutoregressiveNov-2021Google AI
192.OneFormer PaperOne Transformer to Rule Universal Image SegmentationAutoregressiveNov-2021SHI Labs
193.Table Transformer PaperTowards Comprehensive Table Extraction From Unstructured DocumentsHybridNov-2021Microsoft Research
194.VAN PaperVisual Attention NetworkVision Transformer (ViT)Nov-2021Tsinghua University and Nankai University
195.AltCLIP PaperAltering the Language Encoder in CLIP for Extended Language CapabilitiesVision-Language PretrainingDec-2021BAAI
196.MVP PaperMulti-task Supervised Pre-training for Natural Language GenerationAutoencodingDec-2021RUC AI Box
197.NLLB-MOE PaperScaling Human-Centered Machine TranslationAutoencodingDec-2021Meta
198.PEGASUS-X PaperInvestigating Efficiently Extending Transformers for Long Input SummarizationAutoregressiveDec-2021Google
199.Swin2SR PaperSwinV2 Transformer for Compressed Image Super-Resolution and RestorationVision Transformer (ViT)Dec-2021University of Würzburg
200.UL2 PaperUnifying Language Learning ParadigmsHybridDec-2021Google Research
201.ViTMSN PaperMasked Siamese Networks for Label-Efficient LearningVision Transformer (ViT)Dec-2021Meta AI
202.YOLOS PaperRethinking Transformer in Vision through Object DetectionObject DetectionDec-2021Huazhong University of Science & Technology
203.FLAN-T5 Paperreleased in the repository google-research/t5xAutoregressiveFeb-2022Google AI
204.GPT NeoX Japanese Paperby Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori.AutoregressiveFeb-2022ABEJA
205.LayoutLMv3 PaperPre-training for Document AI with Unified Text and Image MaskingAutoencodingMar-2022Microsoft Research Asia
206.FLAN-UL2 Paperreleased in the repository google-research/t5xAutoregressiveApr-2022Google AI
207.FLAVA PaperA Foundational Language And Vision Alignment ModelAutoencodingApr-2022Facebook AI

Authors of Above Papers
#

Sno.PaperAuthor
1.*AlexNet PaperAlex Krizhevsky, Ilya Sutskever, Geoffrey Hinton
2.*VGG16 PaperKaren Simonyan, Andrew Zisserman
3.*VGG19 PaperKaren Simonyan, Andrew Zisserman
4.*ResNet Paperby Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
5.*InceptionResNet PaperChristian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi
6.*ConvNeXt PaperGao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, Kilian Weinberger
7.*DenseNet PaperGao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger
8.*MobileNetV1 Paperby Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam.
9.*Xception PaperFrançois Chollet
10.EncoderDecoder Paperby Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
11.*MobileNetV2 Paperby Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen.
12.Data2Vec Paperby Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
13.GPT Paperby Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
14.BERT Paperby Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
15.MarianMT Paperby Jörg Tiedemann. The Marian Framework is being developed by the Microsoft Translator Team.
16.BiT Paperby Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby.
17.Transformer-XL Paperby Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
18.XLM Paperby Guillaume Lample and Alexis Conneau.
19.CTRL Paperby Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong and Richard Socher.
20.GPT-2 Paperby Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodeiand Ilya Sutskever.
21.Funnel Transformer Paperby Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
22.*EfficientNet B0 PaperMingxing Tan, Quoc V. Le
23.ALBERT Paperby Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
24.EfficientNet Paperby Mingxing Tan, Quoc V. Le.
25.MobileNetV3 PaperAndrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam
26.Nezha Paperby Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu.
27.BART Paperby Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
28.ERNIE Paperby Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
29.ErnieM Paperby Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang.
30.FlauBERT Paperby Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
31.LXMERT Paperby Hao Tan and Mohit Bansal.
32.Pegasus Paperby Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
33.XLNet Paperby Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
34.BioGpt Paperby Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu.
35.Hubert Paperby Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
36.REALM Paperby Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
37.SpeechToTextTransformer Paperby Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
38.XLM-V Paperby Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa.
39.RoBERTa Paperby Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
40.GPT Neo Paperby Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
41.CamemBERT Paperby Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
42.DialoGPT Paperby Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
43.DistilBERT Paperby Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2, RoBERTa into DistilRoBERTa, Multilingual BERT into DistilmBERT and a German version of DistilBERT.
44.LiLT Paperby Jiapeng Wang, Lianwen Jin, Kai Ding.
45.LUKE Paperby Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
46.MobileBERT Paperby Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
47.MT5 Paperby Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
48.RAG Paperby Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
49.ConvBERT Paperby Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
50.Megatron-GPT2 Paperby Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
51.PhoBERT Paperby Dat Quoc Nguyen and Anh Tuan Nguyen.
52.RoBERTa-PreLayerNorm Paperby Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli.
53.BERTweet Paperby Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
54.mBART Paperby Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
55.Megatron-BERT Paperby Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
56.SpeechToTextTransformer2 Paperby Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
57.BERT For Sequence Generation Paperby Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
58.ConvNeXT Paperby Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
59.ELECTRA Paperby Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
60.Longformer Paperby Iz Beltagy, Matthew E. Peters, Arman Cohan.
61.RegNet Paperby Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
62.SqueezeBERT Paperby Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
63.LayoutLM Paperby Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
64.MPNet Paperby Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
65.VisualBERT Paperby Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
66.Conditional DETR Paperby Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.
67.GPTBigCode Paperby Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.
68.M-CTC-T Paperby Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert.
69.Pix2Struct Paperby Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova.
70.ProphetNet Paperby Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
71.SEW Paperby Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
72.T5 Paperby Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
73.DeBERTa Paperby Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
74.Informer Paperby Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
75.LED Paperby Iz Beltagy, Matthew E. Peters, Arman Cohan.
76.SwitchTransformers Paperby William Fedus, Barret Zoph, Noam Shazeer.
77.Whisper Paperby Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.
78.XLM-ProphetNet Paperby Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
79.XLM-RoBERTa Paperby Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
80.Deformable DETR Paperby Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.
81.FNet Paperby James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
82.GPTSAN-japanese Paperby Toshiyuki Sakamoto(tanreinama).
83.SEW-D Paperby Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
84.CPM Paperby Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
85.GIT Paperby Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang.
86.LayoutXLM Paperby Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
87.DETR Paperby Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
88.GPT NeoX Paperby Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
89.RemBERT Paperby Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
90.RoCBert Paperby HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou.
91.TAPAS Paperby Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
92.UPerNet Paperby Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun.
93.Vision Transformer (ViT) Paperby Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
94.Wav2Vec2 Paperby Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
95.PLBart Paperby Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
96.DiT Paperby Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
97.DPR Paperby Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
98.GLPN Paperby Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
99.LeViT Paperby Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze.
100.NAT Paperby Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi.
101.TAPEX Paperby Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
102.VideoMAE Paperby Zhan Tong, Yibing Song, Jue Wang, Limin Wang.
103.Wav2Vec2-Conformer Paperby Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino.
104.CLIP Paperby Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
105.XLS-R Paperby Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
106.Audio Spectrogram Transformer Paperby Yuan Gong, Yu-An Chung, James Glass.
107.M2M100 Paperby Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
108.MEGA Paperby Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer.
109.BEiT Paperby Hangbo Bao, Li Dong, Furu Wei.
110.BigBird-Pegasus Paperby Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
111.BigBird-RoBERTa Paperby Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
112.CLIPSeg Paperby Timo Lüddecke and Alexander Ecker.
113.DPT Paperby René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
114.Perceiver IO Paperby Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
115.Reformer Paperby Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
116.RoFormer Paperby Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
117.Swin Transformer Paperby Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
118.TrOCR Paperby Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
119.Wav2Vec2Phoneme Paperby Qiantong Xu, Alexei Baevski, Michael Auli.
120.X-CLIP Paperby Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling.
121.XLSR-Wav2Vec2 Paperby Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
122.Blenderbot Paperby Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
123.BlenderbotSmall Paperby Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
124.BLIP Paperby Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi.
125.ByT5 Paperby Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
126.CvT Paperby Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
127.DeBERTa-v2 Paperby Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
128.DeiT Paperby Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
129.GroupViT Paperby Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
130.LayoutLMv2 Paperby Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
131.MaskFormer Paperby Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
132.SegFormer Paperby Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
133.Time Series Transformer Paper
134.TimeSformer Paperby Gedas Bertasius, Heng Wang, Lorenzo Torresani.
135.Trajectory Transformer Paperby Michael Janner, Qiyang Li, Sergey Levine
136.UniSpeech Paperby Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
137.UniSpeechSat Paperby Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
138.ALIGN Paperby Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.
139.BORT Paperby Adrian de Wynter and Daniel J. Perry.
140.DePlot Paperby Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun.
141.DETA Paperby Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl.
142.DiNAT Paperby Ali Hassani and Humphrey Shi.
143.Jukebox Paperby Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever.
144.mBART-50 Paperby Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
145.Nyströmformer Paperby Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
146.ViT Hybrid Paperby Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
147.X-MOD Paperby Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe.
148.BARTpho Paperby Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
149.BridgeTower Paperby Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan.
150.CodeGen Paperby Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
151.GPT-J Paperby Ben Wang and Aran Komatsuzaki.
152.LLaMA Paperby Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample.
153.MarkupLM Paperby Junlong Li, Yiheng Xu, Lei Cui, Furu Wei.
154.PoolFormer Paperby Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
155.QDQBert Paperby Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
156.ViLT Paperby Wonjae Kim, Bokyung Son, Ildoo Kim.
157.BARThez Paperby Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
158.Donut Paperby Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
159.ImageGPT Paperby Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
160.OPT Paperby Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
161.Splinter Paperby Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
162.XGLM Paperby Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O’Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
163.YOSO Paper
164.EfficientFormer Paperby Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren.
165.ESM Paperby Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.
166.Mask2Former Paperby Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar.
167.MGP-STR Paperby Peng Wang, Cheng Da, and Cong Yao.
168.NLLB Paperby the NLLB team.
169.T5v1.1 Paperby Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
170.TVLT Paperby Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal.
171.WavLM Paperby Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
172.XLM-RoBERTa-XL Paperby Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
173.Chinese-CLIP Paperby An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou.
174.CLAP Paper
175.Decision Transformer Paperby Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
176.BLIP-2 Paperby Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi.
177.CANINE Paperby Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
178.Graphormer Paperby Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu.
179.I-BERT Paperby Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
180.MatCha Paperby Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos.
181.mLUKE Paperby Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
182.MobileViT Paperby Sachin Mehta and Mohammad Rastegari.
183.OWL-ViT Paperby Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
184.SpeechT5 Paperby Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.
185.Swin Transformer V2 Paperby Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.
186.ViTMAE Paperby Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
187.BLOOM Paper
188.ConvNeXTV2 Paperby Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
189.CPM-Ant Paper
190.GPT-Sw3 Paperby Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Öhman, Fredrik Carlsson, Magnus Sahlgren.
191.LongT5 Paperby Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
192.OneFormer Paperby Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi.
193.Table Transformer Paperby Brandon Smock, Rohith Pesala, Robin Abraham.
194.VAN Paperby Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
195.AltCLIP Paperby Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell.
196.MVP Paperby Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen.
197.NLLB-MOE Paperby the NLLB team.
198.PEGASUS-X Paperby Jason Phang, Yao Zhao, and Peter J. Liu.
199.Swin2SR Paperby Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte.
200.UL2 Paperby Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
201.ViTMSN Paperby Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas.
202.YOLOS Paperby Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.
203.FLAN-T5 Paperby Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
204.GPT NeoX Japanese Paper
205.LayoutLMv3 Paperby Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
206.FLAN-UL2 Paperby Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
207.FLAVA Paperby Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.

Conclusion
#

I hope this article gave you an idea about Transformer architecture, their variants, their types, their birth chronology and the creators. As we have seen, the Transformer architecture has been a game-changer in natural language processing and computer vision tasks. It has been instrumental in enabling breakthroughs in machine translation, language understanding, and image classification, among other fields.

There are many types of Transformers, such as autoregressive models like GPT, autoencoding models like BERT and its variants, and hybrid models that combine the strengths of both. Additionally, there are many variants of the Transformer architecture, such as XLNet, RoBERTa, and T5, each with their unique contributions and improvements.

The Transformer’s birth chronology spans just a few years, from the original paper in 2017 to the latest models that are being developed today. Its creators include some of the most prominent names in the field of AI, such as Google, Facebook, and OpenAI.

As AI technology continues to evolve, we can expect more exciting developments in the field of Transformers, with even more powerful and sophisticated models that can tackle even more complex tasks. The Transformer architecture has shown us that there is still much to explore in the world of deep learning, and we can’t wait to see what the future holds.

Author
Dr Hari Thapliyaal
dasarpai.com
linkedin.com/in/harithapliyal

Dr. Hari Thapliyaal's avatar

Dr. Hari Thapliyaal

Dr. Hari Thapliyal is a seasoned professional and prolific blogger with a multifaceted background that spans the realms of Data Science, Project Management, and Advait-Vedanta Philosophy. Holding a Doctorate in AI/NLP from SSBM (Geneva, Switzerland), Hari has earned Master's degrees in Computers, Business Management, Data Science, and Economics, reflecting his dedication to continuous learning and a diverse skill set. With over three decades of experience in management and leadership, Hari has proven expertise in training, consulting, and coaching within the technology sector. His extensive 16+ years in all phases of software product development are complemented by a decade-long focus on course design, training, coaching, and consulting in Project Management. In the dynamic field of Data Science, Hari stands out with more than three years of hands-on experience in software development, training course development, training, and mentoring professionals. His areas of specialization include Data Science, AI, Computer Vision, NLP, complex machine learning algorithms, statistical modeling, pattern identification, and extraction of valuable insights. Hari's professional journey showcases his diverse experience in planning and executing multiple types of projects. He excels in driving stakeholders to identify and resolve business problems, consistently delivering excellent results. Beyond the professional sphere, Hari finds solace in long meditation, often seeking secluded places or immersing himself in the embrace of nature.

Comments:

Share with :

Related

What is a Digital Twin?
·805 words·4 mins· loading
Industry Applications Technology Trends & Future Computer Vision (CV) Digital Twin Internet of Things (IoT) Manufacturing Technology Artificial Intelligence (AI) Graphics
What is a digital twin? # A digital twin is a virtual representation of a real-world entity or …
Frequencies in Time and Space: Understanding Nyquist Theorem & its Applications
·4103 words·20 mins· loading
Data Analysis & Visualization Computer Vision (CV) Mathematics Signal Processing Space Exploration Statistics
Applications of Nyquists theorem # Can the Nyquist-Shannon sampling theorem applies to light …
The Real Story of Nyquist, Shannon, and the Science of Sampling
·1146 words·6 mins· loading
Technology Trends & Future Interdisciplinary Topics Signal Processing Remove Statistics Technology Concepts
The Story of Nyquist, Shannon, and the Science of Sampling # In the early days of the 20th century, …
BitNet b1.58-2B4T: Revolutionary Binary Neural Network for Efficient AI
·2637 words·13 mins· loading
AI/ML Models Artificial Intelligence (AI) AI Hardware & Infrastructure Neural Network Architectures AI Model Optimization Language Models (LLMs) Business Concepts Data Privacy Remove
Archive Paper Link BitNet b1.58-2B4T: The Future of Efficient AI Processing # A History of 1 bit …
Ollama Setup and Running Models
·1753 words·9 mins· loading
AI and NLP Ollama Models Ollama Large Language Models Local Models Cost Effective AI Models
Ollama: Running Large Language Models Locally # The landscape of Artificial Intelligence (AI) and …