Skip to content

GPT-Neo

GPT-Neo is the code name for a series of transformer-based language models loosely styled around the GPT architecture that we plan to train and open source. Our primary goal is to replicate a GPT-3 sized model and open source it to the public, for free.

Along the way we will be running experiments with alternative architectures and attention types, releasing any intermediate models, and writing up any findings on our blog.

We have a codebase built in Tensorflow-mesh (for training on TPUs), and one built with Deepspeed (for training on GPUs). Both can scale to GPT-3+ sizes, but we currently lack the TPUs to train a 175B model to completion. Thankfully, we don't lack GPUs.

GPT-Neo is now fairly stable, and we will be releasing smaller scale models shortly. GPT-Neox is still a work in progress, and we will be releasing more updates as the project moves forward.

Features

  • GPT-3+ 사이즈로 스케일 가능한 두가지 구현체를 개발중
    • GPT-Neo : Tensorflow-mesh(TPU) 기반 코드
    • GPT-Neox : DeepSpeed(GPU) 기반 코드
  • 현재 GPT-2 규모는 학습완료하고 모델 평가 검토
  • 단일 단계 학습으로 2천억개 파라미터 까지 테스트

See also

Favorite site