Skip to content

Imagen Video

구글, 텍스트-to-비디오 AI

Features

Video Diffusion Model로 텍스트를 입력받아 동영상을 만드는 "Text-conditional Video Generation System"
텍스트에서 저해상도 비디오(24x48 픽셀, 16프레임, 3fps) 영상을 만들고, 디퓨전 모델을 7개 중첩(cascade)시켜서 업스케일 하는 것이 특징
최종 출력은 1280x768 24fps . 5.3초 길이의 비디오를 생성 가능
논문 : Imagen Video : High Definition Video Generation with Diffusion Models

See also

Deep learning

Favorite site

Imagen Video web site