Text-to-Video: The Task, Challenges and the Current State

Alara Dirik's avatar

video-samples
Video samples generated with ModelScope.

Text-to-video is next in line in the long list of incredible advances in generative models. As self-descriptive as it is, text-to-video is a fairly new computer vision task that involves generating a sequence of images from text descriptions that are both temporally and spatially consistent. While this task might seem

 

 

 

To finish reading, please visit source site