Multimodal Neural Script Knowledge Models
merlot MERLOT is a model for learning what we are calling “neural script knowledge” — representations about what is going on in videos, spanning multiple video frames with associated captions. What’s here We are releasing the following: Code for the MERLOT model (in model/, with data processing in data/ Code for running MERLOT over visual story ordering. We plan to release: Information about the videos used in this work Code for adapting the model to other tasks (not strictly needed, […]
Read more