Multi-Modal Open-Domain Dialogue

Facebook NLP Research

Abstract

Recent work in open-domain conversational agents has demonstrated that significant improvements in humanness and user preference can be achieved via massive scaling in both pre-training data and model size (Adiwardana et al., 2020; Roller et al., 2020). However, if we want to build agents with human-like abilities, we must expand beyond handling just text. A particularly important topic is the ability to see images and communicate about what is perceived. With the goal of getting humans to engage in multi-modal dialogue, we investigate combining components from state-of-the-art open-domain dialogue agents with those from state-of-the-art vision models. We study

 

 

To finish reading, please visit source site