“Internet-Augmented Dialogue Generation”, Mojtaba Komeili, Kurt Shuster, Jason Weston2021-07-15 (, )⁠:

[cf. WebGPT, LaMDA] The largest store of continually updating knowledge on our planet can be accessed via Internet search. In this work we study giving access to this information to conversational agents. Large language models, even though they store an impressive amount of knowledge within their weights, are known to hallucinate facts when generating dialogue (Shuster et al 2021); moreover, those facts are frozen in time at the point of model training.

In contrast, we propose an approach that learns to generate an internet search query based on the context, and then conditions on the search results to finally generate a response, a method that can employ up-to-the-minute relevant information.

We train and evaluate such models on a newly collected dataset of human-human conversations whereby one of the speakers is given access to internet search during knowledge-driven discussions in order to ground their responses.

We find that search-query based access of the internet in conversation provides superior performance compared to existing approaches that either use no augmentation or FAISS-based retrieval (Lewis et al 2020).

3.2 Search Engine-Augmented Generation (SEA): The previously described FAISS-based approaches can take advantage of many existing methods developed for QA and dialogue tasks, as we saw, but have several disadvantages. First, they may be difficult to update to real-time web documents; second, there may be a limit to the number of documents storable in local FAISS deployments; and third, such methods will not take advantage of the high quality ranking that has been finely tuned in Internet Search engines over decades of use. We thus consider using Internet search engines directly. Method Our proposed method consists of two components:

• A search query generator: an encoder-decoder Transformer that takes in the dialogue context as input, and generates a search query. This is given to the black-box search engine API, and n documents are returned. • A FiD-style [Izacard & Grave2020] encoder-decoder model that encodes each document individually, concatenates them to the dialogue context encoding, and then finally generates the next response.

We can train each of these modules separately if we have supervised data available for both tasks, the first module requiring (context, search query) pairs, and the second module requiring (context, response) pairs. As we will see, the data we collect in this work (detailed in §4) fulfills both of these requirements.