I gave GPT-3 access to Chrome with the objective "please buy me Airpods". Pretty interesting if you ask me 🤔

Jun 17, 2021 · 9:49 AM UTC

It successfully made it to the product page, but got sidetracked with Walmart's privacy policy.
Since even a simplified DOM is far too large for a single prompt, multiple prompts are given different chunks of the DOM, each generating their own "interaction". Another prompt then takes all the proposed interactions and selects the best one, sort of like a tournament bracket.
For more complex web pages, the time it takes to generate an action scales at O log(n) with the size of the DOM – really fast! It also gets around token limits, so you could technically process an infinitely large DOM!
Replying to @sharifshameem
This is amazing actually! Super curious about the implementation 🤔
Replying to @wichmaennchen
Each prompt contains a simplified version of the DOM, the objective, and possible actions (like click on an element).
Replying to @sharifshameem
Are you using their semantic search endpoint? I.e. searching over DOM elements to find the best matching one?
No, just the normal endpoint
Replying to @sharifshameem
This was without finetuning?
No fine tuning, just few shot prompting