I’ve been playing with using GPT-3 to control a browser the last couple days. Here’s a quick demo. As you can see it's pretty neat! But also quite flakey. Will publish the source code shortly for others to try and improve.

Sep 29, 2022 · 11:38 PM UTC

This was inspired by Sharif's hack from a while back: nitter.net/sharifshameem/st… And of course by the impressive work Adept is doing. But this is just a toy, not very durable (yet?).
I gave GPT-3 access to Chrome with the objective "please buy me Airpods". Pretty interesting if you ask me 🤔
The hardest part was using the chrome debugger protocol and playwright to suck the DOM out of Chrome and turn it into something GPT-3 can read in a single prompt. A very smart person I found on Upwork helped me a ton with that (Thanks Alex!)
Haven't played with the prompt much yet, so there's likely a ton of room to improve this.
Replying to @natfriedman
even i have been playing around with GPT3 to do something similar , just with a bit of screenshot-ing instead , I think its possible to get a full fledged RPA bot like this probably need to stack a couple more AI systems over it
Replying to @natfriedman
I prototyped something similar! Another approach is machine vision to identify the UI elements and text, to void the dom/verbosity issues. I also did the html simplification, but loses much useful info such as class and other queues the LLM could use.
Replying to @natfriedman
Super cool to play around with! Took around 5s on walmart to get blocked as a bot haha, I wonder what clever ways one could go about making the actions seem more human... time to fork!
One thought: simulating / replicating the jagged and stochastic patterns that cursor movements create in heat maps even if very fast when navigating across many sections of a site could prove useful as a masking mechanism for preventing API throttling …