Nat Friedman · Sep 29, 2022 · 11:38 PM UTC

Nat Friedman · Sep 29, 2022 · 11:38 PM UTC

Nat Friedman

29 Sep 2022

I’ve been playing with using GPT-3 to control a browser the last couple days. Here’s a quick demo. As you can see it's pretty neat! But also quite flakey. Will publish the source code shortly for others to try and improve.

Sep 29, 2022 · 11:38 PM UTC

176

1,662

Nat Friedman · Sep 29, 2022 · 11:38 PM UTC

Nat Friedman

@natfriedman

29 Sep 2022

This was inspired by Sharif's hack from a while back: nitter.net/sharifshameem/st… And of course by the impressive work Adept is doing. But this is just a toy, not very durable (yet?).

Sharif Shameem

@sharifshameem

17 Jun 2021

I gave GPT-3 access to Chrome with the objective "please buy me Airpods". Pretty interesting if you ask me 🤔

Nat Friedman · Sep 29, 2022 · 11:40 PM UTC

Nat Friedman

@natfriedman

29 Sep 2022

The hardest part was using the chrome debugger protocol and playwright to suck the DOM out of Chrome and turn it into something GPT-3 can read in a single prompt. A very smart person I found on Upwork helped me a ton with that (Thanks Alex!)

Nat Friedman · Sep 29, 2022 · 11:41 PM UTC

Nat Friedman

@natfriedman

29 Sep 2022

Haven't played with the prompt much yet, so there's likely a ton of room to improve this.

Nat Friedman · Sep 29, 2022 · 11:47 PM UTC

Nat Friedman

@natfriedman

29 Sep 2022

Here's the code, have at it: github.com/nat/natbot

GitHub - nat/natbot: Drive a browser with GPT-3

Drive a browser with GPT-3. Contribute to nat/natbot development by creating an account on GitHub.

github.com

222

aNoobonaJourney · Sep 30, 2022 · 4:03 AM UTC

aNoobonaJourney @aNoobonaJourney

30 Sep 2022

Replying to @natfriedman

even i have been playing around with GPT3 to do something similar , just with a bit of screenshot-ing instead , I think its possible to get a full fledged RPA bot like this probably need to stack a couple more AI systems over it

Nat Friedman · Sep 30, 2022 · 4:55 AM UTC

Nat Friedman

@natfriedman

30 Sep 2022

Share your code!

more replies

Joe Heitzeberg · Sep 30, 2022 · 2:34 AM UTC

Joe Heitzeberg

@jheitzeb

30 Sep 2022

Replying to @natfriedman

I prototyped something similar! Another approach is machine vision to identify the UI elements and text, to void the dom/verbosity issues. I also did the html simplification, but loses much useful info such as class and other queues the LLM could use.

Nat Friedman · Sep 30, 2022 · 4:55 AM UTC

Nat Friedman

@natfriedman

30 Sep 2022

Share the code!

more replies

Calum Bird · Sep 30, 2022 · 12:38 AM UTC

Calum Bird @calumbirdo

30 Sep 2022

Replying to @natfriedman

Super cool to play around with! Took around 5s on walmart to get blocked as a bot haha, I wonder what clever ways one could go about making the actions seem more human... time to fork!

JJ — e/acc · Sep 30, 2022 · 3:51 AM UTC

JJ — e/acc

@JosephJacks_

30 Sep 2022

One thought: simulating / replicating the jagged and stochastic patterns that cursor movements create in heat maps even if very fast when navigating across many sections of a site could prove useful as a masking mechanism for preventing API throttling …

more replies