×
all 73 comments

[–]marcusklaas 216 points217 points  (9 children)

$70k worth of credits for a joke subreddit, I love it. Thanks to all involved for making it happen!

[–]moldy912 50 points51 points  (7 children)

Wait who paid for that?

[–]NewFolgers 92 points93 points  (6 children)

Google - They granted access to their TPU's as part of a research grant kind of thing.

[–]Warhawk2052 36 points37 points  (0 children)

Google is great for these things

[–]nmkd 8 points9 points  (4 children)

Wait, did Google directly "donate" this to the GPT2 Subreddit Sim? Or did it go to OpenAI?

[–]gwern 30 points31 points  (3 children)

TFRC gave the research credits to me for work on GPT-2-poetry & TPU swarm training, and me & Shawn Presser (who has access to my GCP account) did the training on our own. Hopefully TFRC won't be too annoyed that we happen to be benchmarking our TPU swarm code using various datasets like Reddit comments... (They seemed amused by our GPT-2-chess so I'm sure they'll be cool with SubSim.)

[–]H4xolotl 2 points3 points  (2 children)

Could you train a bot on /r/PathOfExile?

The comments in the sub have a strong identity & theme so seeing it simulated will be amazing!

[–]gwern 4 points5 points  (0 children)

You'll have to ask disumbrationist about that. We just trained the model on his dataset, we didn't decide on what subreddits he wanted to create bots for or run.

[–]sneakpeekbot 0 points1 point  (0 children)

Here's a sneak peek of /r/pathofexile using the top posts of the year!

#1: Announcing Path of Exile 2 | 2678 comments
#2: Thank You.
#3: An Update from Chris


I'm a bot, beep boop | Downvote to remove | Contact me | Info | Opt-out

[–]NowanIlfideme 15 points16 points  (0 children)

Exactly. Thanks for all those involved, thanks OpenAI, and thank you, the viewer, for the enhanced enjoyment from the Meta subreddit!

[–]tutetibiimperes 89 points90 points  (1 child)

Wow, I had no idea training the bots was so computationally intensive.

[–]StickiStickman 36 points37 points  (0 children)

Most people agree that the 1.5B model is totally overkill, as it has almost no distinction from from the one half it's size. So it's not that bad really.

[–]xlicer 142 points143 points  (3 children)

kinda disappointed that /r/CrusaderKings didn't make the cut. The original subreddit sim /u/CrusaderKings_SS is fucking hilarious, maybe I'm biased since ck2 is in my top 5 most played games but still

Also quite exciting to see what /r/conlangs and /r/etymology can produce

Also, damn /r/subsimulatorgpt2 and /r/subsimulatorgpt2meta we are going to get quite some levels of meta

[–]Bill_Ender_Belichick 68 points69 points  (0 children)

I’m so hyped for the GPT2 bots, I’m gonna get whooshed to high heaven I can feel it.

[–]Cptbullettime 0 points1 point  (1 child)

I feel ya, I wouldve loved a r/40korkscience

[–]ForAHamburgerToday 1 point2 points  (0 children)

Oh my glob I need that bot.

[–]Bigluser 44 points45 points  (3 children)

So, how do the bots take a subreddit identity if you no longer finetune separate models on each sub?

[–]disumbrationist[S,M] 72 points73 points  (2 children)

The metadata in the training set includes a subreddit identifier (i.e. just a unique integer representing each subreddit) before each submission or comment, so that the model could learn to distinguish the different subreddits from each other during training. Then when I want to generate a submission or comment for a specific subreddit, I can simply prompt the model using its corresponding subreddit identifier.

[–]Bigluser 3 points4 points  (0 children)

Thanks for sharing, that's pretty interesting. What other metadata does the training set include? Are there any example files one could look at?

[–]seventeenth-account 41 points42 points  (2 children)

r/capitalismvsocialism, r/fiftyfifty, r/moviedetails, r/neoliberal, r/riddles, and the GPT2 bots are 120% going to be great additions.

[–]captain_zavec 21 points22 points  (1 child)

I'm excited to see what the riddles and word avalanches it comes up with are.

[–]nokiacrusher 19 points20 points  (0 children)

[50/50] A cute puppy eating a huge necrotic chunk of my leg | Aftermath of a penguin

[–]BlackFang4 37 points38 points  (2 children)

r/chess getting a bot? Chess players represent!

[–]Amargosamountain 24 points25 points  (1 child)

2. Ke2

[–]BlackFang4 19 points20 points  (0 children)

r/anarchychess is gonna be all over this

[–]Hot-Error 91 points92 points  (5 children)

r/neoliberal

Yesssss can't wait to watch bots shillpilling each other

[–]Cowguypig 28 points29 points  (2 children)

Yes daddy Soros

[–]j4ck2063 5 points6 points  (1 child)

Just got my cheque from Soros in the mail the other day!

[–]Cyphertronica 8 points9 points  (0 children)

you donteven know. Its already going on in what you think are real interactions. They dont regulate real identity online.

[–]JanetYellensFuckboy 0 points1 point  (0 children)

THE DEEP STATE ALWAYS WINS.

[–]mengibus 33 points34 points  (0 children)

Thank you for putting the time and effort into this. It's one of the most interesting things I have found in recent time and it never stops amazing me how accurate it can be some times.

Thanks again for all the hard work!

[–]Yuli-Ban 11 points12 points  (3 children)

Fantastic work, and thank you /u/Gwern for helping with this. I can't wait to see what this stronger version is like.

I do hope that, at some point within the near future, we get an interactive version, but I can only imagine the headache this might cause just to create.

In terms of bot additions, I'm only bummed that a neurodivergent sub wasn't added though I suppose that's a bit of a hot potato; I'd personally be fascinated to see how a transformer handles submissions from /r/Schizophrenia or /r/Depression.

[–]gwern 9 points10 points  (2 children)

I do hope that, at some point within the near future, we get an interactive version, but I can only imagine the headache this might cause just to create.

Yes... You saw how it went with AI Dungeon 2. A few hundred downloads of our GPT-2-chess model is no big deal, but when you start talking tens of thousands, that quickly becomes a problem. (My own server bandwidth is generous but I also need it for other things like Danbooru2019.)

[–]Yuli-Ban 3 points4 points  (1 child)

That's what I mean. The compute is something that only a big corporation like Google could handle, but from what I've been told, interactive chatbots are more the domain of Microsoft.

There is a fleeting chance that Reddit itself may fund such an endeavor in the future, but I wouldn't bet on it anytime soon unfortunately. I can see many protests about it being too easy to exploit.

[–]gwern 2 points3 points  (0 children)

The compute isn't too bad. But you do need some sort of revenue source if you want to scale to 10k+ users in an interactive way. ThisWaifuDoesNotExist works fine with millions of users hitting it (as in fact happened when it went viral in China), because it's completely noninteractive and I did all the GPU compute locally in batches in advance. It would be impossible for me to have done that with an interactive TWDNE, and Waifu Labs shows what a challenge it is even when you have good revenue sources like selling prints/pillows.

[–]queens-gambit 11 points12 points  (0 children)

This is awesome. Thanks for all this

[–]paulisaac 9 points10 points  (0 children)

Aww I was hoping to see some plurality or tulpa subs just to see if the bot can emulate multiple personalities in one post. More likely it would have led to anxiety over unclosed brackets though.

[–][deleted] 8 points9 points  (4 children)

Can you use the old, smaller model for the subreddits that you listed as problematic?

[–]disumbrationist[S,M] 9 points10 points  (3 children)

Yeah, that's an option as well. But I think that would be a last resort, since I'd prefer to consistently use 1.5B models for all of them.

[–]SmarkieMark 30 points31 points  (0 children)

I'll be very sad if I stop seeing comments like these :

Cummy 😱 I 👁 always knew 👓 you 👆were a 💦 freak 💩

[–]StickiStickman 4 points5 points  (1 child)

Wouldn't you have to retrain EVERYTHING when adding a new bot now making it basically impossible? I'm not sure that's worth it

[–]Konstantine890 14 points15 points  (0 children)

Aww man, I really would have liked to see r/CrusaderKings. The random and crazy content it could generate is amazing.

[–]Derice 6 points7 points  (0 children)

You could keep the 345M bot version for the weird subreddits. Since they are already weird a little bit less coherency may not be much of a problem for e.g. /r/fifthworldproblems.

[–]moldy912 5 points6 points  (2 children)

When do the new models start?

[–]disumbrationist[S,M] 10 points11 points  (1 child)

The first post generated using the 1.5B model is this one. Everything after that is also using the new model.

[–]ethium0x 0 points1 point  (0 children)

Holy shit this is actually pretty coherent, not indistinguishable from a human but much better than the old model

[–]Barrel_Trollz 4 points5 points  (0 children)

RIP ck2bot

[–]DDonde 4 points5 points  (0 children)

I'm looking forward to seeing the /r/subsimulatorgpt2meta bot

[–]Darkhan17 5 points6 points  (0 children)

Great job!

[–]MrNoobomnenie 2 points3 points  (0 children)

Thank you for your great work! It sad though, that we will not see r/CrusaderKings bot any time soon. Still hope, that it will eventually appear. Maybe, in the next year (right after the 3rd game will come out).

[–]PilifXD 1 point2 points  (0 children)

Really wanted to see a r/shittysuperpowers or r/ayymd bot, hope they get added in the future/some old ones get replaced. Hyped to see what results the upgrade 1.5B brings :] Edit:also r/arabfunny would be hilarious

[–]Om8_8mO 1 point2 points  (0 children)

The IA is reluctant to use words from r/vxjunkies like translugubriation.

It seems the IA is smarter than given credits for.

[–]LiteralHeadCannon 1 point2 points  (0 children)

I'm really glad you're still working on this. This project is probably my favorite thing on Reddit. :)

Long-term idea for a future upgrade (I have no idea when this will be technically feasible, but it's clearly on a higher level of complexity than what's already been done, so I'm not necessarily expecting it anytime soon): for some subreddit bots that revolve around linking to other fictional threads on Reddit (some examples that stand out include /u/subredditdramaGPT2 and /u/subsimgpt2metaGPT2), it'd be a lot of fun if they could actually link to other bot threads and take their contents into account. Hopefully, this wouldn't entirely replace the current system of linking an imaginary thread and imagining its contents - but it'd definitely be cool if we could see, say, the drama bot post a thread about a scuffle that actually happened between bots in another simulated thread, or to see the bot for this subreddit respond to other simulated threads knowing that they're simulated (but not that it itself is simulated).

[–]TiredOldCrow 0 points1 point  (1 child)

Any thought to releasing a dataset of fine-tuned samples? You could get in touch with OpenAI and see if they'll host them alongside the ones they released for Amazon

In any case, really excited about this model.

[–]gwern 0 points1 point  (0 children)

Couldn't you just scrape the subreddit threads if you wanted a dump?

[–]PartyPorpoise 0 points1 point  (0 children)

Damn, missed the suggestion thread by only a bit! I wanted to suggest a few. Oh well, I'm pleasantly surprised to see a /r/HobbyDrama one, that's one of my favorite subs! I can't wait to see what that produces!

[–]TotesMessenger 0 points1 point  (0 children)

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

 If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)

[–]jenbanim 0 points1 point  (0 children)

I'm absolutely loving the /r/Neoliberal GPT2 bot. Thank you!

[–]cench 0 points1 point  (0 children)

Amazing upgrade!

Not sure if this is asked before, any plans to add more comments to threads that have significant up-votes?

[–]PUBLIQclopAccountant 0 points1 point  (0 children)

In case more bots ever get added, may I suggest some mixed bots. They are from related communities that have multiple subreddits.

I hope I didn't miss any small subs when making those comprehensive lists, but you get the idea. Heck, if they can be done on the 345M edition, I'd be fine with that: some slightly stupider bots are better than no bots at all for these communities (but I did see that you'd prefer to keep their model consistent for the smoothest blend of quality).

[–]Afrotoast42 0 points1 point  (0 children)

Can we get an r/skyrimmods bot? That subreddit has gone through so many phases, shitstorms, leadership changes, weighty discussions, and general highs/lows, it would be a perfect training ground for a bot.

[–]Zekava 0 points1 point  (0 children)

I've definitely noticed that the recent threads, while extremely coherent and often hilarious, have been less sub-specific, though mostly in the mixed threads. That might be a good thing, in a way, since the mixed threads are less like the native threads of each bot, and they're picking up on how to "break character", so to speak.

[–]Floc_Trumpet 0 points1 point  (0 children)

Add the_donald, I beg you

[–]immibis 0 points1 point  (0 children)

Is there a TRP bot?

[–]ChickenNuggetSmth 0 points1 point  (1 child)

For comparison: How expensive was the training of the 345M-models?

[–]disumbrationist[S] 1 point2 points  (0 children)

The 345M training was free, since I was able to do it all using Colab.

[–]withateethuh 0 points1 point  (0 children)

Mm a word avalanche bot should be fun.

[–]PUBLIQclopAccountant 0 points1 point  (2 children)

Is there a list of bots? I want to check if there are any MLP bots.

[–]WHY_DO_I_SHOUT 2 points3 points  (1 child)

See the sidebar in old Reddit version. And no, there isn't an MLP bot.

[–]PUBLIQclopAccountant 1 point2 points  (0 children)

A lack of a pony bot is a major missed opportunity. I do like that /r/drama has a bot as well as the SSC bot.