NFNet a là DeepDanbooru

MrSmilingWolf · 2021-12-31T18:08:54+00:00

The main post is for the most mature one, but I've got three other smaller models using different architectures trained to varying degrees of maturity that I want to release, plus ONNX and quantized TFLite versions. I'll upload these at a later date.

KichangKim · 2022-01-02T10:51:51+00:00

am I the only one using focal loss for this? I know DeepDanbooru implements it, but was it ever used? It did miracles for my recall, esp. for rarer classes. Ping /u/KichangKim

Hi. And your are right. DeepDanbooru has focal loss implementation, but not used yet. I'll try it for next training.

gwern · 2022-01-01T00:17:00+00:00

Are you clustering based on an embedding or just on clustering the predicted tags?

How many GPU-hours is 100 epochs?

Apply for Google TPUs perhaps?

TRC hands out TPUs like candy. Since you're using Tensorflow already, interfacing with TPUs shouldn't be too hard. Just be careful with egress/cross-region bandwidth.

Point is, I'd like to find out how well new archs, bigger archs, mobile archs work outside of Imagenet.

Yeah, tagging isn't remotely as well studied as categorizing. Do RegNets break on tagging or do they just need some tweaks? Who knows? Certainly not Facebook.

DL researchers generally tend to jump from 'categories' to 'semantic segmentation' or 'text captions', with not much in between. I assume it's the general absence of popular datasets like, well, Danbooru20xx which are heavily tagged. (This is why I keep telling people that they can train their CLIP-like or DALL-E-like on Danbooru20xx without a problem, just turn the tags into a string by concatenating them with spaces & commas or something. It's not like CLIP isn't already mostly learning on the bag-of-words level anyway!)

MrSmilingWolf · 2022-03-04T11:34:15+00:00

To whom it may concern: I just uploaded a few new model weights, together with data about model performance data across a few different variant/mixup/activation settings.

gwern · 2022-01-12T21:25:06+00:00

BTW, have you ever considered using transfer learning instead of training a NFNet from scratch? If you look at Paperswithcode, I see released checkpoints as high as 87% ImageNet top-1 (SwinL) or 88% (BEiT-Large). The performance usually transfers downstream to tasks like semantic segmentation, which seems close in spirit to tagging.

insufficient_qualia · 2022-02-14T20:15:15+00:00

How does it fare on this benchmark?

https://danbooru.donmai.us/posts/3343112

AnimeResearch

MODERATORS

AnimeResearch

MODERATORS

Welcome to Reddit.

Want to add to the discussion?