use original v-diffusion cfg_sample.py, plms sampling
bad on text caption generation, image prompt combine text caption
but good on image prompt (usage similar to GAN latent inverse)
rather than characters/contents, its trend to improve quality
still has strange bias or lots things undertraining
this is all my generated samples (1st col is original and resized to 256px for fit figure)
I only choice the image prompt. bs=4, all runs generated no cherry pick
https://ibb.co/SQr2x7f
https://ibb.co/cyTbvHj
https://ibb.co/FJmC6rj
https://ibb.co/YXDtpDv
https://ibb.co/XWqFmn6
mostly used wild artist anime drawings from others request (half body and image w&h > 700px recommended)
sample used original drawings author information I lost , ask me if you're curious original
I thinks is a quality augment model
TADNE, waifulabsV2 ,crypko premium, dall-e 2, halcy's tpuddim , cogview 1, anigan, ru-dalle, even vqganclip as image prompt can got pretty good result
the TADNE chan 2nd-generation :
https://preview.redd.it/72djqpnv4wx81.png?width=256&format=png&auto=webp&s=07d6a6b34691b2235480cd66045ca5154644ea27
v-diffusion github
cc12m is 256px model
https://github.com/crowsonkb/v-diffusion-pytorch
finetuned weight (2.24GB) host URL:
notice this is a halfway checkpoint, finetuning using danbooru sfw subset and no text used
http://batbot.ai/models/v-diffusion/cc12m-danbooru-adam-lr5-1645.pt
about runs, all using ":5", except https://ibb.co/SQr2x7f 1st and 2nd row is ":10"
!python cfg_sample.py --images "filename.jpg:5" -n 4 -bs 4 --checkpoint cc12m-danbooru-adam-lr5-1645.pt
next release I'll try --size and CLIP guided
throw
TypeError: Value 'Danbooru2020_01xx;
512x512 scale+crop; dequantized' with dtype <U50 is not a valid JAX array type.
Only arrays of numeric types are supported by JAX.
change code on cfg_sample.py#L86
param = torch.load(checkpoint, map_location='cpu' ); param.pop('_info')
model.load_state_dict( param )
Want to add to the discussion?
Post a comment!