×
all 114 comments

[–]emad_9608Emad Mostaque 76 points77 points  (15 children)

PixArt Sigma is a really nice model, especially given the dataset. I maintain 12m images is all you need.

[–]CrasHthe2nd[S] 47 points48 points  (11 children)

What they've achieved on such a small training budget is incredible. If the community picks up the reigns and starts fine tuning this, it's going to blow away any competition. Perfect timing with SD3 looking more and more disappointing from the recent previews.

[–]CrasHthe2nd[S] 72 points73 points  (7 children)

I didn't realise I was replying to emad 😂 I meant no disrespect. Just that the recent video showing the SD3 generations from the discord don't seem to live up to the initial images that were shared on Twitter.

[–]FullOf_Bad_Ideas 37 points38 points  (0 children)

Bro 💀💀💀I'm dying from laughing over here.

[–]emad_9608Emad Mostaque 36 points37 points  (2 children)

S'ok, when I left it was a really good series of models (LADD is super fast & edit is really good!). They promised to release it so lets see, but sometimes models get worse, like cosine sdxl would have been a better model to release than SDXL, glad it got out there eventually

I think SD3 will get redone eventually with a highly optimised dataset and everyone will use that tbh

[–]RadioheadTrader 7 points8 points  (0 children)

Models get better when the community adopts them and is excited to "work" on them. All this delaying and silence by SAI, after a strong announcement w the paper, is killing momentum. If there's questions about whether or not it's right or they can make it better they should just put out a .9 / beta version and go to a faster / unannounced update timeline.

They don't have their hypeman anymore (you!), So best they keep the fire from burning too dim.

Release the SD3!

[–]More_Bid_2197 1 point2 points  (0 children)

How many times is the SD3 dataset larger than SDXL ?

[–]Remarkable_Emu5822 5 points6 points  (0 children)

Hahaha lmao 🤣

[–]PwanaZana 3 points4 points  (1 child)

"You have startled the witch!"

F in chat for my man. :)

[–]CrasHthe2nd[S] 2 points3 points  (0 children)

😂 curb your enthusiasm theme plays

[–]Hoodfu 22 points23 points  (1 child)

This isn't better than SD3 based on the preview video that just came out, but it's extremely good. It remains to be seen what SD3 is like concerning censorship, but so far this pixart model is uncensored. That said, the prompt following is fantastic. prompt: National Geographic style, A giraffe wearing a pink trenchcoat with her hands in her pockets and a heavy gold necklace in a grocery store. She's surveying the vegetable section with a special interest in the red bell peppers. In the distance, a suspicious man wearing a white tank top and a green apron folds his arms.

<image>

[–]Jellybit 18 points19 points  (0 children)

He ain't folding his arms? TRASH. /s

Seriously, that prompt following is beyond impressive.

[–]somethingclassy 2 points3 points  (0 children)

What was the budget? Where can I read about their training process?

[–]UseHugeCondom 5 points6 points  (1 child)

I maintain 12m images is all you need

For now! If there’s one thing I’ve learned, it’s never to be an absolutist with computer science. Look at the 5MB hard drives the size of refrigerators back in the day and how IBM said that was all we’d ever need. Or two years ago when dalle 2 required a whole server farm of A100’s to run, and now the same quality images can be genned on a local system with 4Gb of ram and a 7 year old graphics card

[–]Hoodfu 9 points10 points  (0 children)

This pixart model is 3 gigs of vram. Yeah. The most amazing thing to hit us in the last year is 3 gigs. The language model is 20 gigs though. It just shows that it's actually less about the training images and more about what the language model can do with it. 

[–]Flimsy_Dingo_7810 2 points3 points  (0 children)

Hi, any idea how to make pixart work with controlnet and ipadapters?

[–]Overall-Newspaper-21 24 points25 points  (31 children)

Any tutorial - How use Pixart Sigma with confyui ?

[–]CrasHthe2nd[S] 11 points12 points  (30 children)

I'll see if I can post a workflow when I get home.

[–]CrasHthe2nd[S] 40 points41 points  (29 children)

[–]Wraithnaut 7 points8 points  (3 children)

In ComfyUI the T5 Loader wants config.json and the model.safetensors.index.json in the same folder as the two part T5 text_encoder model files.

OSError: /mnt/sdb3/ComfyUI-2024-04/models/t5/pixart does not appear to have a file named config.json

With just config.json in place this error goes away and you can load a model with path_type file but because this is a two part model, you get unusable results. Setting path_type to folder gets this message:

OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /mnt/sdb3/ComfyUI-2024-04/models/t5/pixart.

However, with the model.safetensors.index.json also in place, then you can use the path_type folder option and the T5 encoder will use both parts as intended.

[–]-becausereasons- 0 points1 point  (2 children)

Hmm I get this error "pip install accelerate" and now "Error occurred when executing T5v11Loader:

T5Tokenizer requires the SentencePiece library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation."

How do I actually install this stuff???

[–]Wraithnaut 0 points1 point  (0 children)

If an error mentions pip install followed by a package name, that means it is missing and that you can use that command to install it.

However, if you're not console savvy, you're probably looking at downloading the latest comfyui portable and checking whether it came with the accelerate package.

[–]Wraithnaut 0 points1 point  (0 children)

Didn't see your edit, but because you are asking about pip, I presume you didn't use the manual install instructions for ComfyUI and instead downloaded the ComfyUI Portable version?

The portable version uses venv, which is a separate install of python. The file path will depend on where you unzipped ComfyUI Portable.

Enter the command which python to check which python environment is active. Odds are it will say /usr/bin/python or something similar, which is the address of the system python if you have it installed. Use the source path activate command described in ComfyUI's documentation to switch to the portable python, and then use which python again to check. Once you have verified you have the right python active, use that command, pip install accelerate , and you should be good to go. Or you will get another missing package message and need to pip install that. Repeat until it stops complaining about missing packages.

[–]ozzie123 2 points3 points  (1 child)

You are awesome. Take my poorman’s gold 🏅

[–]CrasHthe2nd[S] 0 points1 point  (0 children)

Thanks :)

[–]a_mimsy_borogove 2 points3 points  (14 children)

I'm kind of new, and I need help :(

I downloaded those models, and loaded your comfy workflow file, but comfy says it's missing those nodes:

  • T5v11Loader
  • PixArtCheckpointLoader
  • PixArtResolutionSelect
  • T5TextEncode

Where do I get them? I use comfyui that's installed together with StableSwarm and it's the newest available version.

[–]CrasHthe2nd[S] 14 points15 points  (13 children)

If you have Comfy Manager installed (and if not you really should do 😊) then you can open that and click install missing nodes. If not then it's probably these custom nodes that are missing:

https://github.com/city96/ComfyUI_ExtraModels

[–]hexinx 1 point2 points  (8 children)

Thanks for this =)
Also, hoping (someone) can help me...

"Error occurred when executing T5v11Loader:
Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`"
I updated all in comfyui + installed the custom node... manually did python -m pip install -r requirements.txt in "ComfyUI\custom_nodes\ComfyUI_ExtraModels", too....

[–]CrasHthe2nd[S] 2 points3 points  (4 children)

How much RAM and VRAM do you have?

[–]hexinx 3 points4 points  (3 children)

128GB RAM
24+48 GB VRAM

[–]CrasHthe2nd[S] 2 points3 points  (2 children)

Oh ok haha. Do you have xformers enabled? I know that's given me issues in the past.

[–]hexinx 1 point2 points  (1 child)

I'm not sure - I'm using the standalone version of Comfyui. Also, it says "PixArt: Not using xformers!"

... Could you help?

[–]z0mBy91 0 points1 point  (2 children)

Like it says in the error, install accelerate via pip. Had the same error, that fixed it.

[–]hexinx 1 point2 points  (1 child)

Thank you - I need to do this in the custom node's folder, right?
Update: thank you! It worked - I had to do: .\python_embeded\python.exe -m pip install accelerate

[–]z0mBy91 0 points1 point  (0 children)

perfect. sorry, i just now saw that you actually answered :)

[–]a_mimsy_borogove 0 points1 point  (3 children)

Thanks! I installed all of it manually, and it's technically working, there are no errors, but it seems to be stuck on T5 text encode. It's maxing out all my computer's memory and just does nothing. Maybe my 16GB RAM is not enough? That T5 thing seems to be really heavy, two almost 10GB files.

[–]CrasHthe2nd[S] 2 points3 points  (2 children)

Yeah I think it's about 18GB required. You can run it on CPU if you don't have the VRAM, but you will need that amount of actual RAM. Hopefully someone will quantise it soon to bring down the memory requirement.

[–]a_mimsy_borogove -1 points0 points  (0 children)

I have 16 GB RAM and 6 GB video memory, so it seems like it's not going to work. :( I'll wait for someone to make a smaller version. I see that this one is described in the ComfyUI node as "XXL", so maybe they're planning to make smaller ones?

[–]turbokinetic 0 points1 point  (0 children)

Whoa it’s 4k?

[–]ozzie123 0 points1 point  (0 children)

You are awesome. Take my poorman’s gold 🏅

[–]sdk401 0 points1 point  (3 children)

Made all the steps, no errors, but getting only white noise. What sampler should I use? It's set to euler-normal in the workflow, is that right?

<image>

[–]sdk401 0 points1 point  (2 children)

ok, figured it out, but the results are kinda bad anyways :)

[–]Flimsy_Dingo_7810 0 points1 point  (1 child)

Hey, do you know what the issue was, and why you were getting 'just noise'? I'm stuck in the same place.

[–]sdk401 1 point2 points  (0 children)

This comment explains what to do:

https://www.reddit.com/r/StableDiffusion/comments/1c4oytl/comment/kzuzigv/

You need to chose "path type: folder" in the first node, and put configs in the same folder as the model. Look closely at the filenames, they are adding directory name to the filename, so you need to rename them correctly.

[–]ganduG 13 points14 points  (10 children)

Does it do well on multi-subject/object composition? Thats usually the thing most of these prompt adherence improvements fail at.

[–]CrasHthe2nd[S] 48 points49 points  (5 children)

Ummm, wow ok it handles it amazingly.

"a man on the left with brown spiky hair, wearing a white shirt with a blue bow tie and red striped trousers. he has purple high-top sneakers on. a woman on the right with long blonde curly hair, wearing a yellow summer dress and green high-heels."

<image>

This isn't cherry-picked either - this was literally the first batch I ran.

[–]ganduG 8 points9 points  (1 child)

Very impressive! Is this on Comfy yet to try?

[–]CrasHthe2nd[S] 9 points10 points  (0 children)

Yep, I posted a link to a workflow and some instructions in another comment.

[–][deleted] 1 point2 points  (0 children)

Wow that is impressive

[–]ganduG 1 point2 points  (1 child)

Any idea why this prompt doesn't work well?

Photo of a british man wearing tshirt and jeans standing on the grass talking to a black crow sitting on a tree in the garden under the afternoon sun

Photo of a british man standing on the grass on the left, a crow sitting on a tree on the right, in a garden under the morning sun, blue sky with clouds


Heres another prompt where DallE does better:

photo of a a firefighter wearing a black firefighters outfit climbs a steel ladder attached to a red firetruck, against a large oak tree. there are houses and trees in the background on a sunny day

Sigma vs DallE

[–]Careful_Ad_9077 1 point2 points  (0 children)

Try rephrasing them, maybe use a chat ai to suggest you ways to do so.

[–]Careful_Ad_9077 2 points3 points  (2 children)

I am breaking these models ( pix,dalle3) but I am using a lot of subjects, like 5 or more.

realistic manga style, basketball players , the first player is a male (tall with red hair and confident looks), the second player is female( she has brown hair elf ears and parted hair) , the third player is female (she is short and has parted blue hair) , the fourth player is a female ( tall with orange hair, swept bangs and closed eyes), the fifth player is a female ( she is short with blue hair tied in a braid) the sixth player is a male ( he is tall and strong , he has green short hair in a bowl cut), a dynamic sports action scene

[–]ganduG 1 point2 points  (1 child)

Have you found it performing as well as DallE? Cause i haven't, see this comment

[–]Careful_Ad_9077 0 points1 point  (0 children)

If we ignore text generation, i have seen it perform at 60 to 80% of dalle3, which is a huge step forward. I wonder how biased I am by the fact that in dalle3 I have to walk on egshells when prompting and this one does not care. Like in sigma I can prompt for an athletic marble statue of Venus and get the obvious result and Dalle3 will dog me.

[–]CrasHthe2nd[S] 2 points3 points  (0 children)

Good question. I'm out at the moment but I can give it a try in a bit.

[–]CrasHthe2nd[S] 31 points32 points  (6 children)

All images where generated by a first pass using PixArt Sigma for composition and then run through a second pass on SD1.5 to get the style and quality.

Image 1: a floating island in the sky with skyscrapers on it. red tendrils are reaching up from below enveloping the island. there is water below and the rest of the megacity in the background. the image is very stylized in black and white, with only red highlights for color

Image 2: a woman sits on the floor, viewed from behind. she has long messy brown hair which flows down her back and is coiled on the floor around her. she is sitting on a black marble circle with glowing alchemy symbols around it. she looks up at a beautiful night sky

Image 3: a giant floating black orb hovers menacingly above the planet, seen from the ground looking up into the clouds as it dwarfs the skyline. black and white manga style image. a beam of light is coming out of the orb firing down at the city below, causing a huge explosion

Image 4: a woman with long messy pink hair. she has turquoise eyes, and is wearing a white nurses outfit. she is standing with legs apart at the edge of a high precipice at night, black sky with a bright yellow full moon, with a sprawling city behind her in the background, red and white neon lights glowing in the darkness. little hearts float around her. she has a white nurses hat with bunny ears on it. she has a thick turquoise belt. she is wearing white high-top sneakers with pink laces, and the sneakers have little angel wings on the side

Image 5: a woman with long messy brown hair, viewed from the side, sitting astride a futuristic motorcycle, on the streets of a cyberpunk city at night. she has blue eyes, and a brown leather jacket over a black top. there is a bright full moon with a pale yellow tint in the sky. red and white neon lights glow in the darkness. she has a mischievous smile. she is wearing white high-top sneakers. the image is formatted like a movie poster

[–]Careful_Ad_9077 6 points7 points  (0 children)

Yeah they look a bit shitty but using the results in img2img +name.or detailed prompt in 1.5 is enough to get great looking results.

[–]FoddNZ 2 points3 points  (3 children)

Thanks for the workflow and instructions. I'm a beginner in Comfy, and I need a workflow to make it to a second pass to SDXL or SD1.5 for detail and refining. Do you have any suggestions?

[–]CrasHthe2nd[S] 5 points6 points  (1 child)

Add a checkpoint loader node, take the vae connection and the image output connection from the end of my workflow and put them both into a new VAEEncode node. Then the latent output of that goes into a new KSampler which is connected to your 1.5 model and encoded positive/negative prompts (you'll need to encode them again with the 1.5 clip in new nodes). Set denoise on the new KSampler to about 0.5 (experiment with different values). Essentially you're chaining two KSamplers together, one to do the composition and the second to take that and do style and quality.

[–]FoddNZ 0 points1 point  (0 children)

appreciated

[–]hexinx 1 point2 points  (0 children)

Can we "only use an SDXL model instead of theirs" etc... using just the T5 encoder?

[–]Future-Leek-8753 1 point2 points  (0 children)

Thank you for these.

[–]throwaway1512514 9 points10 points  (0 children)

Hope there is more assistance on how to run fp16/bf16 on local comfy

[–]DaniyarQQQ 8 points9 points  (1 child)

That looks really nice. One question, does it uses some kind of LLM instead of CLIP like SD?

[–]CrasHthe2nd[S] 12 points13 points  (0 children)

Yep, T5 I believe? I haven't dug too deep into the specifics yet.

[–]metal079 7 points8 points  (5 children)

Is there a way to finetune it?

[–]CrasHthe2nd[S] 15 points16 points  (4 children)

They've released fine tuning code but it's not implemented into kohya or OneTrainer yet, just pure python.

[–]metal079 2 points3 points  (0 children)

Gotcha, thanks for letting me know

[–]tekmen0 2 points3 points  (2 children)

Maybe I can implement Lora training code into konya, but I must be sure it's worth it

[–]CrasHthe2nd[S] 0 points1 point  (0 children)

I, for one, would definitely make use of it 😁

[–]reddit22sd 0 points1 point  (0 children)

Yes please!

[–]volatilebool 5 points6 points  (4 children)

Really love #2

[–]CrasHthe2nd[S] 5 points6 points  (3 children)

This one is really good too.

<image>

[–]EliotLeo 1 point2 points  (2 children)

But you're only doing anime/nonrealistic stuff w this model, correct?

[–]CrasHthe2nd[S] 2 points3 points  (1 child)

Only because that's the 1.5 model I was running it through after. It can do realistic stuff too.

[–]EliotLeo 0 points1 point  (0 children)

Cool thanks! I'm having success w consistent characters but now i'm finding issue w consistent clothing. But also trying to rely on as few tools as possible so it's just Stability's web service and REST API for now.

[–]wyguyyyy 7 points8 points  (0 children)

Are people getting this working outside comfy? A1111?

[–]crawlingrat 3 points4 points  (2 children)

Is it possible to create LoRA with this model? I have a bunch of images I’d love to train.

[–]CrasHthe2nd[S] 4 points5 points  (1 child)

Not to my knowledge, but it's only been out a couple of days so in time maybe.

[–]crawlingrat 0 points1 point  (0 children)

I hope so. These model is beautiful! Images are very clean to.

[–]kl0nkarn 2 points3 points  (3 children)

How does it do with text? Pretty poorly?

[–]CrasHthe2nd[S] 3 points4 points  (2 children)

Yeah sadly text doesn't work. But to be honest that's lowest on my list of priorities for an image generator - that sort of stuff can be added easily in post-processing.

[–]kl0nkarn 1 point2 points  (1 child)

yeah for sure, 1.5 models don't work too well with text so i didnt expect this to perform. Would be pretty cool though!

[–]CrasHthe2nd[S] 0 points1 point  (0 children)

Even before passing it through the second 1.5 model it was still a jumbled mess.

[–]Rough-Copy-5611 2 points3 points  (3 children)

Is this available for demo on huggingface or something?

[–]CrasHthe2nd[S] 5 points6 points  (0 children)

Not that I can see, but they have a local demo you can run.

https://github.com/PixArt-alpha/PixArt-sigma?tab=readme-ov-file#3-pixart-demo

[–]ZaneA 3 points4 points  (1 child)

Yup! One was posted yesterday, check this out :) https://huggingface.co/spaces/artificialguybr/Pixart-Sigma

[–]Rough-Copy-5611 1 point2 points  (0 children)

Thanks!

[–]PwanaZana 2 points3 points  (1 child)

Is pixart a thing we can use locally? In A1111?

[–]CrasHthe2nd[S] 1 point2 points  (0 children)

I don't think it's in A1111 yet but I posted a workflow for ComfyUI in another comment on here.

[–]hihajab 1 point2 points  (1 child)

How much minimum vram do you need?

[–]LMLocalizer 0 points1 point  (0 children)

I run it locally on Linux with an AMD GPU with 12 GB VRAM. It maxes out at 11.1 GB during inference if I use model offloading. (not using comfyUI BTW, just a Gradio web UI).

[–]kidelaleronStability Staff 1 point2 points  (6 children)

They look refined with SD1.5 finetunes. Am I right?

[–]CrasHthe2nd[S] 6 points7 points  (5 children)

Yep. The image quality from Sigma right now doesn't match that out of something like SDXL, so I'm running a second img2img pass on them to get better quality and style. The composition itself though is all Sigma.

[–]hellninja55 0 points1 point  (3 children)

Is it comfyui? Mind sharing the flow (for Pixart + SDXL)?

[–]CrasHthe2nd[S] 1 point2 points  (2 children)

Workflow and instructions are in another comment 🙂

[–]hellninja55 0 points1 point  (1 child)

I did see you posted a workflow, but there is no SD model loading there.

[–]CrasHthe2nd[S] 0 points1 point  (0 children)

You can just pass the output of that into a new KSampler with about 0.5 denoise strength. There's an example of img2img in ComfyUI here:

https://comfyanonymous.github.io/ComfyUI_examples/img2img/

[–]kidelaleronStability Staff 0 points1 point  (0 children)

reminds me of when people used to do this with Base SDXL.

[–]turbokinetic 0 points1 point  (0 children)

How is this model any different than others on Civit.ai ?

[–]Familiar-Art-6233 0 points1 point  (1 child)

Can you finetune it?

[–]CrasHthe2nd[S] 2 points3 points  (0 children)

Yes but it's purely python based at the moment. I'm trying to get it working but having issues with my environment. Hopefully kohya or OneTrainer will pick it up at some point.

[–]Overall-Newspaper-21 0 points1 point  (1 child)

is Pixart Sigma better than Ella ?

[–]CrasHthe2nd[S] 1 point2 points  (0 children)

From what I've tested of both so far, yes significantly.

[–]Apprehensive_Sky892 0 points1 point  (5 children)

Since workflow is provided, I would suggest you change the flare to "Workflow included".

[–]CrasHthe2nd[S] 0 points1 point  (4 children)

I tried but it wouldn't let me

[–]Apprehensive_Sky892 1 point2 points  (2 children)

I just tried it myself. Yes, there is a error when you do that if you use the current default UI, but if switch to new.reddit.com then it would work.

[–]CrasHthe2nd[S] 1 point2 points  (1 child)

Nice one, thanks!

[–]Apprehensive_Sky892 0 points1 point  (0 children)

You are welcome.

[–]Apprehensive_Sky892 0 points1 point  (0 children)

That's odd, maybe it has to do with the new reddit UI. Try doing it via the old UI: https://new.reddit.com/r/StableDiffusion/comments/1c4oytl/some_examples_of_pixart_sigmas_excellent_prompt/

(Note that it is new.reddit.com not www.reddit.com)