3,340 users here now
Useful Links
Ai Related Subs
NSFW Ai Subs
SD Bots
account activity
Animov-0.1 — High-resolution anime fine-tune of ModelScope text2video is now available in Auto1111! Trained on 384x384 anime fragments by strangeman3107, makes 2 seconds long videos with only 8.6G of VRAM (16 frames at 8 fps)Resource | Update (v.redd.it)
submitted 2 months ago by kabachuha
[–]Producing_It 40 points41 points42 points 2 months ago (8 children)
See now it’s already starting. Giving the community access to even a mediocre text2vid software will set the foundation for the best synthetic video creation ever.
The precipice of online content has started to transform, for the foreseeable future.
AI DRAWN ANIMATION BABY!!!!
[–]kabachuha[S] 17 points18 points19 points 2 months ago (7 children)
Yeah, the prospect of having an limitless uncontrolled animation factory at home is just too alluring to many. AI progress is exponential, but that exponent needs a seed. This seed has now been provided
Things are now in motion that cannot be undone
[–]Plane_Savings402 6 points7 points8 points 2 months ago (6 children)
INFINITE ANIME, LET'S GO!
[–]kaptainkeel 5 points6 points7 points 2 months ago (5 children)
So, the question then is:
Which anime need a sequel/new season the most?
[–]Plane_Savings402 10 points11 points12 points 2 months ago (0 children)
Well, we can start with remaking Berserk without the ugly 3D style.
[–]runslikewind 7 points8 points9 points 2 months ago (1 child)
I think i'll make an anime about a boy who gets hit by a truck and is transported to a fantasy world where he has op abilities. havent seen one like that before.
[–]LightVelox 0 points1 point2 points 2 months ago (0 children)
To make it actually unique he needs to have a harem with a tsundere loli and some dumb girl with massive boobs, also some rapey villains
[–]GenociderX 2 points3 points4 points 2 months ago (0 children)
**proceeds to look at Hunter X Hunter**
[–]New_Priority3004 0 points1 point2 points 2 months ago (0 children)
just give it to the no game no life fans. they have suffered enough.
[–]kabachuha[S] 15 points16 points17 points 2 months ago (10 children)
Made by strangeman3107 via https://github.com/ExponentialML/Text-To-Video-Finetuning. The original Diffusers weights https://huggingface.co/datasets/strangeman3107/animov-0.1
The converted weights to use in Auto1111 are here https://huggingface.co/kabachuha/animov-0.1-modelscope-original-format, and the conversion script is also available.
You can find the text2video plugin for sd-webui here https://github.com/deforum-art/sd-webui-text2video
[–]CleomokaAIArt 5 points6 points7 points 2 months ago* (6 children)
Im about to try it out (run a 3090)
So text2video is in (ran shutterstock video) , once i have everything download and installedi just replace the original files with the new animov ones to the tv folder with nothing else needed?
I will play around with it once I have it set (I do video edits), thanks for your work! text2video actually becoming semi usable would be such a huge step forward
Edit its alive and have 'anime'! Will see what I could do with it, think i can push it fairly far
[–]GenociderX 2 points3 points4 points 2 months ago (5 children)
Can I see the fruits of your work?
[–]CleomokaAIArt 3 points4 points5 points 2 months ago (4 children)
I was fooling around with a txt2video i did, with im2img batch enhancement and davinci resize and deflicker
And after I made a video, had to reshrink it for imgr gif to show, but its still good for a 1 second clip and a video that litterally didnt exist an hour agoWine all day
[–]GenociderX 4 points5 points6 points 2 months ago (3 children)
That's still impressive. I can see this model in particular is definitely going to be better in a months time. Anime is just too good to pass up. Thanks for sharing
[–]CleomokaAIArt 1 point2 points3 points 2 months ago (2 children)
There seems to be a special sweet spot for fps, size, and steps, resulting videos right now are much better, just found the fps and now much more anime ish. you can definitely got some workable videos. Biggest issue right now is the resolution is just so small, will try 512 x 512 after a few 384 x 384 runs (768x 768 was a mess)
[–]HarmonicDiffusion 0 points1 point2 points 2 months ago (1 child)
so what is the sweet spot? no sense in mentioning it unless you wanna share it.
sharing is what allowed you to make the video in the first place remember :)
[–]CleomokaAIArt 0 points1 point2 points 2 months ago* (0 children)
I havent really found one where I could say ah ha, but to get anything remotely close to useable 384 x 384 (what it was trained on) is what I would suggest. 24 frames and 24 fps at 384 x 384, steps around 50 is about where i would see results, effectively a 1 second video. Its not that my pc cant handle more (I could do as high as 768 x 768 and 48 frames without cuda errors), its that any higher resolution turns into a hot mess
This was my favourite and most impressive result (note quality is worse in gif form)
[–]P0ck3t 0 points1 point2 points 2 months ago (1 child)
Which of these is the file to add to A1111? I tried one and encountered issues, so I think I'm looking at the wrong link
[–]kabachuha[S] 0 points1 point2 points 2 months ago (0 children)
You can just replace the diffusion model and leave everything else the same as it's the only thing trained here and you can even hurt the rest of the modules by the back and forth conversion
Obviously, use the converted weights since it's the point of that script to bring Diffusers-tuned weights to Auto1111
[–]CardAnarchist 9 points10 points11 points 2 months ago (1 child)
This is pretty mind blowing. Much more impressive to me than all the rotoscoping stuff.
Getting awfully tempting to pony up for a 4090 (and a whole new PC) so I can start messing with the fast growing video side of Stable Diffusion.
[–]kabachuha[S] 1 point2 points3 points 2 months ago (0 children)
Yeah, no more need to use dancing TikTokers as a source
[–]AsterJ 8 points9 points10 points 2 months ago (11 children)
No Shutterstock logo!
This could become amazing. There's a shit ton of anime to train on.
[–]kabachuha[S] 0 points1 point2 points 2 months ago (10 children)
The problem is to label them all correctly. BLIP2 for autocaptioning has its own limitations and doesn't know about not absolutely famous characters, locations, etc.
[–]AsterJ 2 points3 points4 points 2 months ago (9 children)
I wonder if something like anidb can be used: https://anidb.net/episode/188651
It contains descriptions for each episode and says which characters appear in the episodes. Characters are also tagged with their physical and personality characteristics https://anidb.net/character/89242
[–]kabachuha[S] 2 points3 points4 points 2 months ago (8 children)
Wow! Just that's I've searched for! I'm making a ControlNet-like model on my Github (see the PR on the same fine-tuning repo) which should allow really long video generation, like whole episodes on a consumer PC and the dataset was a major missing thing for its training
[–]Yuli-Ban 0 points1 point2 points 2 months ago (3 children)
like whole episodes on a consumer PC
/u/saccharinemelody
[–]SaccharineMelody 0 points1 point2 points 2 months ago (2 children)
Mhm. Noted.
[–]kabachuha[S] 0 points1 point2 points 2 months ago (1 child)
I've seen your previous posts and yes, it's the exact replication of Microsoft's NUWA-XL model (the Flintstones demo one), but in ControlNet zero-convolutions style, so it won't change anything in the preexisting model and require far less resources for training
Here's the link https://github.com/kabachuha/InfiNet (WIP)
[–]SaccharineMelody 0 points1 point2 points 2 months ago (0 children)
(Giddy) Yeeheehee
[–]SaccharineMelody 0 points1 point2 points 2 months ago (3 children)
If I knew how to train the text 2 video model (the github doesn't tell me where to even start once I have the model downloaded so I assume it requires previous knowledge like with Loras) you don't even know. I'd probably be making full episodes right now off of hypnagogic 10-second 8-FPS clips.
I'm with Tingle below. The sooner you release this (in an idiot-proof GUI form) the sooner we can get started on our project. The whole future is waiting!
FYI, you can make 8 seconds-long clips already with my plugin after the latest optimizations https://www.reddit.com/r/StableDiffusion/comments/12o5qmo/auto1111_text2video_major_update_animate_pictures/
[–]Yuli-Ban 0 points1 point2 points 2 months ago (1 child)
Still just need a 5 minute proof of concept right now. If we can reliably do that, then whole shows and movies are indeed feasible.
I suppose the last issue beyond that is getting it to be coherent and at a smooth framerate. If it looks like the average one-click Stable Diffusion prompt output and animates like it too, then oof. But that's what agentic AI is supposed to be for in due time.
[–]SaccharineMelody 1 point2 points3 points 2 months ago (0 children)
Dude at this point even a 30-second proof of concept would make me elated.
[–]Enough_Spirit6123 7 points8 points9 points 2 months ago (0 children)
Ayo .. MAPPA just never dissapoints
[–]Rectangularbox23 5 points6 points7 points 2 months ago (0 children)
and so it begins…
[–]ExponentialCookie 4 points5 points6 points 2 months ago (2 children)
Amazing!
[–]kabachuha[S] 2 points3 points4 points 2 months ago (1 child)
Thanks for your work on the Fine-tune repo too!
[–]Cubey42 2 points3 points4 points 2 months ago (0 children)
thanks to both of you on pushing the possibilities
[–]ninjasaid13 3 points4 points5 points 2 months ago (3 children)
How much VRAM for training?
[–]kabachuha[S] 5 points6 points7 points 2 months ago (2 children)
While this finetune due to its relative high resolution used more memory so it would only fit >30 gbs (see the details here) if you tune it with Torch2 at 256x256 and all optimization options turned on, you can train it on as low as <=16 gbs of VRAM https://github.com/ExponentialML/Text-To-Video-Finetuning#hardware
[–]kaptainkeel 3 points4 points5 points 2 months ago (1 child)
Oof. The fact we're only at 256x256 and still using ~16GB... we're gonna need some more optimizations. Or new GPUs with actual VRAM.
[–]HeralaiasYak 0 points1 point2 points 1 month ago (0 children)
well, if you want to pack x frames into memory this means x times the memory requirements.
[–]Plane_Savings402 2 points3 points4 points 2 months ago (0 children)
SADDLE UP BOYS! The future has arrived!
[–]Yuli-Ban 3 points4 points5 points 2 months ago* (0 children)
Neat proof of concept to know that this is possible. Alas, I can't even begin to figure out how to train/finetune a model— I'm still befuddled by how to even train a LoRA and at this point I'm almost too afraid to ask (hence why I'm waiting for the inevitable agentic AI to do it for me in the future). Looking less for anime and more for a specific cartoon style, but it's all beyond me at this point.
Edit: Wait, nevermind, I figured out how to do a LoRA. Song remains the same, though.
[–]HarmonicDiffusion 3 points4 points5 points 2 months ago (0 children)
and this my friends is why MJ will never be able to hold a candle to Stable Diffusion and its army of volunteers :)
[–]tomakorea 2 points3 points4 points 2 months ago (0 children)
If some drunk guy was abducted by aliens and they wanted to know what is anime, I'm pretty sure it's what they will produce and call it "anime"
[–]WanderingPulsar 2 points3 points4 points 2 months ago (0 children)
Fuck thats so cool. Its not there yet but this shows the direction. One more year and people will be massing ok level animes right and left. Some manga artists would even release their own anime themselves instead of cutting a deal with a studio
[–]SpecialistFruit1 2 points3 points4 points 2 months ago (0 children)
Unlimited Diffusion Works
[–]VocalBlur 2 points3 points4 points 2 months ago (0 children)
<image>
[–]ImpactFrames-YT 2 points3 points4 points 2 months ago (0 children)
What yes is not perfect, but it has timing, spacing and weight it is just a couple of iterations to full bloom animation production. With how difficult animation is to produce I don't think handmade animation will be made ever again.
[–]Manson_79 1 point2 points3 points 2 months ago (0 children)
Amazing
[–]Zealousideal_Tip_915 1 point2 points3 points 2 months ago (1 child)
Physics too 😱
True. Having the Diffusion working in 3D (2+1) instead of 2D simultaneously definitely helps the AI to understand causality and how the things 'work' unlike the image-only models trained only with static references
[–]Disastrous-Agency675 1 point2 points3 points 2 months ago (3 children)
Awesome! I’m still getting the memory error with txt2video but good for y’all I guess…
[–]kabachuha[S] 0 points1 point2 points 2 months ago (2 children)
Are you using Torch2 and have you updated the extension to the latest version? (And I don't know how much vram you have)
[–]Disastrous-Agency675 0 points1 point2 points 2 months ago (0 children)
i have 8gb of vram, how do i update torch?
So I did figure out how to download torch 2 but I kept getting a wind 5 error or something like that for permissions, obviously I granted access to the folder for user but I’m still getting the error
[–]P0ck3t 0 points1 point2 points 2 months ago (3 children)
How do you add this to the current Modelscope text2video?
[–]Tadeo111 2 points3 points4 points 2 months ago (2 children)
you have to replace the "text2video_pytorch_model.pth" file in "stable-diffusion-webui/models/ModelScope/t2v" folder
[–]P0ck3t 1 point2 points3 points 2 months ago (1 child)
Can you easily switch between the models or is that just something for the future?
[–]kabachuha[S] 3 points4 points5 points 2 months ago (0 children)
A some sort of a dropdown box will come with time
[–]SwahReddit 0 points1 point2 points 2 months ago (1 child)
Awesome post u/kabachuha, didn't know we were there yet.
Trying to setup everything, and I'm getting this error when attempting to generate:
Exception occurred: memory_efficient_attention() got an unexpected keyword argument 'scale'
Posting in case someone else got this. Latest commit both for A1111 and the extension.
Seems to be xformers-version related, there's an opened issue on Github, I need to read the docs on when this argument appears/disappears. Or try updating xformers to the lastest version yourself (or even better to Torch2)
[–]buckjohnston 0 points1 point2 points 2 months ago* (4 children)
I want to train this so bad but it looks soo hard based on the instructions. Any chance we may ever get a plugin in auto1111 to train through GUI someday? Forcing myself to figure this out.
Also I am not sure how many training images or minutes of video to use
[–]kabachuha[S] 0 points1 point2 points 2 months ago (3 children)
If there will be enough traction, the plugin will appear eventually. (like Dreambooth's one did)
The repo recommends using 16-frames long clips for training, and then extending this value for inference
[–]buckjohnston 0 points1 point2 points 2 months ago (1 child)
Thanks, how many different 16 long frame clips should I use in total?
This fine-tune used around four hundred
/r/StableDiffusion won't agree but this is the one "hypnagogic" era of AI I won't miss. I loved DALL E Mini because of the memes (and static images are a bit different from video), but almost every video generation makes me frustrated (even the "funny" ones like Will Smith eating spaghetti) because I can see the pure utility just down the pipe and it bothers me that it's not good yet. I wish I could be having more fun with this but I just need advanced HD video gen NOW (Veruca Salt pout)
but I just need advanced HD video gen NOW
Wow, wow. I know, we are all horny for high-quality cinema/anime series made at home, but really calm down a bit. The problem with opensource AI is that
[–]SaccharineMelody 0 points1 point2 points 2 months ago* (0 children)
I know, we are all horny for high-quality cinema/anime series made at home, but really calm down a bit.
Just playing up Veruca Salt for that line mate ("I want it now!")
Also I'm actually not that far removed from the fifth point. When I first got told about AI art I didn't like the way it smelled at all but Yuli-Ban convinced me that it wasn't going away and that anyone who got in early (~2023-2025) were basically going to be the kingmakers of the near future of media before the "deluge." You could stick to your guns and not use AI art and wind up getting swept away, or you can ride the crest of the wave.
I think a lot of the anger over AI art is justified but also overblown because everyone is expecting it to be a new iPhone or internet where 95% of people wind up making movies and games and Michelangelos when in reality I don't think that many more than already identify as "creators" will jump into the game (for the most part; I think everyone and their gran will play with AI art for a few minutes before consuming what others make). And I also don't think someone who uses AI for art is quite on the same level as artists. But that's all so hard to make out right now and it's only going to get worse before it gets better. It's not going to get better until AI can reliably make any media and people get used to all of it. Until then the anti-AI people will get louder, more vocal, and more widespread.
But at the same time, I think there's going to be loads of benefits and it'll just be a better quality of life for people who DO want to create things. If you can make an animated series on your desktop PC, that can cut out the entire studio system of capital building and influence you have to go through. At the very least, I would not mind a situation where AI art is not copyrightable and thus can't be commercialized (and occupies the same gray area as fanfiction and modding) because it's the quality improvements that matter.
I don't know, I just think that things are not going to be as radically different as we think they're going to be in 10-15 years (though the status quo won't be the same either) and that people are way too over-focused on the worst case and most extreme case scenarios.
Could you share a ballpark figure of the dataset you need for finetuning? In terms of number of 'clips'/seconds used for training?
π Rendered by PID 120366 on reddit-service-r2-comment-db78cb8ff-t6z8z at 2023-06-20 13:53:31.150596+00:00 running 353fdbf country code: US.
[–]Producing_It 40 points41 points42 points (8 children)
[–]kabachuha[S] 17 points18 points19 points (7 children)
[–]Plane_Savings402 6 points7 points8 points (6 children)
[–]kaptainkeel 5 points6 points7 points (5 children)
[–]Plane_Savings402 10 points11 points12 points (0 children)
[–]runslikewind 7 points8 points9 points (1 child)
[–]LightVelox 0 points1 point2 points (0 children)
[–]GenociderX 2 points3 points4 points (0 children)
[–]New_Priority3004 0 points1 point2 points (0 children)
[–]kabachuha[S] 15 points16 points17 points (10 children)
[–]CleomokaAIArt 5 points6 points7 points (6 children)
[–]GenociderX 2 points3 points4 points (5 children)
[–]CleomokaAIArt 3 points4 points5 points (4 children)
[–]GenociderX 4 points5 points6 points (3 children)
[–]CleomokaAIArt 1 point2 points3 points (2 children)
[–]HarmonicDiffusion 0 points1 point2 points (1 child)
[–]CleomokaAIArt 0 points1 point2 points (0 children)
[–]P0ck3t 0 points1 point2 points (1 child)
[–]kabachuha[S] 0 points1 point2 points (0 children)
[–]CardAnarchist 9 points10 points11 points (1 child)
[–]kabachuha[S] 1 point2 points3 points (0 children)
[–]AsterJ 8 points9 points10 points (11 children)
[–]kabachuha[S] 0 points1 point2 points (10 children)
[–]AsterJ 2 points3 points4 points (9 children)
[–]kabachuha[S] 2 points3 points4 points (8 children)
[–]Yuli-Ban 0 points1 point2 points (3 children)
[–]SaccharineMelody 0 points1 point2 points (2 children)
[–]kabachuha[S] 0 points1 point2 points (1 child)
[–]SaccharineMelody 0 points1 point2 points (0 children)
[–]SaccharineMelody 0 points1 point2 points (3 children)
[–]kabachuha[S] 0 points1 point2 points (0 children)
[–]Yuli-Ban 0 points1 point2 points (1 child)
[–]SaccharineMelody 1 point2 points3 points (0 children)
[–]Enough_Spirit6123 7 points8 points9 points (0 children)
[–]Rectangularbox23 5 points6 points7 points (0 children)
[–]ExponentialCookie 4 points5 points6 points (2 children)
[–]kabachuha[S] 2 points3 points4 points (1 child)
[–]Cubey42 2 points3 points4 points (0 children)
[–]ninjasaid13 3 points4 points5 points (3 children)
[–]kabachuha[S] 5 points6 points7 points (2 children)
[–]kaptainkeel 3 points4 points5 points (1 child)
[–]HeralaiasYak 0 points1 point2 points (0 children)
[–]Plane_Savings402 2 points3 points4 points (0 children)
[–]Yuli-Ban 3 points4 points5 points (0 children)
[–]HarmonicDiffusion 3 points4 points5 points (0 children)
[–]tomakorea 2 points3 points4 points (0 children)
[–]WanderingPulsar 2 points3 points4 points (0 children)
[–]SpecialistFruit1 2 points3 points4 points (0 children)
[–]VocalBlur 2 points3 points4 points (0 children)
[–]ImpactFrames-YT 2 points3 points4 points (0 children)
[–]Manson_79 1 point2 points3 points (0 children)
[–]Zealousideal_Tip_915 1 point2 points3 points (1 child)
[–]kabachuha[S] 0 points1 point2 points (0 children)
[–]Disastrous-Agency675 1 point2 points3 points (3 children)
[–]kabachuha[S] 0 points1 point2 points (2 children)
[–]Disastrous-Agency675 0 points1 point2 points (0 children)
[–]Disastrous-Agency675 0 points1 point2 points (0 children)
[–]P0ck3t 0 points1 point2 points (3 children)
[–]Tadeo111 2 points3 points4 points (2 children)
[–]P0ck3t 1 point2 points3 points (1 child)
[–]kabachuha[S] 3 points4 points5 points (0 children)
[–]SwahReddit 0 points1 point2 points (1 child)
[–]kabachuha[S] 0 points1 point2 points (0 children)
[–]buckjohnston 0 points1 point2 points (4 children)
[–]kabachuha[S] 0 points1 point2 points (3 children)
[–]buckjohnston 0 points1 point2 points (1 child)
[–]kabachuha[S] 1 point2 points3 points (0 children)
[–]SaccharineMelody 0 points1 point2 points (2 children)
[–]kabachuha[S] 0 points1 point2 points (1 child)
[–]SaccharineMelody 0 points1 point2 points (0 children)
[–]HeralaiasYak 0 points1 point2 points (0 children)