2,563 users here now
Useful Links
Ai Related Subs
NSFW Ai Subs
SD Bots
account activity
Animating generated face testAnimation | Video (v.redd.it)
submitted 7 months ago by Sixhaunt
[–]Sixhaunt[S] 216 points217 points218 points 7 months ago* (29 children)
u/MrBeforeMyTime sent me a good video to use as the driver for the image and we have been discussing it during development so shoutout to him.
The idea behind this is to be able to use a single photo of a person that you generated, and create a number of new photos from new angles and with new expressions so that it can be used to train a model. That way you can consistently generate a specific non-existent person to get around issues of using celebrities for comics and stories.
The process I used here was :
I'm going to try it with 4 different driving videos then I'll handpick good frames from all of them to train a new model with.
I have done this all on a google colab so I intend to release it once I've cleaned it up and touched it up more
edit: I'll post my google colab for it but keep in mind I just mashed together the google colabs for the various things that I mentioned above. It's not very optimized but it does the job and it's what I used for this video
https://colab.research.google.com/drive/11pf0SkMIhz-d5Lo-m7XakXrgVHhycWg6?usp=sharing
In the end you'll see the following files in google colab that you can download:
keep in mind that if your clip is long, it can produce a ton of photos so downloading them might take a long time. If you just want the video at the end then that shouldnt be as big of a concern since you can just download the mp4
You can also view individual frames without downloading the entire zip by looking in the "frames" and "fixed" folders
edit2: check out some of the frames I picked out from animating the image: https://www.reddit.com/r/StableDiffusion/comments/ys5xhb/training_a_model_of_a_fictional_person_any_name/
I have 27 total which should be enough to train on.
[–]joachim_s 40 points41 points42 points 7 months ago (5 children)
Questions:
[–]Sixhaunt[S] 46 points47 points48 points 7 months ago (3 children)
[–]joachim_s 1 point2 points3 points 7 months ago (2 children)
[–]Sixhaunt[S] 4 points5 points6 points 7 months ago (1 child)
Depends. Do you consider the google colab creation time? because I can and do reuse it. Aside from that it's just a matter of creating a face (I used one I made a while back) and a driving video which someone else gave me. So in the end it's mostly just the time it takes to run the colab whenever I use it now.
[–]joachim_s 1 point2 points3 points 7 months ago (0 children)
Ok.
[–]LynnSpyre 0 points1 point2 points 7 months ago (0 children)
I did some fun experiments with this one. What I figured out is that it works really well if you keep your head straight. My computer got weird on longer clips, but at 90 seconds and 25-30 fps, it was fine. Another issue is the size limitation which puts you 256 pixels wide, unless you retrain the model, which is a chore. If the op's doing it at 512, though, there's gotta be a way to do it. Either way, you can always upscale. I also found that DPM works better for rendering avatars for Thin Spline Motion Model or First Order Model. First Order Model does the same thing, but it does not work as well. But what it does have that Thin Spline doesn't is a nice utility for isolating the head at the right size from your driver video source.
[–]eugene20 43 points44 points45 points 7 months ago (0 children)
Really impressive consistency.
[–]GamingHubz 10 points11 points12 points 7 months ago (3 children)
I use https://github.com/harlanhong/CVPR2022-DaGAN it's supposedly faster than TPSMM.
[–]samcwl 1 point2 points3 points 7 months ago (1 child)
Did you manage to get this running on a colab?
[–]GamingHubz 0 points1 point2 points 7 months ago (0 children)
I did it locally
[–]MacabreGinger 8 points9 points10 points 7 months ago (1 child)
Thanks for sharing the process u/Sixhaunt . Unfortunately, I didn't understand a single thing because I'm a noob SD user and a total schmuck.
[–]Sixhaunt[S] 5 points6 points7 points 7 months ago (0 children)
to be fair no SD was used at all in the making of this video. I used MidJourney for the original image of the woman but the SD community is more technical and would make more use of this so I posted it here, especially since the original image could have just as easily been made in SD. The purpose is also to use the results in SD for a new custom character model but technically no SD was used in this video.
With the google colab though you can just run the "setup" block, then change the source.png to your own image and the driving.mp4 to your own custom video then just hit run on all the rest of the blocks and it will just work and give you a video like the one above. It will also create a zip file of still-frames for you to use for training.
Just be sure you're replacing the png and mp4 files with the same names and locations, or you change the settings to point to your new files
[–]samcwl 2 points3 points4 points 7 months ago (1 child)
What is considered a good "driving video"?
[–]Sixhaunt[S] 2 points3 points4 points 7 months ago (0 children)
The most important thing from what i've tested is that you dont want your head to move too much from center. There should always be space between your head and the edges of the screen.
For head tilting keep in mind it varies for the following:
There are also 3 or 4 different models in The-Plate that are used for different framing of the person so this applies only to the default (vox). The "ted" model for example is a full-body one with moving arms and stuff like you might expect from someone giving a ted talk.
[–]cacoecacoe 5 points6 points7 points 7 months ago (4 children)
Why not use CodeFormer instead of GFPGan? I fidn the results consistently better for anything photographic at least
[–]Sixhaunt[S] 20 points21 points22 points 7 months ago (2 children)
At first i tried both using A1111's batch processing rather than on colab itself but I found that GFPGan produced far better and more photo-realistic results. Codeformer seems to change the facial structure less but it also gives a less polished result and for what I'm using it for, I dont care so much if the face changes as long as it's consistent, which it is. That way i can get the angles and shots I need to train on. Ideally codeformer would be implemented as a different option but I'm sure someone else will whip up an improved version of this within an hour or two of working on it. It didnt take me long to set this up as it is. I started on it less than a day ago.
[–]cacoecacoe 5 points6 points7 points 7 months ago (1 child)
Strange because my experience of GPPGan and codeformer have been the precise inverse of what you've described, however, different strokes I guess
I guess the fact that GFPGan does change the face more (a common complaint is that it changes faces too much and everyone ends up looking the same) is probably an advantage for animation.
[–]Sixhaunt[S] 3 points4 points5 points 7 months ago (0 children)
it probably was, although it didn't actually change the face shape much. Unfortunately it put a lot of makeup on her though. The original face had worse skin but it looked more natural and I liked it. I might try a version with CodeFormer or blend them together or something but if you want to see the way it changed the face and what the input actually was then here you go:
https://imgur.com/a/HRIVuGE
keep in mind they arent all of the same video frame or anything, I just chose an image from each set where they had roughly the same expression as the original photo
[–]TheMemo 8 points9 points10 points 7 months ago (0 children)
I find CodeFormer tends to 'invent' a face rather than fixing it.
[–]eugene20 1 point2 points3 points 7 months ago (1 child)
I'm new to colab, I've been running everything locally anyway, I just wanted to have a look at the fixed.zip and frames.zip but I couldn't figure out how to download them?
[–]Sixhaunt[S] 0 points1 point2 points 7 months ago (0 children)
those output files are produced after you run it on your custom image and video. They dont host the file-results that I got on there, but elsewhere on this thread I've linked to hand-selected frames I intend to use and I've linked to some comparisons of images from those various zips, but I logged on to to find so many comments that I'm just trying to answer them all right now.
I think it shows the in-progress videos within the colab page itself, just not the files for them. You should be able to see the driving video and input image I used on there as well as how it looked before upsizing and fixing the faces
[+][deleted] 7 months ago (2 children)
[–]LynnSpyre 0 points1 point2 points 7 months ago (3 children)
Okay, I've used this model before. Only issue with it is my graphics card. It gets weird on clips longer than 90 seconds. Either crashes or freezes
[–]Sixhaunt[S] 2 points3 points4 points 7 months ago (2 children)
I ran it on google colab so I didnt have to run it or install any of it locally. I'm working on a new version of the colab right now though.
For my purposes I just need images of the face from different angles and with various expressions so I'll be using a few 2-3 second clips and I wont have the long-video issues. Although you could always crop a video and process in segments.
[–]LynnSpyre 0 points1 point2 points 7 months ago (1 child)
Question: do you remember which pre-trained model you were using?
[–]Sixhaunt[S] 1 point2 points3 points 7 months ago (0 children)
I use the vox one
[–]pierrenay 152 points153 points154 points 7 months ago (15 children)
getting closer to the holy grail dude
[–]Sixhaunt[S] 36 points37 points38 points 7 months ago (10 children)
I ran it with two videos and extracted 9 frames so far that I really like and that are varied from eachother. I have 2 more videos to do it with then I'll hopefully have enough for dreambooth and create a model for a custom person. Any suggestions on what to name her? I'll have to give some sort of keyword name to her afterall.
[–]mreo 13 points14 points15 points 7 months ago (0 children)
Ema Nymton 'Not My Name' backwards From the 90s detective game 'Under a Killing Moon'.
[–]Fake_William_Shatner 12 points13 points14 points 7 months ago (3 children)
Name her Val Vette.
[–]malcolmrey 2 points3 points4 points 7 months ago (1 child)
i like that
[–]Fake_William_Shatner 1 point2 points3 points 7 months ago (0 children)
I was thinking of scarlet. Velvet cake. Valves. And I figure that this name could be mistaken and twisted a few different ways.
Plus, I think she's got a bit of a country accent the way the corners of her mouth press. It sounds like butter rollin' off a new stack of pancakes.
[–]velvetwool 0 points1 point2 points 7 months ago (0 children)
Mmmm nice name
[–]Mackle43221 1 point2 points3 points 7 months ago (0 children)
Ruby
[–]mreo -2 points-1 points0 points 7 months ago* (0 children)
accidental duplicate comment...
[–]pepe256 0 points1 point2 points 7 months ago (0 children)
Gene-vieve
[–]cyan2k 35 points36 points37 points 7 months ago (0 children)
Man I can't even imagine how the SD/Ai Art landscape looks like in 1 year, 3 years, 5 years. Amazing.
Probably banned by every country or something, haha.
[–]o-o- 0 points1 point2 points 7 months ago (1 child)
Yep, what we've all been dreaming of since 1987.
[–]LordTuranian 0 points1 point2 points 7 months ago (0 children)
Good movie.
[–]Orc_ 0 points1 point2 points 7 months ago (0 children)
its all coming together
[–]sheagryphon83 49 points50 points51 points 7 months ago (8 children)
Absolutely amazing, it is so smooth and lifelike. I’ve watched the vid several times now trying to find fault in the skin muscles and crows feet. And I can’t find any. Her crows feet appear and disappear as they should as she talks pulling and pushing her skin around… Simply amazing.
[–]Sixhaunt[S] 24 points25 points26 points 7 months ago (7 children)
That comes down to having a good driving video I think. With other ones you need to be far more picky with frames. The biggest help someone could do for the community would be to record themselves making the faces and head movements that work well with this that way it's easy to generate models with it. It would take some experimenting to get a good driving video though.
[–]Etonet 5 points6 points7 points 7 months ago (2 children)
What is a driving video?
[–]Sixhaunt[S] 8 points9 points10 points 7 months ago (1 child)
the video that has the expressions and emotions that the picture is then animating from. Originally it was a tiktoker making the facial expressions (a brunette woman with a completely different face than the video above). The Thin-Plate Ai then mapped the motion from the video onto the image of the person that I created with AI. The result was 256x256 though so I had to upsize and fix the faces after.
[–]Etonet 0 points1 point2 points 7 months ago (0 children)
I see, thanks! Very cool
[–]Pretend-Marsupial258 1 point2 points3 points 7 months ago* (1 child)
There are video references on the internet for animators. Here's one I found, for example. It requires a login/account, but I bet there are other websites that don't require anything.
Edit: Stock sites like Shutterstock also have videos, but I don't know if the watermark will screw stuff up.
that's a really good idea! Worth registering for if those are free. I'll check it out more today
[–]LetterRip 0 points1 point2 points 7 months ago (1 child)
Interesting facial expressions video here,
https://www.youtube.com/watch?v=X1osDan-RZQ
oh, thankyou! I was planning to put together a bunch of 2-3s clips for different facial expressions then have it run on each clip. I just need to set up the repo for it then find a bunch of clips but that video seems like it would have a lot of gems. The driving video for the post above was using a similar thing. I was recommended some tiktoker who was changing expressions and stuff but there was a good closeup shot that did consistently well so I pulled from it.
[–]Speedwolf89 38 points39 points40 points 7 months ago (3 children)
Now THIS is what I've been sticking around in this horny teen infested subreddit for.
[–]pepe256 31 points32 points33 points 7 months ago (1 child)
You don't think this was also motivated in some way by horniness? We adults are just more subtle about it
[–]Speedwolf89 1 point2 points3 points 7 months ago (0 children)
Hahh indeed.
[–]dreamer_2142 12 points13 points14 points 7 months ago (0 children)
Honestly? this is not that bad at all. almost all the upvoted posts are great. few memes too.
[–]Pretty-Spot-6346 16 points17 points18 points 7 months ago (1 child)
i know some awesome guys gonna make it easy for us, thank you
[–]Sixhaunt[S] 18 points19 points20 points 7 months ago (0 children)
I edited my reply to add my google colab for it so you can do it right now with just a square image and a square video clip. Hopefully someone decides to cannibalize my code and make a better more efficient version before I get the chance to though but this is exactly what i used for the video above
[–]Ooze3d 12 points13 points14 points 7 months ago (3 children)
Amazing results. We’re getting very close to consistent animation and from that point on, the sky is the limit. We’re just a few years apart from actual ai movies.
[–]cool-beans-yeah 1 point2 points3 points 7 months ago (2 children)
How long you think? 5 years?
[–]Ooze3d 1 point2 points3 points 7 months ago (1 child)
The way this is going, probably much sooner than I’d consider possible. Conservatively, I’d say end of 2023 for the first few examples of actual short films with a plot (as in “not simply beautiful images edited together”). Probably still glitchy and always assisted by real footage for the movements. After that, another year to get to a point where it’s virtually indistinguishable from something shot on camera, and maybe another year where we can input what we want the subject to do and the use of actual footage is no longer needed.
But as I said, given the fact that this is all a worldwide collaborative project that’s going way faster than any other technological breakthrough I’ve witnessed or known of, I wouldn’t be surprised to see all that by the end of next year.
[–]cool-beans-yeah 0 points1 point2 points 7 months ago (0 children)
That would be wild!
[–]reddit22sd 12 points13 points14 points 7 months ago (0 children)
These are the posts I come to reddit for, excellent thinking!
[–]superluminary 12 points13 points14 points 7 months ago (1 child)
This is extremely impressive
[–]Sixhaunt[S] 9 points10 points11 points 7 months ago (0 children)
thanks! I just put an update out on how the still frames look that I'll be using for training: https://www.reddit.com/r/StableDiffusion/comments/ys5xhb/training_a_model_of_a_fictional_person_any_name/
If this all turns out well I intend to make a whole bunch of models for various fictional people and maybe take some commissions to turn people's creations into an SD model for them to use if they dont want to use my public code themselves
[–]Tax21996 8 points9 points10 points 7 months ago (0 children)
damn this one is so smooth
[–]Kaennh 8 points9 points10 points 7 months ago* (6 children)
Really cool!
Since I started tampering with SD I've been obsessed with the potential it has to generate new animation workflows. I made a quick video (you can check it out here) using FILM + SD but I also wanted to try TSPMM in the same way you have to improve consistency... I'm pretty sure I will now that you have shared a notebook, so thanks for that!
A few of questions:
- Does the driving video needs to have some specific dimensions (other than 1:1 proportion)?- Have you considered Ebsynth as an alternative to achieve a more painterly look (I'm thinking about something similar to Arcane style... perhaps)? Would it be possible to add it to the notebook? (not asking you to, just asking if it's possible?)
[–]Sixhaunt[S] 1 point2 points3 points 7 months ago (4 children)
- Does the driving video needs to have some specific dimensions (other than 1:1 proportion)?
no. I've used driving videos that are 410x410, 512x512, 380x380 and they all worked fine, but that's probably because they are downsized to 256x256 first.
The animation AI I used does 256x256 videos so I had to upsize the results and use GFPGan to unblur the faces after. So I dont think you get any advantage with an input video larger than 256x256 but it wont prevent it from working or anything
Have you considered Ebsynth as an alternative to do achieve a more painterly look (I'm thinking about something similar to Arcane style... perhaps)? Would it be possible to add it to the notebook?
I've had a local version of Ebsynth installed for a while now and I've gotten great results with it in the past, I just wasn't able to find a way to use it through google colab and ultimately I want to be able to feed in a whole ton of images and videos then have it automatically produce a bunch of new AI "actors" for me but it's too much effort without fully automating it.
If you're doing it manually then using Ebsynth would probably be great and might even work better in terms of not straying from the original face since you dont need to upsize it after and fix the faces (GFPGan puts makeup on the person too much)
[–]rangoonmeathelmet 0 points1 point2 points 7 months ago (3 children)
Is it possible to change the output aspect ratio to 16:9 or are you locked into 256x256?
[–]Sixhaunt[S] 1 point2 points3 points 7 months ago (2 children)
I think it's locked. The full-body one which is called "ted" is like 340x340 or something but it doesnt work for close up faces.
You might be able to crop a video to a square containing the face, use this method to turn it into the other person, then stitch it back into the original video
[–]rangoonmeathelmet 0 points1 point2 points 7 months ago (1 child)
Got it. Thank you!
I should mention that the demo they use doesnt have a perfectly square input video so I think it crops it but still accepts it.
[–]Logseman 4 points5 points6 points 7 months ago (0 children)
This is both awe-inspiring and very scary.
[–]Seventh_Deadly_Bless 6 points7 points8 points 7 months ago (9 children)
95-97% humanlike.
Face muscles change of volume from a frame to the few next. My biggest grief.
Body language hints anxiety/fear. But she also smiles. It's not too paradoxical of a message, but it does bother me.
For the pluses :
Bone structure kept all the way through, pretty proportions of her features. Aligned teeth.
Stable Diffusion is good with surface rendering, which give her a realistic, healthy skin. The saturated, vibrant, painterlier/impressionistic style makes the good pop out and hides the less good.
It's scarily good.
Question : What's the animation workflow ?
I know of an AI animation tool (Antidote ? Not sure of the name.), but it's nowhere near that capable. Especially paired with Stable Diffusion
I imagine you had to animate it manually, at least in part, almost celluloid-era style.
Which would be even more of an achievement.
[–]LetterRip 1 point2 points3 points 7 months ago* (7 children)
Pretty sure it is just optical flow automatic matching (thin plate spline), they aren't doing any animation.
https://arxiv.org/abs/2203.14367
https://studentsxstudents.com/the-future-of-image-animation-thin-plate-spline-motion-90e6cf807ea0?gi=643589a1b820
And this is the model used
https://cloud.tsinghua.edu.cn/f/da8d61d012014b12a9e4/?dl=1
[–]Seventh_Deadly_Bless 0 points1 point2 points 7 months ago (6 children)
Scratching my head.
This is obviously emergent tech, but I'm wondering if it is implemented through the same pytorch stack than Stable Diffusion.
I need to check the tech behind the Antidote thing I've mentionned. Maybe it's an earlier implementation of the same tech.
What you describe is a deepfake workflow. I bet it's one of the earliest ones used to make pictures of famous people sing.
I feel like there's something I'm missing, though. I'll try to take a look tomorrow: it's getting late for me right now.
[–]LetterRip 3 points4 points5 points 7 months ago (5 children)
Yes it uses pytorch (hence the 'pt' extension to the file). I think you might not understand these words?
Pytorch is a neural network frame work. Diffusion is a generative neural network.
What you describe is a deepfake workflow.
Nope,
Deepfakes rely on a type of neural network called an autoencoder.[5][61] These consist of an encoder, which reduces an image to a lower dimensional latent space, and a decoder, which reconstructs the image from the latent representation.[62] Deepfakes utilize this architecture by having a universal encoder which encodes a person in to the latent space.[63] The latent representation contains key features about their facial features and body posture. This can then be decoded with a model trained specifically for the target.[5] This means the target's detailed information will be superimposed on the underlying facial and body features of the original video, represented in the latent space.[5] A popular upgrade to this architecture attaches a generative adversarial network to the decoder.[63] A GAN trains a generator, in this case the decoder, and a discriminator in an adversarial relationship.[63] The generator creates new images from the latent representation of the source material, while the discriminator attempts to determine whether or not the image is generated.[63] This causes the generator to create images that mimic reality extremely well as any defects would be caught by the discriminator.[64] Both algorithms improve constantly in a zero sum game.[63] This makes deepfakes difficult to combat as they are constantly evolving; any time a defect is determined, it can be corrected.[64]
Deepfakes rely on a type of neural network called an autoencoder.[5][61] These consist of an encoder, which reduces an image to a lower dimensional latent space, and a decoder, which reconstructs the image from the latent representation.[62] Deepfakes utilize this architecture by having a universal encoder which encodes a person in to the latent space.[63] The latent representation contains key features about their facial features and body posture. This can then be decoded with a model trained specifically for the target.[5] This means the target's detailed information will be superimposed on the underlying facial and body features of the original video, represented in the latent space.[5]
A popular upgrade to this architecture attaches a generative adversarial network to the decoder.[63] A GAN trains a generator, in this case the decoder, and a discriminator in an adversarial relationship.[63] The generator creates new images from the latent representation of the source material, while the discriminator attempts to determine whether or not the image is generated.[63] This causes the generator to create images that mimic reality extremely well as any defects would be caught by the discriminator.[64] Both algorithms improve constantly in a zero sum game.[63] This makes deepfakes difficult to combat as they are constantly evolving; any time a defect is determined, it can be corrected.[64]
https://en.wikipedia.org/wiki/Deepfake
Optical flow is a older technology, used for match moving (having special effects be in the proper 3d location of a video).
https://en.wikipedia.org/wiki/Optical_flow
[+]Seventh_Deadly_Bless comment score below threshold-5 points-4 points-3 points 7 months ago (4 children)
[–]ko0x 3 points4 points5 points 7 months ago* (2 children)
Nice, I tried something like this for a music video for a song of mine roughly 2 years ago, but stopped because colabs is such a horrible unfun workflow. Looks like I can give it another go soon.
[–]Sixhaunt[S] 3 points4 points5 points 7 months ago (1 child)
they have a spaces page on huggingface if you dont want to run through google colab for Thin-Plate, I just set one up that does it all start to finish including upsizing the result and running the facial fixing and packaging frames so you can hand-pick them for training data.
The main purpose is to generate sets of images like these for training: https://www.reddit.com/r/StableDiffusion/comments/ys5xhb/training_a_model_of_a_fictional_person_any_name/
[–]ko0x 0 points1 point2 points 7 months ago (0 children)
OK thanks, I look into that. I hoped we are getting close to running this locally and easy to use like SD.
[–]allumfunkelnd 4 points5 points6 points 7 months ago (1 child)
This is how our quantum computer AIs will communicate with us in real time in the metaverse of the future. :-D Awesome! Thanks for sharing this and your workflow! The face of this Robo-Girl is stunning.
[–]ninjasaid13 0 points1 point2 points 7 months ago (0 children)
I think we would be more likely to use Analog computers for AI in the future because they're are much faster though with the cost of being less accurate but that doesn't matter much in AI.
[–]pbinder 3 points4 points5 points 7 months ago (6 children)
I run SD on my desktop; is it possible to do all this locally and not through google colab?
[–]Sixhaunt[S] 5 points6 points7 points 7 months ago (5 children)
yeah, I dont see why not.
[–]Vivarevo -1 points0 points1 point 7 months ago (1 child)
wonder if its possible to run low quality video for live feed
I think the processing takes longer than running the video so it probably wouldn't work for that unfortunately, although upscaling to some extent on the client-side isn't unheard of already
[–]jonesaid 0 points1 point2 points 7 months ago (2 children)
Is there a tutorial out there to set up the TPSMM locally?
I think their github shows all the various ways you can use it and gives a quick tutorial
[–]NerdyRodent 1 point2 points3 points 7 months ago (0 children)
Sure is! How to Animate faces from Stable Diffusion! https://youtu.be/Z7TLukqckR0
[–]Maycrofy 1 point2 points3 points 7 months ago (0 children)
I mean, it looks like how animation would move in real life. It's very captivating.
[–]kim_en 1 point2 points3 points 7 months ago (0 children)
tf, I thought this kind of animation will come after next year. Absolutely mind blowing.
[–]Dart_CZ 1 point2 points3 points 7 months ago (0 children)
What is she saying? I cannot recognize the first part. But the last part looks like: "me, please" What are your tips guys?
[–]Unlimitles 1 point2 points3 points 7 months ago (0 children)
one day.....someone is going to use these things to Lure men to their dooms.
it's going to work....
[–]ptitrainvaloin 1 point2 points3 points 7 months ago* (0 children)
Great results 😁
Here's a tip I discovered that will surely help you along your journey for the purpose you stated, if you make a custom photo template for training with Textual Inversion, the more photorealism the results of your new template are, the faster (less steps) and less images required (less than what is regulary suggested in the field at the present time) to create your own model(s) and style(s) in even higher quality.
short example of a new photorealism_template.txt (in directory stable-diffusion-webui/textual_inversion_templates) you can create :
(photo highly detailed vivid) ([name]) [filewords]
(shot medium close-up high detail vivid) ([name]) filewords
(photogenic processing hyper detailed) ([name])
Etc... add some more lines to it.
The more variations you add the better as long you test your prompts before adding them to your template to be sure they produce good pretty constant photorealism results.
Goodluck and continue to have fun experimenting!
***Edit, input image(s) must be of high quality, otherwise garbage in -> garbage out
[–]lagosta-alucinada 1 point2 points3 points 7 months ago (2 children)
u/savevideo
[–]SaveVideo 1 point2 points3 points 7 months ago (1 child)
Info | Feedback | Donate | DMCA | reddit video downloader | download video tiktok
good bot
[–]InMyFavor 1 point2 points3 points 7 months ago (7 children)
This is genuinely fucking nuts
[–]Sixhaunt[S] 1 point2 points3 points 7 months ago (6 children)
Just uploaded a new video with her too: https://www.reddit.com/r/AIActors/comments/ysxg6p/new_video_of_genevieve/
[–]InMyFavor 1 point2 points3 points 7 months ago (1 child)
Yooooooo
I almost have a completed model for her too which I'll release soon. Then anyone can use her for their projects since this woman doesn't actually exist and isn't a copyright issue like celebrity faces. I think people making visual novels will especially like it
[–]InMyFavor 0 points1 point2 points 7 months ago (0 children)
This is firmly on the other side of the uncanny valley.
[–]InMyFavor 0 points1 point2 points 7 months ago (2 children)
This is so crazy and borderline revolutionary and virtually no one mainstream is paying attention.
[–]Sixhaunt[S] 1 point2 points3 points 7 months ago (1 child)
it's crazy to think that this was my first try and it took less than a day to implement. I can only imagine what we will be able to do in a few months from now even.
I'm barely struggling to keep up as it is now. In 6 months I have no clue.
[–]Throwaway-sum 1 point2 points3 points 6 months ago (0 children)
This is nuts!! This only came out weeks ago? It feels like we are experiencing history in the making.
[–]unrealf8 2 points3 points4 points 7 months ago (1 child)
Ahh, that’s the major question I had about sd. Can I generate a character that I can consistently continue to generate art with. Love it!
check out some of the frames I pulled from this method which I'll be training with: https://www.reddit.com/r/StableDiffusion/comments/ys5xhb/training_a_model_of_a_fictional_person_any_name/
[–]Magikarpeles 3 points4 points5 points 7 months ago (1 child)
Hear me out
im listening
[–]HulkHunter 2 points3 points4 points 7 months ago (0 children)
Synthetic Reality becoming real.
[–]martsuia 2 points3 points4 points 7 months ago (0 children)
Looking at this feels like I’m dreaming.
[–]1Neokortex1[🍰] 1 point2 points3 points 7 months ago (0 children)
🚀🔥
[–]moahmo88 1 point2 points3 points 7 months ago (0 children)
Good job!
[–]MonoFauz 1 point2 points3 points 7 months ago (0 children)
The progress with this tech is so fast. Great job!
[–]TraditionLazy7213 0 points1 point2 points 7 months ago (0 children)
Thanks for sharing, amazing stuff
[–]JCNightcore 0 points1 point2 points 7 months ago (0 children)
This is amazing
[–]nano_peen 0 points1 point2 points 7 months ago (0 children)
Incredible consistency
[–]LeBaux 0 points1 point2 points 7 months ago (0 children)
We are all thinking it.
[–]TrevorxTravesty -2 points-1 points0 points 7 months ago (1 child)
This is going to be incredible when we’ll be able to do this with dead actors and see them shine again 😯 I’d love to be able to see some of my favorite people such as Robin Williams or Bruce Lee do stuff again 😞 I would love to make loving tributes to them.
[–]ObiWanCanShowMe 8 points9 points10 points 7 months ago (0 children)
That is not what OP is doing here. OP is generating different images (frames) of a fictional person by animating a still image of a face so they can then make an SD model for this fictional person, thus being able to consistantly generate that fictional person without variations.
Think
picture of thepersonicreated with red hair in a warrior outfit
instead of
picture of a beautiful girl with red hair in a warrior outfit
The first one gets this same face, the second is random. It's SD created dreambooth.
That said, what you suggested is already possible with deepfake which is only going to get better.
[–][deleted] 7 months ago (1 child)
[removed]
[–]StableDiffusion-ModTeam[M] 1 point2 points3 points 7 months agolocked comment (0 children)
Your post/comment was removed because it contains hateful content.
[–]jonesaid 0 points1 point2 points 7 months ago (0 children)
I was wondering if something similar could be done using Euler a step variation to get different images of the same fictional person. I'm not sure if the face stays the same at different steps though...
[–]omnidistancer 0 points1 point2 points 7 months ago (1 child)
I'm implementing something along the same lines but with different models for the motion transfer and upscaling(could possibly go above 2k if everything works out ok). Very interesting to see your amazing results :)
Do you mind share the driving video or at least some suggestion on how to get something similar? The expressions look amazing!
It's just a short clip of a tiktoker making some facial expressions. I mentioned in the original comment the guy who gave me the clip. I ended up having to find it again myself for a higher-quality version.
I uploaded the short clip I used from the video here though: https://filebin.net/r0ynwdeg2emc61e0
[–][deleted] 0 points1 point2 points 7 months ago (0 children)
Wow, this was well done.
[–]Zyj 0 points1 point2 points 7 months ago (1 child)
That slight smile...
https://imgur.com/a/jfkksoh
there's some stills if you're interested.
[–]The_Irish_Rover26 0 points1 point2 points 7 months ago (0 children)
Very cool.
[–]Silly-Slacker-Person 0 points1 point2 points 7 months ago (1 child)
I wonder if soon it will be possible to animate two characters talking at the same time
I dont see why you cant make a face detector that then crops videos around the heads, runs that video through a similar process to what i did, then splices it back into the original video to have as many people talking as you want
[–]vs3a 0 points1 point2 points 7 months ago (0 children)
This remind me of Faestock from Deviantart day.
Game changer!
[–]AlbertoUEDev 0 points1 point2 points 7 months ago (0 children)
Ohh I was looking something like this 🤩
[–]BinyaminDelta 0 points1 point2 points 7 months ago (0 children)
This is the future.
Hopefully these are the kind of graphics we will see in the next Skyrim and Fallout game.
[–]yehiaserag 0 points1 point2 points 7 months ago (0 children)
Respect man, I wish all all the best Even more respect because you are sharing with the community
[–]InfiniteComboReviews 0 points1 point2 points 7 months ago (0 children)
This is awesome, but there is something very....off putting about this. Like this is how I'd expect Skynet to try and infiltrate a human base or something.
[–]Promptmuse 0 points1 point2 points 7 months ago (0 children)
Wow, thanks for sharing your process.
Everyday I’m seeing something new and ground breaking.
[–]purplewhiteblack 0 points1 point2 points 7 months ago (0 children)
5 years from now is going to be crazy
[–]wrnj 0 points1 point2 points 7 months ago (1 child)
One question. How usable is a DreamBooth model created only with training images that are all one kind of closeup portrait with the same background and clothing. I noticed that if I train a model only with face selfies the output generations I get is 1:1 the kind of frames that were in the training data, no variety whatsoever. Do you add some kind of full body images of the fictional person for the training in DB? Thanks.
the plan today is to use the 27 images to train a good model for the face, then I'll be using that to generate more photos of her. If I have difficulty getting certain shots then I can do it with the normal 1.5 model then infill the upper body with the model of her to get a new training image with the right composition.
[–]widgia 0 points1 point2 points 7 months ago (0 children)
Impressive!
this is why!
<image>
https://youtu.be/zWcYN58Y9EM
[–]GoldenHolden01 0 points1 point2 points 7 months ago (0 children)
Holy shittttty
[–][deleted] 0 points1 point2 points 7 months ago (1 child)
When you say you'll "train an algorithm", what's that process actually entail?
When you say you'll "train an algorithm" , what's that process actually entail?
I dont think I said that anywhere from what I can tell. I trained a model that used the StableDiffusion/DreamBooth algorithm. It retrains the weights for the denoising model and it's done by feeding it data of a specific person from various angles and various facial expressions so it can replicate the same person. What I did was found a way to use a single image to generate all the input images required to train the model.
https://www.reddit.com/r/AIActors/comments/yssc2r/genevieve_model_progress/
This means you can generate a consistent person in stable diffusion without using celebrity names and instead using a person you generated from scratch
REALLY nice! I've done similar stuff. What tools did you use to get this? This is super smooth
[–]gtoal 0 points1 point2 points 7 months ago (1 child)
You know the theory that everone has a double... - basically there are not enough faces to go around so that everyone can get a unique one ;-) ... I suspect that a person can be found to match any realistic generated face, so using these to avoid litigation might not be as effective as you hope!
They wouldn't be able to get anywhere with litigation though. No input was ever of them so the similarities wouldn't matter. It's already tough enough for established actors to take legal action for their likeness if it isn't explicitly them. Elliot page tried to go after The Last of Us for example. People have animated films or high-quality 3d renders of people that dont exist all the time and it's never been an issue even when some random person finds that it looks an uncanny amount like them..
[–]Mystvearn2 0 points1 point2 points 7 months ago (2 children)
Wow. This is great.
Is there a YouTube video on the step by step process? Also, Is it possible to run this thing locally? I have a 3060 which I think can be of use. Don't really matter about the processing time.
[–]Sixhaunt[S] 0 points1 point2 points 7 months ago (1 child)
Someone reached out and wants to do a video about it so I dont know if it's going to be a tutorial or a showcase or what, but I just have the google colab that I put together quickly. This was my first try at this so it's still early on. It was done fairly lazily and it's not efficient but you can find the link in the comments to the colab to reference. I just mashed together the demos for the different things I wanted to use but I'm redoing the entire thing right now and I'll have a better colab out in the future. You should be able to follow the local installation steps on your computer for each part to do it locally though.
[–]Mystvearn2 0 points1 point2 points 7 months ago (0 children)
Thanks. I have no coding background. I managed to install stable diffusion locally and managed to to install the model based on the YouTube tutorial. Asking me to do it again without consulting the video, then I am lost 😂
[–]LordTuranian 0 points1 point2 points 7 months ago (8 children)
How did you do this? This is amazing. I want to make something like this too.
[–]Sixhaunt[S] 1 point2 points3 points 7 months ago (7 children)
I explained it in a comment and linked to my google colab for it but basically:
[–]LordTuranian 0 points1 point2 points 7 months ago (6 children)
Oh I accidentally skipped over that comment because I can't understand a lot of the language because I'm new to this kind of stuff. But thanks anyway. :)
with the google colab I made you can just run the first section which sets up the files and stuff, swap out the default video and image with your own(you can see where they are located and what they are named in the "settings" section) Then you just click on all the run/play buttons for each section in order til the end. It will take some time to process but then it will just produce an mp4 file for you to download
[–]LordTuranian 0 points1 point2 points 7 months ago (3 children)
How do I use the google colab on my PC? Do I just use it straight from the browser or do I have to use another program?
the nice thing about google colab is that it runs on google's servers rather than your computer. It basically spins up a Virtual Machine to run the code and you control it through your browser and can download files from it after. When you are on the page you can basically just click the play button next to a chunk of code and it will run that code. You do it in order along with following any instructions and you'll get your results
[–]LordTuranian 1 point2 points3 points 7 months ago (1 child)
Awesome. Thanks again.
no problem! I'm working on a new version of the colab along with someone else. I'm excited to show it off once it's working
A youtube channel called PromptMuse reached out to me the other day and is planning to cover this in a video soon, so it might be more digestible in that format.
I hadn't heard of the channel before she reached out, but it's actually really cool and covers a range of topics in the AI space, especially with SD.
[–]midihex 0 points1 point2 points 7 months ago (0 children)
A great use of TPSMM! I'm familiar with it so here's some thinkings for you, the default output video quality of TPS is a bit meh, it's vbr quality=5, so this is what I settled on..
imageio.mimsave(output_video_path, [img_as_ubyte(frame) for frame in predictions], codec='libx264rgb', pixelformat='rgb24', output_params=['-crf', '0', '-s', '256x256', '-preset', 'veryslow'], fps=fps)
Which is x264 lossless
Also not sure that a pre-upscale before GFPGan is needed for this usage, GFPgan upscales anywhere up to 8x and then applies the face restore, it can also use realesrgan for the bits that GFPgan doesn't touch.
Saw someone mention codeformer - it's great for static but falls apart with video, can't keep coherency like GFPgan
Illustrious_Row_9971 on Reddit wrote a gradio colab version of TPS that you drag and drop on to, haven't got the link atm but it'll show with a search I think.
Final output I always had to lossless (HUFFYUV or FFV1) retains so much more detail than mp4
[–]Automatic-Respect-23 0 points1 point2 points 7 months ago* (0 children)
Great job!
can you please share the driving video?
edit:
sometimes my photo doesn't fit to the driving video, and the results are very poor to take for training. do you have any suggestions?
Thanks a lot!
π Rendered by PID 98000 on reddit-service-r2-comment-db78cb8ff-cg6jm at 2023-06-20 11:14:15.095011+00:00 running 353fdbf country code: US.
[–]Sixhaunt[S] 216 points217 points218 points (29 children)
[–]joachim_s 40 points41 points42 points (5 children)
[–]Sixhaunt[S] 46 points47 points48 points (3 children)
[–]joachim_s 1 point2 points3 points (2 children)
[–]Sixhaunt[S] 4 points5 points6 points (1 child)
[–]joachim_s 1 point2 points3 points (0 children)
[–]LynnSpyre 0 points1 point2 points (0 children)
[–]eugene20 43 points44 points45 points (0 children)
[–]GamingHubz 10 points11 points12 points (3 children)
[–]samcwl 1 point2 points3 points (1 child)
[–]GamingHubz 0 points1 point2 points (0 children)
[–]MacabreGinger 8 points9 points10 points (1 child)
[–]Sixhaunt[S] 5 points6 points7 points (0 children)
[–]samcwl 2 points3 points4 points (1 child)
[–]Sixhaunt[S] 2 points3 points4 points (0 children)
[–]cacoecacoe 5 points6 points7 points (4 children)
[–]Sixhaunt[S] 20 points21 points22 points (2 children)
[–]cacoecacoe 5 points6 points7 points (1 child)
[–]Sixhaunt[S] 3 points4 points5 points (0 children)
[–]TheMemo 8 points9 points10 points (0 children)
[–]eugene20 1 point2 points3 points (1 child)
[–]Sixhaunt[S] 0 points1 point2 points (0 children)
[+][deleted] (2 children)
[–]LynnSpyre 0 points1 point2 points (3 children)
[–]Sixhaunt[S] 2 points3 points4 points (2 children)
[–]LynnSpyre 0 points1 point2 points (1 child)
[–]Sixhaunt[S] 1 point2 points3 points (0 children)
[–]pierrenay 152 points153 points154 points (15 children)
[–]Sixhaunt[S] 36 points37 points38 points (10 children)
[–]mreo 13 points14 points15 points (0 children)
[–]Fake_William_Shatner 12 points13 points14 points (3 children)
[–]malcolmrey 2 points3 points4 points (1 child)
[–]Fake_William_Shatner 1 point2 points3 points (0 children)
[–]velvetwool 0 points1 point2 points (0 children)
[–]Mackle43221 1 point2 points3 points (0 children)
[–]mreo -2 points-1 points0 points (0 children)
[–]pepe256 0 points1 point2 points (0 children)
[–]cyan2k 35 points36 points37 points (0 children)
[–]o-o- 0 points1 point2 points (1 child)
[–]LordTuranian 0 points1 point2 points (0 children)
[–]Orc_ 0 points1 point2 points (0 children)
[–]sheagryphon83 49 points50 points51 points (8 children)
[–]Sixhaunt[S] 24 points25 points26 points (7 children)
[–]Etonet 5 points6 points7 points (2 children)
[–]Sixhaunt[S] 8 points9 points10 points (1 child)
[–]Etonet 0 points1 point2 points (0 children)
[–]Pretend-Marsupial258 1 point2 points3 points (1 child)
[–]Sixhaunt[S] 0 points1 point2 points (0 children)
[–]LetterRip 0 points1 point2 points (1 child)
[–]Sixhaunt[S] 0 points1 point2 points (0 children)
[–]Speedwolf89 38 points39 points40 points (3 children)
[–]pepe256 31 points32 points33 points (1 child)
[–]Speedwolf89 1 point2 points3 points (0 children)
[–]dreamer_2142 12 points13 points14 points (0 children)
[–]Pretty-Spot-6346 16 points17 points18 points (1 child)
[–]Sixhaunt[S] 18 points19 points20 points (0 children)
[–]Ooze3d 12 points13 points14 points (3 children)
[–]cool-beans-yeah 1 point2 points3 points (2 children)
[–]Ooze3d 1 point2 points3 points (1 child)
[–]cool-beans-yeah 0 points1 point2 points (0 children)
[–]reddit22sd 12 points13 points14 points (0 children)
[–]superluminary 12 points13 points14 points (1 child)
[–]Sixhaunt[S] 9 points10 points11 points (0 children)
[–]Tax21996 8 points9 points10 points (0 children)
[–]Kaennh 8 points9 points10 points (6 children)
[–]Sixhaunt[S] 1 point2 points3 points (4 children)
[–]rangoonmeathelmet 0 points1 point2 points (3 children)
[–]Sixhaunt[S] 1 point2 points3 points (2 children)
[–]rangoonmeathelmet 0 points1 point2 points (1 child)
[–]Sixhaunt[S] 0 points1 point2 points (0 children)
[–]Logseman 4 points5 points6 points (0 children)
[–]Seventh_Deadly_Bless 6 points7 points8 points (9 children)
[–]LetterRip 1 point2 points3 points (7 children)
[–]Seventh_Deadly_Bless 0 points1 point2 points (6 children)
[–]LetterRip 3 points4 points5 points (5 children)
[+]Seventh_Deadly_Bless comment score below threshold-5 points-4 points-3 points (4 children)
[–]ko0x 3 points4 points5 points (2 children)
[–]Sixhaunt[S] 3 points4 points5 points (1 child)
[–]ko0x 0 points1 point2 points (0 children)
[–]allumfunkelnd 4 points5 points6 points (1 child)
[–]ninjasaid13 0 points1 point2 points (0 children)
[–]pbinder 3 points4 points5 points (6 children)
[–]Sixhaunt[S] 5 points6 points7 points (5 children)
[–]Vivarevo -1 points0 points1 point (1 child)
[–]Sixhaunt[S] 1 point2 points3 points (0 children)
[–]jonesaid 0 points1 point2 points (2 children)
[–]Sixhaunt[S] 1 point2 points3 points (0 children)
[–]NerdyRodent 1 point2 points3 points (0 children)
[–]Maycrofy 1 point2 points3 points (0 children)
[–]kim_en 1 point2 points3 points (0 children)
[–]Dart_CZ 1 point2 points3 points (0 children)
[–]Unlimitles 1 point2 points3 points (0 children)
[–]ptitrainvaloin 1 point2 points3 points (0 children)
[–]lagosta-alucinada 1 point2 points3 points (2 children)
[–]SaveVideo 1 point2 points3 points (1 child)
[–]Sixhaunt[S] 0 points1 point2 points (0 children)
[–]InMyFavor 1 point2 points3 points (7 children)
[–]Sixhaunt[S] 1 point2 points3 points (6 children)
[–]InMyFavor 1 point2 points3 points (1 child)
[–]Sixhaunt[S] 2 points3 points4 points (0 children)
[–]InMyFavor 0 points1 point2 points (0 children)
[–]InMyFavor 0 points1 point2 points (2 children)
[–]Sixhaunt[S] 1 point2 points3 points (1 child)
[–]InMyFavor 0 points1 point2 points (0 children)
[–]Throwaway-sum 1 point2 points3 points (0 children)
[–]unrealf8 2 points3 points4 points (1 child)
[–]Sixhaunt[S] 1 point2 points3 points (0 children)
[–]Magikarpeles 3 points4 points5 points (1 child)
[–]Sixhaunt[S] 2 points3 points4 points (0 children)
[–]HulkHunter 2 points3 points4 points (0 children)
[–]martsuia 2 points3 points4 points (0 children)
[–]1Neokortex1[🍰] 1 point2 points3 points (0 children)
[–]moahmo88 1 point2 points3 points (0 children)
[–]MonoFauz 1 point2 points3 points (0 children)
[–]TraditionLazy7213 0 points1 point2 points (0 children)
[–]JCNightcore 0 points1 point2 points (0 children)
[–]nano_peen 0 points1 point2 points (0 children)
[–]LeBaux 0 points1 point2 points (0 children)
[–]TrevorxTravesty -2 points-1 points0 points (1 child)
[–]ObiWanCanShowMe 8 points9 points10 points (0 children)
[–][deleted] (1 child)
[removed]
[–]StableDiffusion-ModTeam[M] 1 point2 points3 points locked comment (0 children)
[–]jonesaid 0 points1 point2 points (0 children)
[–]omnidistancer 0 points1 point2 points (1 child)
[–]Sixhaunt[S] 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]Zyj 0 points1 point2 points (1 child)
[–]Sixhaunt[S] 0 points1 point2 points (0 children)
[–]The_Irish_Rover26 0 points1 point2 points (0 children)
[–]Silly-Slacker-Person 0 points1 point2 points (1 child)
[–]Sixhaunt[S] 1 point2 points3 points (0 children)
[–]vs3a 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]AlbertoUEDev 0 points1 point2 points (0 children)
[–]BinyaminDelta 0 points1 point2 points (0 children)
[–]LordTuranian 0 points1 point2 points (0 children)
[–]yehiaserag 0 points1 point2 points (0 children)
[–]InfiniteComboReviews 0 points1 point2 points (0 children)
[–]Promptmuse 0 points1 point2 points (0 children)
[–]purplewhiteblack 0 points1 point2 points (0 children)
[–]wrnj 0 points1 point2 points (1 child)
[–]Sixhaunt[S] 1 point2 points3 points (0 children)
[–]widgia 0 points1 point2 points (0 children)
[–]AlbertoUEDev 0 points1 point2 points (0 children)
[–]GoldenHolden01 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]Sixhaunt[S] 0 points1 point2 points (0 children)
[–]LynnSpyre 0 points1 point2 points (0 children)
[–]gtoal 0 points1 point2 points (1 child)
[–]Sixhaunt[S] 1 point2 points3 points (0 children)
[–]Mystvearn2 0 points1 point2 points (2 children)
[–]Sixhaunt[S] 0 points1 point2 points (1 child)
[–]Mystvearn2 0 points1 point2 points (0 children)
[–]LordTuranian 0 points1 point2 points (8 children)
[–]Sixhaunt[S] 1 point2 points3 points (7 children)
[–]LordTuranian 0 points1 point2 points (6 children)
[–]Sixhaunt[S] 1 point2 points3 points (4 children)
[–]LordTuranian 0 points1 point2 points (3 children)
[–]Sixhaunt[S] 1 point2 points3 points (2 children)
[–]LordTuranian 1 point2 points3 points (1 child)
[–]Sixhaunt[S] 1 point2 points3 points (0 children)
[–]Sixhaunt[S] 1 point2 points3 points (0 children)
[–]midihex 0 points1 point2 points (0 children)
[–]Automatic-Respect-23 0 points1 point2 points (0 children)