Dismiss this pinned window
all 169 comments

[–]joachim_s 40 points41 points  (5 children)

Questions:

  1. How long did this clip take to make?
  2. How many frames/sec?

[–]Sixhaunt[S] 46 points47 points  (3 children)

  1. I'm not entirely sure but a longer clip that I'm doing it with right now took 26 mins to process and it's a 16s clip. The one I posted here is only 4s so it took a lot less time. This is just using the default google colab machine
  2. I dont know what the original was. The idea was to get frames at different angles to train on dreambooth so when it came to reconstructing it as a video again at the end for fun, I just set it to 20fps for the final output video. It might be slightly faster or slower than the original but for my purposes it didn't matter

[–]joachim_s 1 point2 points  (2 children)

  1. I’m asking about both time preparing for it AND processing time.

[–]Sixhaunt[S] 4 points5 points  (1 child)

Depends. Do you consider the google colab creation time? because I can and do reuse it. Aside from that it's just a matter of creating a face (I used one I made a while back) and a driving video which someone else gave me. So in the end it's mostly just the time it takes to run the colab whenever I use it now.

[–]LynnSpyre 0 points1 point  (0 children)

I did some fun experiments with this one. What I figured out is that it works really well if you keep your head straight. My computer got weird on longer clips, but at 90 seconds and 25-30 fps, it was fine. Another issue is the size limitation which puts you 256 pixels wide, unless you retrain the model, which is a chore. If the op's doing it at 512, though, there's gotta be a way to do it. Either way, you can always upscale. I also found that DPM works better for rendering avatars for Thin Spline Motion Model or First Order Model. First Order Model does the same thing, but it does not work as well. But what it does have that Thin Spline doesn't is a nice utility for isolating the head at the right size from your driver video source.

[–]eugene20 43 points44 points  (0 children)

Really impressive consistency.

[–]GamingHubz 10 points11 points  (3 children)

I use https://github.com/harlanhong/CVPR2022-DaGAN it's supposedly faster than TPSMM.

[–]samcwl 1 point2 points  (1 child)

Did you manage to get this running on a colab?

[–]GamingHubz 0 points1 point  (0 children)

I did it locally

[–]MacabreGinger 8 points9 points  (1 child)

Thanks for sharing the process u/Sixhaunt .
Unfortunately, I didn't understand a single thing because I'm a noob SD user and a total schmuck.

[–]Sixhaunt[S] 5 points6 points  (0 children)

to be fair no SD was used at all in the making of this video. I used MidJourney for the original image of the woman but the SD community is more technical and would make more use of this so I posted it here, especially since the original image could have just as easily been made in SD. The purpose is also to use the results in SD for a new custom character model but technically no SD was used in this video.

With the google colab though you can just run the "setup" block, then change the source.png to your own image and the driving.mp4 to your own custom video then just hit run on all the rest of the blocks and it will just work and give you a video like the one above. It will also create a zip file of still-frames for you to use for training.

Just be sure you're replacing the png and mp4 files with the same names and locations, or you change the settings to point to your new files

[–]samcwl 2 points3 points  (1 child)

What is considered a good "driving video"?

[–]Sixhaunt[S] 2 points3 points  (0 children)

The most important thing from what i've tested is that you dont want your head to move too much from center. There should always be space between your head and the edges of the screen.

For head tilting keep in mind it varies for the following:

  • Roll - It seems to handle this really well
  • Pitch - It's very finicky here to try not to tilt your head up or down too much but there is some leeway, probably around 30 degrees or so in each direction
  • Yaw - a max of maybe 45 degrees in terms of motion but it morphs the face a little so restricting the tilt in this direction helps keep consistency

There are also 3 or 4 different models in The-Plate that are used for different framing of the person so this applies only to the default (vox). The "ted" model for example is a full-body one with moving arms and stuff like you might expect from someone giving a ted talk.

[–]cacoecacoe 5 points6 points  (4 children)

Why not use CodeFormer instead of GFPGan? I fidn the results consistently better for anything photographic at least

[–]Sixhaunt[S] 20 points21 points  (2 children)

At first i tried both using A1111's batch processing rather than on colab itself but I found that GFPGan produced far better and more photo-realistic results. Codeformer seems to change the facial structure less but it also gives a less polished result and for what I'm using it for, I dont care so much if the face changes as long as it's consistent, which it is. That way i can get the angles and shots I need to train on. Ideally codeformer would be implemented as a different option but I'm sure someone else will whip up an improved version of this within an hour or two of working on it. It didnt take me long to set this up as it is. I started on it less than a day ago.

[–]cacoecacoe 5 points6 points  (1 child)

Strange because my experience of GPPGan and codeformer have been the precise inverse of what you've described, however, different strokes I guess

I guess the fact that GFPGan does change the face more (a common complaint is that it changes faces too much and everyone ends up looking the same) is probably an advantage for animation.

[–]Sixhaunt[S] 3 points4 points  (0 children)

I guess the fact that GFPGan does change the face more (a common complaint is that it changes faces too much and everyone ends up looking the same) is probably an advantage for animation.

it probably was, although it didn't actually change the face shape much. Unfortunately it put a lot of makeup on her though. The original face had worse skin but it looked more natural and I liked it. I might try a version with CodeFormer or blend them together or something but if you want to see the way it changed the face and what the input actually was then here you go:

https://imgur.com/a/HRIVuGE

keep in mind they arent all of the same video frame or anything, I just chose an image from each set where they had roughly the same expression as the original photo

[–]TheMemo 8 points9 points  (0 children)

I find CodeFormer tends to 'invent' a face rather than fixing it.

[–]eugene20 1 point2 points  (1 child)

I'm new to colab, I've been running everything locally anyway, I just wanted to have a look at the fixed.zip and frames.zip but I couldn't figure out how to download them?

[–]Sixhaunt[S] 0 points1 point  (0 children)

those output files are produced after you run it on your custom image and video. They dont host the file-results that I got on there, but elsewhere on this thread I've linked to hand-selected frames I intend to use and I've linked to some comparisons of images from those various zips, but I logged on to to find so many comments that I'm just trying to answer them all right now.

I think it shows the in-progress videos within the colab page itself, just not the files for them. You should be able to see the driving video and input image I used on there as well as how it looked before upsizing and fixing the faces

[–]LynnSpyre 0 points1 point  (3 children)

Okay, I've used this model before. Only issue with it is my graphics card. It gets weird on clips longer than 90 seconds. Either crashes or freezes

[–]Sixhaunt[S] 2 points3 points  (2 children)

I ran it on google colab so I didnt have to run it or install any of it locally. I'm working on a new version of the colab right now though.

For my purposes I just need images of the face from different angles and with various expressions so I'll be using a few 2-3 second clips and I wont have the long-video issues. Although you could always crop a video and process in segments.

[–]LynnSpyre 0 points1 point  (1 child)

Question: do you remember which pre-trained model you were using?

[–]Sixhaunt[S] 1 point2 points  (0 children)

I use the vox one

[–]pierrenay 152 points153 points  (15 children)

getting closer to the holy grail dude

[–]Sixhaunt[S] 36 points37 points  (10 children)

I ran it with two videos and extracted 9 frames so far that I really like and that are varied from eachother. I have 2 more videos to do it with then I'll hopefully have enough for dreambooth and create a model for a custom person. Any suggestions on what to name her? I'll have to give some sort of keyword name to her afterall.

[–]mreo 13 points14 points  (0 children)

Ema Nymton 'Not My Name' backwards From the 90s detective game 'Under a Killing Moon'.

[–]Fake_William_Shatner 12 points13 points  (3 children)

Name her Val Vette.

[–]malcolmrey 2 points3 points  (1 child)

i like that

[–]Fake_William_Shatner 1 point2 points  (0 children)

I was thinking of scarlet. Velvet cake. Valves. And I figure that this name could be mistaken and twisted a few different ways.

Plus, I think she's got a bit of a country accent the way the corners of her mouth press. It sounds like butter rollin' off a new stack of pancakes.

[–]velvetwool 0 points1 point  (0 children)

Mmmm nice name

[–]mreo -2 points-1 points  (0 children)

accidental duplicate comment...

[–]pepe256 0 points1 point  (0 children)

Gene-vieve

[–]cyan2k 35 points36 points  (0 children)

Man I can't even imagine how the SD/Ai Art landscape looks like in 1 year, 3 years, 5 years. Amazing.

Probably banned by every country or something, haha.

[–]o-o- 0 points1 point  (1 child)

Yep, what we've all been dreaming of since 1987.

[–]LordTuranian 0 points1 point  (0 children)

Good movie.

[–]Orc_ 0 points1 point  (0 children)

its all coming together

[–]sheagryphon83 49 points50 points  (8 children)

Absolutely amazing, it is so smooth and lifelike. I’ve watched the vid several times now trying to find fault in the skin muscles and crows feet. And I can’t find any. Her crows feet appear and disappear as they should as she talks pulling and pushing her skin around… Simply amazing.

[–]Sixhaunt[S] 24 points25 points  (7 children)

That comes down to having a good driving video I think. With other ones you need to be far more picky with frames. The biggest help someone could do for the community would be to record themselves making the faces and head movements that work well with this that way it's easy to generate models with it. It would take some experimenting to get a good driving video though.

[–]Etonet 5 points6 points  (2 children)

What is a driving video?

[–]Sixhaunt[S] 8 points9 points  (1 child)

the video that has the expressions and emotions that the picture is then animating from. Originally it was a tiktoker making the facial expressions (a brunette woman with a completely different face than the video above). The Thin-Plate Ai then mapped the motion from the video onto the image of the person that I created with AI. The result was 256x256 though so I had to upsize and fix the faces after.

[–]Etonet 0 points1 point  (0 children)

I see, thanks! Very cool

[–]Pretend-Marsupial258 1 point2 points  (1 child)

There are video references on the internet for animators. Here's one I found, for example. It requires a login/account, but I bet there are other websites that don't require anything.

Edit: Stock sites like Shutterstock also have videos, but I don't know if the watermark will screw stuff up.

[–]Sixhaunt[S] 0 points1 point  (0 children)

that's a really good idea! Worth registering for if those are free. I'll check it out more today

[–]LetterRip 0 points1 point  (1 child)

Interesting facial expressions video here,

https://www.youtube.com/watch?v=X1osDan-RZQ

[–]Sixhaunt[S] 0 points1 point  (0 children)

oh, thankyou! I was planning to put together a bunch of 2-3s clips for different facial expressions then have it run on each clip. I just need to set up the repo for it then find a bunch of clips but that video seems like it would have a lot of gems. The driving video for the post above was using a similar thing. I was recommended some tiktoker who was changing expressions and stuff but there was a good closeup shot that did consistently well so I pulled from it.

[–]Speedwolf89 38 points39 points  (3 children)

Now THIS is what I've been sticking around in this horny teen infested subreddit for.

[–]pepe256 31 points32 points  (1 child)

You don't think this was also motivated in some way by horniness? We adults are just more subtle about it

[–]Speedwolf89 1 point2 points  (0 children)

Hahh indeed.

[–]dreamer_2142 12 points13 points  (0 children)

Honestly? this is not that bad at all. almost all the upvoted posts are great. few memes too.

[–]Pretty-Spot-6346 16 points17 points  (1 child)

i know some awesome guys gonna make it easy for us, thank you

[–]Sixhaunt[S] 18 points19 points  (0 children)

I edited my reply to add my google colab for it so you can do it right now with just a square image and a square video clip. Hopefully someone decides to cannibalize my code and make a better more efficient version before I get the chance to though but this is exactly what i used for the video above

[–]Ooze3d 12 points13 points  (3 children)

Amazing results. We’re getting very close to consistent animation and from that point on, the sky is the limit. We’re just a few years apart from actual ai movies.

[–]cool-beans-yeah 1 point2 points  (2 children)

How long you think? 5 years?

[–]Ooze3d 1 point2 points  (1 child)

The way this is going, probably much sooner than I’d consider possible. Conservatively, I’d say end of 2023 for the first few examples of actual short films with a plot (as in “not simply beautiful images edited together”). Probably still glitchy and always assisted by real footage for the movements. After that, another year to get to a point where it’s virtually indistinguishable from something shot on camera, and maybe another year where we can input what we want the subject to do and the use of actual footage is no longer needed.

But as I said, given the fact that this is all a worldwide collaborative project that’s going way faster than any other technological breakthrough I’ve witnessed or known of, I wouldn’t be surprised to see all that by the end of next year.

[–]cool-beans-yeah 0 points1 point  (0 children)

That would be wild!

[–]reddit22sd 12 points13 points  (0 children)

These are the posts I come to reddit for, excellent thinking!

[–]superluminary 12 points13 points  (1 child)

This is extremely impressive

[–]Sixhaunt[S] 9 points10 points  (0 children)

thanks! I just put an update out on how the still frames look that I'll be using for training: https://www.reddit.com/r/StableDiffusion/comments/ys5xhb/training_a_model_of_a_fictional_person_any_name/

If this all turns out well I intend to make a whole bunch of models for various fictional people and maybe take some commissions to turn people's creations into an SD model for them to use if they dont want to use my public code themselves

[–]Tax21996 8 points9 points  (0 children)

damn this one is so smooth

[–]Kaennh 8 points9 points  (6 children)

Really cool!

Since I started tampering with SD I've been obsessed with the potential it has to generate new animation workflows. I made a quick video (you can check it out here) using FILM + SD but I also wanted to try TSPMM in the same way you have to improve consistency... I'm pretty sure I will now that you have shared a notebook, so thanks for that!

A few of questions:

- Does the driving video needs to have some specific dimensions (other than 1:1 proportion)?- Have you considered Ebsynth as an alternative to achieve a more painterly look (I'm thinking about something similar to Arcane style... perhaps)? Would it be possible to add it to the notebook? (not asking you to, just asking if it's possible?)

[–]Sixhaunt[S] 1 point2 points  (4 children)

- Does the driving video needs to have some specific dimensions (other than 1:1 proportion)?

no. I've used driving videos that are 410x410, 512x512, 380x380 and they all worked fine, but that's probably because they are downsized to 256x256 first.

The animation AI I used does 256x256 videos so I had to upsize the results and use GFPGan to unblur the faces after. So I dont think you get any advantage with an input video larger than 256x256 but it wont prevent it from working or anything

Have you considered Ebsynth as an alternative to do achieve a more painterly look (I'm thinking about something similar to Arcane style... perhaps)? Would it be possible to add it to the notebook?

I've had a local version of Ebsynth installed for a while now and I've gotten great results with it in the past, I just wasn't able to find a way to use it through google colab and ultimately I want to be able to feed in a whole ton of images and videos then have it automatically produce a bunch of new AI "actors" for me but it's too much effort without fully automating it.

If you're doing it manually then using Ebsynth would probably be great and might even work better in terms of not straying from the original face since you dont need to upsize it after and fix the faces (GFPGan puts makeup on the person too much)

[–]rangoonmeathelmet 0 points1 point  (3 children)

Is it possible to change the output aspect ratio to 16:9 or are you locked into 256x256?

[–]Sixhaunt[S] 1 point2 points  (2 children)

I think it's locked. The full-body one which is called "ted" is like 340x340 or something but it doesnt work for close up faces.

You might be able to crop a video to a square containing the face, use this method to turn it into the other person, then stitch it back into the original video

[–]rangoonmeathelmet 0 points1 point  (1 child)

Got it. Thank you!

[–]Sixhaunt[S] 0 points1 point  (0 children)

I should mention that the demo they use doesnt have a perfectly square input video so I think it crops it but still accepts it.

[–]Logseman 4 points5 points  (0 children)

This is both awe-inspiring and very scary.

[–]Seventh_Deadly_Bless 6 points7 points  (9 children)

95-97% humanlike.

Face muscles change of volume from a frame to the few next. My biggest grief.

Body language hints anxiety/fear. But she also smiles. It's not too paradoxical of a message, but it does bother me.

For the pluses :

Bone structure kept all the way through, pretty proportions of her features. Aligned teeth.

Stable Diffusion is good with surface rendering, which give her a realistic, healthy skin. The saturated, vibrant, painterlier/impressionistic style makes the good pop out and hides the less good.

It's scarily good.

Question : What's the animation workflow ?

I know of an AI animation tool (Antidote ? Not sure of the name.), but it's nowhere near that capable. Especially paired with Stable Diffusion

I imagine you had to animate it manually, at least in part, almost celluloid-era style.

Which would be even more of an achievement.

[–]LetterRip 1 point2 points  (7 children)

Pretty sure it is just optical flow automatic matching (thin plate spline), they aren't doing any animation.

https://arxiv.org/abs/2203.14367

https://studentsxstudents.com/the-future-of-image-animation-thin-plate-spline-motion-90e6cf807ea0?gi=643589a1b820

And this is the model used

https://cloud.tsinghua.edu.cn/f/da8d61d012014b12a9e4/?dl=1

[–]Seventh_Deadly_Bless 0 points1 point  (6 children)

Scratching my head.

This is obviously emergent tech, but I'm wondering if it is implemented through the same pytorch stack than Stable Diffusion.

I need to check the tech behind the Antidote thing I've mentionned. Maybe it's an earlier implementation of the same tech.

What you describe is a deepfake workflow. I bet it's one of the earliest ones used to make pictures of famous people sing.

I feel like there's something I'm missing, though. I'll try to take a look tomorrow: it's getting late for me right now.

[–]LetterRip 3 points4 points  (5 children)

This is obviously emergent tech, but I'm wondering if it is implemented through the same pytorch stack than Stable Diffusion.

Yes it uses pytorch (hence the 'pt' extension to the file). I think you might not understand these words?

Pytorch is a neural network frame work. Diffusion is a generative neural network.

What you describe is a deepfake workflow.

Nope,

Deepfakes rely on a type of neural network called an autoencoder.[5][61] These consist of an encoder, which reduces an image to a lower dimensional latent space, and a decoder, which reconstructs the image from the latent representation.[62] Deepfakes utilize this architecture by having a universal encoder which encodes a person in to the latent space.[63] The latent representation contains key features about their facial features and body posture. This can then be decoded with a model trained specifically for the target.[5] This means the target's detailed information will be superimposed on the underlying facial and body features of the original video, represented in the latent space.[5]

A popular upgrade to this architecture attaches a generative adversarial network to the decoder.[63] A GAN trains a generator, in this case the decoder, and a discriminator in an adversarial relationship.[63] The generator creates new images from the latent representation of the source material, while the discriminator attempts to determine whether or not the image is generated.[63] This causes the generator to create images that mimic reality extremely well as any defects would be caught by the discriminator.[64] Both algorithms improve constantly in a zero sum game.[63] This makes deepfakes difficult to combat as they are constantly evolving; any time a defect is determined, it can be corrected.[64]

https://en.wikipedia.org/wiki/Deepfake

Optical flow is a older technology, used for match moving (having special effects be in the proper 3d location of a video).

https://en.wikipedia.org/wiki/Optical_flow

[–]ko0x 3 points4 points  (2 children)

Nice, I tried something like this for a music video for a song of mine roughly 2 years ago, but stopped because colabs is such a horrible unfun workflow. Looks like I can give it another go soon.

[–]Sixhaunt[S] 3 points4 points  (1 child)

they have a spaces page on huggingface if you dont want to run through google colab for Thin-Plate, I just set one up that does it all start to finish including upsizing the result and running the facial fixing and packaging frames so you can hand-pick them for training data.

The main purpose is to generate sets of images like these for training: https://www.reddit.com/r/StableDiffusion/comments/ys5xhb/training_a_model_of_a_fictional_person_any_name/

[–]ko0x 0 points1 point  (0 children)

OK thanks, I look into that. I hoped we are getting close to running this locally and easy to use like SD.

[–]allumfunkelnd 4 points5 points  (1 child)

This is how our quantum computer AIs will communicate with us in real time in the metaverse of the future. :-D Awesome! Thanks for sharing this and your workflow! The face of this Robo-Girl is stunning.

[–]ninjasaid13 0 points1 point  (0 children)

I think we would be more likely to use Analog computers for AI in the future because they're are much faster though with the cost of being less accurate but that doesn't matter much in AI.

[–]pbinder 3 points4 points  (6 children)

I run SD on my desktop; is it possible to do all this locally and not through google colab?

[–]Sixhaunt[S] 5 points6 points  (5 children)

yeah, I dont see why not.

  1. get Thin-Plate-Spline-Motion-Model setup locally and run the motion translation (hugging face lets you do this part through their web-ui even)
  2. use ffmpeg to cut the video into frames
  3. upsize and fix the faces of the frames. You can do that directly with StableDiffusion and the Automatic1111 library using the bulk img2img section.
  4. use ffmpeg to combine the fixed and upsized images into a video

[–]Vivarevo -1 points0 points  (1 child)

wonder if its possible to run low quality video for live feed

[–]Sixhaunt[S] 1 point2 points  (0 children)

I think the processing takes longer than running the video so it probably wouldn't work for that unfortunately, although upscaling to some extent on the client-side isn't unheard of already

[–]jonesaid 0 points1 point  (2 children)

Is there a tutorial out there to set up the TPSMM locally?

[–]Sixhaunt[S] 1 point2 points  (0 children)

I think their github shows all the various ways you can use it and gives a quick tutorial

[–]NerdyRodent 1 point2 points  (0 children)

Sure is! How to Animate faces from Stable Diffusion! https://youtu.be/Z7TLukqckR0

[–]Maycrofy 1 point2 points  (0 children)

I mean, it looks like how animation would move in real life. It's very captivating.

[–]kim_en 1 point2 points  (0 children)

tf, I thought this kind of animation will come after next year. Absolutely mind blowing.

[–]Dart_CZ 1 point2 points  (0 children)

What is she saying? I cannot recognize the first part. But the last part looks like: "me, please" What are your tips guys?

[–]Unlimitles 1 point2 points  (0 children)

one day.....someone is going to use these things to Lure men to their dooms.

it's going to work....

[–]ptitrainvaloin 1 point2 points  (0 children)

Great results 😁

Here's a tip I discovered that will surely help you along your journey for the purpose you stated, if you make a custom photo template for training with Textual Inversion, the more photorealism the results of your new template are, the faster (less steps) and less images required (less than what is regulary suggested in the field at the present time) to create your own model(s) and style(s) in even higher quality.

short example of a new photorealism_template.txt (in directory stable-diffusion-webui/textual_inversion_templates) you can create :

(photo highly detailed vivid) ([name]) [filewords]

(shot medium close-up high detail vivid) ([name]) filewords

(photogenic processing hyper detailed) ([name])

Etc... add some more lines to it.

The more variations you add the better as long you test your prompts before adding them to your template to be sure they produce good pretty constant photorealism results.

Goodluck and continue to have fun experimenting!

***Edit, input image(s) must be of high quality, otherwise garbage in -> garbage out

[–]InMyFavor 1 point2 points  (7 children)

This is genuinely fucking nuts

[–]Sixhaunt[S] 1 point2 points  (6 children)

[–]InMyFavor 1 point2 points  (1 child)

Yooooooo

[–]Sixhaunt[S] 2 points3 points  (0 children)

I almost have a completed model for her too which I'll release soon. Then anyone can use her for their projects since this woman doesn't actually exist and isn't a copyright issue like celebrity faces. I think people making visual novels will especially like it

[–]InMyFavor 0 points1 point  (0 children)

This is firmly on the other side of the uncanny valley.

[–]InMyFavor 0 points1 point  (2 children)

This is so crazy and borderline revolutionary and virtually no one mainstream is paying attention.

[–]Sixhaunt[S] 1 point2 points  (1 child)

it's crazy to think that this was my first try and it took less than a day to implement. I can only imagine what we will be able to do in a few months from now even.

[–]InMyFavor 0 points1 point  (0 children)

I'm barely struggling to keep up as it is now. In 6 months I have no clue.

[–]Throwaway-sum 1 point2 points  (0 children)

This is nuts!! This only came out weeks ago? It feels like we are experiencing history in the making.

[–]unrealf8 2 points3 points  (1 child)

Ahh, that’s the major question I had about sd. Can I generate a character that I can consistently continue to generate art with. Love it!

[–]Sixhaunt[S] 1 point2 points  (0 children)

check out some of the frames I pulled from this method which I'll be training with: https://www.reddit.com/r/StableDiffusion/comments/ys5xhb/training_a_model_of_a_fictional_person_any_name/

[–]Magikarpeles 3 points4 points  (1 child)

Hear me out

[–]Sixhaunt[S] 2 points3 points  (0 children)

im listening

[–]HulkHunter 2 points3 points  (0 children)

Synthetic Reality becoming real.

[–]martsuia 2 points3 points  (0 children)

Looking at this feels like I’m dreaming.

[–]1Neokortex1[🍰] 1 point2 points  (0 children)

🚀🔥

[–]moahmo88 1 point2 points  (0 children)

Good job!

[–]MonoFauz 1 point2 points  (0 children)

The progress with this tech is so fast. Great job!

[–]TraditionLazy7213 0 points1 point  (0 children)

Thanks for sharing, amazing stuff

[–]JCNightcore 0 points1 point  (0 children)

This is amazing

[–]nano_peen 0 points1 point  (0 children)

Incredible consistency

[–]LeBaux 0 points1 point  (0 children)

We are all thinking it.

[–]TrevorxTravesty -2 points-1 points  (1 child)

This is going to be incredible when we’ll be able to do this with dead actors and see them shine again 😯 I’d love to be able to see some of my favorite people such as Robin Williams or Bruce Lee do stuff again 😞 I would love to make loving tributes to them.

[–]ObiWanCanShowMe 8 points9 points  (0 children)

That is not what OP is doing here. OP is generating different images (frames) of a fictional person by animating a still image of a face so they can then make an SD model for this fictional person, thus being able to consistantly generate that fictional person without variations.

Think

picture of thepersonicreated with red hair in a warrior outfit

instead of

picture of a beautiful girl with red hair in a warrior outfit

The first one gets this same face, the second is random. It's SD created dreambooth.

That said, what you suggested is already possible with deepfake which is only going to get better.

[–][deleted]  (1 child)

[removed]

    [–]StableDiffusion-ModTeam[M] 1 point2 points locked comment (0 children)

    Your post/comment was removed because it contains hateful content.

    [–]jonesaid 0 points1 point  (0 children)

    I was wondering if something similar could be done using Euler a step variation to get different images of the same fictional person. I'm not sure if the face stays the same at different steps though...

    [–]omnidistancer 0 points1 point  (1 child)

    I'm implementing something along the same lines but with different models for the motion transfer and upscaling(could possibly go above 2k if everything works out ok). Very interesting to see your amazing results :)

    Do you mind share the driving video or at least some suggestion on how to get something similar? The expressions look amazing!

    [–]Sixhaunt[S] 1 point2 points  (0 children)

    It's just a short clip of a tiktoker making some facial expressions. I mentioned in the original comment the guy who gave me the clip. I ended up having to find it again myself for a higher-quality version.

    I uploaded the short clip I used from the video here though: https://filebin.net/r0ynwdeg2emc61e0

    [–][deleted] 0 points1 point  (0 children)

    Wow, this was well done.

    [–]Zyj 0 points1 point  (1 child)

    That slight smile...

    [–]Sixhaunt[S] 0 points1 point  (0 children)

    https://imgur.com/a/jfkksoh

    there's some stills if you're interested.

    [–]The_Irish_Rover26 0 points1 point  (0 children)

    Very cool.

    [–]Silly-Slacker-Person 0 points1 point  (1 child)

    I wonder if soon it will be possible to animate two characters talking at the same time

    [–]Sixhaunt[S] 1 point2 points  (0 children)

    I dont see why you cant make a face detector that then crops videos around the heads, runs that video through a similar process to what i did, then splices it back into the original video to have as many people talking as you want

    [–]vs3a 0 points1 point  (0 children)

    This remind me of Faestock from Deviantart day.

    [–][deleted] 0 points1 point  (0 children)

    Game changer!

    [–]AlbertoUEDev 0 points1 point  (0 children)

    Ohh I was looking something like this 🤩

    [–]BinyaminDelta 0 points1 point  (0 children)

    This is the future.

    [–]LordTuranian 0 points1 point  (0 children)

    Hopefully these are the kind of graphics we will see in the next Skyrim and Fallout game.

    [–]yehiaserag 0 points1 point  (0 children)

    Respect man, I wish all all the best Even more respect because you are sharing with the community

    [–]InfiniteComboReviews 0 points1 point  (0 children)

    This is awesome, but there is something very....off putting about this. Like this is how I'd expect Skynet to try and infiltrate a human base or something.

    [–]Promptmuse 0 points1 point  (0 children)

    Wow, thanks for sharing your process.

    Everyday I’m seeing something new and ground breaking.

    [–]purplewhiteblack 0 points1 point  (0 children)

    5 years from now is going to be crazy

    [–]wrnj 0 points1 point  (1 child)

    One question. How usable is a DreamBooth model created only with training images that are all one kind of closeup portrait with the same background and clothing. I noticed that if I train a model only with face selfies the output generations I get is 1:1 the kind of frames that were in the training data, no variety whatsoever.
    Do you add some kind of full body images of the fictional person for the training in DB? Thanks.

    [–]Sixhaunt[S] 1 point2 points  (0 children)

    the plan today is to use the 27 images to train a good model for the face, then I'll be using that to generate more photos of her. If I have difficulty getting certain shots then I can do it with the normal 1.5 model then infill the upper body with the model of her to get a new training image with the right composition.

    [–]widgia 0 points1 point  (0 children)

    Impressive!

    [–]GoldenHolden01 0 points1 point  (0 children)

    Holy shittttty

    [–][deleted] 0 points1 point  (1 child)

    When you say you'll "train an algorithm", what's that process actually entail?

    [–]Sixhaunt[S] 0 points1 point  (0 children)

    When you say you'll "train an algorithm" , what's that process actually entail?

    I dont think I said that anywhere from what I can tell. I trained a model that used the StableDiffusion/DreamBooth algorithm. It retrains the weights for the denoising model and it's done by feeding it data of a specific person from various angles and various facial expressions so it can replicate the same person. What I did was found a way to use a single image to generate all the input images required to train the model.

    https://www.reddit.com/r/AIActors/comments/yssc2r/genevieve_model_progress/

    This means you can generate a consistent person in stable diffusion without using celebrity names and instead using a person you generated from scratch

    [–]LynnSpyre 0 points1 point  (0 children)

    REALLY nice! I've done similar stuff. What tools did you use to get this? This is super smooth

    [–]gtoal 0 points1 point  (1 child)

    You know the theory that everone has a double... - basically there are not enough faces to go around so that everyone can get a unique one ;-) ... I suspect that a person can be found to match any realistic generated face, so using these to avoid litigation might not be as effective as you hope!

    [–]Sixhaunt[S] 1 point2 points  (0 children)

    They wouldn't be able to get anywhere with litigation though. No input was ever of them so the similarities wouldn't matter. It's already tough enough for established actors to take legal action for their likeness if it isn't explicitly them. Elliot page tried to go after The Last of Us for example. People have animated films or high-quality 3d renders of people that dont exist all the time and it's never been an issue even when some random person finds that it looks an uncanny amount like them..

    [–]Mystvearn2 0 points1 point  (2 children)

    Wow. This is great.

    Is there a YouTube video on the step by step process? Also, Is it possible to run this thing locally? I have a 3060 which I think can be of use. Don't really matter about the processing time.

    [–]Sixhaunt[S] 0 points1 point  (1 child)

    Someone reached out and wants to do a video about it so I dont know if it's going to be a tutorial or a showcase or what, but I just have the google colab that I put together quickly. This was my first try at this so it's still early on. It was done fairly lazily and it's not efficient but you can find the link in the comments to the colab to reference. I just mashed together the demos for the different things I wanted to use but I'm redoing the entire thing right now and I'll have a better colab out in the future. You should be able to follow the local installation steps on your computer for each part to do it locally though.

    [–]Mystvearn2 0 points1 point  (0 children)

    Thanks. I have no coding background. I managed to install stable diffusion locally and managed to to install the model based on the YouTube tutorial. Asking me to do it again without consulting the video, then I am lost 😂

    [–]LordTuranian 0 points1 point  (8 children)

    How did you do this? This is amazing. I want to make something like this too.

    [–]Sixhaunt[S] 1 point2 points  (7 children)

    I explained it in a comment and linked to my google colab for it but basically:

    • use a driving video plus character image to generate new video of the character using Thin-Plate AI
    • Upsize it 2X so it's 512x512 before fixing the faces on each frame (I used GFPGan)
    • recombine frames into a video

    [–]LordTuranian 0 points1 point  (6 children)

    Oh I accidentally skipped over that comment because I can't understand a lot of the language because I'm new to this kind of stuff. But thanks anyway. :)

    [–]Sixhaunt[S] 1 point2 points  (4 children)

    with the google colab I made you can just run the first section which sets up the files and stuff, swap out the default video and image with your own(you can see where they are located and what they are named in the "settings" section) Then you just click on all the run/play buttons for each section in order til the end. It will take some time to process but then it will just produce an mp4 file for you to download

    [–]LordTuranian 0 points1 point  (3 children)

    How do I use the google colab on my PC? Do I just use it straight from the browser or do I have to use another program?

    [–]Sixhaunt[S] 1 point2 points  (2 children)

    the nice thing about google colab is that it runs on google's servers rather than your computer. It basically spins up a Virtual Machine to run the code and you control it through your browser and can download files from it after. When you are on the page you can basically just click the play button next to a chunk of code and it will run that code. You do it in order along with following any instructions and you'll get your results

    [–]LordTuranian 1 point2 points  (1 child)

    Awesome. Thanks again.

    [–]Sixhaunt[S] 1 point2 points  (0 children)

    no problem! I'm working on a new version of the colab along with someone else. I'm excited to show it off once it's working

    [–]Sixhaunt[S] 1 point2 points  (0 children)

    A youtube channel called PromptMuse reached out to me the other day and is planning to cover this in a video soon, so it might be more digestible in that format.

    I hadn't heard of the channel before she reached out, but it's actually really cool and covers a range of topics in the AI space, especially with SD.

    [–]midihex 0 points1 point  (0 children)

    A great use of TPSMM! I'm familiar with it so here's some thinkings for you, the default output video quality of TPS is a bit meh, it's vbr quality=5, so this is what I settled on..

    imageio.mimsave(output_video_path, [img_as_ubyte(frame) for frame in predictions], codec='libx264rgb', pixelformat='rgb24', output_params=['-crf', '0', '-s', '256x256', '-preset', 'veryslow'], fps=fps)

    Which is x264 lossless

    Also not sure that a pre-upscale before GFPGan is needed for this usage, GFPgan upscales anywhere up to 8x and then applies the face restore, it can also use realesrgan for the bits that GFPgan doesn't touch.

    Saw someone mention codeformer - it's great for static but falls apart with video, can't keep coherency like GFPgan

    Illustrious_Row_9971 on Reddit wrote a gradio colab version of TPS that you drag and drop on to, haven't got the link atm but it'll show with a search I think.

    Final output I always had to lossless (HUFFYUV or FFV1) retains so much more detail than mp4

    [–]Automatic-Respect-23 0 points1 point  (0 children)

    Great job!

    can you please share the driving video?

    edit:

    sometimes my photo doesn't fit to the driving video, and the results are very poor to take for training. do you have any suggestions?

    Thanks a lot!