×
all 25 comments

[–]adventuringraw 22 points23 points  (0 children)

This makes all of the unity developers very sad.

[–]m_nemo_syne 17 points18 points  (0 children)

This and other forms of "prompt engineering" are both really funny and really interesting new phenomena in the history of machine learning.

[–]ShhhWeKnow 5 points6 points  (8 children)

Is there a repo that has VQGAN and CLIP together? I'm not familiar with either of these, but curious about this. Sorry if that is a noob question...

[–]Wyrdcurt 4 points5 points  (6 children)

Not a repo, but a Colab notebook that draws from the CLIP and Taming Transformers (VQGAN) repos. It's a valid question, I'm surprised nobody else linked it!

[–]BonkoTheHun 2 points3 points  (2 children)

Is there a run down anywhere of what fields require editing and how, in order to generate an image? Sorry, new to this and I think I've at least figured out where the text prompt goes, but that's about it.

[–]echoauditor 1 point2 points  (0 children)

Also wondering this.

[–]Ziddletwix 1 point2 points  (0 children)

I'm a month late, but happened to see this Google Doc that might have a few answers for you if you're still looking.

[–]CandleGlittering2546 0 points1 point  (2 children)

hey I have a question, how do you put a picture in the prompt

init_image=

and also what do you do with the init_weight?

[–]Wyrdcurt 0 points1 point  (1 child)

It's not my notebook and I haven't used an image prompt with it myself, but at a glance it looks like all you need to do is set init_image to your image path. I would upload it to the Colab file system first. I'm not sure about init_weight. I can see it modifies the MSE loss but what that means practically is beyond me. Looks like you can probably leave it alone though, it's not required.

[–]assadollahi 0 points1 point  (0 children)

here's a version of it that runs on your local computer: https://github.com/nerdyrodent/VQGAN-CLIP

[–][deleted]  (14 children)

[deleted]

    [–]devi83 1 point2 points  (11 children)

    Is there some way we can test to verify the claims either way?

    For example, if I say Pac-Man. Unreal engine, it should return a "High Res" Pac-Man?

    And if so, what does that mean? Is there Unreal Engine screenshots of Pac-Man in high res in the training data? And if not, how do you explain the result?

    [–][deleted]  (9 children)

    [deleted]

      [–]devi83 1 point2 points  (8 children)

      So the concept of upscaling an image? Those neurons don't exist?

      [–][deleted]  (7 children)

      [deleted]

        [–]devi83 1 point2 points  (6 children)

        Try this: "Zion national park, upscale" also try it with just "Zion national park"

        I'm curious to see how different they will be.

        [–]thunder_jaxxML Engineer[S] 1 point2 points  (0 children)

        That’s why I said I may be wrong but your premise may be incomplete. Let me explain.

        One of the reasons i think that such a upsampling like computation is possible is because the text is changing the output images hence if the training distribution is devised in a way that a lot of “upsampled” images in the training data have associated text like “upsampled” or “unreal engine” then it can be argued that the network learns upsampling as a byproduct of language conditioning. Again i don’t know the way the distribution was devised but it seems an interesting avenue to explore.

        [–]orenog 1 point2 points  (1 child)

        !RemindMe 13 hours

        [–]RemindMeBot 0 points1 point  (0 children)

        I will be messaging you in 13 hours on 2021-06-04 11:19:48 UTC to remind you of this link

        CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

        Parent commenter can delete this message to hide from others.


        Info Custom Your Reminders Feedback

        [–]Veneck 0 points1 point  (0 children)

        This is actually an amazing showcase of its abilities. Blown away right now.

        [–]Designer-Writing-681 0 points1 point  (1 child)

        What can i do with the error "requirement pillow> = 7.1.0, but you'll have pillow 6.2.2 which is incompatible"?

        [–]graphicteadatasci 0 points1 point  (0 children)

        Restart the runtime

        [–]metaphorz99 0 points1 point  (0 children)

        The addition of a rendering engine is fantastic in terms of results. But how does it work, hypothetically? I don’t think imagenet includes rendered images or does it? What could be in the training to make it so that adding ‘unreal engine’ would have any effect?