all 15 comments

[–]thexdroid 6 points7 points  (1 child)

Amazing! Like GPT3, is there any access to the API already? I tried find one but no success.

[–]xjustwaitx 0 points1 point  (0 children)

No, there is no API for now

[–]FlyingNarwhal 6 points7 points  (0 children)

The difference is striking!

[–]fakesoicansayshit 5 points6 points  (0 children)

Jesus.

It knows colloquially now.

That's crazy.

[–]Simcurious 3 points4 points  (0 children)

Wow this is mind blowing, thanks for posting!

[–]NTaya 3 points4 points  (0 children)

The first one blew my mind for a very minor reason: the model casually calculated the difference in time between 5:00 PM and 9:30 PM in hours without being prompted to do so. LLMs can do that, more or less, but they require a lot of prompt engineering for that. Here the model demonstrates its capabilities in an entirely different context.

[–]adt 2 points3 points  (5 children)

Awesome work /u/BeginningInfluence55

Did you use the same long two-shot prompt given in the PaLM paper (pp38: 'Figure 19: Each “Input” was independently prepended with the same 2-shot exemplar shown at the top')?

I will explain these jokes: (1) The problem with kleptomaniacs is that they always take things literally. Explanation: This joke is wordplay. Someone who "takes things literally" is someone who doesn't fully understand social cues and context, which is a negative trait. But the definition of kleptomania is someone who literally takes things.

(2) Always borrow money from a pessimist. They’ll never expect it back. Explanation: Most people expect you to pay them back when you borrow money, however a pessimist is someone who always assumes the worst, so if you borrow money from them, they will expect that you won't pay them back anyways.

[–]BeginningInfluence55[S] 5 points6 points  (4 children)

Thank you, Alan.

Did you use the same long two-shot prompt given in the PaLM paper (pp38: 'Figure 19: Each “Input” was independently prepended with the same 2-shot exemplar shown at the top')?

That's exactly how I did it. Engine was text-davinci-002, temperature 0.4. No regenerations. However, I left out the two-shot prompt here in the post to keep everything a bit clearer.

Here you can see the full prompt and completion of the language model joke example:

https://beta.openai.com/playground/p/DvPdJ0MxmVCHFc3NZ1pWQVml?model=text-davinci-002

[–]adt 4 points5 points  (0 children)

Brilliant! A nice rigorous test!

[–]vzakharov 0 points1 point  (2 children)

Why 0.4 not 0? This makes the input non-deterministic, so we can’t really know if there would be better outputs on regenerations.

[–]BeginningInfluence55[S] 0 points1 point  (1 child)

0 doesn't necessarily mean better outputs, it just means repeatable outputs. I like to give the model more freedom, and that most of the time improves the answers

[–]vzakharov 0 points1 point  (0 children)

But it does mean output that’s the most expected for the model, i.e. one that it itself “considers” as best. Changing the temperature is basically artificially improving the quality using “human” (not the model’s own) criteria. IMHO of course.

[–]dmmd 1 point2 points  (0 children)

This is amazing. So, there is no way to test PaLM at this time?

[–]ConfidentFlorida 0 points1 point  (1 child)

Amazing. How would this do on the Turing test?

[–]waste_and_pine 2 points3 points  (0 children)

I think it might fail the Turing test because it answers challenging questions more competently than the average human.