David Mortensen · Nov 26, 2023 · 7:04 PM UTC

David Mortensen

David Mortensen @dmort27

26 Nov 2023

Perhaps I would be more concerned about the Robot Apocalypse if the performance of GPT-4 and Llama 2 on tasks important to my current research was not so disappointing.

Gašper Beguš · Nov 26, 2023 · 7:43 PM UTC

Gašper Beguš

@begusgasper

26 Nov 2023

Wait what? This is precisely why I’m worried.

Gašper Beguš

@begusgasper

2 May 2023

LLMs like can do language pretty well, but can they analyze language itself? Here's a research program that tests meta-linguistic abilities of LLMs. Many studies test behavioral performance of LLMs. GPT-4 is the first model that can generate coherent metalinguistic analyses.

David Mortensen · Nov 26, 2023 · 9:11 PM UTC

David Mortensen @dmort27

26 Nov 2023

I haven't found an LLM that can solve this utterly trivial problem.

Gašper Beguš · Nov 26, 2023 · 10:36 PM UTC

Gašper Beguš

@begusgasper

26 Nov 2023

This rule doesn’t seem very natural :) it took me a while! but yea, it’s not performing as well in phonology compared to syntax. But with some prompting we got some cool results.

David Mortensen · Nov 26, 2023 · 11:17 PM UTC

David Mortensen @dmort27

26 Nov 2023

The unnaturalness is the point, since the intended task is a logic problem rather than a content question.

Zlatý Retvítr · Nov 27, 2023 · 12:32 AM UTC

Zlatý Retvítr · Nov 27, 2023 · 12:32 AM UTC

Zlatý Retvítr @retvitr

27 Nov 2023

Replying to @dmort27 @begusgasper

Here it works: chat.openai.com/share/459caf…

Nov 27, 2023 · 12:32 AM UTC

Gašper Beguš · Nov 27, 2023 · 12:33 AM UTC

Gašper Beguš

@begusgasper

27 Nov 2023

Replying to @retvitr @dmort27

Linguists 0 : GPT4 1

David Mortensen · Nov 27, 2023 · 1:15 AM UTC

David Mortensen @dmort27

27 Nov 2023

Since the goal here was to use LLMs for linguistics, and this issue was blocking, I think the correct score is Linguists, GPT4: 1

more replies

Zlatý Retvítr · Nov 27, 2023 · 12:33 AM UTC

Zlatý Retvítr @retvitr

27 Nov 2023

Replying to @retvitr @dmort27 @begusgasper

The problem is in BPE tokenization, which is really stupid and prevents LLMs to work correctly on subword units. @gwern had nice blogpost about it at the times of GPT-3 and nothing has changed since.

David Mortensen · Nov 27, 2023 · 1:08 AM UTC

David Mortensen @dmort27

27 Nov 2023

Replying to @retvitr @begusgasper

Ah, that's very nice. I can considered tokenization, but had no problem, e.g., getting GPT-4 to produce arbitrary palindromes. This addresses a blocker that has been keeping me up at night, so that you!

David Mortensen · Nov 27, 2023 · 1:29 AM UTC

David Mortensen @dmort27

27 Nov 2023

However, I'm still having trouble with this case (which is an actual sound change that has occurred in various languages around the world): chat.openai.com/share/5eba9c…. Any thoughts on how to restructure the prompt in order to help GPT4?

more replies