Perhaps I would be more concerned about the Robot Apocalypse if the performance of GPT-4 and Llama 2 on tasks important to my current research was not so disappointing.
Wait what? This is precisely why I’m worried.
LLMs like can do language pretty well, but can they analyze language itself? Here's a research program that tests meta-linguistic abilities of LLMs. Many studies test behavioral performance of LLMs. GPT-4 is the first model that can generate coherent metalinguistic analyses.
I haven't found an LLM that can solve this utterly trivial problem.
This rule doesn’t seem very natural :) it took me a while! but yea, it’s not performing as well in phonology compared to syntax. But with some prompting we got some cool results.
The unnaturalness is the point, since the intended task is a logic problem rather than a content question.

Nov 27, 2023 · 12:32 AM UTC

Replying to @retvitr @dmort27
Linguists 0 : GPT4 1
Since the goal here was to use LLMs for linguistics, and this issue was blocking, I think the correct score is Linguists, GPT4: 1
The problem is in BPE tokenization, which is really stupid and prevents LLMs to work correctly on subword units. @gwern had nice blogpost about it at the times of GPT-3 and nothing has changed since.
Ah, that's very nice. I can considered tokenization, but had no problem, e.g., getting GPT-4 to produce arbitrary palindromes. This addresses a blocker that has been keeping me up at night, so that you!
However, I'm still having trouble with this case (which is an actual sound change that has occurred in various languages around the world): chat.openai.com/share/5eba9c…. Any thoughts on how to restructure the prompt in order to help GPT4?