Using GPT-3 to make regular expressions legible:

Aug 21, 2022 · 10:19 AM UTC

A second example using a UUID regex:
Also possible to make regexes from prose descriptions, but be careful! You can never fully trust GPT-3. Generate at low/zero temperature. Make sure output is verbose and easy for you to verify. GPT-3 often emits code that appears right but isn’t quite. Correct it via dialog.
Writing code with GPT-3 is often best done incrementally through dialog, guided by human feedback. Here, GPT-3 infers the high-level intent of a regex specified only through test cases, and crafts a legible PCRE2 satisfying all of them:
Note my example here is, technically, very subtly wrong: It doesn’t match leap seconds. GPT-3 is a poor substitute for real wisdom.
Another quick demo, incrementally crafting a regex through instructional dialog. This is overkill for this one, obviously, but it illustrates how you can talk to it using examples, corrections, etc. Playground link: beta.openai.com/playground/p…
These dialogs are often productive in themselves, similar to “rubber duck debugging”. You learn through GPT-3’s misunderstandings how much the task was actually underspecified in your head.
See an extension of this prompt here too:
Using GPT-3 to translate a shell/awk one-liner into both Python 3 and plain English with a single prompt:
There's @atr3gx which does this only using regexp
Yup! That’s a cool project. I like having a place for examples and dialog though, especially for complicated generations that I can only do piece-by-piece.
Replying to @goodside @__MLT__
Does it actually manage to somewhat parse/break down the regex… or rather just soft matches it against near-identical ones it found in its training data? 🤔 i.e.: will it do a decent job if you give it something that does not already exist many times in the wild?
Good question! Here’s something that isn’t in any textbook, and it does a great job, I think. This particular task it’s unusually good at.