Iâve been writing a lot of Janet lately, and Iâve been especially enjoying my time with the macro system.
Janet macros are Common Lisp-flavored unhygienic gensym
-style macros. They are extremely powerful, and very easy to write, but they can be pretty tricky to get right. Itâs easy to make mistakes that lead to unwanted variable capture, or to write macros that only work if theyâre expanded in particular contexts, and it can be pretty difficult to detect these problems ahead of time.
So people have spent a lot of time thinking about ways to write macros more safely â sometimes at the cost of expressiveness or simplicity â and almost all recent languages use some sort of hygienic macro system that defaults to doing the right thing.
But as far as I know, no one has approached macro systems from the other direction. No one looked at Common Lispâs macros and said âWhat if these macros arenât dangerous enough? What if we could make them even harder to write correctly, in order to marginally increase their power and expressiveness?â
So welcome to my blog post.
I want to show you an idea for a new kind of macro. A macro that can not only rewrite itself, but actually rewrite any form in your entire program.
I think that âwhat, no, why on earthâ is an entirely reasonable response to that statement, so letâs take a look at a motivating example together.
Lots of languages have something like defer
that will run an expression at the end of a block, whether or not the rest of the code raises an exception. defer
is just like wrapping the rest of the function in a try-finally block, except that, well, you donât actually have to do any wrapping. Which means no indentation increase, and no extra nested parentheses.
Hereâs an example of what defer
might look in Janet:
(do
(def f (file/open "foo.txt"))
(defer (file/close f))
(def contents (file/read f))
(do-something-dangerous-with contents))
Now, we canât implement defer
as a traditional macro, because a traditional macro can only rewrite itself. But what we really want to do is rewrite the parent form that the defer
appears in, to wrap its siblings in a finally expression:
(do
(def f (file/open "foo.txt"))
(finally
(do
(def contents (file/read f))
(do-something-dangerous-with contents))
(file/close f)))
So generalized macros let us write exactly this.1
I want to start at the punchline and work backwards, because my favorite part is how simple it is to write this. So although I donât actually expect this to make any sense yet, letâs go ahead and look at the implementation of this defer
macro:
(defmacro defer [] [expr]
(macro lefts rights
~(,;lefts (finally (do ,;rights) ,expr))))
Pretty tame, right?
So in order to do understand how this works, weâll have to change three things about macros:
- Macros no longer have to appear at the head of the form; they can appear anywhere within a form.
- Macros now have two argument lists: the forms âto the leftâ of the macro, and the forms âto the rightâ of the macro.
- Macros can either return new abstract syntax trees â like a traditional macro â or they can return new, anonymous macros as first-class values.
Letâs go through these ideas one at a time in slightly more detail, and then weâll circle back to the definition of defer
.
Macros can appear anywhere
Scheme already has something called âidentifier macros,â which can appear anywhere within a form. You can use them to say that foo
is a macro, and it can appear in any expression context, and then make (+ 1 foo)
expand to something like (+ 1 (some complicated expression))
.
But identifier macros can still only rewrite themselves. In order to do anything interesting with this, we need to addâŠ
Macros can see forms âto the leftâ and âto the rightâ
On the face of it this might sound like Iâm trying to introduce âinfix macros,â so that you could write something like (1 + 2)
and rewrite that to the traditional (+ 1 2)
syntax. And, to be clear, that is a thing you can do:
(defmacro + [left] [right]
[+ left right])
I think that infix macros could be very useful â weâll talk more about that in a bit â even if infix math is not particularly compelling to someone practiced in the prefixual arts.
But the real reason to support âinfixâ macros is for cases like defer
, where the (defer ...)
expression occurs in the middle of a form, and acts as sort of an âinfixâ expansion point. But in order for that to work, we need to addâŠ
First-class macros
I think this is the trickiest part to wrap your head around, but itâs the most important. This is the trick that allows macros to rewrite not only themselves, but also the forms around themselves â their parents, their grandparents, their⊠cousins? I guess? Any other form in your program, actually.
So the idea is that we can create first-class anonymous macros, and return them from our macro implementations. And then those macros will get expanded in the context of the parent form that they now appear in.
This is a lot like returning an anonymous function, except that functions are perfectly reasonable values to put in your abstract syntax trees⊠so itâs like returning a special function, a function with a little tag attached that says âhey, Iâm not a real runtime value, Iâm a macro, so you should call me before you finish macro expansion.â
And just to be super explicit: this is different from a macro returning a syntax tree that contains another macro invocation. You can already write ârecursive macros,â or macros that return invocations of other macros. But by creating actual new first-class macros at expansion time, you can close over macro arguments and reference them during the next phase of expansion.
So with these changes in mind, letâs come back to the implementation of defer
:
(defmacro defer [] [expr]
(macro lefts rights
~(,;lefts (finally (do ,;rights) ,expr))))
Our defer
macro takes two binding forms: []
and [expr]
. So it expects no arguments to the left â the word defer
has to appear at the beginning of its form â and it expects exactly one argument to its right. In other words, it looks like a normal, traditional prefix macro of one argument.
But then it returns an anonymous macro that closes over its expr
argument. So if we just look at one step of the expansion, weâll see an abstract syntax tree that looks like this:
(do
(def f (file/open "foo.txt"))
<macro>
(def contents (file/read f))
(do-something-dangerous-with contents))
But macro expansion isnât over. After expanding the (defer ...)
form, the macro expander will notice that it expanded to another macro, so it will expand that. Which winds up invoking our anonymous macro, passing it ['do '(def f ...)]
as its âleftâ arguments and ['(def contents ...) '(do-something...)]
as its ârightâ arguments.
And then that will return a replacement for the entire (do ...)
form, giving us our final result:
(do
(def f (file/open "foo.txt"))
(finally
(do
(def contents (file/read f))
(do-something-dangerous-with contents))
(file/close f)))
Neat, right?
This type of macro gives us a lot more freedom to decide how we want our code to look. Iâm honestly not really sure how much more freedom, because I havenât spent very much time with the idea yet. But Iâve been thinking about it for a while, and Iâve come up with a few examples of things that we can do with this â some much dumber than others.
Letâs take a look at a few of them.
Nest less
I think that reducing the number of nested parentheses and general indentation might be the most compelling use case for this sort of macro. I mean, really this is all defer
does: it lets you write finally
with a little less nesting, and with the expressions in a slightly different order.
I spend most of my programming time writing OCaml. OCaml doesnât have âblock scopeâ or âfunction scopeâ like most languages â it has expression scope. You introduce new âvariablesâ (they canât actually vary; all OCaml bindings are âconst
") using let ... in
, and the binding only exists on the right-hand side of that particular expression.
If you think about the nesting of the OCaml abstract syntax tree, it looks something like this:
(let x = 10 in
(let y = 20 in
(x + y)))
But of course you donât write OCaml like that. For one thing, the parentheses are redundant:
let x = 10 in
let y = 20 in
x + y
For another thing, this triangular indentation is really annoying. So you actually write it like this:
let x = 10 in
let y = 20 in
x + y
The parse tree for that expression is still nested, but it doesnât look nested â you always format your code linearly.
So lisps also have let
, but lisps donât have the luxury of leaving off the parentheses, so weâre back to the first example:
(let [x 10]
(let [y 20]
(+ x y)))
And if we tried to write that without indentation, thenâŠ
(let [x 10]
(let [y 20]
(+ x y)))
Immediate aneurysm.
Fortunately, every lisp dialect that I know of mitigates this problem substantially by allowing let
to introduce multiple bindings in a single form:
(let [x 10
y 20]
(+ x y))
Which means that we only have to increase the nesting by one level in the very common case that we have a series of let
expressions. But by using generalized macros instead, we can write a version of let
that doesnât increase indentation at all. Iâll call it def
:
(defmacro def [] [name value]
(macro lefts rights
~(,;lefts (let [,name ,value] ,;rights))))
def
lets us write code like this:
(def x 10)
(def y 20)
(+ x y)
Which gets transformed into code like this:
(let [x 10]
(let [y 20]
(+ x y)))
Of course Janet â and most lisps â have a âlinear assignmentâ form like this built into the language. In Janet itâs called â coincidentally enough â def
.
In fact in Janet, def
is actually the primitive way to create new bindings, and let
is a macro that just desugars to do
+ def
, which is very reasonable and pragmatic, but feels weird to me.
It feels weird to me because, in my mind, let
should be syntax sugar for fn
â Janetâs word for lambda
. Because, after all, these two expressions are equivalent:2
(let [x 10] (+ x 1))
((fn [x] (+ x 1)) 10)
let
allows us to write the expression in a much more natural order, but we can introduce new bindings without any let
s at all.
This might sound like weird mathematical lambda calculus trivia, but itâs not: itâs important to understand introducing new variables as a special-case of function application, even if this particular function application happens to be trivial.
Because we can apply the same technique that we just used â rewriting def
to let
, and rewriting let
to fn
â to do something much more interesting.
Generalized function application
So Haskell has something called do
notation. Youâve probably seen something like this before:
addAll :: [Int] -> [Int] -> [Int]
addAll xs ys = do
x <- xs
y <- ys
return (x + y)
ghci> addAll [1, 2] [10, 20]
[11,21,12,22]
This is equivalent to the following Janet code:
(defn add-all [xs ys]
(mapcat (fn [x]
(mapcat (fn [y]
[(+ x y)])
ys))
xs))
But I think the Haskell code is easier to read. Partly thatâs because the argument order to Janetâs mapcat
function makes the values weâre traversing appear in reverse order in our source code, and we could fix this by redefining mapcat
with a argument different order:
(defn add-all [xs ys]
(mapcat xs (fn [x]
(mapcat ys (fn [y]
[(+ x y)])))))
This reminds me of the transformation we did when we changed ((fn [x] (+ x 1)) 10)
into (let [x 10] (+ x 1))
. So what if we take it one step further, and do the same thing we did to get def
?
(defn add-all [xs ys]
(as x mapcat xs)
(as y mapcat ys)
[(+ x y)])
Itâs not quite as concise as Haskellâs do
notation: Haskell is able to use the type of the expression to determine what <-
means, so thereâs no need to specify the mapcat
bit: itâs implied from the fact that we gave it a list.
Janet doesnât have an analog for type classes, so we have to be a little more explicit, but this means that we can do more than just âbindâ with the as
macro. We can also map
:
(defn add-all [xs ys]
(as x mapcat xs)
(as y map ys)
(+ x y))
Implementing as
is just as easy as implementing def
:
(defmacro as [] [name f arg]
(macro lefts rights
~(,;lefts (,f (fn [,name] ,;rights) ,arg))))
If you havenât programmed in a language like Haskell, this particular syntax sugar might seem a little odd at first. But a specialized notation for generalized function application is extremely useful â we have it in OCaml too, through a syntax extension called ppx_let
:
let%bind x = xs in
let%bind y = ys in
return (x + y)
I think OCamlâs notation is actually more clear than Haskellâs â it highlights the symmetry between âordinaryâ let bindings and âfancyâ let bindings like these. And because it can do more than just bind
, we can also avoid the explicit return
in OCaml:
let%bind x = xs in
let%map y = ys in
x + y
map
and bind
arenât the only functions of this variety, either. Although monads are ubiquitous in OCaml, I spend a lot of my time working with arrows as well. And arrows have yet another notation: let%sub
and let%arr
.
All of these are generalizations of regular function application. Without worrying about what any of this means, just look at how similar the shape of these different type signatures are:
val (@@) : 'a -> ('a -> 'b) -> 'b
val map : 'a f -> ('a -> 'b) -> 'b f
val bind : 'a f -> ('a -> 'b f) -> 'b f
val sub : 'a s -> ('a r -> 'b s) -> 'b s
val arr : 'a r -> ('a -> 'b) -> 'b s
Okay, I know; this isnât supposed to be a blog post about monads or arrows. Letâs get back to macros.
Infix operators
So Janet has some very useful âthreadingâ macros that allow you to write code in a more âlinearâ fashion than you could without them. Theyâre useful when youâre performing a series of transformations to a value:
(filter |(string/has-prefix? "a" $)
(map string/ascii-lower
(map |($ :name)
people)))
With the power of threading macros, you could write that like this instead:
(->> people
(map |($ :name))
(map string/ascii-lower)
(filter |(string/has-prefix? "a" $)))
This is a lot like âmethod chainingâ in other languages:
people
.map(person => person.name)
.map(name => name.toLowerCase())
.filter(name => name.startsWith("a"))
Itâs easier for me to read the linear notation that uses the threading macro. But itâs not actually any easier to write it.
I like that method chaining allows me to write the code in the order of the operations: âStart with people, get the name, lowercase it, filter it to names that start with âa.'â
When writing the threading macro, though, the way you type this is âstart with people, okay wait, go back, surround it in parentheses, add a ->>
at the beginning, now move the cursor to the end of the form, and then get the nameâŠâ
I donât like that. And I know that there are fancy editors that allow you to easily wrap expressions in threading macros or any other without repositioning the cursor, but Iâd rather use a syntax that doesnât require a structural editor to work with comfortably.
So hereâs another way to write this:
(people
@ (map |($ :name))
@ (map string/ascii-lower)
@ (filter |(string/has-prefix? "a" name)))
This uses @
as an infix function application macro. I prefer |
myself, and thatâs the notation that I chose for Bauble, but |
is how you create short anonymous functions in Janet, so I donât want to step on that.
The main reason I prefer this notation is that itâs easier for me to type. I donât know that itâs any easier to read than ->
, but it allows me to write code in the order that I think it, and I like being able to choose a syntax that maps neatly onto my brain.
Another infix macro that I like is .
. .
is a convenient macro for looking up a keyword in a struct or table, so that you can write struct.key
instead of (get struct :key)
.
In Janet struct.key
parses as a single symbol, so we canât actually implement this as a generalized macro without a separate preprocessing step to split it into three symbols. But we can use it as struct . key
, which parses as three separate symbols:
(defmacro . lefts [key & rights]
~(,;(drop-last lefts)
(get ,(last lefts) ,(keyword key))
,;rights))
Which we can then use like this:
(print foo . bar)
(print (get foo :bar))
This is very goofy looking, but you could imagine a language where .
always parsed as its own symbol, so that we could just write foo.bar
and have that expand to (get foo :bar)
automatically.
Another infix macro that I think could be interesting is :
, to create a pair.
Janet already uses :
as a leader character for declaring keywords, so this wonât work in Janet. But you could imagine, again, a different language where :
is used as a short way to create a pair of two elements. So:
foo:bar
Would become:
(foo bar)
This might be useful in languages that use wrapped let
s, where you have to write:
(let ((x 10)
(y 20))
(+ x y))
Instead, you could write that as:
(let (x:10 y:20)
(+ x y))
But have it parse in exactly the same way.
You could imagine it in cond
as well:
(cond
(> x 0): "positive"
(< x 0): "negative"
(= x 0): "zero"
true: "nan")
Of course Janet doesnât require wrapping each entry in a cond
expression in parentheses, so this isnât as compelling in Janet.
For a slight variation on this, imagine a macro called ::
. Itâs just like :
â it creates a pair â but the pair appears in reverse order from how you write it. Weâre well off the Janet path now, but we could use this as a concise notation for adding type annotations without an explosion of parentheses.
Letâs say we have a â function? macro? â something called Int
that provides a type annotation to our compiler. Weâd normally write the type of an expression like this:
(def x (Int (compute-thing)))
But look all those close parens! I donât want to balance those. So instead, we could write:
(def x (compute-thing) :: Int)
Which is equivalent to the less intuitive (to me):
(def x Int:(compute-thing))
::
doesnât mean âtype annotation,â though, it just means âwrap in parentheses.â We could use it to do dumber things:
"hello" :: print
Which would expand, of course, to (print "hello")
. But⊠I donât know why you would want to do that.
Comment
Something that I occasionally wish for is a (comment ...)
macro that lets me ignore code.
You canât actually write such a macro in Janet. Janet has a macro called comment
in the standard library, but comment
always expands to nil, and nil
is not nothing. This means there are lots of places you canât use (comment ...)
:
(defn foo [a (comment this is a comment)]
(print a))
If you tried to compile that, youâd get an error:
compile error: unexpected type in destruction, got nil
Because after macro expansion, the compiler actually sees:
(defn foo [a nil]
(print a))
Which is not valid.
With generalized macros, though, you can write a comment that actually disappears:
(defmacro comment [] [&]
(macro lefts rights [;lefts ;rights]))
(defn foo [a (comment this is a comment)]
(print a))
After expansion, that will just be:
(defn foo [a]
(print a))
Like it never happened.
if
/else
One thing that sometimes trips me up when Iâm writing Janet is if
.
if
takes three forms: a boolean expression, an expression to evaluate if itâs truthy, and an expression to evaluate if itâs falsy. Which is nice and concise, but itâs different enough from other languages that I use â languages with explicit else
s â that sometimes Iâll write code like this by mistake:
(if should-do-something-important
(print "okay, performing important work:")
(perform-important-work))
That actually doesnât perform important work â (perform-important-work)
is the âelseâ section of that conditional. In order to do more than one thing in the âthenâ branch, we have to wrap all of the statements in do
.
And of course Janet has a when
macro that does exactly what I want:
(when should-do-something-important
(print "okay, performing important work:")
(perform-important-work))
Which doesnât have an else
branch, and usually when Iâm writing an if
without an else
I should just use when
in the first place.
But.
Generalized macros actually let us write if
with an explicit else
. Iâm not saying this is a good idea, but they let us write something like:
(if (empty? name)
(print "you must provide a valid name"))
(else
(print "okay i think it checks out"))
This example is a little weird because, in order for this to work nicely, weâll have to rename the built-in if
. Iâll call the ternary version if-then-else
for this example, and say that if
now means the same thing as Janetâs when
.
(defmacro else [] [else-exprs]
(macro lefts rights
(match (last lefts)
['if & then-exprs]
~(,;(drop-last lefts)
(if-then-else (do ,;then-exprs) (do ,;else-exprs))
,;rights)
(error "else must come immediately after if"))))
This is interesting because the else
macro actually rewrites the form before itself, changing the if
to an if-then-else
and then fussing with its arguments.
Weirder, more exotic things
I could keep going, but, as you have probably noticed, this is already a very long blog post. Thereâs a lot more that you can do with generalized macros â some of it useful, some of it unhinged, and we donât have time to talk about all of it.
So far weâve only seen macros that rewrite their parents or immediate siblings, but you can write macros that return macros that return macros, and use them to rewrite arbitrary forms anywhere in your program. You could write a macro that rewrites âthe nearest enclosing function definition,â recursively accumulating first-class macros until finally expanding all of them.
You can write actual left- and right-associative infix operators, and I think that if you tried hard enough, you could even use dynamic variables and controlled macro expansion to implement infix operator precedence (although I donât think you should).
You could implement (a weaker version of) Janetâs splice
built-in as a generalized macro. You could implement âidentifier macrosâ that look around themselves and expand to something different when they appear as the first argument to a (set ...)
form. You could, you couldâŠ
You could do a lot of things, but Iâm going to have to leave these as exercises to the reader, because itâs time to switch gears and talk about why you shouldnât do this.
Problem one: the repl
The main reason that this seems like a bad idea is that macros like this donât work at the top level.
If youâre just using the repl, and you type (defer (file/close f))
, what happens? One of the arguments to that macro is âeverything to the right.â But there isnât anything to the right! At least, not yet. And it wonât be able to supply everything to the right until you stop typing altogether.
This might not seem like a big deal for defer
â just donât use defer
at the repl â but it is a big deal for, say, def
. And I donât know an elegant way to solve this problem: in the general case, macros could look arbitrarily far ahead, so weâd have to wait until we closed the repl session to be able to expand them. And thatâs kinda gross.
Problem two: generalized macros donât always compose
All of the examples that weâve seen so far play nicely together, but itâs possible to write generalized macros that donât compose with one another.
The problem is that macro behavior can depend on expansion order. Regular macros always get a chance to run before their arguments get expanded, which is very convenient. Generalized macros donât have that luxury â because macros can see the forms around them, the order that you expand those forms matters.
In my implementation I chose to expand macros in a depth-first, left-to-right order. So macros always see the arguments âto the leftâ of themselves fully expanded, and the arguments âto the rightâ completely unexpanded.
And this can be problematic. For example, letâs say we make an infix alias for set
, called :=
:
(var x 0)
(x := 1)
Which expands to:
(var x 0)
(set x 1)
This is a trivial generalized macro to write.
Now letâs say we have another macro, which looks at the form to its left to see if it immediately follows set
. When it does, it rewrites that set
to something else. We could use this to implement some kind of custom associative data structure:
(defmacro at [] [dict key]
(macro lefts rights
(if (= lefts ['set])
~(assign ,dict ,key ,;rights)
~(,;lefts (lookup ,dict ,key) ,;rights))))
So that macro lets us write:
(set (at dict key) 10)
And have that expand to:
(assign dict key 10)
Meanwhile, if we write (print (at dict key))
, that will expand to (print (lookup dict key))
.
Each of these generalized macros make sense on their own. But if we try to use them together, they just donât work:
((at dict key) := 1)
Because (at dict key)
expands first. It looks at the forms to its left, sees that there arenât any, so after one step of expansion we have:
((lookup dict key) := 1)
Then we expand :=
, and finish with:
(set (lookup dict key) 1)
Which of course was not what we wanted.
I think that using generalized macros safely requires really understanding the effect that they have on the syntax tree of your program. Theyâre more like ->>
and friends â explicit syntax re-arrangers â than they are like other kinds of macros.
Problem three: you have created in your code a work of madness that no other human being can possibly hope to understand
ugh not again
Prior art
I feel like this approach is so simple that it must have been done before, but I canât find any references to it. That said, I have no idea how to search for it effectively! So if youâve seen this technique before, or if youâve heard of it being used in the past, Iâd love to hear about it.
Proof of concept
I implemented this macro system in Janet, in order to play around with it and test out my macro implementations.
It was pretty easy to write! It really is a very modest generalization of a traditional macro system. I didnât actually write a custom module loader that would let you use this as your default macro system in Janet code, but I wrote (the equivalent of) macex
, and adding the custom module loader would be pretty easy if you wanted to use it âfor real.â
You can look at the code here: https://github.com/ianthehenry/macaroni
Or take a peek at some of the tests, to see the examples in this post in action, as well as some weirder things that didnât make the cut.
-
Of course we could teach
do
to scan through all of its arguments and look fordefer
s, and implement this that way, but then this would only work indo
expressions. What if we want to do this inside a(fn [] ...)
block? Or anif
? Or awhile
? By makingdefer
itself do the transformation, we can use it anywhere â and make it easy to add new macros that behave likedefer
, without having to teachdo
about them. â©ïž -
Back in the Old Days, JavaScript only had function-scoped variables, so the only way to create new bindings in JavaScript code was to create and invoke an anonymous function like this. The pattern was so common that we called them âimmediately invoked function expressions,â and it was a real thing that you had to do in order to, for example, close over distinct values in a loop. â©ïž