Cowriting an Album With AI
Why a neural net is like ‘a chain of guitar pedals’
These days, artificial intelligence is showing off its creative chops.
GPT-2 and GPT-3 generate text so well that people are using them to author text-adventure games and books of poetry. Visual artists are using image-generation AI to create neural-net paintings. You can create utterly photorealistic pictures of synthetic people that don’t exist.
But in the world of music, things have lagged a bit. Certainly, there are some cool tools out there — like Google’s Magenta music-creation AI, which you can use to autocomplete MIDI melodies or have a piano plink out a tune while you keysmash. The most ambitious is probably OpenAI’s “Jukebox”, which generates entire new songs in the style of well-known musical artists, including lyrics and (crude) singing.
But the truth is, the music tools are less polished and less complete than text or image generation. The tunes these tools produce are usually only parts of songs — or with Jukebox, the songs are complete but very lo-fi, and they don’t have clear verse-chorus structures. With music AI, you can’t just set-it-and-forget-it, push a button and have a creation come out.
Why? It’s because music, I suspect, is so deeply multidimensional, moreso even than textual or visual art. Timing, timbre, human voicing, structure, mood, a welter of instrument voices and styles — there’s just a lot going on in music, such that AI has more trouble cranking it out all on its own.
I’m sure technologists will eventually get there! But for now, AI needs help.
More precisely, it needs collaboration — collaboration with humans.
Recently, Robin Sloan decided to explore this idea, by creating an album using Jukebox as a writing partner. Sloan is a friend of mine from the early, antediluvian days of blogging, and also a phenomenal author; he’s written books like Mr. Penumbra’s 24-Hour Bookstore and novellas like Annabel Scheme. But he’s also a talented programmer who, a few years back, got interested in using neural nets to generate writing — and he created a literary autocompleter, trained on tons of sci-fi stories.
During lockdown last year, Robin got cabin fever like the rest of us, and started chatting with his friend Jesse Solomon Clark, a composer and music producer. They’d heard about OpenAI’s Jukebox and hatched a plan to craft an album of music by working with Jukebox as a creative partner.
So they devised a system, which worked like this:
- Jesse would perform and record short snippets of original music, maybe a minute long each. He’d send them to Robin — on an old-school monaural cassette tape!
- Robin would feed those sounds into Jukebox, along with snippets of lyrics, and use them as a “seed” to inspire Jukebox’s output
- He’d listen to the snippets that Jukebox created — chunks of music typically about 10 or 20 seconds long — and pick the ones he liked best
- Then Robin would put those Jukebox creations on another monaural cassette tape, and send it back to Jesse…
- …who’d use those sounds as materials to craft full songs. The final songs aren’t composed only of Jukebox-created sounds and music; Jesse would also play instruments to accompany or echo what Jukebox had made, and he architected each song’s structure, songs, with verses and choruses and motifs.
The end result dropped this week — the album “Shadow Planet”, by The Cotton Modules (the band name Jesse and Robin adopted). It’s on Spotify, Bandcamp, and their own site, among other places.
It’s a gorgeous and fascinating creation! It’s deeply digital music — hell, when you’re listening to the neural-net output, you’re basically hearing matrix mathematics, right? Yet the Jukebox elements have a grungily lo-fi tone to them; at times, it feels like stuff you’d hear coming out of a scratchy Victrola cone. You can often detect (or I think you can) how Jesse worked his own instrumentation around the Jukebox output, but they blend together so organically you can’t always be sure. And when you hear singing — there are bursts of it here and there — that’s pure Jukebox output. Neither Jesse nor Robin did any singing.
They’re not the first people to use AI as a collaborative tool for making music; back in 2019 I wrote a piece for Mother Jones where I spoke to a few recording artists who’d done a similar thing, with pre-Jukebox tools. But “Shadow Planet” is most ambitious — and organic — use of AI that I’ve yet encountered in an album.
So, what’s it like to have an AI as a writing partner? I wanted to find out!
I did a Zoom call earlier this week with Robin and Jesse. Below are some edited-for-length chunks of our conversation.
Feeding the beast: Seeding the AI with musical snippets
Clive: So talk me through how this process worked. You started off by taking snippets of music Jesse had composed and feeding them into Jukebox. Jesse, what type of snippets did you compose? What did that seed material sound like?
Jesse: It would be everything from drones, to music with a pop kind of feeling, to plucked harps or vocal ideas. About a minute long each.
Robin: Then I just picked out the parts I thought sounded most interesting: Oh, that’s so cool. Oh, that’s interesting. That’s beautiful.
Clive: Then you fed them as seeds into Jukebox. You can also, with Jukebox, pick which artists you want the AI to emulate?
Robin: Yep. You can supply a list of artists or genres that it knows, so you can say things like, “All right, here’s ten seconds of our music snippet. We’re going to start with this. It’s from Jesse and it’s got this great pluck or, it builds in this cool way. Then — keeping in mind that I want something that’s like Perry Como meets Daft Punk — I want you to keep going, computer… just pick it up from there. Oh, by the way, as if that wasn’t weird enough, you’re going to have to start singing these lyrics too.”
And produces a little bit, 10 seconds, 15 seconds.
Clive: And then you’d pick your favorite outputs, and have it continue on with those?
Robin: The process becomes really interactive. At every step you’re asking it to produce for you a few, literally just three or four options. You either choose one and say, Cool, I’m going with that one — or you reject them all, which I did often. I’d just be like, that sounds too weird. And you just kind of keep doing it.
It’s like playing a board game or something. It’s as if you’re going through a maze; you just keep following that path. And I would usually have it open in the background because it takes a while! It’s not an instantaneous process at all. I would usually ask for about five seconds of music, and that would take about 10 minutes or 15 minutes to generate.
Clive: Slow!
Robin: For a lot of people, including professional musicians, that is so far from the real-time feedback that they’re accustomed to.
An AI with a wonky sense of beats and melody
Clive: Does the AI have any sense of beats, or time? Is it creating things with that sort of structure in mind — the way musicians will play to the time signature of a song?
Jesse: It does not! It’s totally bizarre. It’s pulling from the history of popular music, which is almost all western scales and time signatures — but [the output] just kind of drifts. It’ll, like, go down a partial step. Or it’s in weird time signatures. That was difficult to work with. And there was a lot of garbage audio, like noisy, crunchy drums with a guitar, that was either a problem — or a thing to use. But really not easy.
Robin: It was interesting, though! Some of my favorite things that the model produced were weird things that really do break the grid. They’d do really funky things, where you’re like, “I was not aware of that chord. Are you sure that is a chord?”
Jesse: I agree, that was cool — it’s totally wild. It was so loaded with vibe and personality. It was so unpredictable.
In my regular music work, I use plenty of libraries of sounds, and they’re all like general purpose sound-banks, like orchestral or jazz or, you know, or dub step. But this — it was like a guy from another planet being, like, “RAAAAAAAARH.”
Why working with AI is like using a guitar pedal
Clive: I play electric guitar in my band, and what you guys are doing reminds me of when guitarists are screwing around with a guitar pedal. It’s analog, and you’re twisting the knobs to figure out, “what sounds does it create when I do this?”
It also reminds me of when I talk to AI guys who are working on their model and they’re like “Oh, yeah, I tuned those parameters for like six months.” It’s just like a guitar pedal. They’re turning knobs, trying to get the system to perform the way they need it to.
Robin: That’s actually true. AI, as practiced today, is a chain of guitar pedals! There are these feedbacks and nonlinearities, and you’re like — “I don’t know, man. All I know is when I turn that to eight, it’s awesome!”
Playing along with the AI, like a duet
Clive: So Jesse, you’re receiving all this material from Robin, that he generated with Jukebox. How are you shaping it into songs?
Jesse: So I’d get a tape, and it would play in mono, compressed on tape. And it would be some crunchy thing — then it would just kind of change into this kind of beastly thing with words, and weird kind of drones and stuff. And then it would stop.
And I’m like [laughs] “All right! So.” I’d import it, and then just kind of see what I could do with it. Sometimes it would be some crooner thing, a Björk kind of a vocal thing or something.
I didn’t do any auto-tuning or anything. I used it all totally raw. I totally loved its nature. It felt kind of human — it had these spaces and pauses. I liked that rawness about it.
But I also really didn’t want each track to be an ambient soundscape-y thing. I wanted it to be a song.
Clive: You wanted it to have structure, passages, refrains. You didn’t want just, like, glitchy experimental noise.
Jesse: Both of us agreed — we wanted it to have some have some allure, to kind of invite people in.
Clive: So you’d add instrumentation, and play drum parts, and things like that?
Jesse: Yes, usually the drums and things that are linear, I did. Oftentimes, though, I was creating sounds that kind of matched the vibe of what the AI did.
Clive: One of the interesting things about listening to the album is trying to figure out what came from the AI and what came from you.
For example, one of your tracks, “Mallow Opera”, opens with a motif played by what sound like really eerie strings or a synth, with maybe a voice behind it. Was that something purely from the AI? Or is that something you did instrumentally on top of the AI’s creations?
Jesse: I sort of cut up a thing from the AI and then played along to it. It’s playing with it, kind of like a band.
Robin: There’s a bunch of moments like that throughout the whole album. They’re my favorites — where Jesse is playing along with the AI. He’s like, “Cool, let’s do it!” And he comes in with a synthesized violin, sometimes in a harmony, sometimes just a straight unison. I think that’s a beautiful thing.
“Here’s our lead singer, who’s been dead for 50 years”
Clive: There are voices singing lyrics, in several places, most notably on the title track “Shadow Planet”. Were those voices purely AI too?
Robin: All AI.
Those were lyrics I supplied, for better and for worse.
But here’s an interesting bit of color on the generation process and the chaos in it: When you’ve specified your parameters and your lyrics, and it starts singing — that first rumble of speech is totally unpredictable. And often I’m like, “That’s not the voice I want! That’s not what I’m looking for.”
Once you’ve found a voice that you do like — as I did with “Shadow Planet” — I was like, “Oh, that guy is so cool, he’s just like dorky and awesome.” He’s like some lounge singer. But it’s like a freaking ghost. It’s the Ghost Tones! Here’s our lead singer, who’s been dead for 50 years!
Once I kind of locked him in and kept feeding him back into the AI, the AI was happy to kind of keep that going.
Clive: Tell me more about how the feedback process worked. If the AI produced something you liked, you’d take that output and — feed that back into Jukebox? To keep it generating stuff along the same track?
Robin: Yeah, exactly. So let’s say I ask it for five new seconds and it says, “SHADOW PLANET”. And I’m like, hell yeah. So I basically print to tape and then the next thing I feed back in ends with “SHADOW PLANET”.
As slow and often annoying as the AI was, those are moments when I felt like I was playing the AI, like I had learned how to play it as an instrument. Which is kind of cool.
A surprisingly high hit ratio
Clive: Jesse, of all the AI-generated material that Robin sent to you, what percentage of it wound up on the album?
Jesse: Oh, I would say like 80 percent.
Robin: That you used? Or stuff that that I sent to you?
Clive: That’s more than I would have thought.
Robin: Of course, I was doing some of that. Yeah, I was his consigliere.
Clive: Right, yes — Robin, you were filtering out the terrible stuff before it got to Jesse. So let me ask you — of the stuff the AI produced when you fed it a seed, how often was it usable?
Robin: You know, it’s hard to say. One nice thing about this technique is that you could bail early if something was just not working. I did that a lot. So let’s say maybe, of the generations I embarked that eventually made it onto a tape, it might be not more than a quarter.
Clive: That’s actually not a bad hit ratio, though! If you’re a human in a studio recording an album, you probably use way less than one quarter of your takes. I once did literally scores of takes for a guitar break on a song, because I didn’t really know what I was aiming for, and it took a long time to drill down.
The ethics of generating music based on *previous* songs
Clive: Did it feel weird to be creating new music that’s based on the audio files of existing music? I know some music folk wonder about the ethics of that.
Robin: Yeah, we talked about that stuff, asking each other what we thought was appropriate. It definitely reflected my choices of how to generate the music.
I modified the code for sampling quite a bit from the original bundle that came from OpenAI. Their presentation is very much like, “you can generate a music song, and it sounds like the Beatles are performing it!” It’s very much this computational mimicry. But we both were like, “No, that’s not what we’re into.”
So I basically rewrote the code so that it’s finding these weird in-between parts in this big space, rather than just like zipping straight to Bruce Springsteen, Paul Simon, Bob Dylan, or whatever. That’s what we wanted.
Jesse: Also, we’re two dudes who are white. So it isn’t lost on us that we’re harvesting from the entire American songbook.
Robin: It’s not by chance that none of the voices singing in our songs is a black jazz singer. Or a rapper for that matter. The ethics, and making these decisions, a lot of it is murky, for me at least; but I think some things are very clear here. I do not think that people need to be chewing up hip hop in AI and producing robot hip hop, because because if you do, you’re just like: Wow, you didn’t get it at all, really.
So I guess it’s a good thing that I find the crooner — the sort of dorky crooner — so appealing aesthetically! Because I also think there’s something about that figure, that voice, that actually works really well ethically. You’re like, with the dorky white crooners you’re fine. It’s good!
Living in the future
Clive: Robin, in your 2009 novella Annabel Scheme, the plot involves a musical mystery — some mysterious technology is producing pop songs that are blends of styles, like a new Beatles song, except performed by Nirvana. Will that eventually be a reality?
Robin: I absolutely think so. If I wanted to take one of the mashups I wrote in that book, I actually think that we could make that now, with the tools we use for this. It would be really difficult, not very much fun in the end — and not the thing that we wanted to do. But like, you know, like if we wanted to make like, you know, a Radiohead song featuring vocals from John Lennon or something? You could try.
Clive Thompson is a contributing writer for the New York Times Magazine, a columnist for Wired and Smithsonian magazines, and a regular contributor to Mother Jones. He’s the author of Coders: The Making of a New Tribe and the Remaking of the World, and Smarter Than You Think: How Technology is Changing our Minds for the Better. He’s @pomeranian99 on Twitter and Instagram.