- See Also
-
Links
- “A Disney Director Tried—and Failed—to Use an AI Hans Zimmer to Create a Soundtrack”, Heikkilä 2023
- “Whisper-AT: Noise-Robust Automatic Speech Recognizers Are Also Strong General Audio Event Taggers”, Gong et al 2023
- “Voice Conversion With Just Nearest Neighbors”, Baas et al 2023
- “SoundStorm: Efficient Parallel Audio Generation”, Borsos et al 2023
- “ImageBind: One Embedding Space To Bind Them All”, Girdhar et al 2023
- “TANGO: Text-to-Audio Generation Using Instruction-Tuned LLM and Latent Diffusion Model”, Ghosal et al 2023
- “CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval”, Wu et al 2023
- “Speak, Read and Prompt (SPEAR-TTS): High-Fidelity Text-to-Speech With Minimal Supervision”, Kharitonov et al 2023
- “Archisound: Audio Generation With Diffusion”, Schneider 2023
- “Msanii: High Fidelity Music Synthesis on a Shoestring Budget”, Maina 2023
- “Rock Guitar Tablature Generation via Natural Language Processing”, Casco-Rodriguez 2023
- “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, Wang et al 2023
- “Robust Speech Recognition via Large-Scale Weak Supervision”, Radford et al 2022
- “High Fidelity Neural Audio Compression”, Défossez et al 2022
- “Hierarchical Diffusion Models for Singing Voice Neural Vocoder”, Takahashi et al 2022
- “RealSinger: Ultra-Realistic Singing Voice Generation via Stochastic Differential Equations”, Anonymous 2022
- “AudioLM: a Language Modeling Approach to Audio Generation”, Borsos et al 2022
- “MeloForm: Generating Melody With Musical Form Based on Expert Systems and Neural Networks”, Lu et al 2022
- “AI Composer Bias: Listeners like Music Less When They Think It Was Composed by an AI”, Shank et al 2022
- “Musika! Fast Infinite Waveform Music Generation”, Pasini & Schlüter 2022
- “Multitrack Music Transformer: Learning Long-Term Dependencies in Music With Diverse Instruments”, Dong et al 2022
- “CLAP: Learning Audio Concepts From Natural Language Supervision”, Elizalde et al 2022
- “BigVGAN: A Universal Neural Vocoder With Large-Scale Training”, Lee et al 2022
- “Tradformer: A Transformer Model of Traditional Music Transcriptions”, Casini & Sturm 2022
- “SymphonyNet: Symphony Generation With Permutation Invariant Language Model”, Liu et al 2022
- “It’s Raw! Audio Generation With State-Space Models”, Goel et al 2022
- “General-purpose, Long-context Autoregressive Modeling With Perceiver AR”, Hawthorne et al 2022
- “FIGARO: Generating Symbolic Music With Fine-Grained Artistic Control”, Rütte et al 2022
- “Steerable Discovery of Neural Audio Effects”, Steinmetz & Reiss 2021
- “Semi-Supervised Music Tagging Transformer”, Won et al 2021
- “AudioCLIP: Extending CLIP to Image, Text and Audio”, Guzhov et al 2021
- “MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis”, Tae et al 2021
- “PriorGrad: Improving Conditional Denoising Diffusion Models With Data-Dependent Adaptive Prior”, Lee et al 2021
- “DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism”, Liu et al 2021
- “Symbolic Music Generation With Diffusion Models”, Mittal et al 2021
- “Interacting With GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation”, Geerlings & Meroño-Peñuela 2020
- “HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis”, Kong et al 2020
- “AI Song Contest: Human-AI Co-Creation in Songwriting”, Huang et al 2020
- “DeepSinger: Singing Voice Synthesis With Data Mined From the Web”, Ren et al 2020
- “Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models”, Papadimitriou & Jurafsky 2020
- “15.ai”, Fifteen-kun & Project 2020
- “Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions”, Huang & Yang 2020
- “Writing the Next American Hit: Using GPT-2 to Explore the Possibility of Creating Successful AI-Generated Song Lyrics Possibility of Creating Successful AI-Generated Song Lyric”, Barrio 2020
- “GPT-2 Preference Learning for Music Generation”, Gwern 2019
- “Encoding Musical Style With Transformer Autoencoders”, Choi et al 2019
- “GPT-2 Folk Music”, Branwen & Presser 2019
- “Parallel WaveGAN: A Fast Waveform Generation Model Based on Generative Adversarial Networks With Multi-resolution Spectrogram”, Yamamoto et al 2019
- “Low-dimensional Embodied Semantics for Music and Language”, Raposo et al 2019
- “MuseNet: a Deep Neural Network That Can Generate 4-minute Musical Compositions With 10 Different Instruments, and Can Combine Styles from Country to Mozart to the Beatles”, Payne 2019
- “Generative Modeling With Sparse Transformers: We’ve Developed the Sparse Transformer, a Deep Neural Network Which Sets New Records at Predicting What Comes next in a Sequence—whether Text, Images, or Sound. It Uses an Algorithmic Improvement of the attention Mechanism to Extract Patterns from Sequences 30× Longer Than Possible Previously”, Child & Gray 2019
- “Music Transformer: Generating Music With Long-Term Structure”, Huang et al 2018
- “FloWaveNet: A Generative Flow for Raw Audio”, Kim et al 2018
- “Piano Genie”, Donahue et al 2018
- “Music Transformer”, Huang et al 2018
- “This Time With Feeling: Learning Expressive Musical Performance”, Oore et al 2018
- “The Challenge of Realistic Music Generation: Modelling Raw Audio at Scale”, Dieleman et al 2018
- “The Sound of Pixels”, Zhao et al 2018
- “Efficient Neural Audio Synthesis”, Kalchbrenner et al 2018
- “Generating Structured Music through Self-Attention”, Huang et al 2018
- “Towards Deep Modeling of Music Semantics Using EEG Regularizers”, Raposo et al 2017
- “Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models”, Guimaraes et al 2017
- “Neural Audio Synthesis of Musical Notes With WaveNet Autoencoders”, Engel et al 2017
- “Tuning Recurrent Neural Networks With Reinforcement Learning”, Jaques et al 2017
- “SampleRNN: An Unconditional End-to-End Neural Audio Generation Model”, Mehri et al 2016
- “WaveNet: A Generative Model for Raw Audio”, Oord et al 2016
-
“The Abc Music Standard 2.1: §3.1.1:
X:
- Reference Number”, Walshaw 2011 - “Staring Emmy Straight in the Eye—And Doing My Best Not to Flinch”, Hofstadter & Cope 2001
- “Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints [Technical Report CU-CS–495–90]”, Mozer 1990
- Sort By Magic
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“A Disney Director Tried—and Failed—to Use an AI Hans Zimmer to Create a Soundtrack”, Heikkilä 2023
“A Disney director tried—and failed—to use an AI Hans Zimmer to create a soundtrack”
“Whisper-AT: Noise-Robust Automatic Speech Recognizers Are Also Strong General Audio Event Taggers”, Gong et al 2023
“Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers”
“Voice Conversion With Just Nearest Neighbors”, Baas et al 2023
“SoundStorm: Efficient Parallel Audio Generation”, Borsos et al 2023
“ImageBind: One Embedding Space To Bind Them All”, Girdhar et al 2023
“TANGO: Text-to-Audio Generation Using Instruction-Tuned LLM and Latent Diffusion Model”, Ghosal et al 2023
“TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model”
“CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval”, Wu et al 2023
“Speak, Read and Prompt (SPEAR-TTS): High-Fidelity Text-to-Speech With Minimal Supervision”, Kharitonov et al 2023
“Speak, Read and Prompt (SPEAR-TTS): High-Fidelity Text-to-Speech with Minimal Supervision”
“Archisound: Audio Generation With Diffusion”, Schneider 2023
“Msanii: High Fidelity Music Synthesis on a Shoestring Budget”, Maina 2023
“Msanii: High Fidelity Music Synthesis on a Shoestring Budget”
“Rock Guitar Tablature Generation via Natural Language Processing”, Casco-Rodriguez 2023
“Rock Guitar Tablature Generation via Natural Language Processing”
“VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, Wang et al 2023
“VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers”
“Robust Speech Recognition via Large-Scale Weak Supervision”, Radford et al 2022
“Robust Speech Recognition via Large-Scale Weak Supervision”
“High Fidelity Neural Audio Compression”, Défossez et al 2022
“Hierarchical Diffusion Models for Singing Voice Neural Vocoder”, Takahashi et al 2022
“Hierarchical Diffusion Models for Singing Voice Neural Vocoder”
“RealSinger: Ultra-Realistic Singing Voice Generation via Stochastic Differential Equations”, Anonymous 2022
“RealSinger: Ultra-Realistic Singing Voice Generation via Stochastic Differential Equations”
“AudioLM: a Language Modeling Approach to Audio Generation”, Borsos et al 2022
“MeloForm: Generating Melody With Musical Form Based on Expert Systems and Neural Networks”, Lu et al 2022
“MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks”
“AI Composer Bias: Listeners like Music Less When They Think It Was Composed by an AI”, Shank et al 2022
“AI composer bias: Listeners like music less when they think it was composed by an AI”
“Musika! Fast Infinite Waveform Music Generation”, Pasini & Schlüter 2022
“Multitrack Music Transformer: Learning Long-Term Dependencies in Music With Diverse Instruments”, Dong et al 2022
“Multitrack Music Transformer: Learning Long-Term Dependencies in Music with Diverse Instruments”
“CLAP: Learning Audio Concepts From Natural Language Supervision”, Elizalde et al 2022
“CLAP: Learning Audio Concepts From Natural Language Supervision”
“BigVGAN: A Universal Neural Vocoder With Large-Scale Training”, Lee et al 2022
“BigVGAN: A Universal Neural Vocoder with Large-Scale Training”
“Tradformer: A Transformer Model of Traditional Music Transcriptions”, Casini & Sturm 2022
“Tradformer: A Transformer Model of Traditional Music Transcriptions”
“SymphonyNet: Symphony Generation With Permutation Invariant Language Model”, Liu et al 2022
“SymphonyNet: Symphony Generation with Permutation Invariant Language Model”
“It’s Raw! Audio Generation With State-Space Models”, Goel et al 2022
“General-purpose, Long-context Autoregressive Modeling With Perceiver AR”, Hawthorne et al 2022
“General-purpose, long-context autoregressive modeling with Perceiver AR”
“FIGARO: Generating Symbolic Music With Fine-Grained Artistic Control”, Rütte et al 2022
“FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control”
“Steerable Discovery of Neural Audio Effects”, Steinmetz & Reiss 2021
“Semi-Supervised Music Tagging Transformer”, Won et al 2021
“AudioCLIP: Extending CLIP to Image, Text and Audio”, Guzhov et al 2021
“MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis”, Tae et al 2021
“MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis”
“PriorGrad: Improving Conditional Denoising Diffusion Models With Data-Dependent Adaptive Prior”, Lee et al 2021
“PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior”
“DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism”, Liu et al 2021
“DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism”
“Symbolic Music Generation With Diffusion Models”, Mittal et al 2021
“Interacting With GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation”, Geerlings & Meroño-Peñuela 2020
“Interacting with GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation”
“HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis”, Kong et al 2020
“HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis”
“AI Song Contest: Human-AI Co-Creation in Songwriting”, Huang et al 2020
“DeepSinger: Singing Voice Synthesis With Data Mined From the Web”, Ren et al 2020
“DeepSinger: Singing Voice Synthesis with Data Mined From the Web”
“Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models”, Papadimitriou & Jurafsky 2020
“Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models”
“15.ai”, Fifteen-kun & Project 2020
“Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions”, Huang & Yang 2020
“Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions”
“Writing the Next American Hit: Using GPT-2 to Explore the Possibility of Creating Successful AI-Generated Song Lyrics Possibility of Creating Successful AI-Generated Song Lyric”, Barrio 2020
“GPT-2 Preference Learning for Music Generation”, Gwern 2019
“Encoding Musical Style With Transformer Autoencoders”, Choi et al 2019
“GPT-2 Folk Music”, Branwen & Presser 2019
“Parallel WaveGAN: A Fast Waveform Generation Model Based on Generative Adversarial Networks With Multi-resolution Spectrogram”, Yamamoto et al 2019
“Low-dimensional Embodied Semantics for Music and Language”, Raposo et al 2019
“MuseNet: a Deep Neural Network That Can Generate 4-minute Musical Compositions With 10 Different Instruments, and Can Combine Styles from Country to Mozart to the Beatles”, Payne 2019
“Generative Modeling With Sparse Transformers: We’ve Developed the Sparse Transformer, a Deep Neural Network Which Sets New Records at Predicting What Comes next in a Sequence—whether Text, Images, or Sound. It Uses an Algorithmic Improvement of the attention Mechanism to Extract Patterns from Sequences 30× Longer Than Possible Previously”, Child & Gray 2019
“Music Transformer: Generating Music With Long-Term Structure”, Huang et al 2018
“Music Transformer: Generating Music with Long-Term Structure”
“FloWaveNet: A Generative Flow for Raw Audio”, Kim et al 2018
“Piano Genie”, Donahue et al 2018
“Music Transformer”, Huang et al 2018
“This Time With Feeling: Learning Expressive Musical Performance”, Oore et al 2018
“This Time with Feeling: Learning Expressive Musical Performance”
“The Challenge of Realistic Music Generation: Modelling Raw Audio at Scale”, Dieleman et al 2018
“The challenge of realistic music generation: modelling raw audio at scale”
“The Sound of Pixels”, Zhao et al 2018
“Efficient Neural Audio Synthesis”, Kalchbrenner et al 2018
“Generating Structured Music through Self-Attention”, Huang et al 2018
“Towards Deep Modeling of Music Semantics Using EEG Regularizers”, Raposo et al 2017
“Towards Deep Modeling of Music Semantics using EEG Regularizers”
“Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models”, Guimaraes et al 2017
“Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models”
“Neural Audio Synthesis of Musical Notes With WaveNet Autoencoders”, Engel et al 2017
“Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders”
“Tuning Recurrent Neural Networks With Reinforcement Learning”, Jaques et al 2017
“Tuning Recurrent Neural Networks with Reinforcement Learning”
“SampleRNN: An Unconditional End-to-End Neural Audio Generation Model”, Mehri et al 2016
“SampleRNN: An Unconditional End-to-End Neural Audio Generation Model”
“WaveNet: A Generative Model for Raw Audio”, Oord et al 2016
“The Abc Music Standard 2.1: §3.1.1: X:
- Reference Number”, Walshaw 2011
“Staring Emmy Straight in the Eye—And Doing My Best Not to Flinch”, Hofstadter & Cope 2001
“Staring Emmy Straight in the Eye—And Doing My Best Not to Flinch”
“Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints [Technical Report CU-CS–495–90]”, Mozer 1990
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
music-generation
music-transformer
audio-synthesis
Wikipedia
Miscellaneous
-
/doc/ai/music/2023-wang-figure1-vallevoicesynthesisautoregressivearchitecture.png
-
/doc/ai/music/2021-07-03-purplesmartai-applejack-navysealcopypasta.mp3
-
/doc/ai/music/2020-07-07-nshepperd-openaijukebox-gpt3-theuniverseisaglitch.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-thesessionsabc-9199774293.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-thesessionsabc-6791639443.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-thesessionsabc-6473931123.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-thesessionsabc-3400691.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-thesessionsabc-33762535.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-thesessionsabc-3374184109.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-thesessionsabc-3308925389.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-thesessionsabc-32506201.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-thesessionsabc-2625946.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-thesessionsabc-19337613.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-thesessionsabc-1838864.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-thesessionsabc-1772291.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-thesessionsabc-145036110185.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-thesessionsabc-11877811957.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-pop_midi-settttiestodaffatta.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-pop_midi-setsssgtroscnpelciope.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-pop_midi-setgggeneraloperationsmcnewtonmixx.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-pop_mid-ismirlmd_matchedg.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-lmd_full-ee_214dda09c3020.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-lmd_full-dd83b4c18d8897e6.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-lmd_full-99ebb19c3ffcaac7.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-lmd_full-8861e24a8b983dff.mp3
-
/doc/ai/music/2020-04-15-gpt2-midi-lmd_full-554f3a38f2676bfe.mp3
-
/doc/ai/music/2020-04-01-fifteenai-twilightsparkle-telephonecall.mp3
-
/doc/ai/music/2020-03-30-fifteenai-twilightsparkle-sel-presentdaypresenttime.mp3
-
/doc/ai/music/2020-03-28-fifteenai-ensemble-hellofellowhumans.mp3
-
/doc/ai/music/2020-03-06-fifteenai-twilightsparkle-sithcode.mp3
-
/doc/ai/music/2020-01-26-gwern-gpt2-preferencelearning-datacode.tar.xz
-
/doc/ai/music/2020-01-25-gpt2-rl-final-bourreeasixdebriantes.mp3
-
/doc/ai/music/2019-12-22-gpt2-preferencelearning-gwern-abcmusic.patch
-
/doc/ai/music/2019-12-09-gpt2-abccombined-samples-top_p0.95.txt
-
/doc/ai/music/2019-12-04-gpt2-combinedabc-invereshieshouse.mp3
-
/doc/ai/music/2019-11-10-gpt2-irish-spaceless-50variantsonynbollanbane.mp3
-
/doc/ai/music/2019-11-09-gpt2-nospaces-samples-top_p0.99.txt
-
/doc/ai/music/2019-11-09-gpt2-irish-spaceless-50medley-topp0.99.mp3
-
/doc/ai/music/2019-10-23-gwern-gpt2-folkrnn-irishmusic-samples.txt
-
http://mtg.upf.edu/system/files/publications/Font-Roma-Serra-ACMM-2013.pdf
-
https://blog.metabrainz.org/2022/02/16/acousticbrainz-making-a-hard-decision-to-end-the-project/
-
https://blog.research.google/2023/01/google-research-2022-beyond-language.html
-
https://blog.youtube/inside-youtube/ai-and-music-experiment/
-
https://colinmeloy.substack.com/p/i-had-chatgpt-write-a-decemberists
-
https://deepmind.google/discover/blog/transforming-the-future-of-music-creation/
-
https://pitchfork.com/features/article/ai-music-experimentation-or-automation/
-
https://twitter.com/vatsal_aggarwal/status/1612536555708743680
-
https://www.404media.co/harry-styles-one-direction-ai-leaked-songs/
-
https://www.danieldjohnson.com/2015/08/03/composing-music-with-recurrent-neural-networks/
-
https://www.engadget.com/drew-carey-made-a-radio-show-with-ai-fans-werent-pleased-143014038.html
-
https://www.karolpiczak.com/papers/Piczak2015-ESC-Dataset.pdf
-
https://www.nytimes.com/2020/01/07/magazine/hologram-musicians.html
-
https://www.vice.com/en/article/k7z8be/torswats-computer-generated-ai-voice-swatting
Link Bibliography
-
https://arxiv.org/abs/2305.09636#google
: “SoundStorm: Efficient Parallel Audio Generation”, Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi -
https://arxiv.org/abs/2305.05665#facebook
: “ImageBind: One Embedding Space To Bind Them All”, Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Arm, Joulin, Ishan Misra -
https://arxiv.org/abs/2304.13731
: “TANGO: Text-to-Audio Generation Using Instruction-Tuned LLM and Latent Diffusion Model”, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Soujanya Poria -
https://raw.githubusercontent.com/flavioschneider/master-thesis/main/audio_diffusion_thesis.pdf
: “Archisound: Audio Generation With Diffusion”, Flavio Schneider -
https://arxiv.org/abs/2301.02111#microsoft
: “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, -
https://arxiv.org/abs/2210.13438#facebook
: “High Fidelity Neural Audio Compression”, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi -
https://arxiv.org/abs/2210.07508#sony
: “Hierarchical Diffusion Models for Singing Voice Neural Vocoder”, Naoya Takahashi, Mayank Kumar, Singh, Yuki Mitsufuji -
2022-shank.pdf
: “AI Composer Bias: Listeners like Music Less When They Think It Was Composed by an AI”, Daniel B. Shank, Courtney Stefanik, Cassidy Stuhlsatz, Kaelyn Kacirek, Amy M. Belfi -
https://arxiv.org/abs/2206.04658#nvidia
: “BigVGAN: A Universal Neural Vocoder With Large-Scale Training”, Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon -
https://arxiv.org/abs/2202.09729
: “It’s Raw! Audio Generation With State-Space Models”, Karan Goel, Albert Gu, Chris Donahue, Christopher Ré -
https://arxiv.org/abs/2202.07765#deepmind
: “General-purpose, Long-context Autoregressive Modeling With Perceiver AR”, -
https://arxiv.org/abs/2106.13043
: “AudioCLIP: Extending CLIP to Image, Text and Audio”, Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel -
https://fifteen.ai/
: “15.ai”, Fifteen-kun, Pony Preservation Project -
gpt-2-preference-learning
: “GPT-2 Preference Learning for Music Generation”, Gwern -
gpt-2-music
: “GPT-2 Folk Music”, Gwern Branwen, Shawn Presser -
https://openai.com/research/musenet
: “MuseNet: a Deep Neural Network That Can Generate 4-minute Musical Compositions With 10 Different Instruments, and Can Combine Styles from Country to Mozart to the Beatles”, Christine Payne -
https://magenta.tensorflow.org/music-transformer
: “Music Transformer: Generating Music With Long-Term Structure”, Cheng-Zhi Anna Huang, Ian Simon, Monica Dinculescu -
https://arxiv.org/abs/1811.02155
: “FloWaveNet: A Generative Flow for Raw Audio”, Sungwon Kim, Sang-gil Lee, Jongyoon Song, Jaehyeon Kim, Sungroh Yoon -
2018-huang.pdf
: “Generating Structured Music through Self-Attention”,