Creator of the OpenWebText and OpenGPT2. PyTorch Core Reviewer. PhD Student at @Cornell (interning at @MosaicML) Previously at @FacebookAI and @BrownUniversity.

Joined September 2015
Aaron Gokaslan retweeted
New year, new MME 🎉 @dskhudia and I profiled @Intel Gaudi2 accelerators for LLM training and inference, and found great performance and perf/$ ! databricks.com/blog/llm-trai…
Aaron Gokaslan retweeted
ByteDance announces Diffusion Model with Perceptual Loss paper page: huggingface.co/papers/2401.0… Diffusion models trained with mean squared error loss tend to generate unrealistic samples. Current state-of-the-art models rely on classifier-free guidance to improve sample quality, yet its surprising effectiveness is not fully understood. In this paper, We show that the effectiveness of classifier-free guidance partly originates from it being a form of implicit perceptual guidance. As a result, we can directly incorporate perceptual loss in diffusion training to improve sample quality. Since the score matching objective used in diffusion training strongly resembles the denoising autoencoder objective used in unsupervised training of perceptual networks, the diffusion model itself is a perceptual network and can be used to generate meaningful perceptual loss. We propose a novel self-perceptual objective that results in diffusion models capable of generating more realistic samples. For conditional generation, our method only improves sample quality without entanglement with the conditional input and therefore does not sacrifice sample diversity. Our method can also improve sample quality for unconditional generation, which was not possible with classifier-free guidance before.
Aaron Gokaslan retweeted
the openAI—NYT lawsuit is a big deal for copyright precedent. literally all popular models right now were trained on copyrighted data... except for one my friend from school @SkyLi0n developed a diffusion model that's not trained on any copyrighted data it's called CommonCanvas
Aaron Gokaslan retweeted
Heads up to SLURM users: Does your SLURM task get a single cpu-core instead of many? If so, you need to be aware that in recent SLURM versions srun no longer inherits --cpus-per-task. I explained how to diagnose and fix this issue here: github.com/stas00/ml-enginee… Even though our SLURM env isn't impacted I updated our templates to future-proof those for when the SLURM version gets updated. This change in behavior was reported here github.com/Lightning-AI/pyto…
The Fourier Transform, explained in one sentence by Stuart Riffle. [bityl.co/NGqj]
Aaron Gokaslan retweeted
999 github issues on the wall, 999 github issues, take one down, fix it all done, 998 github issues on the wall 🎵
✨Introducing diffusion with learned adaptive noise, a new state-of-the-art model for density estimation✨ Our key idea is to learn the diffusion process from data (instead of it being fixed). This yields a tighter ELBO, faster training, and more! Paper: arxiv.org/pdf/2312.13236.pdf
It's crazy how many modern generative models are 15-year old Aapo Hyvarinen papers. Noise contrastive estimation => GANs Score matching => diffusion Ratio matching => discrete diffusion If I were a student today, I'd carefully read Aapo's papers, they’re a gold mine of ideas.
Aaron Gokaslan retweeted
Really, really thankful to @willknight for covering this *critical* piece of the AI puzzle. It's something few would otherwise pay attention to, yet will affect everything that AI will become in the future. Also great quotes from @ruchowdh & @YJernite wired.com/story/americas-ai-…
Aaron Gokaslan retweeted
Scrutiny into open source datasets used for ML is a good thing That being said, we should collectively aim to direct at least as much scrutiny (require transparency) into closed source datasets Otherwise this likely leads to (even) less transparency in data & datasets used to train AI models in the future
Aaron Gokaslan retweeted
Academics: "You should finish your PhD with three papers that you are decidedly passionate about." Job market: "Minimum requirement: 8 top-tier conference papers for research scientist roles."
Since we just wrapped up an AI megaconference, it felt like a good day to plead for fewer papers. argmin.net/p/too-much-inform…
Aaron Gokaslan retweeted
Stories have 6 primary arcs: • “Rags to riches” (rise) • “Tragedy” (fall) • “Man in a hole” (fall-rise) • “Icarus” (rise-fall) • “Cinderella” (rise-fall-rise) • “Oedipus” (fall-rise-fall) Programmatic analysis validates Kurt Vonnegut's legendary rejected masters thesis.
Aaron Gokaslan retweeted
One paper can change your life. But which one? Overproductivity doesn't just come from paper counting, but from the desperate acts of young researchers under extreme pressure to be part of that one paper.
Since we just wrapped up an AI megaconference, it felt like a good day to plead for fewer papers. argmin.net/p/too-much-inform…
At poster #1202 and #1203
Come check out my posters on MuLAN: Multivariate Learned Adaptive Noise and CommonCanvas at the Diffusion Model Workshop at #NeurIPS #NeurIPS2023
Aaron Gokaslan retweeted
To the ACs of @CVPR #CVPR2024 if you have not, log in to OpenReview and re-check your assignments. Hard limit to 1 student/paper, and this caused some rather random matches to happen. Reviewers with ~0 topic experience and ~0 citations appointed instead of rising stars🤯
2-bit LLaMAs are here! 🦙✨ The new QuIP# ("quip-sharp") algorithm enables running the largest 70B models on consumer-level 24Gb GPUs with a only minimal drop in accuracy. Amazing work led by Cornell students @tsengalb99 @CheeJerry + colleagues @qingyao_sun @chrismdesa [1/n]
🧵 (1/n) 👉 Introducing QuIP#, a new SOTA LLM quantization method that uses incoherence processing from QuIP & lattices to achieve 2 bit LLMs with near-fp16 performance! Now you can run LLaMA 2 70B on a 24G GPU w/out offloading! 💻 cornell-relaxml.github.io/qu…
Aaron Gokaslan retweeted
🧵 (1/n) 👉 Introducing QuIP#, a new SOTA LLM quantization method that uses incoherence processing from QuIP & lattices to achieve 2 bit LLMs with near-fp16 performance! Now you can run LLaMA 2 70B on a 24G GPU w/out offloading! 💻 cornell-relaxml.github.io/qu…
Aaron Gokaslan retweeted
Me on arXiv this week.
The more you know…