I received my PhD from UC Berkeley where I was advised by Dawn Song and Jacob Steinhardt. I am now the director of the Center for AI Safety. I am interested in AI Safety. I received my BS from UChicago. My research is supported by the NSF GRFP and the Open Philanthropy AI Fellowship. I helped contribute the GELU activation function (the most-used activation in state-of-the-art models including BERT, GPT, Vision Transformers, etc.), the out-of-distribution detection baseline, and distribution shift benchmarks.
Works
- Understanding Large Language Models: Foundations and Safety (UC Berkeley Course)
Dawn Song and Dan Hendrycks - The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Liu, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks - HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendrycks - Representation Engineering: A Top-Down Approach to AI Transparency
Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, Zico Kolter, Dan Hendrycks - Can LLMs Follow Simple Rules?
Norman Mu, Sarah Chen, Zifan Wang, Sizhe Chen, David Karamardian, Lulwa Aljeraisy, Dan Hendrycks, David Wagner - AI Deception: A Survey of Examples, Risks, and Potential Solutions
Peter S. Park, Simon Goldstein, Aidan O'Gara, Michael Chen, Dan Hendrycks
Patterns - Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan*, Chan Jun Shern*, Andy Zou*, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks
ICML 2023 - A Unified Survey on Anomaly, Novelty, Open-Set, and Out of-Distribution Detection: Solutions and Future Challenges
Mohammadreza Salehi, Hossein Mirzaei, Dan Hendrycks, Yixuan Li, Mohammad Hossein Rohban, Mohammad Sabokrou
TMLR - Forecasting Future World Events with Neural Networks
Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, Dan Hendrycks
NeurIPS 2022 - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
Mantas Mazeika*, Eric Tang*, Andy Zou, Steven Basart, Dawn Song, David Forsyth, Jacob Steinhardt, Dan Hendrycks
NeurIPS 2022 - OpenOOD: Benchmarking Generalized Out-of-Distribution Detection
Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, WENXUAN PENG, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Dan Hendrycks, Yixuan Li, Ziwei Liu
NeurIPS 2022 - Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks
Anthony M. Barrett, Dan Hendrycks, Jessica Newman, Brandie Nonnecke - A Spectral View of Randomized Smoothing under Common Corruptions: Benchmarking and Improving Certified Robustness
Jiachen Sun, Akshay Mehra, Bhavya Kailkhura, Pin-Yu Chen, Dan Hendrycks, Jihun Hamm, Zhuoqing Mao
ECCV 2022 - PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures
Dan Hendrycks*, Andy Zou*, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, and Jacob Steinhardt
CVPR 2022 - Scaling Out-of-Distribution Detection for Real-World Settings
Dan Hendrycks*, Steven Basart*, Mantas Mazeika, Andy Zou, Joe Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song
ICML 2022 - Unsolved Problems in ML Safety
Dan Hendrycks, Nicholas Carlini, John Schulman, and Jacob Steinhardt
Position Paper - What Would Jiminy Cricket Do? Towards Agents That Behave Morally
Dan Hendrycks*, Mantas Mazeika*, Andy Zou, Sahil Patel, Christine Zhu, Jesus Navarro, Dawn Song, Bo Li, Jacob Steinhardt
NeurIPS 2021 - Measuring Coding Challenge Competence With APPS
Dan Hendrycks*, Steven Basart*, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt
NeurIPS 2021 - Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt
NeurIPS 2021 - CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review
Dan Hendrycks*, Collin Burns*, Anya Chen, Spencer Ball
NeurIPS 2021 - The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
Dan Hendrycks, Steven Basart*, Norman Mu*, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, Justin Gilmer
ICCV 2021 - Natural Adversarial Examples
Dan Hendrycks, Kevin Zhao*, Steven Basart*, Jacob Steinhardt, Dawn Song
CVPR 2021 - Measuring Massive Multitask Language Understanding
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt
ICLR 2021 - Aligning AI With Shared Human Values
Dan Hendrycks*, Collin Burns*, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt
ICLR 2021 - Pretrained Transformers Improve Out-of-Distribution Robustness
Dan Hendrycks*, Xiaoyuan Liu*, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, Dawn Song
ACL 2020 - AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
Dan Hendrycks*, Norman Mu*, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan
ICLR 2020 - Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty
Dan Hendrycks, Mantas Mazeika*, Saurav Kadavath*, Dawn Song
NeurIPS 2019 - Testing Robustness Against Unforeseen Adversaries
Daniel Kang*, Yi Sun*, Dan Hendrycks, Tom Brown, Jacob Steinhardt - Adversarial Example Researchers Need to Expand What is Meant by ‘Robustness’
Justin Gilmer, Dan Hendrycks
Distill 2019 - Using Pre-Training Can Improve Model Robustness and Uncertainty
Dan Hendrycks, Kimin Lee, Mantas Mazeika
ICML 2019 - Deep Anomaly Detection with Outlier Exposure
Dan Hendrycks, Mantas Mazeika, Thomas Dietterich
ICLR 2019 - Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
Dan Hendrycks, Thomas Dietterich
ICLR 2019 - Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise
Dan Hendrycks*, Mantas Mazeika*, Duncan Wilson, Kevin Gimpel
NeurIPS 2018 - Open Category Detection with PAC Guarantees
Si Liu, Risheek Garrepalli, Thomas G. Dietterich, Alan Fern, Dan Hendrycks
ICML 2018 - A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
Dan Hendrycks, Kevin Gimpel
ICLR 2017 - Understanding Large Language Models: Foundations and Safety (UC Berkeley Course)
Dawn Song and Dan Hendrycks - The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Liu, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks - HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendrycks - Representation Engineering: A Top-Down Approach to AI Transparency
Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, Zico Kolter, Dan Hendrycks - Can LLMs Follow Simple Rules?
Norman Mu, Sarah Chen, Zifan Wang, Sizhe Chen, David Karamardian, Lulwa Aljeraisy, Dan Hendrycks, David Wagner - AI Deception: A Survey of Examples, Risks, and Potential Solutions
Peter S. Park, Simon Goldstein, Aidan O'Gara, Michael Chen, Dan Hendrycks
Patterns - Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan*, Chan Jun Shern*, Andy Zou*, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks
ICML 2023 - A Unified Survey on Anomaly, Novelty, Open-Set, and Out of-Distribution Detection: Solutions and Future Challenges
Mohammadreza Salehi, Hossein Mirzaei, Dan Hendrycks, Yixuan Li, Mohammad Hossein Rohban, Mohammad Sabokrou
TMLR - Forecasting Future World Events with Neural Networks
Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, Dan Hendrycks
NeurIPS 2022 - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
Mantas Mazeika*, Eric Tang*, Andy Zou, Steven Basart, Dawn Song, David Forsyth, Jacob Steinhardt, Dan Hendrycks
NeurIPS 2022 - OpenOOD: Benchmarking Generalized Out-of-Distribution Detection
Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, WENXUAN PENG, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Dan Hendrycks, Yixuan Li, Ziwei Liu
NeurIPS 2022 - Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks
Anthony M. Barrett, Dan Hendrycks, Jessica Newman, Brandie Nonnecke - A Spectral View of Randomized Smoothing under Common Corruptions: Benchmarking and Improving Certified Robustness
Jiachen Sun, Akshay Mehra, Bhavya Kailkhura, Pin-Yu Chen, Dan Hendrycks, Jihun Hamm, Zhuoqing Mao
ECCV 2022 - PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures
Dan Hendrycks*, Andy Zou*, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, and Jacob Steinhardt
CVPR 2022 - Scaling Out-of-Distribution Detection for Real-World Settings
Dan Hendrycks*, Steven Basart*, Mantas Mazeika, Andy Zou, Joe Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song
ICML 2022 - Unsolved Problems in ML Safety
Dan Hendrycks, Nicholas Carlini, John Schulman, and Jacob Steinhardt
Position Paper - What Would Jiminy Cricket Do? Towards Agents That Behave Morally
Dan Hendrycks*, Mantas Mazeika*, Andy Zou, Sahil Patel, Christine Zhu, Jesus Navarro, Dawn Song, Bo Li, Jacob Steinhardt
NeurIPS 2021 - Measuring Coding Challenge Competence With APPS
Dan Hendrycks*, Steven Basart*, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt
NeurIPS 2021 - Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt
NeurIPS 2021 - CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review
Dan Hendrycks*, Collin Burns*, Anya Chen, Spencer Ball
NeurIPS 2021 - The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
Dan Hendrycks, Steven Basart*, Norman Mu*, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, Justin Gilmer
ICCV 2021 - Natural Adversarial Examples
Dan Hendrycks, Kevin Zhao*, Steven Basart*, Jacob Steinhardt, Dawn Song
CVPR 2021 - Measuring Massive Multitask Language Understanding
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt
ICLR 2021 - Aligning AI With Shared Human Values
Dan Hendrycks*, Collin Burns*, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt
ICLR 2021 - Pretrained Transformers Improve Out-of-Distribution Robustness
Dan Hendrycks*, Xiaoyuan Liu*, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, Dawn Song
ACL 2020 - AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
Dan Hendrycks*, Norman Mu*, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan
ICLR 2020 - Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty
Dan Hendrycks, Mantas Mazeika*, Saurav Kadavath*, Dawn Song
NeurIPS 2019 - Testing Robustness Against Unforeseen Adversaries
Daniel Kang*, Yi Sun*, Dan Hendrycks, Tom Brown, Jacob Steinhardt - Adversarial Example Researchers Need to Expand What is Meant by ‘Robustness’
Justin Gilmer, Dan Hendrycks
Distill 2019 - Using Pre-Training Can Improve Model Robustness and Uncertainty
Dan Hendrycks, Kimin Lee, Mantas Mazeika
ICML 2019 - Deep Anomaly Detection with Outlier Exposure
Dan Hendrycks, Mantas Mazeika, Thomas Dietterich
ICLR 2019 - Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
Dan Hendrycks, Thomas Dietterich
ICLR 2019 - Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise
Dan Hendrycks*, Mantas Mazeika*, Duncan Wilson, Kevin Gimpel
NeurIPS 2018 - Open Category Detection with PAC Guarantees
Si Liu, Risheek Garrepalli, Thomas G. Dietterich, Alan Fern, Dan Hendrycks
ICML 2018 - A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
Dan Hendrycks, Kevin Gimpel
ICLR 2017
Service
- I co-organized the Workshop on Robustness and Uncertainty Estimation in Deep Learning at ICML 2019, 2020, 2021; an Adversarial Machine Learning workshop at ICML 2021; and the VISDA domain adaptation competition at NeurIPS 2021.
- I have reviewed for CVPR (2019, 2020), ICLR (2019, 2020), NeurIPS (2017, 2020), ICML (2018, 2019), ICCV (2019), ECCV (2018), Transactions on Affective Computing, AI and Ethics, IJCV, TPAMI, and JMLR.