- See Also
-
Links
- “Universal Jailbreak Backdoors from Poisoned Human Feedback”, Rando & Tramèr 2023
- “Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild”, Inie et al 2023
- “Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game”, Toyer et al 2023
- “Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models”, Shan et al 2023
- “Consistency Trajectory Models (CTM): Learning Probability Flow ODE Trajectory of Diffusion”, Kim et al 2023
- “How Robust Is Google’s Bard to Adversarial Image Attacks?”, Dong et al 2023
- “Why Do Universal Adversarial Attacks Work on Large Language Models?: Geometry Might Be the Answer”, Subhash et al 2023
- “Investigating the Existence of "Secret Language” in Language Models”, Wang et al 2023
- “CLIPMasterPrints: Fooling Contrastive Language-Image Pre-training Using Latent Variable Evolution”, Freiberger et al 2023
- “On the Exploitability of Instruction Tuning”, Shu et al 2023
- “Evaluating Superhuman Models With Consistency Checks”, Fluri et al 2023
- “Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Roger 2023
- “TrojText: Test-time Invisible Textual Trojan Insertion”, Liu et al 2023
- “Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models”, Shan et al 2023
- “Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons”, Zehavi & Shamir 2023
- “TrojanPuzzle: Covertly Poisoning Code-Suggestion Models”, Aghakhani et al 2023
- “SNAFUE: Diagnostics for Deep Neural Networks With Automated Copy/Paste Attacks”, Casper et al 2022
- “Are AlphaZero-like Agents Robust to Adversarial Perturbations?”, Lan et al 2022
- “Rickrolling the Artist: Injecting Invisible Backdoors into Text-Guided Image Generation Models”, Struppek et al 2022
- “On Optimal Learning Under Targeted Data Poisoning”, Hanneke et al 2022
- “BTD: Decompiling X86 Deep Neural Network Executables”, Liu et al 2022
- “Discovering Bugs in Vision Models Using Off-the-shelf Image Generation and Captioning”, Wiles et al 2022
- “Adversarially Trained Neural Representations May Already Be As Robust As Corresponding Biological Neural Representations”, Guo et al 2022
- “Flatten the Curve: Efficiently Training Low-Curvature Neural Networks”, Srinivas et al 2022
- “Why Robust Generalization in Deep Learning Is Difficult: Perspective of Expressive Power”, Li et al 2022
- “Diffusion Models for Adversarial Purification”, Nie et al 2022
- “Planting Undetectable Backdoors in Machine Learning Models”, Goldwasser et al 2022
- “Transfer Attacks Revisited: A Large-Scale Empirical Study in Real Computer Vision Settings”, Mao et al 2022
- “On the Effectiveness of Dataset Watermarking in Adversarial Settings”, Tekgul & Asokan 2022
- “An Equivalence Between Data Poisoning and Byzantine Gradient Attacks”, Farhadkhani et al 2022
- “Red Teaming Language Models With Language Models”, Perez et al 2022
- “WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Liu et al 2022
- “CommonsenseQA 2.0: Exposing the Limits of AI through Gamification”, Talmor et al 2022
- “Models in the Loop: Aiding Crowdworkers With Generative Annotation Assistants”, Bartolo et al 2021
- “Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs”, Korkmaz 2021
- “PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, Khashabi et al 2021
- “TnT Attacks! Universal Naturalistic Adversarial Patches Against Deep Neural Network Systems”, Doan et al 2021
- “AugMax: Adversarial Composition of Random Augmentations for Robust Training”, Wang et al 2021
- “Unrestricted Adversarial Attacks on ImageNet Competition”, Chen et al 2021
- “Partial Success in Closing the Gap between Human and Machine Vision”, Geirhos et al 2021
- “A Universal Law of Robustness via Isoperimetry”, Bubeck & Sellke 2021
- “Manipulating SGD With Data Ordering Attacks”, Shumailov et al 2021
- “Gradient-based Adversarial Attacks against Text Transformers”, Guo et al 2021
- “A Law of Robustness for Two-layers Neural Networks”, Bubeck et al 2021
- “Multimodal Neurons in Artificial Neural Networks [CLIP]”, Goh et al 2021
- “Words As a Window: Using Word Embeddings to Explore the Learned Representations of Convolutional Neural Networks”, Dharmaretnam et al 2021
- “Bot-Adversarial Dialogue for Safe Conversational Agents”, Xu et al 2021
- “Unadversarial Examples: Designing Objects for Robust Vision”, Salman et al 2020
- “Concealed Data Poisoning Attacks on NLP Models”, Wallace et al 2020
- “Recipes for Safety in Open-domain Chatbots”, Xu et al 2020
- “Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples”, Gowal et al 2020
- “Dataset Cartography: Mapping and Diagnosing Datasets With Training Dynamics”, Swayamdipta et al 2020
- “Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning)”, El-Mhamdi et al 2020
- “Do Adversarially Robust ImageNet Models Transfer Better?”, Salman et al 2020
- “Smooth Adversarial Training”, Xie et al 2020
- “Sponge Examples: Energy-Latency Attacks on Neural Networks”, Shumailov et al 2020
- “Improving the Interpretability of FMRI Decoding Using Deep Neural Networks and Adversarial Robustness”, McClure et al 2020
- “Approximate Exploitability: Learning a Best Response in Large Games”, Timbers et al 2020
- “Radioactive Data: Tracing through Training”, Sablayrolles et al 2020
- “Adversarial Examples Improve Image Recognition”, Xie et al 2019
- “Fooling LIME and SHAP: Adversarial Attacks on Post Hoc Explanation Methods”, Slack et al 2019
- “The Bouncer Problem: Challenges to Remote Explainability”, Merrer & Tredan 2019
- “Distributionally Robust Language Modeling”, Oren et al 2019
- “Universal Adversarial Triggers for Attacking and Analyzing NLP”, Wallace et al 2019
- “Robustness Properties of Facebook’s ResNeXt WSL Models”, Orhan 2019
- “Intriguing Properties of Adversarial Training at Scale”, Xie & Yuille 2019
- “Adversarially Robust Generalization Just Requires More Unlabeled Data”, Zhai et al 2019
- “Are Labels Required for Improving Adversarial Robustness?”, Uesato et al 2019
- “Adversarial Policies: Attacking Deep Reinforcement Learning”, Gleave et al 2019
- “Adversarial Examples Are Not Bugs, They Are Features”, Ilyas et al 2019
- “Benchmarking Neural Network Robustness to Common Corruptions and Perturbations”, Hendrycks & Dietterich 2019
- “Smooth Adversarial Examples”, Zhang et al 2019
- “Fairwashing: the Risk of Rationalization”, Aïvodji et al 2019
- “Evolving Super Stimuli for Real Neurons Using Deep Generative Networks”, Ponce et al 2019
- “AdVersarial: Perceptual Ad Blocking Meets Adversarial Machine Learning”, Tramèr et al 2018
- “Adversarial Reprogramming of Text Classification Neural Networks”, Neekhara et al 2018
- “Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations”, Hendrycks & Dietterich 2018
- “Adversarial Reprogramming of Neural Networks”, Elsayed et al 2018
- “Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data”, Yang et al 2018
- “Towards the First Adversarially Robust Neural Network Model on MNIST”, Schott et al 2018
- “Sensitivity and Generalization in Neural Networks: an Empirical Study”, Novak et al 2018
- “Adversarial Vulnerability for Any Classifier”, Fawzi et al 2018
- “Intriguing Properties of Adversarial Examples”, Cubuk et al 2018
- “First-order Adversarial Vulnerability of Neural Networks and Input Dimension”, Simon-Gabriel et al 2018
- “Adversarial Spheres”, Gilmer et al 2018
- “CycleGAN, a Master of Steganography”, Chu et al 2017
- “Adversarial Phenomenon in the Eyes of Bayesian Deep Learning”, Rawat et al 2017
- “Mitigating Adversarial Effects Through Randomization”, Xie et al 2017
- “Learning Universal Adversarial Perturbations With Generative Models”, Hayes & Danezis 2017
- “Robust Physical-World Attacks on Deep Learning Models”, Eykholt et al 2017
- “Towards Deep Learning Models Resistant to Adversarial Attacks”, Madry et al 2017
- “Ensemble Adversarial Training: Attacks and Defenses”, Tramèr et al 2017
- “The Space of Transferable Adversarial Examples”, Tramèr et al 2017
- “Adversarial Examples in the Physical World”, Kurakin et al 2016
- “Foveation-based Mechanisms Alleviate Adversarial Examples”, Luo et al 2015
- “Explaining and Harnessing Adversarial Examples”, Goodfellow et al 2014
- “Pixels Still Beat Text: Attacking the OpenAI CLIP Model With Text Patches and Adversarial Pixel Perturbations”
- NoaNabeshima
- “A Universal Law of Robustness”
- “Apple or IPod? Easy Fix for Adversarial Textual Attacks on OpenAI's CLIP Model!”
- “A Law of Robustness and the Importance of Overparameterization in Deep Learning”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“Universal Jailbreak Backdoors from Poisoned Human Feedback”, Rando & Tramèr 2023
“Universal Jailbreak Backdoors from Poisoned Human Feedback”
“Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild”, Inie et al 2023
“Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild”
“Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game”, Toyer et al 2023
“Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game”
“Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models”, Shan et al 2023
“Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models”
“Consistency Trajectory Models (CTM): Learning Probability Flow ODE Trajectory of Diffusion”, Kim et al 2023
“Consistency Trajectory Models (CTM): Learning Probability Flow ODE Trajectory of Diffusion”
“How Robust Is Google’s Bard to Adversarial Image Attacks?”, Dong et al 2023
“Why Do Universal Adversarial Attacks Work on Large Language Models?: Geometry Might Be the Answer”, Subhash et al 2023
“Why do universal adversarial attacks work on large language models?: Geometry might be the answer”
“Investigating the Existence of "Secret Language” in Language Models”, Wang et al 2023
[“Investigating the Existence of "Secret Language” in Language Models”](https://arxiv.org/abs/2307.12507 “‘Investigating the Existence of “Secret Language” in Language Models’, et al 2023 ”){.link-annotated ..include-annotation .include-replace-container}
“CLIPMasterPrints: Fooling Contrastive Language-Image Pre-training Using Latent Variable Evolution”, Freiberger et al 2023
“CLIPMasterPrints: Fooling Contrastive Language-Image Pre-training Using Latent Variable Evolution”
“On the Exploitability of Instruction Tuning”, Shu et al 2023
“Evaluating Superhuman Models With Consistency Checks”, Fluri et al 2023
“Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Roger 2023
“Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”
“TrojText: Test-time Invisible Textual Trojan Insertion”, Liu et al 2023
“Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models”, Shan et al 2023
“Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models”
“Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons”, Zehavi & Shamir 2023
“TrojanPuzzle: Covertly Poisoning Code-Suggestion Models”, Aghakhani et al 2023
“SNAFUE: Diagnostics for Deep Neural Networks With Automated Copy/Paste Attacks”, Casper et al 2022
“SNAFUE: Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks”
“Are AlphaZero-like Agents Robust to Adversarial Perturbations?”, Lan et al 2022
“Are AlphaZero-like Agents Robust to Adversarial Perturbations?”
“Rickrolling the Artist: Injecting Invisible Backdoors into Text-Guided Image Generation Models”, Struppek et al 2022
“Rickrolling the Artist: Injecting Invisible Backdoors into Text-Guided Image Generation Models”
“On Optimal Learning Under Targeted Data Poisoning”, Hanneke et al 2022
“BTD: Decompiling X86 Deep Neural Network Executables”, Liu et al 2022
“Discovering Bugs in Vision Models Using Off-the-shelf Image Generation and Captioning”, Wiles et al 2022
“Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning”
“Adversarially Trained Neural Representations May Already Be As Robust As Corresponding Biological Neural Representations”, Guo et al 2022
“Flatten the Curve: Efficiently Training Low-Curvature Neural Networks”, Srinivas et al 2022
“Flatten the Curve: Efficiently Training Low-Curvature Neural Networks”
“Why Robust Generalization in Deep Learning Is Difficult: Perspective of Expressive Power”, Li et al 2022
“Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power”
“Diffusion Models for Adversarial Purification”, Nie et al 2022
“Planting Undetectable Backdoors in Machine Learning Models”, Goldwasser et al 2022
“Planting Undetectable Backdoors in Machine Learning Models”
“Transfer Attacks Revisited: A Large-Scale Empirical Study in Real Computer Vision Settings”, Mao et al 2022
“Transfer Attacks Revisited: A Large-Scale Empirical Study in Real Computer Vision Settings”
“On the Effectiveness of Dataset Watermarking in Adversarial Settings”, Tekgul & Asokan 2022
“On the Effectiveness of Dataset Watermarking in Adversarial Settings”
“An Equivalence Between Data Poisoning and Byzantine Gradient Attacks”, Farhadkhani et al 2022
“An Equivalence Between Data Poisoning and Byzantine Gradient Attacks”
“Red Teaming Language Models With Language Models”, Perez et al 2022
“WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Liu et al 2022
“WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”
“CommonsenseQA 2.0: Exposing the Limits of AI through Gamification”, Talmor et al 2022
“CommonsenseQA 2.0: Exposing the Limits of AI through Gamification”
“Models in the Loop: Aiding Crowdworkers With Generative Annotation Assistants”, Bartolo et al 2021
“Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants”
“Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs”, Korkmaz 2021
“Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs”
“PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, Khashabi et al 2021
“PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”
“TnT Attacks! Universal Naturalistic Adversarial Patches Against Deep Neural Network Systems”, Doan et al 2021
“TnT Attacks! Universal Naturalistic Adversarial Patches Against Deep Neural Network Systems”
“AugMax: Adversarial Composition of Random Augmentations for Robust Training”, Wang et al 2021
“AugMax: Adversarial Composition of Random Augmentations for Robust Training”
“Unrestricted Adversarial Attacks on ImageNet Competition”, Chen et al 2021
“Partial Success in Closing the Gap between Human and Machine Vision”, Geirhos et al 2021
“Partial success in closing the gap between human and machine vision”
“A Universal Law of Robustness via Isoperimetry”, Bubeck & Sellke 2021
“Manipulating SGD With Data Ordering Attacks”, Shumailov et al 2021
“Gradient-based Adversarial Attacks against Text Transformers”, Guo et al 2021
“Gradient-based Adversarial Attacks against Text Transformers”
“A Law of Robustness for Two-layers Neural Networks”, Bubeck et al 2021
“Multimodal Neurons in Artificial Neural Networks [CLIP]”, Goh et al 2021
“Words As a Window: Using Word Embeddings to Explore the Learned Representations of Convolutional Neural Networks”, Dharmaretnam et al 2021
“Bot-Adversarial Dialogue for Safe Conversational Agents”, Xu et al 2021
“Unadversarial Examples: Designing Objects for Robust Vision”, Salman et al 2020
“Unadversarial Examples: Designing Objects for Robust Vision”
“Concealed Data Poisoning Attacks on NLP Models”, Wallace et al 2020
“Recipes for Safety in Open-domain Chatbots”, Xu et al 2020
“Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples”, Gowal et al 2020
“Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples”
“Dataset Cartography: Mapping and Diagnosing Datasets With Training Dynamics”, Swayamdipta et al 2020
“Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics”
“Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning)”, El-Mhamdi et al 2020
“Do Adversarially Robust ImageNet Models Transfer Better?”, Salman et al 2020
“Smooth Adversarial Training”, Xie et al 2020
“Sponge Examples: Energy-Latency Attacks on Neural Networks”, Shumailov et al 2020
“Sponge Examples: Energy-Latency Attacks on Neural Networks”
“Improving the Interpretability of FMRI Decoding Using Deep Neural Networks and Adversarial Robustness”, McClure et al 2020
“Approximate Exploitability: Learning a Best Response in Large Games”, Timbers et al 2020
“Approximate exploitability: Learning a best response in large games”
“Radioactive Data: Tracing through Training”, Sablayrolles et al 2020
“Adversarial Examples Improve Image Recognition”, Xie et al 2019
“Fooling LIME and SHAP: Adversarial Attacks on Post Hoc Explanation Methods”, Slack et al 2019
“Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods”
“The Bouncer Problem: Challenges to Remote Explainability”, Merrer & Tredan 2019
“Distributionally Robust Language Modeling”, Oren et al 2019
“Universal Adversarial Triggers for Attacking and Analyzing NLP”, Wallace et al 2019
“Universal Adversarial Triggers for Attacking and Analyzing NLP”
“Robustness Properties of Facebook’s ResNeXt WSL Models”, Orhan 2019
“Intriguing Properties of Adversarial Training at Scale”, Xie & Yuille 2019
“Adversarially Robust Generalization Just Requires More Unlabeled Data”, Zhai et al 2019
“Adversarially Robust Generalization Just Requires More Unlabeled Data”
“Are Labels Required for Improving Adversarial Robustness?”, Uesato et al 2019
“Adversarial Policies: Attacking Deep Reinforcement Learning”, Gleave et al 2019
“Adversarial Policies: Attacking Deep Reinforcement Learning”
“Adversarial Examples Are Not Bugs, They Are Features”, Ilyas et al 2019
“Benchmarking Neural Network Robustness to Common Corruptions and Perturbations”, Hendrycks & Dietterich 2019
“Benchmarking Neural Network Robustness to Common Corruptions and Perturbations”
“Smooth Adversarial Examples”, Zhang et al 2019
“Fairwashing: the Risk of Rationalization”, Aïvodji et al 2019
“Evolving Super Stimuli for Real Neurons Using Deep Generative Networks”, Ponce et al 2019
“Evolving super stimuli for real neurons using deep generative networks”
“AdVersarial: Perceptual Ad Blocking Meets Adversarial Machine Learning”, Tramèr et al 2018
“AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning”
“Adversarial Reprogramming of Text Classification Neural Networks”, Neekhara et al 2018
“Adversarial Reprogramming of Text Classification Neural Networks”
“Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations”, Hendrycks & Dietterich 2018
“Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations”
“Adversarial Reprogramming of Neural Networks”, Elsayed et al 2018
“Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data”, Yang et al 2018
“Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data”
“Towards the First Adversarially Robust Neural Network Model on MNIST”, Schott et al 2018
“Towards the first adversarially robust neural network model on MNIST”
“Sensitivity and Generalization in Neural Networks: an Empirical Study”, Novak et al 2018
“Sensitivity and Generalization in Neural Networks: an Empirical Study”
“Adversarial Vulnerability for Any Classifier”, Fawzi et al 2018
“Intriguing Properties of Adversarial Examples”, Cubuk et al 2018
“First-order Adversarial Vulnerability of Neural Networks and Input Dimension”, Simon-Gabriel et al 2018
“First-order Adversarial Vulnerability of Neural Networks and Input Dimension”
“Adversarial Spheres”, Gilmer et al 2018
“CycleGAN, a Master of Steganography”, Chu et al 2017
“Adversarial Phenomenon in the Eyes of Bayesian Deep Learning”, Rawat et al 2017
“Adversarial Phenomenon in the Eyes of Bayesian Deep Learning”
“Mitigating Adversarial Effects Through Randomization”, Xie et al 2017
“Learning Universal Adversarial Perturbations With Generative Models”, Hayes & Danezis 2017
“Learning Universal Adversarial Perturbations with Generative Models”
“Robust Physical-World Attacks on Deep Learning Models”, Eykholt et al 2017
“Towards Deep Learning Models Resistant to Adversarial Attacks”, Madry et al 2017
“Towards Deep Learning Models Resistant to Adversarial Attacks”
“Ensemble Adversarial Training: Attacks and Defenses”, Tramèr et al 2017
“The Space of Transferable Adversarial Examples”, Tramèr et al 2017
“Adversarial Examples in the Physical World”, Kurakin et al 2016
“Foveation-based Mechanisms Alleviate Adversarial Examples”, Luo et al 2015
“Explaining and Harnessing Adversarial Examples”, Goodfellow et al 2014
“Pixels Still Beat Text: Attacking the OpenAI CLIP Model With Text Patches and Adversarial Pixel Perturbations”
NoaNabeshima
“A Universal Law of Robustness”
“Apple or IPod? Easy Fix for Adversarial Textual Attacks on OpenAI's CLIP Model!”
“Apple or iPod? Easy Fix for Adversarial Textual Attacks on OpenAI's CLIP Model!”
“A Law of Robustness and the Importance of Overparameterization in Deep Learning”
“A law of robustness and the importance of overparameterization in deep learning”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
deception
attack-analysis
robustness
resilience
Wikipedia
Miscellaneous
-
https://chat.openai.com/share/312e82f0-cc5e-47f3-b368-b2c0c0f4ad3f
-
https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/
-
https://openai.com/research/attacking-machine-learning-with-adversarial-examples
-
https://reiinakano.com/2019/06/21/robust-neural-style-transfer.html
-
https://spectrum.ieee.org/its-too-easy-to-hide-bias-in-deeplearning-systems
-
https://stanislavfort.github.io/2021/01/12/OpenAI_CLIP_adversarial_examples.html
-
https://twitter.com/SebastienBubeck/status/1402645428504461319
-
https://twitter.com/papayathreesome/status/1670170344953372676
-
https://twitter.com/supercomposite/status/1567162288087470081
-
https://win-vector.com/2016/09/11/adversarial-machine-learning/
-
https://www.lesswrong.com/posts/7GQZyooNi5nqgoyyJ/mlsn-2-adversarial-training
-
https://www.quantamagazine.org/cryptographers-show-how-to-hide-invisible-backdoors-in-ai-20230302/
-
https://www.reddit.com/r/DotA2/comments/beyilz/openai_live_updates_thread_lessons_on_how_to_beat/
Link Bibliography
-
https://arxiv.org/abs/2310.02279#sony
: “Consistency Trajectory Models (CTM): Learning Probability Flow ODE Trajectory of Diffusion”, -
https://arxiv.org/abs/2306.07567
: “Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Fabien Roger -
https://arxiv.org/abs/2303.02242
: “TrojText: Test-time Invisible Textual Trojan Insertion”, Yepeng Liu, Bo Feng, Qian Lou -
https://arxiv.org/abs/2302.04222
: “Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models”, Shawn Shan, Jenna Cryan, Emily Wenger, Haitao Zheng, Rana Hanocka, Ben Y. Zhao -
https://arxiv.org/abs/2211.03769
: “Are AlphaZero-like Agents Robust to Adversarial Perturbations?”, Li-Cheng Lan, Huan Zhang, Ti-Rong Wu, Meng-Yu Tsai, I-Chen Wu, Cho-Jui Hsieh -
https://arxiv.org/abs/2208.08831#deepmind
: “Discovering Bugs in Vision Models Using Off-the-shelf Image Generation and Captioning”, Olivia Wiles, Isabela Albuquerque, Sven Gowal -
https://arxiv.org/abs/2205.07460
: “Diffusion Models for Adversarial Purification”, Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, Anima Anandkumar -
https://swabhs.com/assets/pdf/wanli.pdf#allen
: “WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Alisa Liu, Swabha Swayamdipta, Noah A. Smith, Yejin Choi -
https://arxiv.org/abs/2201.05320#allen
: “CommonsenseQA 2.0: Exposing the Limits of AI through Gamification”, Alon Talmor, Ori Yoran, Ronan Le Bras, Chandra Bhagavatula, Yoav Goldberg, Yejin Choi, Jonathan Berant -
https://arxiv.org/abs/2110.13771#nvidia
: “AugMax: Adversarial Composition of Random Augmentations for Robust Training”, Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Anima Anandkumar, Zhangyang Wang -
https://arxiv.org/abs/2106.07411
: “Partial Success in Closing the Gap between Human and Machine Vision”, -
https://arxiv.org/abs/2105.12806
: “A Universal Law of Robustness via Isoperimetry”, Sébastien Bubeck, Mark Sellke -
https://distill.pub/2021/multimodal-neurons/#openai
: “Multimodal Neurons in Artificial Neural Networks [CLIP]”, -
https://aclanthology.org/2021.naacl-main.235.pdf#facebook
: “Bot-Adversarial Dialogue for Safe Conversational Agents”, Jing Xu, Da Ju, Margaret Li, Y-Lan Boureau, Jason Weston, Emily Dinan -
https://arxiv.org/abs/2006.14536#google
: “Smooth Adversarial Training”, Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le -
https://arxiv.org/abs/2002.00937
: “Radioactive Data: Tracing through Training”, Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou -
https://arxiv.org/abs/1911.09665
: “Adversarial Examples Improve Image Recognition”, Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le -
https://arxiv.org/abs/1706.06083
: “Towards Deep Learning Models Resistant to Adversarial Attacks”, Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu