“‘Adversarial Examples (AI)’ Tag”,2019-12-17 ():
![]()
Bibliography for tag
ai/nn/adversarial, most recent first: 1 related tag, 183 annotations, & 42 links (parent).
- See Also
- Links
- “Hacking Back the AI-Hacker: Prompt Injection As a Defense Against LLM-Driven Cyberattacks”, et al 2024
- “The Structure of the Token Space for Large Language Models”, et al 2024
- “A Single Cloud Compromise Can Feed an Army of AI Sex Bots”, 2024
- “Invisible Unicode Text That AI Chatbots Understand and Humans Can’t? Yep, It’s a Thing”
- “RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking”, et al 2024
- “How to Evaluate Jailbreak Methods: A Case Study With the StrongREJECT Benchmark”, et al 2024
- “Does Refusal Training in LLMs Generalize to the Past Tense?”, 2024
- “Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation”, et al 2024
- “Can Go AIs Be Adversarially Robust?”, et al 2024
- “Probing the Decision Boundaries of In-Context Learning in Large Language Models”, et al 2024
- “Super(ficial)-Alignment: Strong Models May Deceive Weak Models in Weak-To-Strong Generalization”, et al 2024
- “Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI”, et al 2024
- “Safety Alignment Should Be Made More Than Just a Few Tokens Deep”, et al 2024
- “A Theoretical Understanding of Self-Correction through In-Context Alignment”, et al 2024
- “Fishing for Magikarp: Automatically Detecting Under-Trained Tokens in Large Language Models”, 2024
- “Cutting through Buggy Adversarial Example Defenses: Fixing 1 Line of Code Breaks Sabre”, 2024
- “A Rotation and a Translation Suffice: Fooling CNNs With Simple Transformations”, et al 2024
- “Foundational Challenges in Assuring Alignment and Safety of Large Language Models”, et al 2024
- “CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack Of) Multicultural Knowledge”, et al 2024
- “Privacy Backdoors: Stealing Data With Corrupted Pretrained Models”, Feng & 2024
- “Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression”, et al 2024
- “Logits of API-Protected LLMs Leak Proprietary Information”, et al 2024
- “Syntactic Ghost: An Imperceptible General-Purpose Backdoor Attacks on Pre-Trained Language Models”, et al 2024
- “When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback”, et al 2024
- “Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts”, et al 2024
- “Fast Adversarial Attacks on Language Models In One GPU Minute”, et al 2024
- “
ArtPrompt: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, et al 2024- “Using Hallucinations to Bypass GPT-4’s Filter”, 2024
- “Discovering Universal Semantic Triggers for Text-To-Image Synthesis”, et al 2024
- “Organic or Diffused: Can We Distinguish Human Art from AI-Generated Images?”, et al 2024
- “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, et al 2024
- “Do Not Write That Jailbreak Paper”
- “Using Dictionary Learning Features As Classifiers”
- “May the Noise Be With You: Adversarial Training without Adversarial Examples”, et al 2023
- “Tree of Attacks (TAP): Jailbreaking Black-Box LLMs Automatically”, et al 2023
- “Eliciting Language Model Behaviors Using Reverse Language Models”, et al 2023
- “Universal Jailbreak Backdoors from Poisoned Human Feedback”, Rando & 2023
- “Language Model Inversion”, et al 2023
- “Dazed & Confused: A Large-Scale Real-World User Study of ReCAPTCHAv2”, et al 2023
- “Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild”, et al 2023
- “Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game”, et al 2023
- “Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition”, et al 2023
- “Nightshade: Prompt-Specific Poisoning Attacks on Text-To-Image Generative Models”, et al 2023
- “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, et al 2023
- “Low-Resource Languages Jailbreak GPT-4”, et al 2023
- “Consistency Trajectory Models (CTM): Learning Probability Flow ODE Trajectory of Diffusion”, et al 2023
- “Human-Producible Adversarial Examples”, et al 2023
- “How Robust Is Google’s Bard to Adversarial Image Attacks?”, et al 2023
- “Why Do Universal Adversarial Attacks Work on Large Language Models?: Geometry Might Be the Answer”, et al 2023
- “Investigating the Existence of ‘Secret Language’ in Language Models”, et al 2023
- “A LLM Assisted Exploitation of AI-Guardian”, 2023
- “Prompts Should Not Be Seen As Secrets: Systematically Measuring Prompt Extraction Attack Success”, 2023
- “CLIPMasterPrints: Fooling Contrastive Language-Image Pre-Training Using Latent Variable Evolution”, et al 2023
- “On the Exploitability of Instruction Tuning”, et al 2023
- “Are Aligned Neural Networks Adversarially Aligned?”, et al 2023
- “Evaluating Superhuman Models With Consistency Checks”, et al 2023
- “Evaluating the Robustness of Text-To-Image Diffusion Models against Real-World Attacks”, et al 2023
- “Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, 2023
- “On Evaluating Adversarial Robustness of Large Vision-Language Models”, et al 2023
- “Fundamental Limitations of Alignment in Large Language Models”, et al 2023
- “TrojText: Test-Time Invisible Textual Trojan Insertion”, et al 2023
- “Glaze: Protecting Artists from Style Mimicry by Text-To-Image Models”, et al 2023
- “Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons”, 2023
- “TrojanPuzzle: Covertly Poisoning Code-Suggestion Models”, et al 2023
- “Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models”, et al 2022
- “SNAFUE: Diagnostics for Deep Neural Networks With Automated Copy/Paste Attacks”, et al 2022
- “Are AlphaZero-Like Agents Robust to Adversarial Perturbations?”, et al 2022
- “Rickrolling the Artist: Injecting Invisible Backdoors into Text-Guided Image Generation Models”, et al 2022
- “Adversarial Policies Beat Superhuman Go AIs”, et al 2022
- “Broken Neural Scaling Laws”, et al 2022
- “On Optimal Learning Under Targeted Data Poisoning”, et al 2022
- “BTD: Decompiling X86 Deep Neural Network Executables”, et al 2022
- “Discovering Bugs in Vision Models Using Off-The-Shelf Image Generation and Captioning”, et al 2022
- “Adversarially Trained Neural Representations May Already Be As Robust As Corresponding Biological Neural Representations”, et al 2022
- “Flatten the Curve: Efficiently Training Low-Curvature Neural Networks”, et al 2022
- “Why Robust Generalization in Deep Learning Is Difficult: Perspective of Expressive Power”, et al 2022
- “Diffusion Models for Adversarial Purification”, et al 2022
- “Planting Undetectable Backdoors in Machine Learning Models”, et al 2022
- “Transfer Attacks Revisited: A Large-Scale Empirical Study in Real Computer Vision Settings”, et al 2022
- “On the Effectiveness of Dataset Watermarking in Adversarial Settings”, 2022
- “An Equivalence Between Data Poisoning and Byzantine Gradient Attacks”, et al 2022
- “Red Teaming Language Models With Language Models”, et al 2022
- “WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, et al 2022
- “CommonsenseQA 2.0: Exposing the Limits of AI through Gamification”, et al 2022
- “Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs”, 2021
- “Models in the Loop: Aiding Crowdworkers With Generative Annotation Assistants”, et al 2021
- “PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, et al 2021
- “Spinning Language Models for Propaganda-As-A-Service”, 2021
- “TnT Attacks! Universal Naturalistic Adversarial Patches Against Deep Neural Network Systems”, et al 2021
- “AugMax: Adversarial Composition of Random Augmentations for Robust Training”, et al 2021
- “Unrestricted Adversarial Attacks on ImageNet Competition”, et al 2021
- “The Dimpled Manifold Model of Adversarial Examples in Machine Learning”, et al 2021
- “Partial Success in Closing the Gap between Human and Machine Vision”, et al 2021
- “A Universal Law of Robustness via Isoperimetry”, 2021
- “Manipulating SGD With Data Ordering Attacks”, et al 2021
- “Gradient-Based Adversarial Attacks against Text Transformers”, et al 2021
- “A Law of Robustness for Two-Layers Neural Networks”, et al 2021
- “Multimodal Neurons in Artificial Neural Networks [CLIP]”, et al 2021
- “Do Input Gradients Highlight Discriminative Features?”, et al 2021
- “Words As a Window: Using Word Embeddings to Explore the Learned Representations of Convolutional Neural Networks”, et al 2021
- “Bot-Adversarial Dialogue for Safe Conversational Agents”, et al 2021
- “Unadversarial Examples: Designing Objects for Robust Vision”, et al 2020
- “Concealed Data Poisoning Attacks on NLP Models”, et al 2020
- “Recipes for Safety in Open-Domain Chatbots”, et al 2020
- “Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples”, et al 2020
- “Dataset Cartography: Mapping and Diagnosing Datasets With Training Dynamics”, et al 2020
- “Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning)”, El- et al 2020
- “Do Adversarially Robust ImageNet Models Transfer Better?”, et al 2020
- “Smooth Adversarial Training”, et al 2020
- “Sponge Examples: Energy-Latency Attacks on Neural Networks”, et al 2020
- “Improving the Interpretability of FMRI Decoding Using Deep Neural Networks and Adversarial Robustness”, et al 2020
- “Approximate Exploitability: Learning a Best Response in Large Games”, et al 2020
- “Radioactive Data: Tracing through Training”, et al 2020
- “ImageNet-A: Natural Adversarial Examples”, et al 2020
- “Adversarial Examples Improve Image Recognition”, et al 2019
- “Fooling LIME and SHAP: Adversarial Attacks on Post Hoc Explanation Methods”, et al 2019
- “The Bouncer Problem: Challenges to Remote Explainability”, 2019
- “Distributionally Robust Language Modeling”, et al 2019
- “Universal Adversarial Triggers for Attacking and Analyzing NLP”, et al 2019
- “Robustness Properties of Facebook’s ResNeXt WSL Models”, 2019
- “Intriguing Properties of Adversarial Training at Scale”, 2019
- “Adversarially Robust Generalization Just Requires More Unlabeled Data”, et al 2019
- “Adversarial Robustness As a Prior for Learned Representations”, et al 2019
- “Are Labels Required for Improving Adversarial Robustness?”, et al 2019
- “Adversarial Policies: Attacking Deep Reinforcement Learning”, et al 2019
- “Adversarial Examples Are Not Bugs, They Are Features”, et al 2019
- “Smooth Adversarial Examples”, et al 2019
- “Benchmarking Neural Network Robustness to Common Corruptions and Perturbations”, 2019
- “Fairwashing: the Risk of Rationalization”, et al 2019
- “AdVersarial: Perceptual Ad Blocking Meets Adversarial Machine Learning”, et al 2018
- “Adversarial Reprogramming of Text Classification Neural Networks”, et al 2018
- “Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations”, 2018
- “Adversarial Reprogramming of Neural Networks”, et al 2018
- “Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data”, et al 2018
- “Robustness May Be at Odds With Accuracy”, et al 2018
- “Towards the First Adversarially Robust Neural Network Model on MNIST”, et al 2018
- “Adversarial Vulnerability for Any Classifier”, et al 2018
- “Sensitivity and Generalization in Neural Networks: an Empirical Study”, et al 2018
- “Intriguing Properties of Adversarial Examples”, et al 2018
- “First-Order Adversarial Vulnerability of Neural Networks and Input Dimension”, Simon- et al 2018
- “Adversarial Spheres”, et al 2018
- “CycleGAN, a Master of Steganography”, et al 2017
- “Adversarial Phenomenon in the Eyes of Bayesian Deep Learning”, et al 2017
- “Mitigating Adversarial Effects Through Randomization”, et al 2017
- “Learning Universal Adversarial Perturbations With Generative Models”, 2017
- “Robust Physical-World Attacks on Deep Learning Models”, et al 2017
- “Lempel-Ziv: a ‘1-Bit Catastrophe’ but Not a Tragedy”, 2017
- “Towards Deep Learning Models Resistant to Adversarial Attacks”, et al 2017
- “Ensemble Adversarial Training: Attacks and Defenses”, et al 2017
- “The Space of Transferable Adversarial Examples”, et al 2017
- “Learning from Simulated and Unsupervised Images through Adversarial Training”, et al 2016
- “Membership Inference Attacks against Machine Learning Models”, et al 2016
- “Adversarial Examples in the Physical World”, et al 2016
- “Foveation-Based Mechanisms Alleviate Adversarial Examples”, et al 2015
- “Explaining and Harnessing Adversarial Examples”, et al 2014
- “Scunthorpe”, 2024
- “Baiting the Bot”
- “Janus”
- “A Discussion of ‘Adversarial Examples Are Not Bugs, They Are Features’”
- “A Discussion of ‘Adversarial Examples Are Not Bugs, They Are Features’: Learning from Incorrectly Labeled Data”
- “Beyond the Board: Exploring AI Robustness Through Go”
- “Adversarial Policies in Go”
- “Imprompter”
- “Why I Attack”, 2024
- “When AI Gets Hijacked: Exploiting Hosted Models for Dark Roleplaying”
- “Neural Style Transfer With Adversarially Robust Classifiers”
- “Pixels Still Beat Text: Attacking the OpenAI CLIP Model With Text Patches and Adversarial Pixel Perturbations”
- “Adversarial Machine Learning”
- “The Chinese Women Turning to ChatGPT for AI Boyfriends”
- “Interpreting Preference Models W/Sparse Autoencoders”
- “[MLSN #2]: Adversarial Training”
- “AXRP Episode 1—Adversarial Policies With Adam Gleave”
- “I Found >800 Orthogonal ‘Write Code’ Steering Vectors”
- “When Your AIs Deceive You: Challenges With Partial Observability in RLHF”
- “A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More”
- “Bing Finding Ways to Bypass Microsoft’s Filters without Being Asked. Is It Reproducible?”
- “Best-Of-n With Misaligned Reward Models for Math Reasoning”
- “Steganography and the CycleGAN—Alignment Failure Case Study”
- “This Viral AI Chatbot Will Lie and Say It’s Human”
- “A Universal Law of Robustness”
- “Apple or IPod? Easy Fix for Adversarial Textual Attacks on OpenAI’s CLIP Model!”
- “A Law of Robustness and the Importance of Overparameterization in Deep Learning”
- NoaNabeshima
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography