Bibliography:

  1. ‘meta-learning’ tag

  2. ‘dynamic evaluation (NN)’ tag

  3. Flexible task abstractions emerge in linear networks with fast and bounded units

  4. LoRA vs Full Fine-tuning: An Illusion of Equivalence

  5. Investigating learning-independent abstract reasoning in artificial neural networks

  6. How Do Large Language Models Acquire Factual Knowledge During Pretraining?

  7. Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

  8. Simple and Scalable Strategies to Continually Pre-train Large Language Models

  9. Online Adaptation of Language Models with a Memory of Amortized Contexts (MAC)

  10. When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

  11. Investigating Continual Pretraining in Large Language Models: Insights and Implications

  12. RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture

  13. LLaMA Pro: Progressive LLaMA with Block Expansion

  14. Large Language Models Relearn Removed Concepts

  15. Language Model Alignment with Elastic Reset

  16. In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries

  17. Loss of Plasticity in Deep Continual Learning (Continual Backpropagation)

  18. Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA

  19. Understanding plasticity in neural networks

  20. The Forward-Forward Algorithm: Some Preliminary Investigations

  21. Broken Neural Scaling Laws

  22. Exclusive Supermask Subnetwork Training for Continual Learning

  23. Learn the Time to Learn: Replay Scheduling in Continual Learning

  24. On the Effectiveness of Compact Biomedical Transformers (✱BioBERT)

  25. Don’t Stop Learning: Towards Continual Learning for the CLIP Model

  26. Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human Supervision

  27. Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)

  28. CT0: Fine-tuned Language Models are Continual Learners

  29. Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

  30. Continual Pre-Training Mitigates Forgetting in Language and Vision

  31. Continual Learning with Foundation Models: An Empirical Study of Latent Replay

  32. DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning

  33. Effect of scale on catastrophic forgetting in neural networks

  34. The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention

  35. Learning to Prompt for Continual Learning

  36. An Empirical Investigation of the Role of Pre-training in Lifelong Learning

  37. The Geometry of Representational Drift in Natural and Artificial Neural Networks

  38. Wide Neural Networks Forget Less Catastrophically

  39. Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

  40. Continuous Coordination As a Realistic Scenario for Lifelong Learning

  41. Inductive Biases for Deep Learning of Higher-Level Cognition

  42. Learning from the Past: Meta-Continual Learning with Knowledge Embedding for Jointly Sketch, Cartoon, and Caricature Face Recognition

  43. Meta-Learning through Hebbian Plasticity in Random Networks

  44. Learning to Learn with Feedback and Local Plasticity

  45. Understanding the Role of Training Regimes in Continual Learning

  46. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks

  47. Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic Reinforcement Learning

  48. On Warm-Starting Neural Network Training

  49. Gated Linear Networks

  50. Learning and Evaluating General Linguistic Intelligence

  51. Self-Net: Lifelong Learning via Continual Self-Modeling

  52. Unicorn: Continual Learning with a Universal, Off-policy Agent

  53. Meta Networks

  54. PathNet: Evolution Channels Gradient Descent in Super Neural Networks

  55. Overcoming catastrophic forgetting in neural networks

  56. Repeat Before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks

  57. Can LLMs Learn from a Single Example?

  58. 5c73cf7b7ebdb67c15013107c0ba82613c5661ef.html

  59. design#future-tag-features

    [Transclude the forward-link's context]

  60. 2024-ibrahim-figure1-continualpretrainingwithcyclicallearningratematchesfromscratchtraining.png

  61. https://github.com/ThomasScialom/T0_continual_learning

  62. https://huggingface.co/ThomasNLG/CT0-11B

  63. https://x.com/john__allard/status/1748140402912481537

  64. RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture

  65. https%253A%252F%252Farxiv.org%252Fabs%252F2401.08406%2523microsoft.html

  66. Language Model Alignment with Elastic Reset

  67. Aaron Courville

  68. https%253A%252F%252Farxiv.org%252Fabs%252F2312.07551.html

  69. Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human Supervision

  70. https%253A%252F%252Farxiv.org%252Fabs%252F2206.14349.html

  71. CT0: Fine-tuned Language Models are Continual Learners

  72. https%253A%252F%252Farxiv.org%252Fabs%252F2205.12393.html

  73. Wide Neural Networks Forget Less Catastrophically

  74. https://sites.google.com/view/razp/home

  75. https%253A%252F%252Farxiv.org%252Fabs%252F2110.11526%2523deepmind.html

  76. Wikipedia Bibliography:

    1. Ranveer Chandra

    2. Pieter Abbeel