reinforcementlearning: search results

search

posts

DL, I, M, R"TextCraftor: Your Text Encoder Can be Image Quality Controller", Li et al 2024 {Snapchat}

2 points 0 comments submitted 10 days ago by gwern to r/reinforcementlearning

https://arxiv.org/abs/2403.18978#snapchat

IHey everyone, just came across PUBLIC AI. What makes it different from other AI projects out there?

0 points 1 comment submitted 13 days ago by mrwookee to r/reinforcementlearning

DL, M, I, R"RewardBench: Evaluating Reward Models for Language Modeling", Lambert et al 2024

3 points 0 comments submitted 18 days ago by gwern to r/reinforcementlearning

https://arxiv.org/abs/2403.13787

N, DL, M, IDevin launched by Cognition AI: "Gold-Medalist Coders Build an AI That Can Do Their Job for Them"

14 points 13 comments submitted 24 days ago by gwern to r/reinforcementlearning

https://www.bloomberg.com/news/articles/2024-03-12/cognition-ai-is-a-peter-thiel-backed-coding-assistant

D, ISupervised Learning vs. Offline Reinforcement Learning

15 points 6 comments submitted 25 days ago by StwayneXG to r/reinforcementlearning

I'm starting off with RL and these might be very trivial questions but I want to wrap my head around everything as best as I can. If you have any resources that would provide good intuitions behind applications of RL, please provide them in the comments too :) Thanks.

Questions:

In which scenarios do we prefer supervised learning over offline reinforcement learning?
How does the number of samples affect the training for each case? Does supervised learning converge faster?
What are the examples where both of them have been used and compared for comparative analysis?

Intuition:

Supervised Learning can be good for predicting a reward given a state but we cannot depend on it for maximizing future rewards. Since it does not use rollouts to maximize rewards, and it does not do planning, we cannot expect to use it in cases where delayed rewards would be expected.
Also, in a dynamic environment that is non-iid, each action affects the state and then affects further actions taken. So, for continual settings, we accounted for distributional shift in most cases for RL.
Supervised Learning tries to find the best action for each state, which may be correct in most of the cases but it is a very rigid and dumb approach for ever changing environments. Reinforcement Learning learns for itself and is more adaptable.

For the answers, if possible, provide with a single-liner and then any detail and source of answer would be appreciated too. I want this post to be a nice guideline for anyone trying to apply RL. I'll edit and update answers to any questions answered below to compile all the information I get. If you feel like I should be thinking about any other major questions and concerns, mention them as well please. Thank you!

[EDIT]: Resources I found regarding this:

RAIL Lecture by Sergey Levine: Imitation Learning vs. Offline Reinforcement Learning

Medium post by Sergey Levine: Decisions from Data: How Offline Reinforcement Learning Will Change How We Use Machine Learning

Medium post by Sergey Levine: Understanding the World Through Action: RL as a Foundation for Scalable Self-Supervised Learning

Research Paper by Sergey Levine: When Should We Prefer Offline Reinforcement Learning over Behavioral Cloning?

Research Paper by Sergey Levine: RVS: What is Essential for Offline RL via Supervised Learning?

moreless

DL, I, MetaRL, M, R"How to Generate and Use Synthetic Data for Finetuning", Eugene Yan

2 points 0 comments submitted 27 days ago by gwern to r/reinforcementlearning

https://eugeneyan.com/writing/synthetic/

M, MF, I, R"Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?", Du et al 2020

4 points 5 comments submitted 28 days ago by gwern to r/reinforcementlearning