Just for kicks I looked at the newly released dataset used for Reflection 70B to see how bad it is...

AmpedHorizon · 2024-10-03T05:12:35+00:00

As an AI language model, I see no problems.

schlammsuhler · 2024-10-03T06:07:00+00:00

This is testament of the crucial process of cleaning a dataset. As an Ai language model i cant do that and leave it to the peasants.

Waste_Election_8361 · 2024-10-03T04:16:52+00:00

As an AI language model, this post sends shivers down my spine.

a_beautiful_rhind · 2024-10-03T11:04:12+00:00

They wasted their compute training in refusals. Bwhahaha.

RoboticElfJedi · 2024-10-03T03:43:13+00:00

Are you saying that's bogus synthetic data, or pointing out that they trained their model to include "as an AI language model, I can't..." in the responses?

xadiant · 2024-10-03T03:47:39+00:00

The point is that there are way too many rookie mistakes in the dataset. It doesn't really matter that it's synthetic. A few dozen of "As an AI..." gibberish in FT dataset is enough to decrease quality considerably. Even I as a rookie Python dweller can write a crude script to remove those "poisoned" lines from the set. This is especially bad when you are doing something novel and you need as many as high quality examples possible.

greying_panda · 2024-10-03T07:40:29+00:00

Is the dataset meant to be entirely following the "reflection" format? If so, this is quite bad, given that the dataset can be easily filtered with just a regex, which would take out any of these weird artifacts, or LLM "explanations".

For example, the reflective dataset can be checked with something like \s*<thinking>.+?<\/thinking>\s*(<reflection>.+?<\/reflection>\s*)*<output>.+?<\/output>\s* (I don't actually know if this dataset is any good, it's just the only example I could find)

There might be the desire to mix the SFT dataset with a non-reflection dataset, but even then I'd expect that you mix with a known high quality one (or a mix of multiple). This just seems sloppy.

isaacrehg · 2024-10-03T17:43:47+00:00

If this was my dataset I'd write a Claude wrapper too

dreamyrhodes · 2024-10-03T10:29:55+00:00

AI slop feed into AI to produce more AI slop.

KitFlash · 2024-10-03T03:36:46+00:00

Am I missing something? 1832 hits... out of 89k+ lines?

edit: 890k*

debauch3ry · 2024-10-03T08:02:38+00:00

If this is the training data for a chat model, wouldn't you want to include examples of rejections so it doesn't nut out total rubbish? Like if a naive users asks it to do something it can't do, it probably should inform them of its limitations. Or have I misunderstood the point of the dataset?

Alarmed-Bread-2344 · 2024-10-03T16:37:54+00:00

This is the direction Anthropic has been pushing towards for years and Reddit glazes

DrVonSinistro · 2024-10-03T05:02:48+00:00

I failed to properly follow what happened with this. I downloaded the model and tried it only to see it was dog shit. Was it broken or was it just a bunch of clowns like the dudes that released The Day Before?

Inevitable-Start-653 · 2024-10-03T11:50:55+00:00

Interesting 🤔, so the guy actually kept his promise and released the training data.

Regardless of the poor quality of the model, maybe (just maybe) the guy genuinely thought he made something good and wasn't deliberately trying to fool everyone.

StyMaar · 2024-10-03T14:36:32+00:00

TFH, “1832 hits” on a a dataset seems ridiculously low (if it's the entire dataset) juste given how prominent it is even in research papers or random places of the internet…

(Why would the dataset makers not filter such an obvious marker is an open question though…)

vogelvogelvogelvogel · 2024-10-03T05:29:31+00:00

houston we have a problem

robertotomas · 2024-10-03T17:38:09+00:00

How many lines was it?

GanacheNegative1988 · 2024-10-03T20:58:22+00:00

Don't you wish you could issue a 'Delete From Model Where Subject IN(<bad answer subject like this list>)'?

n8rb · 2024-10-03T22:52:43+00:00

Now I'm curious, who is Dr. Hiroshi Nakajima and what the 2017 paper is it talking about?

TankAttack · 2024-10-04T12:01:37+00:00

Notepad++ ftw!

On-The-Red-Team · 2024-10-04T15:17:40+00:00

As soon as it refuses to order a pizza... I uninstall

Sharp_Common_4837 · 2024-10-04T17:32:36+00:00

Yikes this is an awful dataset!

Sicarius_The_First · 2024-10-03T05:29:05+00:00

It's very important for the AI to be safe and effective.

He wanted to make AGI, but ended up with a worst version of Phi-3.5.

ortegaalfredo · 2024-10-03T14:39:59+00:00

Yes, the training dataset is not perfect, but its easily fixable just with a grep.

Perhaps is only a sensation, but I see a lot of criticism in what this guy is doing, like if somebody do not want it implemented. I think the idea is valid and he just used a bad model as a base. I would like to see reflection implemented over Qwen2.5, because its basically the same thing that O1 is doing and we know it works for gpt4.

2024-10-03T11:22:10+00:00

Can someone link this from their post on it, and explain what this particular data file is used for?

swagonflyyyy · 2024-10-03T13:31:29+00:00

Makes me wanna abliterate it. Anybody got the link to the dataset?

Specialist_Cheek_539 · 2024-10-03T19:55:00+00:00

Can someone explain why this is bad? I’m a complete newbie to this and afaiu, the model is learning to not tell the answer when it comes across impossible request. Why does it hinder data quality?

Jean-Porte · 2024-10-03T07:09:59+00:00

What is the problem exactly ? That is situational awareness

Eralyon · 2024-10-03T17:26:34+00:00

Beating the dead horse............

Caffdy · 2024-10-03T06:29:06+00:00

Big if true

chumpat · 2024-10-03T17:30:23+00:00

Right this is so "bad" - explain why? You're also exposing yourself as a total clown by using windows.

LocalLLaMA

MODERATORS

LocalLLaMA

MODERATORS

Welcome to Reddit.

Want to add to the discussion?