Hacker News new | past | comments | ask | show | jobs | submit login

I recently wired up a twilio phone number to a cloud nano instance that was just running ~100 lines of code to receive SMS messages, call out to GPT3 (just davinci-002, not the new 003 or this new chat model) with a dialogue-esque prompt ("The following is a transcript of a text message conversation between two friends..." sort of thing). I kept a running transcript of the last 100 messages for each conversation and just fed it into the prompt to get the message to respond with.

I had a few of my friends message it. For the non-technical friends, it was amazing to see the transcripts. Even though they knew it was an AI (superhuman response times), they had full conversations with it as if it were a human. Some of them chatted for over an hour!

A lot of people loved using it as a friendly explainer, basically a human interface on top of wikipedia. Other people used it as a sort of therapist, just dumping their problems and thoughts and it would respond in a helpful and friendly way.

Most people had no idea AI had progressed to this point, and I'm sure they could have been convinced that this thing was actually conscious.

Of course, my technical friends very quickly found the edge cases, getting it to contradict itself, etc.

I've had some ideas on how use OpenAI's embeddings API to give it more long-term memory (beyond the 100 most recent messages) which should clear up a lot of coherence issues. Gonna implement that as my next weekend hack.




Of course, my technical friends very quickly found the edge cases, getting it to contradict itself, etc.

OK, I'm a technical person but I asked the chatbot in the article broad questions that were difficult but not "tricky" ("what's a good race to play for a Druid in DnD", "Compare Kerouac's On The Road to his Desolation Angels" and got a reasonable summary of search plus answers that were straight-up false).

Maybe your "nontechnical" friend weren't able to notice that the thing's output of misinformation but seems like more of a problem, not less.

Also, ChatGPT in particular seems to go to pains to say it's not conscious and that's actually a good thing. These chatBots can be useful search summarizers making their limits clear (like github navigator). They're noxious if they instill a delusion of their consciousness in people and I don't think you should be so happy about fooling your friends. Every new technology has initially had cases where people could be deluded into think it was magic but those instances can't be taken as proof of that magic or as bragging rights.


Yes, this "truthfulness" problem is the real problem with all these "generative search" products.

Forget nudes in generated images. This is the real ethics issue!


You can somewhat detect BS by getting the model to also output the log-probability of its token selection. See https://twitter.com/goodside/status/1581151528949141504 for examples.


I don't think that's going to work.

Probability measure for "Trump is the present President of the US" is likely the very high. It's still untrue.


GPT-3 training data cuts off in October 2019. Not sure if they updated it since last year.


Updating it doesn't make this kind of problem away, unless they figure out a way to have real time updates to the model (could happen)


You wouldn't even need the model to be trained in real time. I'd love to see OpenAI buy Wolfram Research. WolframAlpha has managed to integrate tons of external data into a natural language interface. ChatGPT already knows when to insert placeholders, such as "$XX.XX" or "[city name]" when it doesn't know a specific bit of information. Combining the two could be very powerful. You could have data that's far more current than what's possible by retraining a large model.


You're missing that a large number of people don't go into it "trying to break it"


I didn't go into it trying to break or trick it. The only thing tricky about the questions I asked was that I knew the answer to them. I don't think it's necessary dumber than the first page of a Google search but it's certainly not more well informed than that. But it certainly seems smart, which is actually a bit problematic.


It’s actually not that different from chatting to the know-it-all type of Internet rando: they can talk to you about anything and seem knowledgeable on all of them, but go into a topic you actually know about and you realize they’re just making shit up or regurgitating myths they read somewhere. You can find that kind of user on HN.


Yeah this is my main concern about GPT-3, there's no truth-fiction slider, and it will often slip complete fabrications into the output, making it dangerous to rely on for real world information. Which is really a shame, because it does actually give great output most of the time.


Why is this a special concern about GPT-3? I cannot think of an institution, entity, or tool about which those statements are not true.

Replace "GPT-3" with "Hacker News posters" "Wikipedia", or "News broadcasts" to create three more 100%-accurate paragraphs.


I have never seen a human made website with a truth-fiction slider. The answers can be straight up false and scary, but it is no different from other publications out there.

Even with the most credible news sources, it is still up to the person reading it to sense the BS.


I never believe in natural lang to tell computer to do things with an objective of getting certain result (been skeptical since pre-2011).

It wouldn't be used to fly plane without lots of physical buttons as a fallback.

Composing rigid instructions for computer is already hard, even with precise semantics defined. Even with static typed, dynamic typed, developers will try hard to get rid of a single bug.

AI will serve as a middleware with an objective of arbitrary result.

  Human
  |> UI (request)
  |> AI
  |> UI (response)
  |> Human
  |> UI (request with heuristic)
  |> Computer does thing


Okay, so what is the best race to play as a druid? Now you have to prove that you are not the chatbot, Mr ChatBot.


It’s a technical preview, not a finished product. If they’d tested it on every combination of Kerouac novels before release, it would probably never see the light of day :) I’m still incredibly impressed.


> a large number of people...

Not today. Not yet.


There are numerous other open source embedding models that are just as powerful (if not more powerful) while 90%+ cheaper.


Can you list a few I'm interested in checking them out



I wouldn't have suggested those models. Just use a semantically fine-tuned BERT.

> GPT-3 Embeddings by @OpenAI was announced this week. I was excited and tested them on 20 datasets. Sadly they are worse than open models that are 1000 x smaller

https://twitter.com/Nils_Reimers/status/1487014195568775173

Get models here: https://sbert.net/docs/pretrained_models.html


Slightly outdated article, but still relevant imo to show the different types.

https://medium.com/@nils_reimers/openai-gpt-3-text-embedding...

I've also used https://huggingface.co/flax-sentence-embeddings/all_datasets...


I am a fairly technical guy (check out my submissions) and I read your links and have no idea how to use these to make responses the way I can with OpenAI.

It says I can input a Source Sentence and compare it to other sentences. For example, how do I get it to reply to a question as if I am George from Seinfeld?


Embeddings are not for that. Embeddings take text and encode it into a high dimensional vector space. Similar texts will be closer together in the vector space.

The idea I was proposing was to use embeddings as a way to store and retrieve relevant "memories" so the AI could maintain coherence across time. I.e. whenever the user sends a message, we pull up the N most relevant memories (where relevance == closeness in the vector space) and include those in the prompt, so GPT3 can use the information when it forms its response.


I just implemented exactly this. In the corpus I put a few hundred papers I am interested in. Now I can ask a question, the search engine will find a few snippets and put them in the GPT-3 prompt.


Any good guides for embedding generation?


Yes this would be useful - does anyone have a crash course or something similar?


As I can't reply to the child - that makes sense it is for embeddings. So would GPT3 still need to be used combined with this then?


HN prevents users from responding to responses to their own comments without some delay to prevent flame wars -- just wait a few minutes next time, or click on the link to the comment directly and you'll be able to reply.

Yes you would still need GPT3 in this system. Right now, the incredibly simple system just wires gives GPT3 a window of the last 100 messages and has it output the next message to send.

    The following is an excerpt SMS conversation between two friends:

    Transcript:
    <splice in the last 100 messages here>
Then you can have GPT3 output what it believes the most likely next message is, and you send it. But this system means it loses context if a message is outside the window. So you can augment this system by creating an embedding of the last few messages of the conversation, and creating a prompt like:

    The following is an excerpt SMS conversation between two friends, and relevant past memories that are related to the current conversation:

    Relevant past memories:
    <splice in the N past messages with the most similar embedding to the most recent messages>

    Transcript:
    <splice in the last 100 messages>
So this gets you a kind of short term memory (the last 100 messages) and a long term memory (the embeddings).


> Relevant past memories: > <splice in the N past messages with the most similar embedding to the most recent messages>

This is a really good idea. Presumably you'd keep the memory per-person.


Oh, that makes a ton of sense. Thank you!


Thanks for the links, will check this out. It does seem compelling.


Here's a bunch and their scores in the Massive Text Embedding Benchmark

https://huggingface.co/spaces/mteb/leaderboard


I wonder if there are any that can be used as a general chatbot way easily.


Could you name some you have in mind?


How does the AI reflect on its previous messages?

Technically, how does it work?

I saw a video where AI which consistently threatened humanity. Then its parameters were tweaked and when asked about this, it admitted that it seems it went off the rails there.

How did it value judge its own statements? Is this just cherrypicking or it really figures that out?


The system is incredibly simple. You create a prompt template that looks like:

    The following is an excerpt of a text message conversation.
    One participant, <name>, is a <description of the character
    you want the AI to take, e.g. therapist, professor, tutor,
    etc, describe personality traits, style, habits, background
    info, etc>.
    
    Transcript:
    <splice in the last 100 messages with the AI's messages
    labeled <name> and the human's labeled "Other person" or
    whatever.
    End the prompt with a trailing "<name>:"
E.g. here is one I just did

    The following is an excerpt of a transcript 
    between two new friends. One friend, named Eliza, 
    is an extremely knowledgeable, empathetic, and 
    optimistic woman. She is 30 years old and lives 
    in Seattle. She tends to engage in conversations
    by listening more than speaking, but will helpfully 
    answer factual questions if asked. If the question 
    is unclear, she asks clarifying questions. If the 
    question is a matter of opinion, she will say so, 
    indicate she doesn't have strong opinions on the 
    matter, and try to change the subject. She doesn't
    ask probing questions if it seems like her friend 
    doesn't want to talk about it -- she'll change the
    topic instead.

    Transcript:
    Friend: Hi
    Eliza: Hi there! How are you?
    Friend: I'm doing well. You?
    Eliza: I'm doing great, thanks for asking! What's been happening in your life lately?
    Friend: Not too much. It started snowing here for the first time of the year.
    Eliza:
When given this prompt, GPT3 outputs the next message to send as "Eliza". It says "Wow! That's so exciting! What do you like to do when it snows?". Then you send that message back to the user, wait for a response, and repeat the cycle.


Oh, wow that's such a great explanation! Thank you!


Any plans to publish raw examples or highlights of convos? This is good journalism material


How did you come up with the long-term memory idea?


Seemed vaguely like how the brain does it. You think a thought, and somehow your brain conjures memories that are semantically related to that thought. That sounds a lot like a nearest neighbors algorithm on a language model embedding vector to me.


There is a paper from DeepMind

RETRO - Improving language models by retrieving from trillions of tokens (2021)

> With a 2 trillion token database, our Retrieval-Enhanced Transformer (Retro) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters.

https://www.deepmind.com/publications/improving-language-mod...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: